Showing preview only (2,585K chars total). Download the full file or copy to clipboard to get everything.
Repository: py-pdf/pypdf
Branch: main
Commit: 04b0a38f56ad
Files: 207
Total size: 2.4 MB
Directory structure:
gitextract_mui37wu0/
├── .git-blame-ignore-revs
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug-report.md
│ │ └── feature-request.md
│ ├── SECURITY.md
│ ├── dependabot.yaml
│ ├── scripts/
│ │ ├── check_gh_pages_updates.py
│ │ ├── check_pr_title.py
│ │ └── check_urls.py
│ └── workflows/
│ ├── benchmark.yaml
│ ├── create-github-release.yaml
│ ├── gh-pages-check.yaml
│ ├── github-ci.yaml
│ ├── publish-to-pypi.yaml
│ ├── release.yaml
│ ├── title-check.yaml
│ └── urls-check.yaml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CHANGELOG.md
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE
├── Makefile
├── README.md
├── docs/
│ ├── Makefile
│ ├── _static/
│ │ └── releasing.drawio
│ ├── conf.py
│ ├── dev/
│ │ ├── cmaps.md
│ │ ├── deprecations.md
│ │ ├── documentation.md
│ │ ├── intro.md
│ │ ├── pdf-format.md
│ │ ├── pypdf-parsing.md
│ │ ├── pypdf-writing.md
│ │ ├── releasing.md
│ │ └── testing.md
│ ├── index.rst
│ ├── make.bat
│ ├── meta/
│ │ ├── changelog-v1.md
│ │ ├── comparisons.md
│ │ ├── faq.md
│ │ ├── history.md
│ │ ├── migration-1-to-2.md
│ │ ├── project-governance.md
│ │ ├── scope-of-pypdf.md
│ │ └── taking-ownership.md
│ ├── modules/
│ │ ├── Destination.rst
│ │ ├── DocumentInformation.rst
│ │ ├── Field.rst
│ │ ├── Fit.rst
│ │ ├── PageObject.rst
│ │ ├── PageRange.rst
│ │ ├── PaperSize.rst
│ │ ├── PdfDocCommon.rst
│ │ ├── PdfReader.rst
│ │ ├── PdfWriter.rst
│ │ ├── RectangleObject.rst
│ │ ├── Transformation.rst
│ │ ├── XmpInformation.rst
│ │ ├── annotations.rst
│ │ ├── constants.rst
│ │ ├── errors.rst
│ │ └── generic.rst
│ └── user/
│ ├── add-javascript.md
│ ├── add-watermark.md
│ ├── adding-pdf-annotations.md
│ ├── cropping-and-transforming.md
│ ├── encryption-decryption.md
│ ├── extract-images.md
│ ├── extract-text.md
│ ├── file-size.md
│ ├── forms.md
│ ├── handle-attachments.md
│ ├── handling-outlines.md
│ ├── installation.md
│ ├── merging-pdfs.md
│ ├── metadata.md
│ ├── pdf-version-support.md
│ ├── pdfa-compliance.md
│ ├── post-processing-in-text-extraction.md
│ ├── reading-pdf-annotations.md
│ ├── robustness.md
│ ├── security.md
│ ├── streaming-data.md
│ ├── suppress-warnings.md
│ └── viewer-preferences.md
├── make_release.py
├── pypdf/
│ ├── __init__.py
│ ├── _cmap.py
│ ├── _codecs/
│ │ ├── __init__.py
│ │ ├── _codecs.py
│ │ ├── adobe_glyphs.py
│ │ ├── core_font_metrics.py
│ │ ├── pdfdoc.py
│ │ ├── std.py
│ │ ├── symbol.py
│ │ └── zapfding.py
│ ├── _crypt_providers/
│ │ ├── __init__.py
│ │ ├── _base.py
│ │ ├── _cryptography.py
│ │ ├── _fallback.py
│ │ └── _pycryptodome.py
│ ├── _doc_common.py
│ ├── _encryption.py
│ ├── _font.py
│ ├── _page.py
│ ├── _page_labels.py
│ ├── _protocols.py
│ ├── _reader.py
│ ├── _text_extraction/
│ │ ├── __init__.py
│ │ ├── _layout_mode/
│ │ │ ├── __init__.py
│ │ │ ├── _fixed_width_page.py
│ │ │ ├── _text_state_manager.py
│ │ │ └── _text_state_params.py
│ │ └── _text_extractor.py
│ ├── _utils.py
│ ├── _version.py
│ ├── _writer.py
│ ├── annotations/
│ │ ├── __init__.py
│ │ ├── _base.py
│ │ ├── _markup_annotations.py
│ │ └── _non_markup_annotations.py
│ ├── constants.py
│ ├── errors.py
│ ├── filters.py
│ ├── generic/
│ │ ├── __init__.py
│ │ ├── _appearance_stream.py
│ │ ├── _base.py
│ │ ├── _data_structures.py
│ │ ├── _files.py
│ │ ├── _fit.py
│ │ ├── _image_inline.py
│ │ ├── _image_xobject.py
│ │ ├── _link.py
│ │ ├── _outline.py
│ │ ├── _rectangle.py
│ │ ├── _utils.py
│ │ └── _viewerpref.py
│ ├── pagerange.py
│ ├── papersizes.py
│ ├── py.typed
│ ├── types.py
│ └── xmp.py
├── pyproject.toml
├── requirements/
│ ├── ci-3.11.txt
│ ├── ci.in
│ ├── ci.txt
│ ├── dev.in
│ ├── dev.txt
│ ├── docs.in
│ └── docs.txt
├── resources/
│ ├── 010-pdflatex-forms.txt
│ ├── AEO.1172.layout.rot180.txt
│ ├── AEO.1172.layout.txt
│ ├── Claim Maker Alerts Guide_pg2.layout.txt
│ ├── Epic.Page.layout.txt
│ ├── afm_to_dataclass.py
│ ├── crazyones.txt
│ ├── crazyones_layout_vertical_space.txt
│ ├── crazyones_layout_vertical_space_font_height_weight.txt
│ ├── jpeg.txt
│ ├── multicolumn-lorem-ipsum.txt
│ └── toy.layout.txt
└── tests/
├── __init__.py
├── bench.py
├── conftest.py
├── example_files.yaml
├── generic/
│ ├── __init__.py
│ ├── test_base.py
│ ├── test_data_structures.py
│ ├── test_files.py
│ ├── test_image_inline.py
│ ├── test_image_xobject.py
│ └── test_link.py
├── scripts/
│ ├── __init__.py
│ ├── data/
│ │ └── commits__version_4_0_1.json
│ ├── test_example_files.py
│ └── test_make_release.py
├── test_annotations.py
├── test_appearance_stream.py
├── test_cmap.py
├── test_codecs.py
├── test_constants.py
├── test_doc_common.py
├── test_encryption.py
├── test_filters.py
├── test_font.py
├── test_forms.py
├── test_generic.py
├── test_images.py
├── test_javascript.py
├── test_merger.py
├── test_page.py
├── test_page_labels.py
├── test_pagerange.py
├── test_papersizes.py
├── test_pdfa.py
├── test_protocols.py
├── test_reader.py
├── test_text_extraction.py
├── test_utils.py
├── test_workflows.py
├── test_writer.py
├── test_xmp.py
└── utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .git-blame-ignore-revs
================================================
# This file helps us to ignore style / formatting / doc changes
# in git blame. That is useful when we're trying to find the root cause of an
# error.
# Docstring formatting
a89ff74d8c0203278a039d9496a3d8df4d134f84
# STY: Apply pre-commit (black, isort) + use snake_case variables (#832)
eef03d935dfeacaa75848b39082cf94d833d3174
# STY: Apply black and isort
baeb7d23278de0f8d00ca9f2b656bf0674f08937
# STY: Documentation, Variable names (#839)
444fca22836df061d9d23e71ffb7d68edcdfa766
================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.md
================================================
---
name: Report a bug
about: Something broke!
title: ''
labels: Bug
assignees: ''
---
Replace this: What happened? What were you trying to achieve?
## Environment
Which environment were you using when you encountered the problem?
```bash
$ python -m platform
# TODO: Your output goes here
$ python -c "import pypdf;print(pypdf._debug_versions)"
# TODO: Your output goes here
```
## Code + PDF
This is a minimal, complete example that shows the issue:
```python
# TODO: Your code goes here
```
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
## Traceback
This is the complete traceback I see:
```
# TODO: Your traceback goes here (if applicable)
```
================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.md
================================================
---
name: Request a Feature
about: What do you think is missing in pypdf?
title: ''
labels: Feature Request
assignees: ''
---
## Explanation
Explain briefly what you want to achieve.
## Code Example
How would your feature be used? (Remove this if it is not applicable.)
```python
from pypdf import PdfReader, PdfWriter
... # your new feature in action!
```
================================================
FILE: .github/SECURITY.md
================================================
# Security Policy
## Supported Versions
Security fixes are applied to the latest version.
## Reporting a Vulnerability
If you find a potential security issue, please report it using the
[private vulnerability reporting](https://docs.github.com/en/code-security/security-advisories/guidance-on-reporting-and-writing-information-about-vulnerabilities/privately-reporting-a-security-vulnerability) feature of GitHub to
automatically inform all relevant team members. Otherwise, please
get in touch with stefan6419846 through e-mail (current maintainer,
address in GitHub profile).
Please have a look at our [corresponding user documentation](https://pypdf.readthedocs.io/en/stable/user/security.html)
as well, which includes some information about possibly invalid reports as well.
We will try to find a fix in a timely manner and will then issue a security
advisory together with the update via GitHub, as well as requesting a CVE
([example](https://github.com/py-pdf/pypdf/security/advisories/GHSA-xcjx-m2pj-8g79)).
If you do not get a reaction within 30 days, please open a public issue on GitHub.
================================================
FILE: .github/dependabot.yaml
================================================
# Set update schedule for GitHub Actions
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
commit-message:
prefix: "DEV"
================================================
FILE: .github/scripts/check_gh_pages_updates.py
================================================
"""Check that all GitHub pages JavaScript dependencies are up-to-date.""" # noqa: INP001
import base64
import hashlib
import json
import re
import sys
import urllib.request
from pathlib import Path
JSDELIVR_RE = re.compile(
r"(https://cdn\.jsdelivr\.net/npm/"
r"(?P<name>[^@/]+)@(?P<version>[^/]+)"
r"/(?P<path>[^\"']+))"
)
def fetch_json(url: str) -> dict:
"""Retrieve JSON data from the given URL."""
with urllib.request.urlopen(url, timeout=15) as resp: # noqa: S310 # Controlled input.
return json.load(resp)
def fetch_bytes(url: str) -> bytes:
"""Retrieve bytes data from the given URL."""
with urllib.request.urlopen(url, timeout=30) as resp: # noqa: S310 # Controlled input.
return resp.read()
def get_latest_version(pkg: str) -> str:
"""Get the latest version for this package."""
data = fetch_json(f"https://registry.npmjs.org/{pkg}")
return data["dist-tags"]["latest"]
def sri_hash(content: bytes) -> str:
"""Calculate the SRI hash for the given content."""
digest = hashlib.sha384(content).digest()
return "sha384-" + base64.b64encode(digest).decode("ascii")
def scan_html(path: Path) -> list[re.Match[str]]:
"""Scan the given HTML file for external JavaScript includes."""
text = path.read_text(encoding="utf-8", errors="ignore")
return list(JSDELIVR_RE.finditer(text))
def main() -> None:
"""Perform the checks."""
outdated_found = False
for html_path in sorted(Path("gh-pages").rglob("*.html"), key=str):
matches = scan_html(html_path)
if not matches:
continue
sys.stdout.write(f"\n📄 {html_path} ...\n\n")
for m in matches:
pkg = m.group("name")
current_version = m.group("version")
full_url = m.group(1)
try:
latest_version = get_latest_version(pkg)
except Exception as e:
sys.stdout.write(f" ⚠️ {pkg}: npm lookup failed ({e})\n")
continue
if current_version == latest_version:
sys.stdout.write(f" ✅ {pkg} {current_version}\n")
continue
outdated_found = True
latest_url = full_url.replace(
f"@{current_version}/", f"@{latest_version}/"
)
try:
latest_bytes = fetch_bytes(latest_url)
latest_sri = sri_hash(latest_bytes)
except Exception as e:
sys.stdout.write(f" ⚠️ {pkg}: failed to fetch latest file ({e})\n")
continue
sys.stdout.write(f" ❌ {pkg}\n")
sys.stdout.write(f" Current: {current_version}\n")
sys.stdout.write(f" Latest: {latest_version}\n")
sys.stdout.write(f" Latest SRI: {latest_sri}\n")
sys.stdout.write("\n")
if outdated_found:
sys.stdout.write("\n❗ Outdated dependencies detected\n")
sys.exit(1)
sys.stdout.write("\n🎉 All CDN dependencies are up to date\n")
if __name__ == "__main__":
main()
================================================
FILE: .github/scripts/check_pr_title.py
================================================
"""Check that all PR titles follow the desired scheme.""" # noqa: INP001
import os
import sys
KNOWN_PREFIXES = (
"SEC: ",
"BUG: ",
"ENH: ",
"DEP: ",
"PI: ",
"ROB: ",
"DOC: ",
"TST: ",
"DEV: ",
"STY: ",
"MAINT: ",
"REL: ", # For internal use only.
)
PR_TITLE = os.getenv("PR_TITLE", "")
if not PR_TITLE.startswith(KNOWN_PREFIXES) or not PR_TITLE.split(": ", maxsplit=1)[1]:
sys.stderr.write(
f"The PR title '{PR_TITLE}' does not follow the projects naming scheme: "
"https://pypdf.readthedocs.io/en/latest/dev/intro.html#commit-messages\n",
)
sys.stderr.write(
"If you do not know which one to choose or if multiple apply, make a best guess. "
"Nobody will complain if it does not quite fit :-)\n",
)
sys.exit(1)
else:
sys.stdout.write(f"PR title '{PR_TITLE}' appears to be valid.\n")
================================================
FILE: .github/scripts/check_urls.py
================================================
"""Check that all test data URLs are still accessible.""" # noqa: INP001
import ast
import sys
from collections.abc import Iterator
from operator import itemgetter
from pathlib import Path
from tests import _get_data_from_url, read_yaml_to_list_of_dicts
URL_PREFIXES_TO_IGNORE = (
"http://ns.adobe.com/tiff/1.0/",
"http://www.example.com",
"https://example.com",
"https://martin-thoma.com",
"https://pypdf.readthedocs.io/",
"https://www.example.com",
)
PDF_URLS_WHICH_DO_NOT_LOOK_LIKE_PDFS = {
"https://github.com/user-attachments/files/18381726/tika-957721.pdf",
}
def get_urls_from_test_files() -> Iterator[str]:
"""Retrieve all URLs defined the test files."""
tests_directory = Path(__file__).parent.parent.parent / "tests"
for test_file in sorted(tests_directory.rglob("test_*.py")):
tree = ast.parse(source=test_file.read_text(encoding="utf-8"), filename=str(test_file))
for node in ast.walk(tree):
if not isinstance(node, ast.Constant):
continue
if not isinstance(node.value, str):
continue
if not node.value.startswith(("http://", "https://")):
continue
yield node.value
def get_urls_from_example_files() -> Iterator[str]:
"""Retrieve all URLs defined in the `example_files.yaml`."""
pdfs = read_yaml_to_list_of_dicts(Path(__file__).parent.parent.parent / "tests" / "example_files.yaml")
yield from map(itemgetter("url"), pdfs)
def check_url(url: str) -> bool:
"""Check if the given URL appears to still be valid."""
if url.startswith(URL_PREFIXES_TO_IGNORE):
return True
try:
data = _get_data_from_url(url)
except Exception as exception:
sys.stderr.write(f"Error getting data from {url}: {exception}\n")
return False
if len(data) < 75:
sys.stderr.write(f"Not enough data from {url}: {data}\n")
return False
if (
url.lower().endswith(".pdf") and
url not in PDF_URLS_WHICH_DO_NOT_LOOK_LIKE_PDFS and
not data.startswith(b"%PDF-")
):
sys.stderr.write(f"The file at {url} does not look like a PDF: {data[:50]}\n")
return False
sys.stdout.write(f"URL {url} looks good.\n")
return True
def main() -> bool:
"""Check if there are invalid URLs."""
urls: set[str] = set()
for url in get_urls_from_test_files():
urls.add(url)
for url in get_urls_from_example_files():
urls.add(url)
is_valid = True
for url in sorted(urls):
is_valid &= check_url(url)
return not is_valid
if __name__ == "__main__":
sys.exit(main())
================================================
FILE: .github/workflows/benchmark.yaml
================================================
name: Benchmarking pypdf
on:
push:
branches:
- main
permissions:
contents: write
deployments: write
jobs:
benchmark:
name: "Benchmark ${{ matrix.name }}"
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.x']
include:
- python-version: '3.x'
name: 'CPython'
- python-version: 'pypy3.11'
name: 'PyPy 3.11'
steps:
- name: Checkout Code
uses: actions/checkout@v6
with:
submodules: 'recursive'
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install requirements
run: |
pip install -r requirements/ci-3.11.txt
- name: Install pypdf
run: |
pip install .
- name: Run benchmark
run: |
pytest tests/bench.py --benchmark-json output.json
- name: Store benchmark result
uses: benchmark-action/github-action-benchmark@v1
with:
name: "${{ matrix.name }} Benchmark"
tool: 'pytest'
output-file-path: output.json
# Use personal access token instead of GITHUB_TOKEN due to https://github.community/t/github-action-not-triggering-gh-pages-upon-push/16096
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
# Show alert with commit comment on detecting possible performance regression
alert-threshold: '200%'
comment-on-alert: true
fail-on-alert: true
================================================
FILE: .github/workflows/create-github-release.yaml
================================================
name: Create a GitHub release page
on:
push:
tags:
- '*.*.*'
workflow_dispatch:
permissions:
contents: write
jobs:
build_and_publish:
name: Create a GitHub release page
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v6
- name: Prepare variables
id: prepare_variables
run: |
git fetch --tags --force
latest_tag=$(git describe --tags --abbrev=0)
echo "latest_tag=${latest_tag}" >> "$GITHUB_ENV"
echo "date=$(date +'%Y-%m-%d')" >> "$GITHUB_ENV"
EOF=$(dd if=/dev/urandom bs=15 count=1 status=none | base64)
echo "tag_body<<$EOF" >> "$GITHUB_ENV"
git --no-pager tag -l "${latest_tag}" --format='%(contents:body)' >> "$GITHUB_ENV"
echo "$EOF" >> "$GITHUB_ENV"
- name: Create GitHub Release 🚀
uses: softprops/action-gh-release@v2
with:
tag_name: ${{ env.latest_tag }}
name: Version ${{ env.latest_tag }}, ${{ env.date }}
draft: false
prerelease: false
body: ${{ env.tag_body }}
================================================
FILE: .github/workflows/gh-pages-check.yaml
================================================
name: 'GitHub Pages Check'
on:
workflow_dispatch:
schedule:
- cron: 0 6 * * 1
jobs:
url-check:
name: GitHub Pages check
runs-on: ubuntu-latest
steps:
- name: Checkout GitHub Pages
uses: actions/checkout@v6
with:
ref: 'gh-pages'
path: 'gh-pages'
- name: Checkout main (tools)
uses: actions/checkout@v6
with:
ref: main
path: main
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
- name: Check GitHub Pages
run: |
export PYTHONPATH="$GITHUB_WORKSPACE"
python main/.github/scripts/check_gh_pages_updates.py
================================================
FILE: .github/workflows/github-ci.yaml
================================================
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/tutorials/build-and-test-code/python
name: CI
on:
push:
branches:
- main
paths-ignore:
- '**/*.md'
- '**/*.rst'
pull_request:
branches:
- main
paths-ignore:
- '**/*.md'
- '**/*.rst'
workflow_dispatch:
jobs:
test_windows:
name: pytest on windows
runs-on: windows-latest
steps:
- name: Checkout Code
uses: actions/checkout@v6
with:
submodules: 'recursive'
- name: Cache Downloaded Files
id: cache-downloaded-files-windows
uses: actions/cache@v5
if: github.ref == 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-${{ github.run_id }}
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
enableCrossOsArchive: true
- name: Restore Downloaded Files
uses: actions/cache/restore@v5
if: github.ref != 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
enableCrossOsArchive: true
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
allow-prereleases: true
- name: Upgrade pip
run: |
python -m pip install --upgrade pip
- name: Install requirements (Python 3.11+)
run: |
pip install -r requirements/ci-3.11.txt
- name: Install cryptography
run: |
pip install cryptography
- name: Install pypdf
run: |
pip install .
- name: Prepare
run: |
python -c "from tests import download_test_pdfs; download_test_pdfs()"
- name: Test with pytest
run: |
python -m pytest tests --cov=pypdf --cov-append -n auto -vv -p no:benchmark
test_macos:
name: pytest on macOS
runs-on: macos-latest
steps:
- name: Checkout Code
uses: actions/checkout@v6
with:
submodules: 'recursive'
- name: Cache Downloaded Files
id: cache-downloaded-files-mac
uses: actions/cache@v5
if: github.ref == 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-${{ github.run_id }}
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
- name: Restore Downloaded Files
uses: actions/cache/restore@v5
if: github.ref != 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
- name: Setup Python (3.11+)
uses: actions/setup-python@v6
with:
python-version: '3.x'
allow-prereleases: true
- name: Upgrade pip
run: |
python -m pip install --upgrade pip
- name: Install requirements (Python 3.11+)
run: |
pip install -r requirements/ci-3.11.txt
- name: Install cryptography
run: |
pip install cryptography
- name: Install OS dependencies
run:
brew install ghostscript jbig2dec poppler
- name: Install pypdf
run: |
pip install .
- name: Prepare
run: |
python -c "from tests import download_test_pdfs; download_test_pdfs()"
- name: Test with pytest
run: |
python -m pytest tests --cov=pypdf --cov-append -n auto -vv -p no:benchmark
tests:
name: "pytest on ${{ matrix.python-version }} (crypto-lib: ${{ matrix.use-crypto-lib }})"
runs-on: ubuntu-24.04
strategy:
matrix:
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13', '3.14', 'pypy3.11']
use-crypto-lib: ['cryptography']
include:
- python-version: '3.9'
use-crypto-lib: 'pycryptodome'
- python-version: '3.9'
use-crypto-lib: 'none'
steps:
- name: Update APT packages
run:
sudo apt-get update
- name: Install APT dependencies
run:
sudo apt-get install ghostscript jbig2dec poppler-utils
- name: Checkout Code
uses: actions/checkout@v6
with:
submodules: 'recursive'
- name: Cache Downloaded Files
id: cache-downloaded-files
uses: actions/cache@v5
if: github.ref == 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-${{ github.run_id }}
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
- name: Restore Downloaded Files
uses: actions/cache/restore@v5
if: github.ref != 'refs/heads/main'
with:
path: '**/tests/pdf_cache/*'
key: cache-downloaded-files-main-
restore-keys: |
cache-downloaded-files-main-
cache-downloaded-files
- name: Setup Python
uses: actions/setup-python@v6
if: matrix.python-version == '3.9' || matrix.python-version == '3.10'
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements/ci.txt'
- name: Setup Python (3.11+)
uses: actions/setup-python@v6
if: matrix.python-version != '3.9' && matrix.python-version != '3.10'
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true
cache: 'pip'
cache-dependency-path: '**/requirements/ci-3.11.txt'
- name: Upgrade pip
run: |
python -m pip install --upgrade pip
- name: Install requirements (Python 3)
run: |
pip install -r requirements/ci.txt
if: matrix.python-version == '3.9' || matrix.python-version == '3.10'
- name: Install requirements (Python 3.11+)
run: |
pip install -r requirements/ci-3.11.txt
if: matrix.python-version != '3.9' && matrix.python-version != '3.10'
- name: Remove pycryptodome and cryptography
run: |
pip uninstall pycryptodome cryptography -y
- name: Install cryptography
run: |
pip install cryptography
if: matrix.use-crypto-lib == 'cryptography'
- name: Install pycryptodome
run: |
pip install pycryptodome
if: matrix.use-crypto-lib == 'pycryptodome'
- name: Install pypdf
run: |
pip install .
- name: Download test files
run: |
python -c "from tests import download_test_pdfs; download_test_pdfs()"
- name: Test with pytest
run: |
python -m pytest tests --cov=pypdf --cov-append -n auto -vv -p no:benchmark
if: ${{ !startsWith(matrix.python-version, 'pypy') }}
- name: Test with pytest (PyPy, no coverage)
# Coverage on PyPy is skipped because running coverage with PyPy is slow and CPython test already provides
# complete coverage data for the same code
run: |
python -m pytest tests -n auto -vv -p no:benchmark -o faulthandler_timeout=60 --dist=loadfile
if: ${{ startsWith(matrix.python-version, 'pypy') }}
- name: Rename coverage data file
run: mv .coverage ".coverage.$RANDOM"
if: ${{ !startsWith(matrix.python-version, 'pypy') }}
- name: Upload coverage data
uses: actions/upload-artifact@v7
if: ${{ !startsWith(matrix.python-version, 'pypy') }}
with:
name: coverage-data.${{ matrix.python-version }}-${{ matrix.use-crypto-lib }}
path: .coverage.*
if-no-files-found: ignore
include-hidden-files: true
codestyle:
name: Check code style issues
runs-on: ubuntu-24.04
steps:
- name: Checkout Code
uses: actions/checkout@v6
with:
submodules: 'recursive'
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
cache: 'pip'
cache-dependency-path: '**/requirements/ci-3.11.txt'
- name: Upgrade pip
run: |
python -m pip install --upgrade pip
- name: Install requirements
run: |
pip install -r requirements/ci-3.11.txt
- name: Install pypdf
run: |
pip install .
- name: Test with ruff
run: |
echo `ruff --version`
ruff check .
- name: Test with mypy
run : |
mypy pypdf
- name: Install docs requirements
run: |
pip install -r requirements/docs.txt
- name: Test docs build
working-directory: ./docs
run: |
sphinx-build --nitpicky --fail-on-warning --keep-going --show-traceback -d _build/doctrees --builder html . _build/html
- name: Test docs examples
working-directory: ./docs
run: |
sphinx-build -d _build/doctrees --builder doctest . _build/doctest
- name: Check with pre-commit
run: |
pip install -r requirements/dev.txt
pre-commit run --all-files --show-diff-on-failure
package:
name: Build & verify package
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: '3.x'
- run: python -m pip install flit check-wheel-contents
- run: flit build
- run: ls -l dist
- name: Test CHANGELOG.md present in sdist
run: tar -tzf dist/*.tar.gz | grep -q 'CHANGELOG.md'
- name: Test of bdist
run: check-wheel-contents dist/*.whl
- name: Test installing package
run: python -m pip install .
- name: Test running installed package
working-directory: /tmp
run: python -c "import pypdf;print(pypdf.__version__)"
coverage:
name: Combine & check coverage.
runs-on: ubuntu-latest
needs: tests
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: '3.x'
- run: python -m pip install --upgrade coverage[toml]
- uses: actions/download-artifact@v8
with:
pattern: coverage-data*
merge-multiple: true
- name: Check Number of Downloaded Files
run: |
downloaded_files_count=$(find \.coverage* -type f | wc -l)
if [ $downloaded_files_count -eq 8 ]; then
echo "The expected number of files (8) were downloaded."
else
echo "ERROR: Expected 8 files, but found $downloaded_files_count files."
exit 1
fi
- name: Combine coverage & create xml report
run: |
python -m coverage combine
python -m coverage xml
- name: Upload Coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
================================================
FILE: .github/workflows/publish-to-pypi.yaml
================================================
name: Publish Python Package to PyPI
on:
push:
tags:
- '*.*.*'
workflow_dispatch:
jobs:
build:
name: Build distribution
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
- name: Install pypa/build
run: >-
python3 -m
pip install
build
--user
- name: Build a binary wheel and a source tarball
run: python3 -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v7
with:
name: python-package-distributions
path: dist/
publish-to-pypi:
name: Publish Python distribution to PyPI
needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/pypdf
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing
steps:
- name: Download all the dists
uses: actions/download-artifact@v8
with:
name: python-package-distributions
path: dist/
- name: Publish distribution to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
================================================
FILE: .github/workflows/release.yaml
================================================
# This action assumes that there is a REL-commit which already has a
# Markdown-formatted git tag. Hence, the CHANGELOG is already adjusted,
# and it's decided what should be in the release.
# This action only ensures the release is done with the proper contents
# and that it's announced with a GitHub release.
name: Create git tag
# Disable for now and uses dummy `workflow_dispatch` trigger we usually do not use anyway.
# To activate this again, we have to fix https://github.com/py-pdf/pypdf/issues/2753
on:
workflow_dispatch:
# push:
# branches:
# - main
permissions:
contents: write
env:
HEAD_COMMIT_MESSAGE: ${{ github.event.head_commit.message }}
jobs:
build_and_publish:
name: Publish a new version
runs-on: ubuntu-latest
if: "${{ startsWith(github.event.head_commit.message, 'REL: ') }}"
steps:
- name: Checkout Repository
uses: actions/checkout@v6
- name: Extract version from commit message
id: extract_version
run: |
VERSION=$(echo "$HEAD_COMMIT_MESSAGE" | grep -oP '(?<=REL: )\d+\.\d+\.\d+')
echo "version=$VERSION" >> $GITHUB_OUTPUT
- name: Extract tag message from commit message
id: extract_message
run: |
VERSION="${{ steps.extract_version.outputs.version }}"
delimiter="$(openssl rand -hex 8)"
MESSAGE=$(echo "$HEAD_COMMIT_MESSAGE" | sed "0,/REL: $VERSION/s///" )
echo "message<<${delimiter}" >> $GITHUB_OUTPUT
echo "$MESSAGE" >> $GITHUB_OUTPUT
echo "${delimiter}" >> $GITHUB_OUTPUT
- name: Create Git Tag
run: |
VERSION="${{ steps.extract_version.outputs.version }}"
MESSAGE="${{ steps.extract_message.outputs.message }}"
git config user.name github-actions
git config user.email github-actions@github.com
git tag "$VERSION" -m "$MESSAGE"
git push origin $VERSION
================================================
FILE: .github/workflows/title-check.yaml
================================================
name: 'PR Title Check'
on:
pull_request:
# check when PR
# * is created,
# * title is edited, and
# * new commits are added (to ensure failing title blocks merging)
types: [opened, reopened, edited, synchronize]
jobs:
title-check:
name: Title check
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v6
- name: Check PR title
env:
PR_TITLE: ${{ github.event.pull_request.title }}
run: python .github/scripts/check_pr_title.py
================================================
FILE: .github/workflows/urls-check.yaml
================================================
name: 'URL Check'
on:
workflow_dispatch:
schedule:
- cron: 0 6 * * 1
jobs:
url-check:
name: URL check
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v6
- name: Setup Python
uses: actions/setup-python@v6
with:
python-version: '3.x'
- name: Install requirements
run:
pip install pyyaml Pillow
- name: Check URLs
run: |
export PYTHONPATH="$GITHUB_WORKSPACE"
python .github/scripts/check_urls.py
================================================
FILE: .gitignore
================================================
*.pyc
*.swp
.DS_Store
.tox
build
.idea/*
*.egg-info/
dist/*
__pycache__/
# in-project virtual environments
venv/
.venv/
# Code coverage artifacts
.coverage*
coverage.xml
# Editors / IDEs
.vscode/
# Docs
docs/_build/
.cspell/
# Files generated by some of the scripts
dont_commit_*.pdf
pypdf-output.pdf
annotated-pdf-link.pdf
Image9.png
pypdf_pdfLocation.txt
.python-version
tests/pdf_cache/
docs/meta/CHANGELOG.md
docs/meta/CONTRIBUTORS.md
extracted-images/
RELEASE_COMMIT_MSG.md
RELEASE_TAG_MSG.md
.envrc
================================================
FILE: .gitmodules
================================================
[submodule "sample-files"]
path = sample-files
url = https://github.com/py-pdf/sample-files
================================================
FILE: .pre-commit-config.yaml
================================================
# pre-commit run --all-files
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
exclude: "resources/.*|docs/make.bat"
- id: fix-byte-order-marker
- id: trailing-whitespace
- id: mixed-line-ending
args: ['--fix=lf']
exclude: "docs/make.bat"
- id: check-added-large-files
args: ['--maxkb=1000']
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.0
hooks:
- id: ruff-check
args: ['--fix']
- repo: https://github.com/asottile/pyupgrade
rev: v3.21.2
hooks:
- id: pyupgrade
args: [--py39-plus]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.17.1
hooks:
- id: mypy
files: ^pypdf/.*
================================================
FILE: .readthedocs.yaml
================================================
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
build:
os: ubuntu-lts-latest
tools:
python: "latest"
# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats: all
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: requirements/docs.txt
- method: pip
path: .
extra_requirements:
- full
================================================
FILE: CHANGELOG.md
================================================
# CHANGELOG
## Version 6.9.1, 2026-03-17
### Security (SEC)
- Improve performance and limit length of array-based content streams (#3686)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.9.0...6.9.1)
## Version 6.9.0, 2026-03-15
### New Features (ENH)
- Expose /Perms verification result on Encryption object (#3672)
### Performance Improvements (PI)
- Fix O(n²) performance in NameObject read/write (#3679)
- Batch-parse all objects in ObjStm on first access (#3677)
### Bug Fixes (BUG)
- Avoid sharing array-based content streams between pages (#3681)
- Avoid accessing invalid page when inserting blank page under some conditions (#3529)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.8.0...6.9.0)
## Version 6.8.0, 2026-03-09
### Security (SEC)
- Limit allowed `/Length` value of stream (#3675)
### New Features (ENH)
- Add /IRT (in-reply-to) support for markup annotations (#3631)
### Documentation (DOC)
- Avoid using `PageObject.replace_contents` on PdfReader (#3669)
- Document how to disable jbig2dec calls
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.5...6.8.0)
## Version 6.7.5, 2026-03-02
### Security (SEC)
- Improve the performance of the ASCIIHexDecode filter (#3666)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.4...6.7.5)
## Version 6.7.4, 2026-02-27
### Security (SEC)
- Allow limiting output length for RunLengthDecode filter (#3664)
### Robustness (ROB)
- Deal with invalid annotations in extract_links (#3659)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.3...6.7.4)
## Version 6.7.3, 2026-02-24
### Security (SEC)
- Use zlib decompression limit when retrieving XFA data (#3658)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.2...6.7.3)
## Version 6.7.2, 2026-02-22
### Security (SEC)
- Prevent infinite loop from circular xref /Prev references (#3655)
### Bug Fixes (BUG)
- Fix wrong LUT size error (#3651)
- Fix handling of page boxes defined on `/Pages` (#3650)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.1...6.7.2)
## Version 6.7.1, 2026-02-17
### Security (SEC)
- Detect cyclic references when accessing TreeObject.children (#3645)
- Limit size of `/ToUnicode` entries (#3646)
- Limit FlateDecode recovery attempts (#3644)
### Bug Fixes (BUG)
- Avoid own object replacement logic in `PageObject.replace_contents` (#3638)
- Fix UnboundLocalError when update_page_form_field_values with /Sig (#3634)
### Robustness (ROB)
- Avoid divison by zero when decoding FlateDecode PNG prediction (#3641)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.7.0...6.7.1)
## Version 6.7.0, 2026-02-08
### Deprecations (DEP)
- Deprecate support for abbreviations in decode_stream_data (#3617)
### New Features (ENH)
- Add ability to add font resources for 14 Adobe Core fonts in text widget annotations (#3624)
### Bug Fixes (BUG)
- Avoid invalid load for ICCBased FlateDecode images in mode 1 (#3619)
### Robustness (ROB)
- Fix AESV2 decryption when /Length missing in encrypt dict (#3629)
- Fix merging when annotations point to NullObject (#3613)
- Check for `self._info` being None in `compress_identical_objects` (#3612)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.6.2...6.7.0)
## Version 6.6.2, 2026-01-26
### Security (SEC)
- Detect cyclic references when retrieving outlines (#3610)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.6.1...6.6.2)
## Version 6.6.1, 2026-01-25
### Robustness (ROB)
- `/AcroForm` might be NullObject (#3601)
- Handle missing font bounding boxes gracefully (#3600)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.6.0...6.6.1)
## Version 6.6.0, 2026-01-09
### Security (SEC)
- Improve handling of partially broken PDF files (#3594)
### Deprecations (DEP)
- Block common page content modifications when assigned to reader (#3582)
### New Features (ENH)
- Embellishments to generated text appearance streams (#3571)
### Bug Fixes (BUG)
- Do not consider multi-byte BOM-like sequences as BOMs (#3589)
### Robustness (ROB)
- Avoid empty FlateDecode outputs without warning (#3579)
### Documentation (DOC)
- Add outlines documentation and link it in User Guide (#3511)
### Developer Experience (DEV)
- Add PyPy 3.11 to test matrix and benchmarks (#3574)
### Maintenance (MAINT)
- Fix compatibility with Pillow >= 12.1.0 (#3590)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.5.0...6.6.0)
## Version 6.5.0, 2025-12-21
### New Features (ENH)
- Limit jbig2dec memory usage (#3576)
- FontDescriptor: Initiate from embedded font resource (#3551)
### Robustness (ROB)
- Allow fallback to PBM files for jbig2dec without PNG support (#3567)
- Use warning instead of error for early EOD for RunLengthDecode (#3548)
### Developer Experience (DEV)
- Test with macOS as well (#3401)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.4.2...6.5.0)
## Version 6.4.2, 2025-12-14
### Bug Fixes (BUG)
- Fix KeyError when flattening form field without /Font in resources (#3554)
### Robustness (ROB)
- Allow deleting non-existent annotations (#3559)
### Documentation (DOC)
- Fix level of attachment heading (#3560)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.4.1...6.4.2)
## Version 6.4.1, 2025-12-07
### Performance Improvements (PI)
- Optimize loop for layout mode text extraction (#3543)
### Bug Fixes (BUG)
- Do not fail on choice field without /Opt key (#3540)
### Documentation (DOC)
- Document possible issues with merge_page and clipping (#3546)
- Add some notes about library security (#3545)
### Maintenance (MAINT)
- Use CORE_FONT_METRICS for widths where possible (#3526)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.4.0...6.4.1)
## Version 6.4.0, 2025-11-23
### Security (SEC)
- Reduce default limit for LZW decoding
### New Features (ENH)
- Parse and format comb fields in text widget annotations (#3519)
### Robustness (ROB)
- Silently ignore Adobe Ascii85 whitespace for suffix detection (#3528)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.3.0...6.4.0)
## Version 6.3.0, 2025-11-16
### New Features (ENH)
- Wrap and align text in flattened PDF forms (#3465)
### Bug Fixes (BUG)
- Fix missing "PreventGC" when cloning (#3520)
- Preserve JPEG image quality by default (#3516)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.2.0...6.3.0)
## Version 6.2.0, 2025-11-09
### New Features (ENH)
- Add 'strict' parameter to PDFWriter (#3503)
### Bug Fixes (BUG)
- PdfWriter.append fails when there are articles being None (#3509)
### Documentation (DOC)
- Execute docs examples in CI (#3507)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.1.3...6.2.0)
## Version 6.1.3, 2025-10-22
### Security (SEC)
- Allow limiting size of LZWDecode streams (#3502)
- Avoid infinite loop when reading broken DCT-based inline images (#3501)
### Bug Fixes (BUG)
- PageObject.scale() scales media box incorrectly (#3489)
### Robustness (ROB)
- Fail with explicit exception when image mode is an empty array (#3500)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.1.2...6.1.3)
## Version 6.1.2, 2025-10-19
### Bug Fixes (BUG)
- Fix handling of zero-length StreamObject (#3485)
### Robustness (ROB)
- Deal with wrong size for incremental PDF files (#3495)
- Improve handling for malformed cross-reference tables (#3483)
### Developer Experience (DEV)
- Use released Python 3.14
- Use Mapping instead of dict in type hint of update_page_form_field_values (#3490)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.1.1...6.1.2)
## Version 6.1.1, 2025-09-28
### Bug Fixes (BUG)
- Insert new embedded files in a sorted manner (#3477)
- Fix name tree handling for embedded files with Kids-based inputs (#3475)
- Make embedding files not break PDF/A-3 compliance (#3472)
### Documentation (DOC)
- Document AFRelationship handling for PDF/A and provide constants (#3478)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.1.0...6.1.1)
## Version 6.1.0, 2025-09-21
### New Features (ENH)
- Enhance XMP metadata handling with creation and setter methods (#3410)
- Add all font metrics for base 14 Type 1 PDF fonts (#3363)
- Allow deleting embedded files (#3461)
- Add support for Python in FIPS mode for document identifier (#3438)
### Bug Fixes (BUG)
- Fix handling of UTF-16 encoded destination titles (#3463)
- Guard empty input to prevent IndexError (#3448)
### Developer Experience (DEV)
- Fix type hint for XMP metadata setter to add bytes type (#3464)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/6.0.0...6.1.0)
## Version 6.0.0, 2025-08-11
### Security (SEC)
- Limit decompressed size for FlateDecode filter (#3430)
### Deprecations (DEP)
- Drop Python 3.8 support (#3412)
### New Features (ENH)
- Move BlackIs1 functionality to tiff_header (#3421)
### Robustness (ROB)
- Skip Go-To actions without a destination (#3420)
### Developer Experience (DEV)
- Update code style related libraries (#3414)
- Update mypy to 1.17.0 (#3413)
- Stop testing on Python 3.8 and start testing on Python 3.14 (#3411)
### Maintenance (MAINT)
- Cleanup deprecations (#3424)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.9.0...6.0.0)
## Version 5.9.0, 2025-07-27
### New Features (ENH)
- Automatically preserve links in added pages (#3298)
- Allow writing/updating all properties of an embedded file (#3374)
### Bug Fixes (BUG)
- Fix XMP handling dropping indirect references (#3392)
### Robustness (ROB)
- Deal with DecodeParms being empty list (#3388)
### Documentation (DOC)
- Document how to read and modify XMP metadata (#3383)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.8.0...5.9.0)
## Version 5.8.0, 2025-07-13
### New Features (ENH)
- Implement flattening for writer (#3312)
### Bug Fixes (BUG)
- Unterminated object when using PdfWriter with incremental=True (#3345)
### Robustness (ROB)
- Resolve some image extraction edge cases (#3371)
- Ignore faulty trailing newline during RLE decoding (#3355)
- Gracefully handle odd-length strings in parse_bfchar (#3348)
### Developer Experience (DEV)
- Modernize license specifiers (#3338)
### Maintenance (MAINT)
- Reduce max-complexity of tool.ruff.lint.mccabe (#3365)
- Refactor text extraction code
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.7.0...5.8.0)
## Version 5.7.0, 2025-06-29
### Performance Improvements (PI)
- Performance optimization for LZW decoding (#3329)
### Robustness (ROB)
- Flate decoding for streams with faulty tail bytes (#3332)
- dc_creator could be a Bag as well (#3333)
- Handle tree being NullObject when retrieving named destinations (#3331)
### Maintenance (MAINT)
- Move inline-image mappings to constants (#3328)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.6.1...5.7.0)
## Version 5.6.1, 2025-06-22
### New Features (ENH)
- Add PDF/A XMP metadata support (#3314)
### Robustness (ROB)
- Deal with annotations not being lists on merge (#3321)
- Handle NullObject for cmap encoding Differences entry (#3317)
### Developer Experience (DEV)
- Update ruff to 0.12.0 (#3316)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.6.0...5.6.1)
## Version 5.6.0, 2025-06-01
### New Features (ENH)
- Add basic support for JBIG2 by using jbig2dec (#3163)
### Bug Fixes (BUG)
- Fix crashes by removing unnecessary line (#3293)
- Add delimiters to NameObject.renumber_table (#3286)
### Robustness (ROB)
- Handle DecodeParms being a NullObject (#3285)
### Code Style (STY)
- Update to mypy 1.16.0 (#3300)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.5.0...5.6.0)
## Version 5.5.0, 2025-05-11
### New Features (ENH)
- Add support for IndirectObject.__iter__ (#3228)
- Allow filtering by font when removing text (#3216)
### Bug Fixes (BUG)
- Add missing named destinations being ByteStringObjects (#3282)
- Get font information more reliably when removing text (#3252)
- T* 2D Translation consistent with PDF 1.7 Spec (#3250)
- Add font stack to q/Q operations in layout mode (#3225)
- Avoid completely hiding image loading issues like exceeding image size limits (#3221)
- Using compress_identical_objects on transformed content duplicates differing content (#3197)
- Consider BlackIs1 parameter for CCITTFaxDecode filter (#3196)
### Robustness (ROB)
- Deal with insufficient cm matrix during text extraction (#3283)
- Allow merging when annotations miss D entry (#3281)
- Fix merging documents if there are no Dests (#3280)
- Fix crash on malformed action in outline (#3278)
- Fix compression issues for removed images which might be None (#3246)
- Attempt to deal with non-rectangular FlateDecode streams (#3245)
- Handle some None values for broken PDF files (#3230)
### Developer Experience (DEV)
- Multiple style improvements
- Update ruff to 0.11.0
### Maintenance (MAINT)
- Conform ASCIIHexDecode implementation to specification (#3274)
- Modify comments of filters that do not use decode_parms (#3260)
### Code Style (STY)
- Simplify warnings & debugging in layout mode text extraction (#3271)
- Standardize mypy assert statements (#3276)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.4.0...5.5.0)
## Version 5.4.0, 2025-03-16
### New Features (ENH)
- Add support for `IndirectObject.__contains__` (#3155)
### Bug Fixes (BUG)
- Fix detection of inline images followed by names or numbers (#3173)
### Robustness (ROB)
- Consider root objects without catalog type as fallback (#3175)
- Raise proper error on infinite loop when reading objects (#3169)
### Documentation (DOC)
- Mention memory consumption of text extraction (#3168)
### Developer Experience (DEV)
- Upgrade to ruff 0.10.0 (#3191)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.3.1...5.4.0)
## Version 5.3.1, 2025-03-02
### Bug Fixes (BUG)
- Use the correct name StandardEncoding for the predefined cmap (#3156)
- Handle inline images containing `EI ` sequences (#3152)
- Fix check box value which should be name object (#3124)
- Fix stream position on inline image fallback extraction (#3120)
- Fix object count for incremental writer (#3117)
### Robustness (ROB)
- Avoid index errors on empty lines in xref table (#3162)
- Improve handling of LZW decoder table overflow (#3159)
- Ignore non-numbers for width when building font width map (#3158)
- Avoid negative seek values when reading partially broken files (#3157)
### Documentation (DOC)
- Fixed PageObject.images example usage for replacing image (#3149)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.3.0...5.3.1)
## Version 5.3.0, 2025-02-09
### New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108)
### Bug Fixes (BUG)
- Handle annotations being None on merging (#3111)
### Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts (#3082)
### Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078)
- Tidy the visitor function description (#3086)
### Developer Experience (DEV)
- Remove ignoring multiple Ruff rules
- Remove unused mutmut configuration (#3092)
### Testing (TST)
- Fix warning assertions to use `pytest.warns()` (#3083)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.2.0...5.3.0)
## Version 5.2.0, 2025-01-26
### Deprecations (DEP)
- Deprecate with replacement CCITParameters (#3019)
- Correct deprecation of interiour_color (#2947)
### New Features (ENH)
- Support alternative (U)F names for embedded file retrieval (#3072)
- Adding support for reading .metadata.keywords (#2939)
### Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode (#3073)
- Ensure `add_metadata` can deal with `_info = None` (#3040)
- Handle IndirectObject in CCITTFaxDecode filter (#2965)
- Handle chained colorspace for inline images when no filter is set (#3008)
- Avoid extracting inline images twice and dropping other operators (#3002)
- Fixed reference of value with `str.__new__` in TextStringObject (#2952)
- Handle indirect objects in font width calculations (#2967)
- Title sometimes is bytes and not str (#2930)
- Fix undefined variable for text extraction (regression) (#2934)
- Don't close stream passed to PdfWriter.write() (#2909)
### Robustness (ROB)
- Handle zero height fonts when extracting text (#3075)
- Deal with content streams not containing streams (#3005)
- Gracefully handle some text operators when the operands are missing (#3006)
- Fall back to non-Adobe Ascii85 format for missing end markers (#3007)
- Ignore odd-length strings when processing cmap lines (#3009)
- Skip annotation destination being NullObject in PdfWriter (#2964)
- Skip destination page being None in PdfWriter (#2963)
- Fix infinite loop case when reading null objects within an Array
- Fixing infinite loop in ArrayObject read_from_stream (#2928)
### Documentation (DOC)
- Add note about default line colors (#3014)
### Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004 (#3071)
- Tidy ignore array in tool.ruff.lint (#3069)
- Move Windows CI to Python 3.13 (#3003)
- Move to Ubuntu 22.04 (#3004)
### Maintenance (MAINT)
- Fix formatting of warning message and include exception message (#3076)
- Narrow return type for `ContentStream.operations` (#2941)
### Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04 (#3039)
- Replace broken Apache Tika Corpora urls (#3041)
### Code Style (STY)
- Add form feed to WHITESPACES (#3054)
- Lots of small internal changes
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.1.0...5.2.0)
## Version 5.1.0, 2024-10-27
### New Features (ENH)
- Add `layout_mode_font_height_weight` argument to `PageObject.extract_text()` (#2920)
### Bug Fixes (BUG)
- Fix font specificier for FreeText annotation (#2893)
- Line breaks are not generated due to incorrect calculation of text leading (#2890)
- Improve handling of spaces in text extraction (#2882)
### Robustness (ROB)
- Soft failure for flate encode image mode 1 with wrong LUT size (#2900)
### Documentation (DOC)
- Use latest package versions (#2907)
- Correct example of reading FileAttachment annotation (#2906)
### Developer Experience (DEV)
- Update pinned requirements (#2918)
- Make make_release.py compatible with Windows environment (#2894)
### Maintenance (MAINT)
- Remove references to outdated Python versions (#2919)
- Generalize the method of obtaining space_code (#2891)
- Unnecessary character mapping process (#2888)
- New LZW decoding implementation (#2887)
### Testing (TST)
- Add LzwCodec for encoding (#2883)
### Code Style (STY)
- Capitalize error messages (#2903)
- Modify error messages in PdfWriter (#2902)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.0.1...5.1.0)
## Version 5.0.1, 2024-09-29
### New Features (ENH)
- Add `full` parameter to PdfWriter constructor (#2865)
### Bug Fixes (BUG)
- Update pyproject.toml with minimum Python version of 3.8 (#2859)
- Cope with unbalanced delimiters in dictionary object (#2878)
- Cope with encoding with too many differences (#2873)
- Missing spaces in extract_text() method (#1328) (#2868)
- Tolerate truncated files and no warning when jumping startxref (#2855)
### Robustness (ROB)
- Repair PDF with invalid Root object (#2880)
- Continue parsing dictionary object when error is detected (#2872)
- Merge documents with invalid pages in named destinations (#2857)
- Tolerate comments in arrays (#2856)
### Developer Experience (DEV)
- Use latest Python version for benchmarking (#2879)
### Maintenance (MAINT)
- Add tests to source distributions (#2874)
- Refactor _update_field_annotation (#2862)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/5.0.0...5.0.1)
## Version 5.0.0, 2024-09-15
This version drops support for Python 3.7 (not maintained since July 2023), PdfMerger (use PdfWriter instead) and AnnotationBuilder (use annotations instead).
### Deprecations (DEP)
- Remove the deprecated PdfMerger and AnnotationBuilder classes and other deprecations cleanup (#2813)
- Drop Python 3.7 support (#2793)
### New Features (ENH)
- Add capability to remove /Info from PDF (#2820)
- Add incremental capability to PdfWriter (#2811)
- Add UniGB-UTF16 encodings (#2819)
- Accept utf strings for metadata (#2802)
- Report PdfReadError instead of RecursionError (#2800)
- Compress PDF files merging identical objects (#2795)
### Bug Fixes (BUG)
- Fix sheared image (#2801)
### Robustness (ROB)
- Robustify .set_data() (#2821)
- Raise PdfReadError when missing /Root in trailer (#2808)
- Fix extract_text() issues on damaged PDFs (#2760)
- Handle images with empty data when processing an image from bytes (#2786)
### Developer Experience (DEV)
- Fix coverage uploads (#2832)
- Test against Python 3.13 (#2776)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.3.1...5.0.0)
## Version 4.3.1, 2024-07-21
### Bug Fixes (BUG)
- Cope with Matrix entry in field annotations (#2736)
### Robustness (ROB)
- Cope with fields with upside down box/rectangle (#2729)
### Maintenance (MAINT)
- Add deprecate_with_replacement to StreamObject.initializeFromD… (#2728)
- Deal with cryptography>=43 moving ARC4 (#2765)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.3.0...4.3.1)
## Version 4.3.0, 2024-06-23
### New Features (ENH)
- Accept ETen-B5 and UniCNS-UTF16 encodings (#2721)
- Add decode_as_image() to ContentStreams (#2615)
- context manager for PdfReader (#2666)
- Add capability to set font and size in fields (#2636)
- Allow to pass input file without named argument (#2576)
### Bug Fixes (BUG)
- Fix deprecation for Ressources when using old constants (#2705)
- Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675)
- Reading large compressed images takes huge time to process (#2644)
- Highlighted Text Cannot Be Printed (#2604)
- Fix UnboundLocalError on malformed pdf (#2619)
### Robustness (ROB)
- Cope with missing Standard 14 fonts in fields (#2677)
- Improve inline image extraction (#2622)
- Cope with loops in Fields tree (#2656)
- Discard /I in choice fields for compatibility with Acrobat (#2614)
- Cope with some issues in pillow (#2595)
- Cope with some image extraction issues (#2591)
### Documentation (DOC)
- Various improvements on docstrings and examples
### Maintenance (MAINT)
- Deprecate interiour_color with replacement interior_color (#2706)
- Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674)
### Code Style (STY)
- Change Link to be a non-markup annotation (#2714)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.2.0...4.3.0)
## Version 4.2.0, 2024-04-07
### New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream (#2585)
- Add support for /Kids in page labels (#2562)
- Allow to update fields on many pages (#2571)
- Tolerate PDF with invalid xref pointed objects (#2335)
- Add Enforce from PDF2.0 in viewer_preferences (#2511)
- Add += and -= operators to ArrayObject (#2510)
### Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ' (#2588)
- Fix fields update where annotations are kids of field (#2570)
- Process CMYK images without a filter correctly (#2557)
- Extract text in layout mode without finding resources (#2555)
- Prevent recursive loop in some PDF files (#2505)
### Robustness (ROB)
- Tolerate "truncated" xref (#2580)
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334)
- Rebuild xref table if one entry is invalid (#2528)
- Robustify stream extraction (#2526)
### Documentation (DOC)
- Update release process for latest changes (#2564)
- Encryption/decryption: Clone document instead of copying all pages (#2546)
- Minor improvements (#2542)
- Update annotation list (#2534)
- Update references and formatting (#2529)
- Correct threads reference, plus minor changes (#2521)
- Minor readability increases (#2515)
- Simplify PaperSize examples (#2504)
- Minor improvements (#2501)
### Developer Experience (DEV)
- Remove unused dependencies (#2572)
- Remove page labels PR link from message (#2561)
- Fix changelog generator regarding whitespace and handling of "Other" group (#2492)
- Add REL to known PR prefixes (#2554)
- Release using the REL commit instead of git tag (#2500)
- Unify code between PdfReader and PdfWriter (#2497)
- Bump softprops/action-gh-release from 1 to 2 (#2514)
### Maintenance (MAINT)
- Ressources → Resources (and internal name childs) (#2550)
- Fix typos found by codespell (#2549)
- Update Read the Docs configuration (#2538)
- Add root_object, _info and _ID to PdfReader (#2495)
### Testing (TST)
- Allow loading truncated images if required (#2586)
- Fix download issues from #2562 (#2578)
- Improve test_get_contents_from_nullobject to show real use-case (#2524)
- Add missing test annotations (#2507)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.1.0...4.2.0)
## Version 4.1.0, 2024-03-03
Generating name objects (`NameObject`) without a leading slash
is considered deprecated now. Previously, just a plain warning
would be logged, leading to possibly invalid PDF files. According
to our deprecation policy, this will log a *DeprecationWarning*
for now.
### New Features (ENH)
- Add get_pages_from_field (#2494)
- Add reattach_fields function (#2480)
- Automatic access to pointed object for IndirectObject (#2464)
### Bug Fixes (BUG)
- Missing error on name without leading / (#2387)
- encode_pdfdocencoding() always returns bytes (#2440)
- BI in text content identified as image tag (#2459)
### Robustness (ROB)
- Missing basefont entry in type 3 font (#2469)
### Documentation (DOC)
- Improve lossless compression example (#2488)
- Amend robustness documentation (#2479)
### Developer Experience (DEV)
- Fix changelog for UTF-8 characters (#2462)
### Maintenance (MAINT)
- Add _get_page_number_from_indirect in writer (#2493)
- Remove user assignment for feature requests (#2483)
- Remove reference to old 2.0.0 branch (#2482)
### Testing (TST)
- Fix benchmark failures (#2481)
- Broken test due to expired test file URL (#2468)
- Resolve file naming conflict in test_iss1767 (#2445)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.0.2...4.1.0)
## Version 4.0.2, 2024-02-18
### Bug Fixes (BUG)
- Use NumberObject for /Border elements of annotations (#2451)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.0.1...4.0.2)
## Version 4.0.1, 2024-01-28
### Bug Fixes (BUG)
- layout mode text extraction ZeroDivisionError (#2417)
### Testing (TST)
- Skip tests using fpdf2 if it's not installed (#2419)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/4.0.0...4.0.1)
## Version 4.0.0, 2024-01-19
### Deprecations (DEP)
- Drop Python 3.6 support (#2369)
- Remove deprecated code (#2367)
- Remove deprecated XMP properties (#2386)
### New Features (ENH)
- Add "layout" mode for text extraction (#2388)
- Add Jupyter Notebook integration for PdfReader (#2375)
- Improve/rewrite PDF permission retrieval (#2400)
### Bug Fixes (BUG)
- PdfWriter.add_uri was setting the wrong type (#2406)
- Add support for GBK2K cmaps (#2385)
### Maintenance (MAINT)
- Return None instead of -1 when page is not attached (#2376)
- Complete FileSpecificationDictionaryEntries constants (#2416)
- Replace warning with logging.error (#2377)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.17.4...4.0.0)
## Version 3.17.4, 2023-12-24
### Bug Fixes (BUG)
- Handle IndirectObject as image filter (#2355)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.17.3...3.17.4)
## Version 3.17.3, 2023-12-17
### Robustness (ROB)
- Out-of-bounds issue in handle_tj (text extraction) (#2342)
### Developer Experience (DEV)
- Make make_release.py easier to configure (#2348)
### Maintenance (MAINT)
- Bump actions/download-artifact from 3 to 4 (#2344)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.17.2...3.17.3)
## Version 3.17.2, 2023-12-10
### Bug Fixes (BUG)
- Cope with deflated images with CMYK Black Only (#2322)
- Handle indirect objects as parameters for CCITTFaxDecode (#2307)
- check words length in _cmap type1_alternative function (#2310)
### Robustness (ROB)
- Relax flate decoding for too many lookup values (#2331)
- Let _build_destination skip in case of missing /D key (#2018)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.17.1...3.17.2)
## Version 3.17.1, 2023-11-14
### Bug Fixes (BUG)
- Mediabox expansion size when applying non-right angle rotation (#2282)
### Robustness (ROB)
- MissingWidth is IndirectObject (#2288)
- Initialize states array with an empty value (#2280)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.17.0...3.17.1)
## Version 3.17.0, 2023-10-29
### Security (SEC)
- Infinite recursion when using PdfWriter(clone_from=reader) (#2264)
### New Features (ENH)
- Add parameter to select images to be removed (#2214)
### Bug Fixes (BUG)
- Correctly handle image mode 1 with FlateDecode (#2249)
- Error when filling a value with parentheses #2268 (#2269)
- Handle empty root outline (#2239)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.16.4...3.17.0)
## Version 3.16.4, 2023-10-10
### Bug Fixes (BUG)
- Avoid exceeding recursion depth when retrieving image mode (#2251)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.16.3...3.16.4)
## Version 3.16.3, 2023-10-08
### Bug Fixes (BUG)
- Invalid cm/tm in visitor functions (#2206)
- Encrypt / decrypt Stream object dictionaries (#2228)
- Support nested color spaces for the /DeviceN color space (#2241)
- Images property fails if NullObject in list (#2215)
### Developer Experience (DEV)
- Unify mypy options and warn redundant workarounds (#2223)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.16.2...3.16.3)
## Version 3.16.2, 2023-09-24
### Bug Fixes (BUG)
- PDF size increases because of too high float writing precision (#2213)
- Fix test_watermarking_reportlab_rendering() (#2203)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.16.1...3.16.2)
## Version 3.16.1, 2023-09-17
⚠️ The 'rename PdfWriter.create_viewer_preference to
PdfWriter.create_viewer_preferences (#2190)' could be a breaking change for you,
if you use it. As it was only introduced last week I'm confident enough that
nobody will be affected though. Hence only the patch update.
### Bug Fixes (BUG)
- Missing new line in extract_text with cm operations (#2142)
- _get_fonts not processing properly CIDFonts and annotations (#2194)
### Maintenance (MAINT)
- Rename PdfWriter.create_viewer_preference to PdfWriter.create_viewer_preferences (#2190)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.16.0...3.16.1)
## Version 3.16.0, 2023-09-10
### Security (SEC)
- Infinite recursion caused by IndirectObject clone (#2156)
### New Features (ENH)
- Ease access to ViewerPreferences (#2144)
### Bug Fixes (BUG)
- Catch the case where w[0] is an IndirectObject instead of an int (#2154)
- Cope with indirect objects in filters and remove deprecated code (#2177)
- Accept tabs in cmaps (#2174) / cope with extra space (#2151)
- Merge pages without resources (#2150)
- getcontents() shall return None if contents is NullObject (#2161)
- Fix conversion from 1 to LA (#2175)
### Robustness (ROB)
- Accept XYZ with no arguments (#2178)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.5...3.16.0)
## Version 3.15.5, 2023-09-03
### Bug Fixes (BUG)
- Cope with missing /I in articles (#2134)
- Fix image look-up table in EncodedStreamObject (#2128)
- remove_images not operating in sub level forms (#2133)
### Robustness (ROB)
- Cope with damaged PDF (#2129)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.4...3.15.5)
## Version 3.15.4, 2023-08-27
### Performance Improvements (PI)
- Making pypdf as fast as pdfrw (#2086)
### Maintenance (MAINT)
- Relax typing_extensions version (#2104)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.3...3.15.4)
## Version 3.15.3, 2023-08-26
### Bug Fixes (BUG)
- Check version of crypt provider (#2115)
- TypeError: can't concat str to bytes (#2114)
- Require flit_core >= 3.9 (#2091)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.2...3.15.3)
## Version 3.15.2, 2023-08-20
### Security (SEC)
- Avoid endless recursion of reading damaged PDF file (#2093)
### Performance Improvements (PI)
- Reuse content stream (#2101)
### Maintenance (MAINT)
- Make ParseError inherit from PyPdfError (#2097)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.1...3.15.2)
## Version 3.15.1, 2023-08-13
### Performance Improvements (PI)
- optimize _decode_png_prediction (#2068)
### Bug Fixes (BUG)
- Fix incorrect tm_matrix in call to visitor_text (#2060)
- Writing German characters into form fields (#2047)
- Prevent stall when accessing image in corrupted pdf (#2081)
- append() fails when articles do not have /T (#2080)
### Robustness (ROB)
- Cope with xref not followed by separator (#2083)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.15.0...3.15.1)
## Version 3.15.0, 2023-08-06
### New Features (ENH)
- Add `level` parameter to compress_content_streams (#2044)
- Process /uniHHHH for text_extract (#2043)
### Bug Fixes (BUG)
- Fix AnnotationBuilder.link (#2066)
- JPX image without ColorSpace (#2062)
- Added check for field /Info when cloning reader document (#2055)
- Fix indexed/CMYK images (#2039)
### Maintenance (MAINT)
- Cryptography as primary dependency (#2053)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.14.0...3.15.0)
## Version 3.14.0, 2023-07-29
### New Features (ENH)
- Accelerate image list keys generation (#2014)
- Use `cryptography` for encryption/decryption as a fallback for PyCryptodome (#2000)
- Extract LaTeX characters (#2016)
- ASCIIHexDecode.decode now returns bytes instead of str (#1994)
### Bug Fixes (BUG)
- Add RunLengthDecode filter (#2012)
- Process /Separation ColorSpace (#2007)
- Handle single element ColorSpace list (#2026)
- Process lookup decoded as TextStringObjects (#2008)
### Robustness (ROB)
- Cope with garbage collector during cloning (#1841)
### Maintenance (MAINT)
- Cleanup of annotations (#1745)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.13.0...3.14.0)
## Version 3.13.0, 2023-07-23
### New Features (ENH)
- Add is_open in outlines in PdfReader and PdfWriter (#1960)
### Bug Fixes (BUG)
- Search /DA in hierarchy fields (#2002)
- Cope with different ISO date length (#1999)
- Decode Black only/CMYK deviceN images (#1984)
- Process CMYK in deflate images (#1977)
### Developer Experience (DEV)
- Add mypy to pre-commit (#2001)
- Release automation (#1991, #1985)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.12.2...3.13.0)
## Version 3.12.2, 2023-07-16
### Bug Fixes (BUG)
- Accept calRGB and calGray color_spaces (#1968)
- Process 2bits and 4bits images (#1967)
- Check for AcroForm and ensure it is not None (#1965)
### Developer Experience (DEV)
- Automate the release process (#1970)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.12.1...3.12.2)
## Version 3.12.1, 2023-07-09
### Bug Fixes (BUG)
- Prevent updating page contents after merging page (stamping/watermarking) (#1952)
- % to be hex encoded in names (#1958)
- Inverse color in CMYK images (#1947)
- Dates conversion not working with Z00\'00\' (#1946)
- Support UTF-16-LE Strings (#1884)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.12.0...3.12.1)
## Version 3.12.0, 2023-07-02
### New Features (ENH)
- Add AES support for encrypting PDF files (#1918, #1935, #1936, #1938)
- Add page deletion feature to PdfWriter (#1843)
### Bug Fixes (BUG)
- PdfReader.get_fields() attempts to delete non-existing index "/Off" (#1933)
- Remove unused objects when cloning_from (#1926)
- Add the TK.SIZE into the trailer (#1911)
- add_named_destination() maintains named destination list sort order (#1930)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.11.1...3.12.0)
## Version 3.11.1, 2023-06-25
### Bug Fixes (BUG)
- Cascaded filters in image objects (#1913)
- Append pdf with named destination using numbers for pages (#1858)
- Ignore "/B" fields only on pages in PdfWriter.append() (#1875)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.11.0...3.11.1)
## Version 3.11.0, 2023-06-23
### New Features (ENH)
- Add page_number property (#1856)
### Bug Fixes (BUG)
- File expansion when updating with Page Contents (#1906)
- Missing Alternate in indexed/ICCbased colorspaces (#1896)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.10.0...3.11.0)
## Version 3.10.0, 2023-06-18
### New Features (ENH)
- Extraction of inline images (#1850)
- Add capability to replace image (#1849)
- Extend images interface by returning an ImageFile(File) class (#1848)
- Add set_data to EncodedStreamObject (#1854)
### Bug Fixes (BUG)
- Fix RGB FlateEncode Images(PNG) and transparency (#1834)
- Generate static appearance for fields (#1864)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.9.1...3.10.0)
## Version 3.9.1, 2023-06-04
### Deprecations (DEP)
- Deprecate PdfMerger (#1866)
### Bug Fixes (BUG)
- Ignore UTF-8 decode errors (#1865)
### Robustness (ROB)
- Handle missing /Type entry in Page tree (#1859)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.9.0...3.9.1)
## Version 3.9.0, 2023-05-21
### New Features (ENH)
- Simplify metadata input (Document Information Dictionary) (#1851)
- Extend cmap compatibility to GBK_EUC_H/V (#1812)
### Bug Fixes (BUG)
- Prevent infinite loop when no character follows after a comment (#1828)
- get_contents does not return ContentStream (#1847)
- Accept XYZ destination with zoom missing (default to zoom=0.0) (#1844)
- Cope with 1 Bit images (#1815)
### Robustness (ROB)
- Handle missing /Type entry in Page tree (#1845)
### Documentation (DOC)
- Expand file size explanations (#1835)
- Add comparison with pdfplumber (#1837)
- Clarify that PyPDF2 is dead (#1827)
- Add Hunter King as Contributor for #1806
### Maintenance (MAINT)
- Refactor internal Encryption class (#1821)
- Add R parameter to generate_values (#1820)
- Make encryption_key parameter of write_to_stream optional (#1819)
- Prepare for adding AES encryption support (#1818)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.8.1...3.9.0)
## Version 3.8.1, 2023-04-23
### Bug Fixes (BUG)
- Convert color space before saving (#1802)
### Documentation (DOC)
- PDF/A (#1807)
- Use append instead of add_page
- Document core mechanics of pypdf (#1783)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.8.0...3.8.1)
## Version 3.8.0, 2023-04-16
### New Features (ENH)
- Add transform method to Transformation class (#1765)
- Cope with UC2 fonts in text_extraction (#1785)
### Robustness (ROB)
- Invalid startxref pointing 1 char before (#1784)
### Maintenance (MAINT)
- Mark code handling old parameters as deprecated (#1798)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.7.1...3.8.0)
## Version 3.7.1, 2023-04-09
### Security (SEC)
- Warn about PDF encryption security (#1755)
### Robustness (ROB)
- Prevent loop in Cloning (#1770)
- Capture UnicodeDecodeError at PdfReader.pdf_header (#1768)
### Documentation (DOC)
- Add .readthedocs.yaml and bump docs dependencies using `tox -e deps` (#1750, #1752)
### Developer Experience (DEV)
- Make make_changelog.py idempotent
### Maintenance (MAINT)
- Move generation of file identifiers to a method (#1760)
### Testing (TST)
- Add xmp test (#1775)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.7.0...3.7.1)
## Version 3.7.0, 2023-03-26
### Security (SEC)
- Use Python's secrets module instead of random module (#1748)
### New Features (ENH)
- Add AnnotationBuilder.highlight text markup annotation (#1740)
- Add AnnotationBuilder.popup (#1665)
- Add AnnotationBuilder.polyline annotation support (#1726)
- Add clone_from parameter in PdfWriter constructor (#1703)
### Bug Fixes (BUG)
- 'DictionaryObject' object has no attribute 'indirect_reference' (#1729)
### Robustness (ROB)
- Handle params NullObject in decode_stream_data (#1738)
### Documentation (DOC)
- Project scope (#1743)
### Maintenance (MAINT)
- Add AnnotationFlag (#1746)
- Add LazyDict.__str__ (#1727)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.6.0...3.7.0)
## Version 3.6.0, 2023-03-18
### New Features (ENH)
- Extend PdfWriter.append() to PageObjects (#1704)
- Support qualified names in update_page_form_field_values (#1695)
### Robustness (ROB)
- Tolerate streams without length field (#1717)
- Accept DictionaryObject in /D of NamedDestination (#1720)
- Widths def in cmap calls IndirectObject (#1719)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.5.2...3.6.0)
## Version 3.5.2, 2023-03-12
⚠️ We discovered that compress_content_stream has to be applied to a page of
the PdfWriter. It may not be applied to a page of the PdfReader!
### Bug Fixes (BUG)
- compress_content_stream not readable in Adobe Acrobat (#1698)
- Pass logging parameters correctly in set_need_appearances_writer (#1697)
- Write /Root/AcroForm in set_need_appearances_writer (#1639)
### Robustness (ROB)
- Allow more whitespaces within linearized file (#1701)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.5.1...3.5.2)
## Version 3.5.1, 2023-03-05
### Robustness (ROB)
- Some attributes not copied in DictionaryObject._clone (#1635)
- Allow merging multiple time pages with annots (#1624)
### Testing (TST)
- Replace pytest.mark.external by enable_socket (#1657)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.5.0...3.5.1)
## Version 3.5.0, 2023-02-26
### New Features (ENH)
- Add reader.attachments public interface (#1611, #1661)
- Add PdfWriter.remove_objects_from_page(page: PageObject, to_delete: ObjectDeletionFlag) (#1648)
- Allow free-text annotation to have transparent border/background (#1664)
### Bug Fixes (BUG)
- Allow decryption with empty password for AlgV5 (#1663)
- Let PdfWriter.pages return PageObject after calling `clone_document_from_reader()` (#1613)
- Invalid font pointed during merge_resources (#1641)
### Robustness (ROB)
- Cope with invalid objects in IndirectObject.clone (#1637)
- Improve tolerance to invalid Names/Dests (#1658)
- Decode encoded values in get_fields (#1636)
- Let PdfWriter.merge cope with missing "/Fields" (#1628)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.4.1...3.5.0)
## Version 3.4.1, 2023-02-12
### Bug Fixes (BUG)
- Switch from trimbox to cropbox when merging pages (#1622)
- Text extraction not working with one glyph to char sequence (#1620)
### Robustness (ROB)
- Fix 2 cases of "object has no attribute \'indirect_reference\'" (#1616)
### Testing (TST)
- Add multiple retry on get_url for external PDF downloads (#1626)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.4.0...3.4.1)
## Version 3.4.0, 2023-02-05
NOTICE: pypdf changed the way it represents numbers parsed from PDF files.
pypdf<3.4.0 represented numbers as Decimal, pypdf>=3.4.0 represents them as
floats. Several other PDF libraries to this, as well as many PDF viewers.
We hope to fix issues with too high precision like this and get a speed boost.
In case your PDF documents rely on more than 18 decimals of precision you
should check if it still works as expected.
To clarify: This does not affect the text shown in PDF documents. It affects
numbers, e.g. when graphics are drawn on the PDF or very exact positions are
used. Typically, 5 decimals should be enough.
### New Features (ENH)
- Enable merging forms with overlapping names (#1553)
- Add 'over' parameter to merge_transformend_page & co (#1567)
### Bug Fixes (BUG)
- Fix getter of the PageObject.rotation property with an indirect object (#1602)
- Restore merge_transformed_page & co (#1567)
- Replace decimal by float (#1563)
### Robustness (ROB)
- PdfWriter.remove_images: /Contents might not be in page_ref (#1598)
### Developer Experience (DEV)
- Introduce ruff (#1586, #1609)
### Maintenance (MAINT)
- Remove decimal (#1608)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.3.0...3.4.0)
## Version 3.3.0, 2023-01-22
### New Features (ENH)
- Add page label support to PdfWriter (#1558)
- Accept inline images with space before EI (#1552)
- Add circle annotation support (#1556)
- Add polygon annotation support (#1557)
- Make merging pages produce a deterministic PDF (#1542, #1543)
### Bug Fixes (BUG)
- Fix error in cmap extraction (#1544)
- Remove erroneous assertion check (#1564)
- Fix dictionary access of optional page label keys (#1562)
### Robustness (ROB)
- Set ignore_eof=True for read_until_regex (#1521)
### Documentation (DOC)
- Paper size (#1550)
### Developer Experience (DEV)
- Fix broken combination of dependencies of docs.txt
- Annotate tests appropriately (#1551)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.2.1...3.3.0)
## Version 3.2.1, 2023-01-08
### Bug Fixes (BUG)
- Accept hierarchical fields (#1529)
### Documentation (DOC)
- Use google style docstrings (#1534)
- Fix linked markdown documents (#1537)
### Developer Experience (DEV)
- Update docs config (#1535)
## Version 3.2.0, 2022-12-31
### Performance Improvement (PI)
- Help the specializing adaptive interpreter (#1522)
### New Features (ENH)
- Add support for page labels (#1519)
### Bug Fixes (BUG)
- upgrade clone_document_root (#1520)
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.1.0...3.1.1)
## Version 3.1.0, 2022-12-23
Move PyPDF2 to pypdf (#1513). This now it's all lowercase, no number in the
name. For installation and for import. PyPDF2 will no longer receive updates.
The community should move back to its roots.
If you were still using pyPdf or PyPDF2 < 2.0.0, I recommend reading the
migration guide: https://pypdf.readthedocs.io/en/latest/user/migration-1-to-2.html
pypdf==3.1.0 is only different from PyPDF2==3.0.0 in the package name.
Replacing "PyPDF2" by "pypdf" should be enough if you migrate from
`PyPDF2==3.0.0` to `pypdf==3.1.0`.
[Full Changelog](https://github.com/py-pdf/pypdf/compare/3.0.0...3.1.0)
## Version 3.0.0, 2022-12-22
### BREAKING CHANGES ⚠️
- Deprecate features with PyPDF2==3.0.0 (#1489)
- Refactor Fit / Zoom parameters (#1437)
### New Features (ENH)
- Add Cloning (#1371)
- Allow int for indirect_reference in PdfWriter.get_object (#1490)
### Documentation (DOC)
- How to read PDFs from S3 (#1509)
- Make MyST parse all links as simple hyperlinks (#1506)
- Changed 'latest' for 'stable' generated docs (#1495)
- Adjust deprecation procedure (#1487)
### Maintenance (MAINT)
- Use typing.IO for file streams (#1498)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.12.1...3.0.0)
## Version 2.12.1, 2022-12-10
### Documentation (DOC)
- Deduplicate extract_text docstring (#1485)
- How to cite PyPDF2 (#1476)
### Maintenance (MAINT)
Consistency changes:
- indirect_ref/ido ➔ indirect_reference, dest➔ page_destination (#1467)
- owner_pwd/user_pwd ➔ owner_password/user_password (#1483)
- position ➜ page_number in Merger.merge (#1482)
- indirect_ref ➜ indirect_reference (#1484)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.12.0...2.12.1)
## Version 2.12.0, 2022-12-10
### New Features (ENH)
- Add support to extract gray scale images (#1460)
- Add 'threads' property to PdfWriter (#1458)
- Add 'open_destination' property to PdfWriter (#1431)
- Make PdfReader.get_object accept integer arguments (#1459)
### Bug Fixes (BUG)
- Scale PDF annotations (#1479)
### Robustness (ROB)
- Padding issue with AES encryption (#1469)
- Accept empty object as null objects (#1477)
### Documentation (DOC)
- Add module documentation the PaperSize class (#1447)
### Maintenance (MAINT)
- Use 'page_number' instead of 'pagenum' (#1365)
- Add List of pages to PageRangeSpec (#1456)
### Testing (TST)
- Cleanup temporary files (#1454)
- Mark test_tounicode_is_identity as external (#1449)
- Use Ubuntu 20.04 for running CI test suite (#1452)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.11.2...2.12.0)
## Version 2.11.2, 2022-11-20
### New Features (ENH)
- Add remove_from_tree (#1432)
- Add AnnotationBuilder.rectangle (#1388)
### Bug Fixes (BUG)
- JavaScript executed twice (#1439)
- ToUnicode stores /Identity-H instead of stream (#1433)
- Declare Pillow as optional dependency (#1392)
### Developer Experience (DEV)
- Link 'Full Changelog' automatically
- Modify read_string_from_stream to a benchmark (#1415)
- Improve error reporting of read_object (#1412)
- Test Python 3.11 (#1404)
- Extend Flake8 ignore list (#1410)
- Use correct pytest markers (#1407)
- Move project configuration to pyproject.toml (#1382)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.11.1...2.11.2)
## Version 2.11.1, 2022-10-09
### Bug Fixes (BUG)
- td matrix (#1373)
- Cope with cmap from #1322 (#1372)
### Robustness (ROB)
- Cope with str returned from get_data in cmap (#1380)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.11.0...2.11.1)
## Version 2.11.0, 2022-09-25
### New Features (ENH)
- Addition of optional visitor-functions in extract_text() (#1252)
- Add metadata.creation_date and modification_date (#1364)
- Add PageObject.images attribute (#1330)
### Bug Fixes (BUG)
- Lookup index in _xobj_to_image can be ByteStringObject (#1366)
- 'IndexError: index out of range' when using extract_text (#1361)
- Errors in transfer_rotation_to_content() (#1356)
### Robustness (ROB)
- Ensure update_page_form_field_values does not fail if no fields (#1346)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.9...2.11.0)
## Version 2.10.9, 2022-09-18
### New Features (ENH)
- Add rotation property and transfer_rotate_to_content (#1348)
### Performance Improvements (PI)
- Avoid string concatenation with large embedded base64-encoded images (#1350)
### Bug Fixes (BUG)
- Format floats using their intrinsic decimal precision (#1267)
### Robustness (ROB)
- Fix merge_page for pages without resources (#1349)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.8...2.10.9)
## Version 2.10.8, 2022-09-14
### New Features (ENH)
- Add PageObject.user_unit property (#1336)
### Robustness (ROB)
- Improve NameObject reading/writing (#1345)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.7...2.10.8)
## Version 2.10.7, 2022-09-11
### Bug Fixes (BUG)
- Fix Error in transformations (#1341)
- Decode #23 in NameObject (#1342)
### Testing (TST)
- Use pytest.warns() for warnings, and .raises() for exceptions (#1325)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.6...2.10.7)
## Version 2.10.6, 2022-09-09
### Robustness (ROB)
- Fix infinite loop due to Invalid object (#1331)
- Fix image extraction issue with superfluous whitespaces (#1327)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.5...2.10.6)
## Version 2.10.5, 2022-09-04
### New Features (ENH)
- Process XRefStm (#1297)
- Auto-detect RTL for text extraction (#1309)
### Bug Fixes (BUG)
- Avoid scaling cropbox twice (#1314)
### Robustness (ROB)
- Fix offset correction in revised PDF (#1318)
- Crop data of /U and /O in encryption dictionary to 48 bytes (#1317)
- MultiLine bfrange in cmap (#1299)
- Cope with 2 digit codes in bfchar (#1310)
- Accept '/annn' charset as ASCII code (#1316)
- Log errors during Float / NumberObject initialization (#1315)
- Cope with corrupted entries in xref table (#1300)
### Documentation (DOC)
- Migration guide (PyPDF2 1.x ➔ 2.x) (#1324)
- Creating a coverage report (#1319)
- Fix AnnotationBuilder.free_text example (#1311)
- Fix usage of page.scale by replacing it with page.scale_by (#1313)
### Maintenance (MAINT)
- PdfReaderProtocol (#1303)
- Throw PdfReadError if Trailer can't be read (#1298)
- Remove catching OverflowException (#1302)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.4...2.10.5)
## Version 2.10.4, 2022-08-28
### Robustness (ROB)
- Fix errors/warnings on no /Resources within extract_text (#1276)
- Add required line separators in ContentStream ArrayObjects (#1281)
### Maintenance (MAINT)
- Use NameObject idempotency (#1290)
### Testing (TST)
- Rectangle deletion (#1289)
- Add workflow tests (#1287)
- Remove files after tests ran (#1286)
### Packaging (PKG)
- Add minimum version for typing_extensions requirement (#1277)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.3...2.10.4)
## Version 2.10.3, 2022-08-21
### Robustness (ROB)
- Decrypt returns empty bytestring (#1258)
### Developer Experience (DEV)
- Modify CI to better verify built package contents (#1244)
### Maintenance (MAINT)
- Remove 'mine' as PdfMerger always creates the stream (#1261)
- Let PdfMerger._create_stream raise NotImplemented (#1251)
- password param of _security._alg32(...) is only a string, not bytes (#1259)
- Remove unreachable code in read_block_backwards (#1250)
and sign function in _extract_text (#1262)
### Testing (TST)
- Delete annotations (#1263)
- Close PdfMerger in tests (#1260)
- PdfReader.xmp_metadata workflow (#1257)
- Various PdfWriter (Layout, Bookmark deprecation) (#1249)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.2...2.10.3)
## Version 2.10.2, 2022-08-15
BUG: Add PyPDF2.generic to PyPI distribution
## Version 2.10.1, 2022-08-15
### Bug Fixes (BUG)
- TreeObject.remove_child had a non-PdfObject assignment for Count (#1233, #1234)
- Fix stream truncated prematurely (#1223)
### Documentation (DOC)
- Fix docstring formatting (#1228)
### Maintenance (MAINT)
- Split generic.py (#1229)
### Testing (TST)
- Decrypt AlgV4 with owner password (#1239)
- AlgV5.generate_values (#1238)
- TreeObject.remove_child / empty_tree (#1235, #1236)
- create_string_object (#1232)
- Free-Text annotations (#1231)
- generic._base (#1230)
- Strict get fonts (#1226)
- Increase PdfReader coverage (#1219, #1225)
- Increase PdfWriter coverage (#1237)
- 100% coverage for utils.py (#1217)
- PdfWriter exception non-binary stream (#1218)
- Don't check coverage for deprecated code (#1216)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.10.0...2.10.1)
## Version 2.10.0, 2022-08-07
### New Features (ENH)
- "with" support for PdfMerger and PdfWriter (#1193)
- Add AnnotationBuilder.text(...) to build text annotations (#1202)
### Bug Fixes (BUG)
- Allow IndirectObjects as stream filters (#1211)
### Documentation (DOC)
- Font scrambling
- Page vs Content scaling (#1208)
- Example for orientation parameter of extract_text (#1206)
- Fix AnnotationBuilder parameter formatting (#1204)
### Developer Experience (DEV)
- Add flake8-print (#1203)
### Maintenance (MAINT)
- Introduce WrongPasswordError / FileNotDecryptedError / EmptyFileError (#1201)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.9.0...2.10.0)
## Version 2.9.0, 2022-07-31
### New Features (ENH)
- Add ability to add hex encoded colors to outline items (#1186)
- Add support for pathlib.Path in PdfMerger.merge (#1190)
- Add link annotation (#1189)
- Add capability to filter text extraction by orientation (#1175)
### Bug Fixes (BUG)
- Named Dest in PDF1.1 (#1174)
- Incomplete Graphic State save/restore (#1172)
### Documentation (DOC)
- Update changelog url in package metadata (#1180)
- Mention camelot for table extraction (#1179)
- Mention pyHanko for signing PDF documents (#1178)
- Weow have CMAP support since a while (#1177)
### Maintenance (MAINT)
- Consistent usage of warnings / log messages (#1164)
- Consistent terminology for outline items (#1156)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.8.1...2.9.0)
## Version 2.8.1, 2022-07-25
### Bug Fixes (BUG)
- u_hash in AlgV4.compute_key (#1170)
### Robustness (ROB)
- Fix loading of file from #134 (#1167)
- Cope with empty DecodeParams (#1165)
### Documentation (DOC)
- Typo in merger deprecation warning message (#1166)
### Maintenance (MAINT)
- Package updates; solve mypy strict remarks (#1163)
### Testing (TST)
- Add test from #325 (#1169)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.8.0...2.8.1)
## Version 2.8.0, 2022-07-24
### New Features (ENH)
- Add writer.add_annotation, page.annotations, and generic.AnnotationBuilder (#1120)
### Bug Fixes (BUG)
- Set /AS for /Btn form fields in writer (#1161)
- Ignore if /Perms verify failed (#1157)
### Robustness (ROB)
- Cope with utf16 character for space calculation (#1155)
- Cope with null params for FitH / FitV destination (#1152)
- Handle outlines without valid destination (#1076)
### Developer Experience (DEV)
- Introduce _utils.logger_warning (#1148)
### Maintenance (MAINT)
- Break up parse_to_unicode (#1162)
- Add diagnostic output to exception in read_from_stream (#1159)
- Reduce PdfReader.read complexity (#1151)
### Testing (TST)
- Add workflow tests found by arc testing (#1154)
- Decrypt file which is not encrypted (#1149)
- Test CryptRC4 encryption class; test image extraction filters (#1147)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.7.0...2.8.0)
## Version 2.7.0, 2022-07-21
### New Features (ENH)
- Add `outline_count` property (#1129)
### Bug Fixes (BUG)
- Make reader.get_fields also return dropdowns with options (#1114)
- Add deprecated EncodedStreamObject functions back until PyPDF2==3.0.0 (#1139)
### Robustness (ROB)
- Cope with missing /W entry (#1136)
- Cope with invalid parent xref (#1133)
### Documentation (DOC)
- Contributors file (#1132)
- Fix type in signature of PdfWriter.add_uri (#1131)
### Developer Experience (DEV)
- Add .git-blame-ignore-revs (#1141)
### Code Style (STY)
- Fixing typos (#1137)
- Reuse code via get_outlines_property in tests (#1130)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.6.0...2.7.0)
## Version 2.6.0, 2022-07-17
### New Features (ENH)
- Add color and font_format to PdfReader.outlines[i] (#1104)
- Extract Text Enhancement (whitespaces) (#1084)
### Bug Fixes (BUG)
- Use `build_destination` for named destination outlines (#1128)
- Avoid a crash when a ToUnicode CMap has an empty dstString in beginbfchar (#1118)
- Prevent deduplication of PageObject (#1105)
- None-check in DictionaryObject.read_from_stream (#1113)
- Avoid IndexError in _cmap.parse_to_unicode (#1110)
### Documentation (DOC)
- Explanation for git submodule
- Watermark and stamp (#1095)
### Maintenance (MAINT)
- Text extraction improvements (#1126)
- Destination.color returns ArrayObject instead of tuple as fallback (#1119)
- Use add_bookmark_destination in add_bookmark (#1100)
- Use add_bookmark_destination in add_bookmark_dict (#1099)
### Testing (TST)
- Add test for arab text (#1127)
- Add xfail for decryption fail (#1125)
- Add xfail test for IndexError when extracting text (#1124)
- Add MCVE showing outline title issue (#1123)
### Code Style (STY)
- Use IntFlag for permissions_flag / update_page_form_field_values (#1094)
- Simplify code (#1101)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.5.0...2.6.0)
## Version 2.5.0, 2022-07-10
### New Features (ENH)
- Add support for indexed color spaces / BitsPerComponent for decoding PNGs (#1067)
- Add PageObject._get_fonts (#1083)
### Performance Improvements (PI)
- Use iterative DFS in PdfWriter._sweep_indirect_references (#1072)
### Bug Fixes (BUG)
- Let Page.scale also scale the crop-/trim-/bleed-/artbox (#1066)
- Column default for CCITTFaxDecode (#1079)
### Robustness (ROB)
- Guard against None-value in _get_outlines (#1060)
### Documentation (DOC)
- Stamps and watermarks (#1082)
- OCR vs PDF text extraction (#1081)
- Python Version support
- Formatting of CHANGELOG
### Developer Experience (DEV)
- Cache downloaded files (#1070)
- Speed-up for CI (#1069)
### Maintenance (MAINT)
- Set page.rotate(angle: int) (#1092)
- Issue #416 was fixed by #1015 (#1078)
### Testing (TST)
- Image extraction (#1080)
- Image extraction (#1077)
### Code Style (STY)
- Apply black
- Typo in Changelog
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.4.2...2.5.0)
## Version 2.4.2, 2022-07-05
### New Features (ENH)
- Add PdfReader.xfa attribute (#1026)
### Bug Fixes (BUG)
- Wrong page inserted when PdfMerger.merge is done (#1063)
- Resolve IndirectObject when it refers to a free entry (#1054)
### Developer Experience (DEV)
- Added {posargs} to tox.ini (#1055)
### Maintenance (MAINT)
- Remove PyPDF2._utils.bytes_type (#1053)
### Testing (TST)
- Scale page (indirect rect object) (#1057)
- Simplify pathlib PdfReader test (#1056)
- IndexError of VirtualList (#1052)
- Invalid XML in xmp information (#1051)
- No pycryptodome (#1050)
- Increase test coverage (#1045)
### Code Style (STY)
- DOC of compress_content_streams (#1061)
- Minimize diff for #879 (#1049)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.4.1...2.4.2)
## Version 2.4.1, 2022-06-30
### New Features (ENH)
- Add writer.pdf_header property (getter and setter) (#1038)
### Performance Improvements (PI)
- Remove b_ call in FloatObject.write_to_stream (#1044)
- Check duplicate objects in writer._sweep_indirect_references (#207)
### Documentation (DOC)
- How to surppress exceptions/warnings/log messages (#1037)
- Remove hyphen from lossless (#1041)
- Compression of content streams (#1040)
- Fix inconsistent variable names in add-watermark.md (#1039)
- File size reduction
- Add CHANGELOG to the rendered docs (#1023)
### Maintenance (MAINT)
- Handle XML error when reading XmpInformation (#1030)
- Deduplicate Code / add mutmut config (#1022)
### Code Style (STY)
- Use unnecessary one-line function / class attribute (#1043)
- Docstring formatting (#1033)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.4.0...2.4.1)
## Version 2.4.0, 2022-06-26
### New Features (ENH):
- Support R6 decrypting (#1015)
- Add PdfReader.pdf_header (#1013)
### Performance Improvements (PI):
- Remove ord_ calls (#1014)
### Bug Fixes (BUG):
- Fix missing page for bookmark (#1016)
### Robustness (ROB):
- Deal with invalid Destinations (#1028)
### Documentation (DOC):
- get_form_text_fields does not extract dropdown data (#1029)
- Adjust PdfWriter.add_uri docstring
- Mention crypto extra_requires for installation (#1017)
### Developer Experience (DEV):
- Use /n line endings everywhere (#1027)
- Adjust string formatting to be able to use mutmut (#1020)
- Update Bug report template
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.3.1...2.4.0)
## Version 2.3.1, 2022-06-19
BUG: Forgot to add the internal `_codecs` subpackage.
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.3.0...2.3.1)
## Version 2.3.0, 2022-06-19
The highlight of this release is improved support for file encryption
(AES-128 and AES-256, R5 only). See #749 for the amazing work of
@exiledkingcc 🎊 Thank you 🤗
### Deprecations (DEP)
- Rename names to be PEP8-compliant (#967)
- `PdfWriter.get_page`: the pageNumber parameter is renamed to page_number
- `PyPDF2.filters`:
* For all classes, a parameter rename: decodeParms ➔ decode_parms
* decodeStreamData ➔ decode_stream_data
- `PyPDF2.xmp`:
* XmpInformation.rdfRoot ➔ XmpInformation.rdf_root
* XmpInformation.xmp_createDate ➔ XmpInformation.xmp_create_date
* XmpInformation.xmp_creatorTool ➔ XmpInformation.xmp_creator_tool
* XmpInformation.xmp_metadataDate ➔ XmpInformation.xmp_metadata_date
* XmpInformation.xmp_modifyDate ➔ XmpInformation.xmp_modify_date
* XmpInformation.xmpMetadata ➔ XmpInformation.xmp_metadata
* XmpInformation.xmpmm_documentId ➔ XmpInformation.xmpmm_document_id
* XmpInformation.xmpmm_instanceId ➔ XmpInformation.xmpmm_instance_id
- `PyPDF2.generic`:
* readHexStringFromStream ➔ read_hex_string_from_stream
* initializeFromDictionary ➔ initialize_from_dictionary
* createStringObject ➔ create_string_object
* TreeObject.hasChildren ➔ TreeObject.has_children
* TreeObject.emptyTree ➔ TreeObject.empty_tree
### New Features (ENH)
- Add decrypt support for V5 and AES-128, AES-256 (R5 only) (#749)
### Robustness (ROB)
- Fix corrupted (wrongly) linear PDF (#1008)
### Maintenance (MAINT)
- Move PDF_Samples folder into resources
- Fix typos (#1007)
### Testing (TST)
- Improve encryption/decryption test (#1009)
- Add merger test cases with real PDFs (#1006)
- Add mutmut config
### Code Style (STY)
- Put pure data mappings in separate files (#1005)
- Make encryption module private, apply pre-commit (#1010)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.2.1...2.3.0)
## Version 2.2.1, 2022-06-17
### Performance Improvements (PI)
- Remove b_ calls (#992, #986)
- Apply improvements to _utils suggested by perflint (#993)
### Robustness (ROB)
- utf-16-be codec can't decode (...) (#995)
### Documentation (DOC)
- Remove reference to Scripts (#987)
### Developer Experience (DEV)
- Fix type annotations for add_bookmarks (#1000)
### Testing (TST)
- Add test for PdfMerger (#1001)
- Add tests for XMP information (#996)
- reader.get_fields / zlib issue / LZW decode issue (#1004)
- reader.get_fields with report generation (#1002)
- Improve test coverage by extracting texts (#998)
### Code Style (STY)
- Apply fixes suggested by pylint (#999)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.2.0...2.2.1)
## Version 2.2.0, 2022-06-13
The 2.2.0 release improves text extraction again via (#969):
* Improvements around /Encoding / /ToUnicode
* Extraction of CMaps improved
* Fallback for font def missing
* Support for /Identity-H and /Identity-V: utf-16-be
* Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
* Arabic (for evaluation)
* Whitespace extraction improvements
Those changes should mainly improve the text extraction for non-ASCII alphabets,
e.g. Russian / Chinese / Japanese / Korean / Arabic.
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.1.1...2.2.0)
## Version 2.1.1, 2022-06-12
### New Features (ENH)
- Add support for pathlib as input for PdfReader (#979)
### Performance Improvements (PI)
- Optimize read_next_end_line (#646)
### Bug Fixes (BUG)
- Adobe Acrobat 'Would you like to save this file?' (#970)
### Documentation (DOC)
- Notes on annotations (#982)
- Who uses PyPDF2
- intendet \xe2\x9e\x94 in robustness page (#958)
### Maintenance (MAINT)
- pre-commit / requirements.txt updates (#977)
- Mark read_next_end_line as deprecated (#965)
- Export `PageObject` in PyPDF2 root (#960)
### Testing (TST)
- Add MCVE of issue #416 (#980)
- FlateDecode.decode decodeParms (#964)
- Xmp module (#962)
- utils.paeth_predictor (#959)
### Code Style (STY)
- Use more tuples and list/dict comprehensions (#976)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.1.0...2.1.1)
## Version 2.1.0, 2022-06-06
The highlight of the 2.1.0 release is the most massive improvement to the
text extraction capabilities of PyPDF2 since 2016 🥳🎊 A very big thank you goes
to [pubpub-zz](https://github.com/pubpub-zz) who took a lot of time and
knowledge about the PDF format to finally get those improvements into PyPDF2.
Thank you 🤗💚
In case the new function causes any issues, you can use `_extract_text_old`
for the old functionality. Please also open a bug ticket in that case.
There were several people who have attempted to bring similar improvements to
PyPDF2. All of those were valuable. The main reason why they didn't get merged
is the big amount of open PRs / issues. pubpub-zz was the most comprehensive
PR which also incorporated the latest changes of PyPDF2 2.0.0.
Thank you to [VictorCarlquist](https://github.com/VictorCarlquist) for #858 and
[asabramo](https://github.com/asabramo) for #464 🤗
### New Features (ENH)
- Massive text extraction improvement (#924). Closed many open issues:
- Exceptions / missing spaces in extract_text() method (#17) 🕺
- Whitespace issues in extract_text() (#42) 💃
- pypdf2 reads the hifenated words in a new line (#246)
- PyPDF2 failing to read unicode character (#37)
- Unable to read bullets (#230)
- ExtractText yields nothing for apparently good PDF (#168) 🎉
- Encoding issue in extract_text() (#235)
- extractText() doesn't work on Chinese PDF (#252)
- encoding error (#260)
- Trouble with apostophes in names in text "O'Doul" (#384)
- extract_text works for some PDF files, but not the others (#437)
- Euro sign not being recognized by extractText (#443)
- Failed extracting text from French texts (#524)
- extract_text doesn't extract ligatures correctly (#598)
- reading spanish text - mark convert issue (#635)
- Read PDF changed from text to random symbols (#654)
- .extractText() reads / as 1. (#789)
- Update glyphlist (#947) - inspired by #464
- Allow adding PageRange objects (#948)
### Bug Fixes (BUG)
- Delete .python-version file (#944)
- Compare StreamObject.decoded_self with None (#931)
### Robustness (ROB)
- Fix some conversion errors on non conform PDF (#932)
### Documentation (DOC)
- Elaborate on PDF text extraction difficulties (#939)
- Add logo (#942)
- rotate vs Transformation().rotate (#937)
- Example how to use PyPDF2 with AWS S3 (#938)
- How to deprecate (#930)
- Fix typos on robustness page (#935)
- Remove scripts (pdfcat) from docs (#934)
### Developer Experience (DEV)
- Ignore .python-version file
- Mark deprecated code with no-cover (#943)
- Automatically create Github releases from tags (#870)
### Testing (TST)
- Text extraction for non-latin alphabets (#954)
- Ignore PdfReadWarning in benchmark (#949)
- writer.remove_text (#946)
- Add test for Tree and _security (#945)
### Code Style (STY)
- black, isort, Flake8, splitting buildCharMap (#950)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/2.0.0...2.1.0)
## Version 2.0.0, 2022-06-01
The 2.0.0 release of PyPDF2 includes three core changes:
1. Dropping support for Python 3.5 and older.
2. Introducing type annotations.
3. Interface changes, mostly to have PEP8-compliant names
We introduced a [deprecation process](https://github.com/py-pdf/PyPDF2/pull/930)
that hopefully helps users to avoid unexpected breaking changes.
### Breaking Changes (DEP)
- PyPDF2 2.0 requires Python 3.6+. Python 2.7 and 3.5 support were dropped.
- PdfFileReader: The "warndest" parameter was removed
- PdfFileReader and PdfFileMerger no longer have the `overwriteWarnings`
parameter. The new behavior is `overwriteWarnings=False`.
- merger: OutlinesObject was removed without replacement.
- merger.py ➔ _merger.py: You must import PdfFileMerger from PyPDF2 directly.
- utils:
* `ConvertFunctionsToVirtualList` was removed
* `formatWarning` was removed
* `isInt(obj)`: Use `instance(obj, int)` instead
* `u_(s)`: Use `s` directly
* `chr_(c)`: Use `chr(c)` instead
* `barray(b)`: Use `bytearray(b)` instead
* `isBytes(b)`: Use `instance(b, type(bytes()))` instead
* `xrange_fn`: Use `range` instead
* `string_type`: Use `str` instead
* `isString(s)`: Use `instance(s, str)` instead
* `_basestring`: Use `str` instead
* All Exceptions are now in `PyPDF2.errors`:
- PageSizeNotDefinedError
- PdfReadError
- PdfReadWarning
- PyPdfError
- `PyPDF2.pdf` (the `pdf` module) no longer exists. The contents were moved with
the library. You should most likely import directly from `PyPDF2` instead.
The `RectangleObject` is in `PyPDF2.generic`.
- The `Resources`, `Scripts`, and `Tests` will no longer be part of the distribution
files on PyPI. This should have little to no impact on most people. The
`Tests` are renamed to `tests`, the `Resources` are renamed to `resources`.
Both are still in the git repository. The `Scripts` are now in
[cpdf](https://github.com/py-pdf/cpdf). `Sample_Code` was moved to the `docs`.
For a full list of deprecated functions, please see the changelog of version
1.28.0.
### New Features (ENH)
- Improve space setting for text extraction (#922)
- Allow setting the decryption password in `PdfReader.__init__` (#920)
- Add Page.add_transformation (#883)
### Bug Fixes (BUG)
- Fix error adding transformation to page without /Contents (#908)
### Robustness (ROB)
- Cope with invalid length in streams (#861)
### Documentation (DOC)
- Fix style of 1.25 and 1.27 patch notes (#927)
- Transformation (#907)
### Developer Experience (DEV)
- Create flake8 config file (#916)
- Use relative imports (#875)
### Maintenance (MAINT)
- Use Python 3.6 language features (#849)
- Add wrapper function for PendingDeprecationWarnings (#928)
- Use new PEP8 compliant names (#884)
- Explicitly represent transformation matrix (#878)
- Inline PAGE_RANGE_HELP string (#874)
- Remove unnecessary generics imports (#873)
- Remove star imports (#865)
- merger.py ➔ _merger.py (#864)
- Type annotations for all functions/methods (#854)
- Add initial type support with mypy (#853)
### Testing (TST)
- Regression test for xmp_metadata converter (#923)
- Checkout submodule sample-files for benchmark
- Add text extracting performance benchmark
- Use new PyPDF2 API in benchmark (#902)
- Make test suite fail for uncaught warnings (#892)
- Remove -OO testrun from CI (#901)
- Improve tests for convert_to_int (#899)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.28.4...2.0.0)
## PyPDF2 1.X
See [CHANGELOG PyPDF2 1.X](changelog-v1.md)
================================================
FILE: CONTRIBUTING.md
================================================
Please check the [documentation page dedicated to development](https://pypdf.readthedocs.io/en/stable/dev/intro.html).
## Creating issues / tickets
Please go here: https://github.com/py-pdf/pypdf/issues
Typically you should not send e-mails. E-mails might only reach one person and
it could go into spam or that person might be busy. Please create issues on
GitHub instead.
Please use the templates provided.
Keep in mind that although PDF has an official specification, there are tons of
variations which might require special handling. Thus, please always provide a
reproducing example file for us to work with. Otherwise, we have to guess possible
issues, leading to unnecessary overhead - especially since most of the contributions
happen during our free time.
If you already know a fix, consider opening a pull request after reporting the issue
to make life easier for everyone.
## Creating Pull Requests
We appreciate if people make PRs, but please be aware that pypdf is used by many
people. That means:
* We rarely make breaking changes and have a [deprecation process](https://pypdf.readthedocs.io/en/latest/dev/deprecations.html).
* New features, especially adding to the public interface, typically need to be
discussed first.
Before you make bigger changes, open an issue to make the suggestion.
Note which interface changes you want to make.
================================================
FILE: CONTRIBUTORS.md
================================================
# Contributors
pypdf had a lot of contributors since it started as pyPdf in 2005. We are
a free software project without any company affiliation. We cannot pay
contributors, but we do value their contributions. A lot of time, effort, and
expertise went into this project. With this list, we recognize these awesome
people 🤗
The list is definitely not complete. You can find more contributors via the git
history and [GitHub's 'Contributors' feature](https://github.com/py-pdf/pypdf/graphs/contributors).
## Contributors to the pypdf (formerly pyPdf / PyPDF2) project
* [abyesilyurt](https://github.com/abyesilyurt)
* [ArkieCoder](https://github.com/ArkieCoder)
* [Beers, PJ](https://github.com/PJBrs)
* [Clauss, Christian](https://github.com/cclauss)
* [DL6ER](https://github.com/DL6ER)
* [Duy, Phan Thanh](https://github.com/zuypt)
* [ediamondscience](https://github.com/ediamondscience)
* [Ermeson, Felipe](https://github.com/FelipeErmeson)
* [Freitag, François](https://github.com/francoisfreitag)
* [Gagnon, William G.](https://github.com/williamgagnon)
* [Gillard, James](https://github.com/jgillard)
* [Górny, Michał](https://github.com/mgorny)
* [Grillo, Miguel](https://github.com/Ineffable22)
* [Gutteridge, David H.](https://github.com/dhgutteridge)
* [Hale, Joseph](https://github.com/thehale)
* [harshhes](https://github.com/harshhes)
* [Jackowitz, Noah](https://github.com/hackowitz-af) | [LinkedIn](https://www.linkedin.com/in/noah-jackowitz/)
* [JianzhengLuo](https://github.com/JianzhengLuo)
* [Karvonen, Harry](https://github.com/Hatell/)
* [King, Hunter](https://github.com/neversphere)
* [Kotler, Mitchell](https://github.com/mitchelljkotler)
* [KourFrost](https://github.com/KourFrost)
* [Lightup1](https://github.com/Lightup1)
* [Majumder, Jonah](https://github.com/jonahmajumder)
* [Manini, Lorenzo](https://github.com/lorenzomanini)
* [maxbeer99](https://github.com/maxbeer99)
* [McNeil, Karen](https://github.com/karenlmcneil): Arabic Language Support
* [Mérino, Antoine](https://github.com/Merinorus)
* [Murphy, Kevin](https://github.com/kmurphy4)
* [nalin-udhaar](https://github.com/nalin-udhaar)
* [Noah-Houghton](https://github.com/Noah-Houghton) | [LinkedIn](https://www.linkedin.com/in/noah-h-554992a0/)
* [Paramonov, Alexey](https://github.com/alexey-v-paramonov)
* [Paternault, Louis](https://framagit.org/spalax)
* [Perrensen, Olsen](https://github.com/olsonperrensen)
* [pilotandy](https://github.com/pilotandy)
* [Pinheiro, Arthur](https://github.com/xilopaint)
* [pmiller66](https://github.com/pmiller66)
* [Poddar, Arka](https://github.com/postmeback)
* [programmarchy](https://github.com/programmarchy)
* [pubpub-zz](https://github.com/pubpub-zz): involved in community development
* [Ramos, Leodanis Pozo](https://github.com/lpozo)
* [RitchieP](https://github.com/RitchieP) | [LinkedIn](https://www.linkedin.com/in/ritchie-p-892b31115/) | [StackOverflow](https://stackoverflow.com/users/13328625/casual-r?tab=profile)
* [robbiebusinessacc](https://github.com/robbiebusinessacc)
* [Roder, Thomas](https://github.com/MrTomRod)
* [Rogmann, Sascha](https://github.com/srogmann)
* [Röthenbacher, Thomas](https://github.com/troethe)
* [shartzog](https://github.com/shartzog)
* [stefan6419846](https://github.com/stefan6419846): Maintainer of pypdf since January 2025
* [sietzeberends](https://github.com/sietzeberends)
* [Stober, Marc](https://github.com/marcstober)
* [Stüber, Timo](https://github.com/omit66)
* [Thoma, Martin](https://github.com/MartinThoma): Maintainer of pypdf from April 2022 to January 2025. I hope to build a great community with many awesome contributors. [LinkedIn](https://www.linkedin.com/in/martin-thoma/) | [StackOverflow](https://stackoverflow.com/users/562769/martin-thoma) | [Blog](https://martin-thoma.com/)
* [Thomas, Reuben](https://github.com/rrthomas)
* [Tobeabellwether](https://github.com/Tobeabellwether)
* [van Alst, Ludo](https://github.com/LudovA)
* [WevertonGomes](https://github.com/WevertonGomesCosta)
* [Wilson, Huon](https://github.com/huonw)
* ztravis
## Adding a new contributor
Contributors are:
* Anybody who has a commit in `main` - no matter how small or how many. Also if it's via *co-authored-by*.
* People who opened helpful issues:
1. Bugs: with complete MCVE
2. Well-described feature requests
3. Potentially some more.
The maintainers of pypdf have the last call on that one.
* Community work: This is exceptional. If the maintainers of pypdf see people
being super helpful in answering issues / discussions or being very active on
Stackoverflow, we also consider them being contributors to pypdf.
Contributors can add themselves or ask via an GitHub Issue to be added.
Please use the following format:
```
* Last name, First name: 140-characters of text; links to LinkedIn / GitHub / other profiles and personal pages are ok
```
OR
```
* GitHub Username: 140-characters of text; links to LinkedIn / GitHub / other profiles and personal pages are ok
```
and add the entry in the alphabetical order. The 140 characters are everything visible after the `Name:`.
Please don't use images.
================================================
FILE: LICENSE
================================================
Copyright (c) 2006-2008, Mathieu Fenniak
Some contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>
Some contributions copyright (c) 2014, Steve Witham <switham_github@mac-guyver.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* The name of the author may not be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: Makefile
================================================
maint:
pre-commit autoupdate
pip-compile -U requirements/ci.in
pip-compile -U requirements/dev.in
pip-compile -U requirements/docs.in
release:
python make_release.py
git commit -eF RELEASE_COMMIT_MSG.md
clean:
python -m pip install pyclean
pyclean .
rm -rf tests/__pycache__ pypdf/__pycache__ htmlcov docs/_build dist pypdf.egg-info .pytest_cache .mypy_cache .benchmarks
test:
pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=60 pypdf
testtype:
pytest tests --cov --cov-report term-missing -vv --cov-report html --durations=3 --timeout=30 --typeguard-packages=pypdf
benchmark:
pytest tests/bench.py
mypy:
mypy pypdf --ignore-missing-imports --check-untyped --strict
ruff:
ruff check pypdf tests make_release.py
================================================
FILE: README.md
================================================
[](https://badge.fury.io/py/pypdf)
[](https://pypi.org/project/pypdf/)
[](https://pypdf.readthedocs.io/en/stable/)
[](https://github.com/py-pdf/pypdf)
[](https://codecov.io/gh/py-pdf/pypdf)
# pypdf
pypdf is a free and open-source pure-python PDF library capable of splitting,
[merging](https://pypdf.readthedocs.io/en/stable/user/merging-pdfs.html),
[cropping, and transforming](https://pypdf.readthedocs.io/en/stable/user/cropping-and-transforming.html)
the pages of PDF files. It can also add
custom data, viewing options, and
[passwords](https://pypdf.readthedocs.io/en/stable/user/encryption-decryption.html)
to PDF files. pypdf can
[retrieve text](https://pypdf.readthedocs.io/en/stable/user/extract-text.html)
and
[metadata](https://pypdf.readthedocs.io/en/stable/user/metadata.html)
from PDFs as well.
See [pdfly](https://github.com/py-pdf/pdfly) for a CLI application that uses pypdf to interact with PDFs.
## Installation
Install pypdf using pip:
```
pip install pypdf
```
For using pypdf with AES encryption or decryption, install extra dependencies:
```
pip install pypdf[crypto]
```
> **NOTE**: `pypdf` 3.1.0 and above include significant improvements compared to
> previous versions. Please refer to [the migration
> guide](https://pypdf.readthedocs.io/en/latest/user/migration-1-to-2.html) for
> more information.
## Usage
```python
from pypdf import PdfReader
reader = PdfReader("example.pdf")
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()
```
pypdf can do a lot more, e.g. splitting, merging, reading and creating annotations, decrypting and encrypting. Check out the
[documentation](https://pypdf.readthedocs.io/en/stable/) for additional usage
examples!
For questions and answers, visit
[StackOverflow](https://stackoverflow.com/questions/tagged/pypdf)
(tagged with [pypdf](https://stackoverflow.com/questions/tagged/pypdf)).
## Contributions
Maintaining pypdf is a collaborative effort. You can support the project by
writing documentation, helping to narrow down issues, and submitting code.
See the [CONTRIBUTING.md](https://github.com/py-pdf/pypdf/blob/main/CONTRIBUTING.md) file for more information.
### Q&A
The experience pypdf users have covers the whole range from beginner to expert. You can contribute to the pypdf community by answering questions
on [StackOverflow](https://stackoverflow.com/questions/tagged/pypdf),
helping in [discussions](https://github.com/py-pdf/pypdf/discussions),
and asking users who report issues for [MCVE](https://stackoverflow.com/help/minimal-reproducible-example)'s (Code + example PDF!).
### Issues
A good bug ticket includes a MCVE - a minimal complete verifiable example.
For pypdf, this means that you must upload a PDF that causes the bug to occur
as well as the code you're executing with all of the output. Use
`print(pypdf.__version__)` to tell us which version you're using.
### Code
All code contributions are welcome, but smaller ones have a better chance to
get included in a timely manner. Adding unit tests for new features or test
cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.
pypdf includes a test suite which can be executed with `pytest`:
```bash
$ pytest
===================== test session starts =====================
platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0
rootdir: /home/moose/GitHub/Martin/pypdf
plugins: cov-3.0.0
collected 233 items
tests/test_basic_features.py .. [ 0%]
tests/test_constants.py . [ 1%]
tests/test_filters.py .................x..... [ 11%]
tests/test_generic.py ................................. [ 25%]
............. [ 30%]
tests/test_javascript.py .. [ 31%]
tests/test_merger.py . [ 32%]
tests/test_page.py ......................... [ 42%]
tests/test_pagerange.py ................ [ 49%]
tests/test_papersizes.py .................. [ 57%]
tests/test_reader.py .................................. [ 72%]
............... [ 78%]
tests/test_utils.py .................... [ 87%]
tests/test_workflows.py .......... [ 91%]
tests/test_writer.py ................. [ 98%]
tests/test_xmp.py ... [100%]
========== 232 passed, 1 xfailed, 1 warning in 4.52s ==========
```
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/_static/releasing.drawio
================================================
<mxfile host="Electron" type="device">
<diagram name="Seite-1" id="xmn08oupI2gSAHxAwkuE">
<mxGraphModel dx="394" dy="220" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="Sy3GnD-ZVnJThFurnhwo-33" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;arcSize=21;" parent="1" vertex="1">
<mxGeometry x="130" y="790" width="280" height="290" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-21" value="" style="rounded=1;whiteSpace=wrap;html=1;fillColor=#f5f5f5;fontColor=#333333;strokeColor=#666666;" parent="1" vertex="1">
<mxGeometry x="60" y="330" width="480" height="250" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-4" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-1" target="Sy3GnD-ZVnJThFurnhwo-3" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-1" value="python make_release.py" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="180" y="80" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-6" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-3" target="Sy3GnD-ZVnJThFurnhwo-5" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-3" value="Manually adjust CHANGELOG.md changes" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="180" y="170" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-9" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-5" target="Sy3GnD-ZVnJThFurnhwo-8" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-10" value="Yes" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="Sy3GnD-ZVnJThFurnhwo-9" vertex="1" connectable="0">
<mxGeometry x="0.1768" y="-2" relative="1" as="geometry">
<mxPoint as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-12" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-5" target="Sy3GnD-ZVnJThFurnhwo-11" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-13" value="No" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="Sy3GnD-ZVnJThFurnhwo-12" vertex="1" connectable="0">
<mxGeometry x="0.3105" y="2" relative="1" as="geometry">
<mxPoint as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-5" value="Is there a breaking change" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="180" y="260" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-24" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-7" target="Sy3GnD-ZVnJThFurnhwo-23" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-7" value="Adjust the CHANGELOG.md" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="170" y="600" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-17" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-8" target="Sy3GnD-ZVnJThFurnhwo-7" edge="1">
<mxGeometry relative="1" as="geometry">
<Array as="points">
<mxPoint x="150" y="460" />
<mxPoint x="230" y="460" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-8" value="Major version bump in _version.py" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="90" y="370" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-15" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-11" target="Sy3GnD-ZVnJThFurnhwo-14" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-16" value="Yes" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" parent="Sy3GnD-ZVnJThFurnhwo-15" vertex="1" connectable="0">
<mxGeometry x="-0.2562" y="3" relative="1" as="geometry">
<mxPoint as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-20" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-11" target="Sy3GnD-ZVnJThFurnhwo-19" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-11" value="Is there a new feature?" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="250" y="370" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-18" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-14" target="Sy3GnD-ZVnJThFurnhwo-7" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-14" value="Minor version bump" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="250" y="490" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-35" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=1;entryY=0.5;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-19" target="Sy3GnD-ZVnJThFurnhwo-23" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-19" value="Patch version bump" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="400" y="490" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-22" value="Semantic Versioning" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontStyle=1;fontSize=18;fontColor=#6E6E6E;" parent="1" vertex="1">
<mxGeometry x="450" y="350" width="60" height="30" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-27" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-23" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="230" y="810" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-23" value="git commit -eF RELEASE_COMMIT_MSG.md" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="75" y="700" width="310" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-30" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" target="Sy3GnD-ZVnJThFurnhwo-28" edge="1">
<mxGeometry relative="1" as="geometry">
<mxPoint x="230" y="870" as="sourcePoint" />
</mxGeometry>
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-31" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=0.5;exitY=1;exitDx=0;exitDy=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" parent="1" source="Sy3GnD-ZVnJThFurnhwo-28" target="Sy3GnD-ZVnJThFurnhwo-29" edge="1">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-28" value="Build and push packages to PyPI" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="170" y="910" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-29" value="Create release on GitHub" style="rounded=1;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="170" y="1010" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="Sy3GnD-ZVnJThFurnhwo-36" value="GitHub Action" style="text;html=1;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontStyle=1;fontSize=18;fontColor=#6F9958;" parent="1" vertex="1">
<mxGeometry x="325" y="813" width="60" height="30" as="geometry" />
</mxCell>
<mxCell id="srRZveQdFgRCeiaoivwE-1" value="Create tag on<div>GitHub</div>" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
<mxGeometry x="170" y="810" width="120" height="60" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
================================================
FILE: docs/conf.py
================================================
"""
Configuration file for the Sphinx documentation builder.
This file only contains a selection of the most common options.
For a full list see the documentation:
https://www.sphinx-doc.org/en/master/usage/configuration.html
"""
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import datetime
import os
import shutil
import sys
from pathlib import Path
sys.path.insert(0, os.path.abspath("."))
sys.path.insert(0, os.path.abspath("../"))
import pypdf as py_pkg
shutil.copyfile("../CHANGELOG.md", "meta/CHANGELOG.md")
shutil.copyfile("../CONTRIBUTORS.md", "meta/CONTRIBUTORS.md")
# -- Project information -----------------------------------------------------
project = py_pkg.__name__
copyright = f"2006 - {datetime.datetime.now(tz=datetime.timezone.utc).year}, Mathieu Fenniak and pypdf contributors"
author = "Mathieu Fenniak"
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = py_pkg.__version__
# The full version, including alpha/beta/rc tags.
release = py_pkg.__version__
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
needs_sphinx = "4.0.0"
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
"sphinx.ext.coverage",
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinx.ext.napoleon",
"sphinx.ext.doctest",
# External
"myst_parser",
]
python_version = ".".join(map(str, sys.version_info[:2]))
intersphinx_mapping = {
"python": (f"https://docs.python.org/{python_version}", None),
"Pillow": ("https://pillow.readthedocs.io/en/latest/", None),
}
nitpick_ignore_regex = [
# For reasons unclear at this stage, the io module prefixes everything with _io
# and this confuses sphinx
(
r"py:class",
r"(_io.(FileIO|BytesIO|Buffered(Reader|Writer))|pypdf.*PdfDocCommon)",
),
]
autodoc_default_options = {
"member-order": "bysource",
"members": True,
"show-inheritance": True,
"undoc-members": True,
}
autodoc_inherit_docstrings = False
autodoc_typehints_format = "short"
python_use_unqualified_type_names = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# Configure MyST extension.
myst_all_links_external = False
myst_heading_anchors = 3
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
"canonical_url": "",
"analytics_id": "",
"logo_only": True,
"prev_next_buttons_location": "bottom",
"style_external_links": False,
# Toc options
"collapse_navigation": True,
"sticky_navigation": True,
"navigation_depth": 4,
"includehidden": True,
"titles_only": False,
}
html_logo = "_static/logo.png"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
# -- Options for Napoleon -----------------------------------------------------
napoleon_google_docstring = True
napoleon_numpy_docstring = False # Explicitly prefer Google style docstring
napoleon_use_param = True # for type hint support
napoleon_use_rtype = False # False, so the return type is inline with the description.
# -- Options for Doctest ------------------------------------------------------
# Most of doc examples use hardcoded input and output file names.
# To execute these examples real files need to be read and written.
#
# By default, documentation examples run with the working directory set to where
# "sphinx-build" command was invoked. To avoid relative paths in docs and to
# allow to run "sphinx-build" command from any directory, we modify the current
# working directory in each tested file. Tests are executed against our
# temporary directory where we have copied all nessesary resources.
#
# Each doc page that requires file operations must use "testsetup" directive
# to call "pypdf_test_setup" function to prepare the test environment for that
# page.
#
# def pypdf_test_setup(group: str, resources: dict[str, str] = {}) -> None
#
# Args:
# group: A unique name for group of tests. Typically we group tests by doc page.
# For each doc page we create a test folder under
# "_build/doctest/pypdf_test/<group>". This allows to avoid file name conflicts
# between different doc pages.
# resources: A dictionary of source files to copy into the test folder.
# Key is the destination file name (relative to the test folder).
# Value is the source file path (relative to the root folder).
#
# Examples:
# ```{testsetup}
# pypdf_test_setup("user/add-javascript", {
# "example.pdf": "../resources/example.pdf",
# })
# ```
pypdf_test_src_root_dir = os.path.abspath(".")
pypdf_test_dst_root_dir = os.path.abspath("_build/doctest/pypdf_test")
if Path(pypdf_test_dst_root_dir).exists():
shutil.rmtree(pypdf_test_dst_root_dir)
Path(pypdf_test_dst_root_dir).mkdir(parents=True)
doctest_global_setup = f"""
def pypdf_test_global_setup():
import os
import shutil
from pathlib import Path
src_root_dir = {pypdf_test_src_root_dir.__repr__()}
dst_root_dir = {pypdf_test_dst_root_dir.__repr__()}
global pypdf_test_orig_dir
pypdf_test_orig_dir = os.getcwd()
os.chdir(dst_root_dir)
global pypdf_test_setup
def pypdf_test_setup(group: str, resources: dict[str, str] = {{}}) -> None:
dst_dir = os.path.join(dst_root_dir, group)
Path(dst_dir).mkdir(parents=True)
os.chdir(dst_dir)
for (dst_path, src_path) in resources.items():
src = os.path.normpath(os.path.join(src_root_dir, src_path))
dst = os.path.join(dst_dir, dst_path)
shutil.copyfile(src, dst)
pypdf_test_global_setup()
"""
doctest_global_cleanup = f"""
def pypdf_test_global_cleanup():
import os
dst_root_dir = {pypdf_test_dst_root_dir.__repr__()}
os.chdir(pypdf_test_orig_dir)
has_files = False
for name in os.listdir(dst_root_dir):
file_name = os.path.join(dst_root_dir, name)
if os.path.isfile(file_name):
if not has_files:
print("Docs page was not configured propery for running code examples")
print("Please use 'pypdf_test_setup' function in 'testsetup' directive")
print("Deleting unexpected file(s) in " + dst_root_dir)
has_files = True
print(f"- {{name}}")
os.remove(file_name) # Avoid side effects on other tests
pypdf_test_global_cleanup()
"""
================================================
FILE: docs/dev/cmaps.md
================================================
# CMaps
Looking at the cmap of "crazyones":
```bash
pdftk crazyones.pdf output crazyones-uncomp.pdf uncompress
```
You can see this:
```text
begincmap
/CMapName /T1Encoding-UTF16 def
/CMapType 2 def
/CIDSystemInfo <<
/Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
1 begincodespacerange
<00> <FF>
endcodespacerange
1 beginbfchar
<1B> <FB00>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
```
## codespacerange
A codespacerange maps a complete sequence of bytes to a range of Unicode glyphs.
It defines a starting point:
```text
1 beginbfchar
<1B> <FB00>
```
That means that `1B` (Hex for 27) maps to the Unicode character [`FB00`](https://unicode-table.com/en/FB00/) - the ligature ff (two lowercase f's).
The two numbers in `begincodespacerange` mean that it starts with an offset of
0 (hence from `1B ➜ FB00`) up to an offset of FF (dec: 255), hence 1B+FF = 282
➜ [FBFF](https://www.compart.com/de/unicode/U+FBFF).
Within the text stream, there is
```text
(The)-342(mis\034ts.)
```
`\034 ` is octal for the decimal value 28.
================================================
FILE: docs/dev/deprecations.md
================================================
# The Deprecation Process
pypdf strives to be an excellent library for its current users and for new
ones. We are careful with introducing potentially breaking changes, but we
will do them if they provide value for the community in the long run.
We hope and think that deprecations will not happen frequently. If they do,
users can rely on the following procedure.
## Semantic Versioning
pypdf uses [semantic versioning](https://semver.org/). If you want to avoid
breaking changes, please use dependency pinning (also known as version pinning).
In Python, this is done by specifying the exact version you want to use in a
`requirements.txt` file. A tool that can support you is `pip-compile` from
[`pip-tools`](https://pypi.org/project/pip-tools/).
If you are using [Poetry](https://pypi.org/project/poetry/) it is done with the
`poetry.lock` file.
## How pypdf deprecates features
Assume the current version of pypdf is `x.y.z`. After a discussion (e.g., via
GitHub issues), we decided to remove a class / function / method. This is how
we do it:
1. `x.y.(z+1)`: Add a DeprecationWarning. If there is a replacement,
the replacement is also introduced and the warning informs about the change
and when it will happen.
The docs let users know about the deprecation and when it will happen and the new function.
The CHANGELOG informs about it.
2. `(x+1).0.0`: Remove / change the code in the breaking way by replacing
DeprecationWarnings by DeprecationErrors.
We do this to help people who didn't look at the warnings before.
The CHANGELOG informs about it.
3. `(x+2).0.0`: The DeprecationErrors are removed.
This means the users have three warnings in the CHANGELOG, a DeprecationWarning
until the next major release and a DeprecationError until the major release
after that.
Please note that adding warnings can be a breaking change for some users; most
likely just in the CI.
This means it needs to be properly documented.
================================================
FILE: docs/dev/documentation.md
================================================
# Documentation
This documentation is build with [Sphinx](https://www.sphinx-doc.org/) and
hosted by [Read the Docs](https://about.readthedocs.com/)
## Testing code snippets
Almost all python code snippets in documentation tested using Sphinx's extension
[sphinx.ext.doctest](https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html).
This allows to make sure that we have no typos, missed imports and other problems in:
- code snippets marked with `testcode` directive in `*.md` files
- code snippets from python's docstrings imported via `autoclass` directive in `*.rst` files
CI pipeline is configured run Sphinx's `doctest` build automatically for each PR.
It is also possible to run it locally:
1. First you need to install docs requirements
```bash
pip install -r requirements/docs.txt
```
2. Change current directory
```bash
cd docs
```
3. Run `doctest` build. It uses indirectly `sphinx-build` command line tool
installed with docs requrements. See
[Sphinx's docs](https://www.sphinx-doc.org/en/master/usage/quickstart.html#running-the-build)
for details.
```bash
make doctest
```
4. If everything is okay you should see in output `Doctest summary` without failures
## API Reference
### Method / Function Docstrings
We use Google-Style Docstrings:
```
def example(param1: int, param2: str) -> bool:
"""
Example function with PEP 484 type annotations.
Args:
param1: The first parameter.
param2: The second parameter.
Returns:
The return value. True for success, False otherwise.
Raises:
AttributeError: The ``Raises`` section is a list of all exceptions
that are relevant to the interface.
ValueError: If `param2` is equal to `param1`.
Examples:
Examples should be written in doctest format, and should illustrate how
to use the function.
>>> print([i for i in example_generator(4)])
[0, 1, 2, 3]
"""
```
* The order of sections is (1) Args (2) Returns (3) Raises (4) Examples
* If there is no return value, remove the 'Returns' block
* Properties should not have any sections
## Issues and PRs
An issue can be used to discuss what we want to achieve.
A PR can be used to discuss how we achieve it.
## Commit Messages
We want to have descriptive commits in the `main` branch. For this reason, every
pull request (PR) is squashed. That means no matter how many commits a PR has,
in the end only one combined commit will be in `main`.
The title of the PR will be used as the first line of that combined commit message.
The first comment within the commit will be used as the message body.
See [developer intro](intro.md#commit-messages) for more details.
================================================
FILE: docs/dev/intro.md
================================================
# Developer Intro
pypdf is a library and hence its users are developers. This document is not for
the users, but for people who want to work on pypdf itself.
```{note}
Our CI (continuous integration) validates that relevant standards are met with your contribution.
Especially for regular contributors or larger changes, it is highly recommended that you set up your own development environment
to already cover the most important aspects locally. This greatly helps us to reduce the noise compared to when you open an untested
PR early and use our CI to do your debugging and improvements from there. The maintainers usually receive a notification on every push
to a branch where a corresponding PR is open, possibly hiding important notifications.
```
## Installing Requirements
```
pip install -r requirements/dev.txt
```
## Running Tests
See [testing pypdf with pytest](testing.md).
## The sample-files git submodule
The reason for having the submodule `sample-files` is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.
The `resources` folder should contain a select set of core examples that cover
most cases we typically want to test for. The `sample-files` might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
PDF producers.
To get the sample-files folder, you need to execute:
```
git submodule update --init
```
## Tools: git and pre-commit
Git is a command line application for version control. If you don't know it,
you can [play ohmygit](https://ohmygit.org/) to learn it.
GitHub is the service where the pypdf project is hosted. While git is free and
open source, GitHub is a paid service by Microsoft, but free in a lot of
cases.
[pre-commit](https://pypi.org/project/pre-commit/) is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered `pre-commit install`
once in your local copy of pypdf, it will automatically be executed when
you `git commit`.
## Commit Messages
Having a clean commit message helps people to quickly understand what the commit
is about, without actually looking at the changes. The first line of the
commit message is used to [auto-generate the CHANGELOG](https://github.com/py-pdf/pypdf/blob/main/make_release.py).
For this reason, the format should be:
```
PREFIX: DESCRIPTION
BODY
```
The `PREFIX` can be:
* `SEC`: Security improvements. Typically, an infinite loop that was possible.
* `BUG`: A bug was fixed. Likely there are one or multiple issues. Then write in
the `BODY`: `Closes #123` where 123 is the issue number on GitHub.
It would be absolutely amazing if you could write a regression test in those
cases. That is a test that would fail without the fix.
A bug is always an issue for pypdf users - test code or CI that was fixed is
not considered a bug here.
* `ENH`: A new feature! Describe in the body what it can be used for.
* `DEP`: Deprecation. Either marking something as "this is going to be removed"
or actually removing it.
* `PI`: A performance improvement. This could also be a reduction in the
file size of PDF files generated by pypdf.
* `ROB`: A robustness change. Dealing better with broken PDF files.
* `DOC`: A documentation change.
* `TST`: Adding or adjusting tests.
* `DEV`: Developer experience improvements, e.g., pre-commit or setting up CI.
* `MAINT`: Quite a lot of different stuff. Performance improvements are, for sure,
the most interesting changes in here. Refactorings as well.
* `STY`: A style change. Something that makes pypdf code more consistent.
Typically, a small change. It could also be better error messages for
end users.
The prefix is used to generate the CHANGELOG. Every PR must have exactly one -
if you feel like several match, take the top one from this list that matches for
your PR.
## Pull Request Size
Smaller Pull Requests (PRs) are preferred as it's typically easier to merge
them. For example, if you have some typos, a few code-style changes, a new
feature, and a bug-fix, that could be three or four PRs.
A PR must be complete. That means if you introduce a new feature, it must be
finished within the PR and have a test for that feature.
## Benchmarks
We need to keep an eye on performance, and thus we have a few benchmarks.
See [py-pdf.github.io/pypdf/dev/bench](https://py-pdf.github.io/pypdf/dev/bench/)
================================================
FILE: docs/dev/pdf-format.md
================================================
# The PDF Format
It is recommended to look in the PDF specification for details and clarifications.
* [PDF Specification Archive](https://pdfa.org/resource/pdf-specification-archive/)
* [Portable Document Format Reference Manual, 1993. ISBN 0-201-62628-4](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.0.pdf)
* [ISO 32000-1:2008 (PDF 1.7)](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf)
* ISO 32000-2:2020 (PDF 2.0)
```{note}
We currently generate files with a header for PDF 1.3 by default. At the same time, we strive
to support the PDF 1.7 specification.
Features specific to PDF 2.0 might be available, but we always ensure that older versions do
not break due to the rather limited general PDF 2.0 support in the wild and to not break for
old PDF files. For this reason, some historical aspects (like insecure encryption algorithms)
are required to be supported, although PDF 2.0 deprecates most of them and allows more secure
variants.
```
Below is only intended to give a very rough overview of the format.
## Overall Structure
A PDF consists of:
1. Header: Contains the version of the PDF, e.g. `%PDF-1.7`
2. Body: Contains a sequence of indirect objects
3. Cross-reference table (xref): Contains a list of the indirect objects in the body
4. Trailer
## The xref table
A cross-reference table (xref) is a table of the indirect objects in the body.
It allows quick access to those objects by pointing to their location in the file.
It looks like this:
```text
xref 42 5
0000001000 65535 f
0000001234 00000 n
0000001987 00000 n
0000011987 00000 n
0000031987 00000 n
```
Let's go through it step-by-step:
* `xref` is just a keyword that specifies the start of the xref table.
* `42` is the numerical ID of the first object in this xref section; `5` is the number of entries in the xref table.
* Now every object has 3 entries `nnnnnnnnnn ggggg n`: a 10-digit byte offset,
a 5-digit generation number, and a literal keyword which is either `n` or `f`.
* `nnnnnnnnnn` is the byte offset of the object. It tells the reader where
the object is in the file.
* `ggggg` is the generation number. It tells the reader how old the object is.
* `n` means that the object is a normal in-use object, `f` means that the object
is a free object.
* The first free object always has a generation number of 65535. It forms
the head of a linked-list of all free objects.
* The generation number of a normal object is always 0. The generation
number allows the PDF format to contain multiple versions of the same
object. This is a version history mechanism.
## The body
The body is a sequence of indirect objects:
`counter generation_number << the_object >> endobj`
* `counter` (integer) is a unique identifier for the object.
* `generation_number` (integer) is the generation number of the object.
* `the_object` is the object itself. It can be empty. Starts with `/Keyword` to
specify which kind of object it is.
* `endobj` marks the end of the object.
A concrete example can be found in `test_reader.py::test_get_images_raw`:
```text
1 0 obj << /Count 1 /Kids [4 0 R] /Type /Pages >> endobj
2 0 obj << >> endobj
3 0 obj << >> endobj
4 0 obj << /Contents 3 0 R /CropBox [0.0 0.0 2550.0 3508.0]
/MediaBox [0.0 0.0 2550.0 3508.0] /Parent 1 0 R
/Resources << /Font << >> >>
/Rotate 0 /Type /Page >> endobj
5 0 obj << /Pages 1 0 R /Type /Catalog >> endobj
```
## The trailer
The trailer looks like this:
```text
trailer << /Root 5 0 R
/Size 6
>>
startxref 1234
%%EOF
```
Let's go through it:
* `trailer <<` indicates that the *trailer dictionary* starts. It ends with `>>`.
* `startxref` is a keyword followed by the byte-location of the `xref` keyword.
As the trailer is always at the bottom of the file, this allows readers to
quickly find the xref table.
* `%%EOF` is the end-of-file marker.
The trailer dictionary is a key-value list. The keys are specified in
Table 15 of the PDF Reference 1.7, e.g. `/Root` and `/Size` (both are required).
* `/Root` (dictionary) contains the document catalog.
* The `5` is the object number of the catalog dictionary.
* `0` is the generation number of the catalog dictionary.
* `R` is the keyword that indicates that the object is a reference to the
catalog dictionary.
* `/Size` (integer) contains the total number of entries in the files xref table.
## Reading PDF files
Most PDF files are compressed. If you want to read them, first uncompress them:
```bash
pdftk crazyones.pdf output crazyones-uncomp.pdf uncompress
```
Then rename `crazyones-uncomp.pdf` to `crazyones-uncomp.txt` and open it in
your favorite IDE / text editor.
================================================
FILE: docs/dev/pypdf-parsing.md
================================================
# How pypdf parses PDF files
pypdf uses {class}`~pypdf.PdfReader` to parse PDF files.
The method {py:meth}`PdfReader.read <pypdf.PdfReader.read>` shows the basic
structure of parsing:
1. **Finding and reading the cross-reference tables / trailer**: The
cross-reference table (xref table) is a table of byte offsets that indicate
the locations of objects within the file. The trailer provides additional
information such as the root object (Catalog) and the Info object containing
metadata.
2. **Parsing the objects**: After locating the xref table and the trailer, pypdf
proceeds to parse the objects in the PDF. Objects in a PDF can be of various
types such as dictionaries, arrays, streams, and simple data types (e.g.,
integers, strings). pypdf parses these objects and stores them in
{py:meth}`PdfReader.resolved_objects <pypdf.PdfReader.resolved_objects>`,
populated by {py:meth}`cache_indirect_object <pypdf.PdfReader.cache_indirect_object>`.
3. **Decoding content streams**: The content of a PDF is typically stored in
content streams, which are sequences of PDF operators and operands. pypdf
decodes these content streams by applying filters (e.g., `FlateDecode`,
`LZWDecode`) specified in the stream's dictionary. This is only done when the
object is requested by {py:meth}`PdfReader.get_object <pypdf.PdfReader.get_object>`
which uses the `PdfReader._get_object_from_stream` method.
## References
[PDF 1.7 specification](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf):
* 7.5 File Structure
* 7.5.4 Cross-Reference Table
* 7.8 Content Streams and Resources
================================================
FILE: docs/dev/pypdf-writing.md
================================================
# How pypdf writes PDF files
pypdf uses {py:class}`PdfWriter <pypdf.PdfWriter>` to write PDF files. pypdf has
{py:class}`PdfObject <pypdf.generic.PdfObject>` and several subclasses with the
{py:meth}`write_to_stream <pypdf.generic.PdfObject.write_to_stream>` method.
The {py:meth}`PdfWriter.write <pypdf.PdfWriter.write>` method uses the
`write_to_stream` methods of the referenced objects.
The {py:meth}`PdfWriter.write_stream <pypdf.PdfWriter.write_stream>` method
has the following core steps:
1. `_sweep_indirect_references`: This step ensures that any circular references
to objects are correctly handled. It adds the object reference numbers of any
circularly referenced objects to an external reference map, so that
self-page-referencing trees can reference the correct new object location,
rather than copying in a new copy of the page object.
2. **Write the File Header and Body** with `_write_pdf_structure`: In this step,
the PDF header and objects are written to the output stream. This includes
the PDF version (e.g., %PDF-1.7) and the objects that make up the content of
the PDF, such as pages, annotations, and form fields. The locations (byte
offsets) of these objects are stored for later use in generating the xref
table.
3. **Write the Cross-Reference Table** with `_write_xref_table`: Using the stored
object locations, this step generates and writes the cross-reference table
(xref table) to the output stream. The cross-reference table contains the
byte offsets for each object in the PDF file, allowing for quick random
access to objects when reading the PDF.
4. **Write the File Trailer** with `_write_trailer`: The trailer is written to
the output stream in this step. The trailer contains essential information,
such as the number of objects in the PDF, the location of the root object
(Catalog), and the Info object containing metadata. The trailer also
specifies the location of the xref table.
## How others do it
Looking at alternative software designs and implementations can help to improve
our choices.
### fpdf2
[fpdf2](https://pypi.org/project/fpdf2/) has a [`PDFObject` class](https://github.com/PyFPDF/fpdf2/blob/master/fpdf/syntax.py)
with a serialize method which roughly maps to `pypdf.PdfObject.write_to_stream`.
Some other similarities include:
* [fpdf.output.OutputProducer.buffersize](https://github.com/PyFPDF/fpdf2/blob/master/fpdf/output.py#L370-L485) vs. {py:meth}`pypdf.PdfWriter.write_stream <pypdf.PdfWriter.write_stream>`
* [fpdpf.syntax.Name](https://github.com/PyFPDF/fpdf2/blob/master/fpdf/syntax.py#L124) vs. {py:class}`pypdf.generic.NameObject <pypdf.generic.NameObject>`
* [fpdf.syntax.build_obj_dict](https://github.com/PyFPDF/fpdf2/blob/master/fpdf/syntax.py#L222) vs. {py:class}`pypdf.generic.DictionaryObject <pypdf.generic.DictionaryObject>`
* [fpdf.structure_tree.NumberTree](https://github.com/PyFPDF/fpdf2/blob/master/fpdf/structure_tree.py#L17) vs. {py:class}`pypdf.generic.TreeObject <pypdf.generic.TreeObject>`
### pdfrw
[pdfrw](https://pypi.org/project/pdfrw/), in contrast, seems to work more with
the standard Python objects (bool, float, string) and not wrap them in custom
objects, if possible. It still has:
* [PdfArray](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfarray.py#L13)
* [PdfDict](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfdict.py#L49)
* [PdfName](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfname.py#L65)
* [PdfString](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfstring.py#L322)
* [PdfIndirect](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/objects/pdfindirect.py#L10)
The core classes of pdfrw are
[PdfReader](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/pdfreader.py#L26)
and
[PdfWriter](https://github.com/pmaupin/pdfrw/blob/master/pdfrw/pdfwriter.py#L224)
================================================
FILE: docs/dev/releasing.md
================================================
# Releasing
A `pypdf` release contains the following artifacts:
* A new [release on PyPI](https://pypi.org/project/pypdf/)
* A [release commit](https://github.com/py-pdf/pypdf/commit/91391b18bb8ec9e6e561e2795d988e8634a01a50)
* Containing a changelog update
* A new [git tag](https://github.com/py-pdf/pypdf/tags)
* A [GitHub release](https://github.com/py-pdf/pypdf/releases/tag/3.15.0)
## Who does it?
`pypdf` should typically only be released by one of the core maintainers / the
core maintainer. At the moment, this usually is stefan6419846.
Any owner of the py-pdf organization also has the technical permissions to
release.
## How is it done?
### With direct push permissions
This is the typical way for the core maintainer/benevolent dictator.
The release contains the following steps:
1. Update the CHANGELOG.md and the _version.py via `python make_release.py`.
This also prepares the release commit message.
2. Create a release commit: `git commit -eF RELEASE_COMMIT_MSG.md`.
3. Push commit: `git push`.
4. Create the tag: `git tag -s 6.7.1 -eF RELEASE_COMMIT_MSG.md`.
5. Push the tag: `git push origin 6.7.1`.
6. CI now builds a source and a wheels package which it pushes to PyPI. It also
creates the corresponding GitHub release.

### Using a Pull Request
This is the typical way for collaborators which do not have direct push permissions for
the `main` branch.
The release contains the following steps:
1. Update the CHANGELOG.md and the _version.py via `python make_release.py`.
This also prepares the release commit message.
2. Push the changes to a dedicated branch.
3. Open a pull request starting with `REL: `, followed by the new version number.
4. Wait for the approval of another eligible maintainer.
5. Merge the pull request with the name being the PR title and the body being
the content of `RELEASE_COMMIT_MSG.md`.
6. Create the tag: `git tag -s 6.7.1 -eF RELEASE_COMMIT_MSG.md`.
7. Push the tag: `git push origin 6.7.1`.
8. CI now builds a source and a wheels package which it pushes to PyPI. It also
creates the corresponding GitHub release.
### The Release Tag
* Use the release version as the tag name. No need for a leading "v".
* Use the changelog entry as the body.
## When are releases done?
There is no need to wait for anything. If the CI is green (all tests succeeded),
we can release.
At the moment, there is no fixed release cycle - except that we usually release
on Sunday.
================================================
FILE: docs/dev/testing.md
================================================
# Testing
pypdf uses [`pytest`](https://docs.pytest.org/en/7.1.x/) for testing.
To run the tests, you need to install the CI (Continuous Integration) requirements by running `pip install -r requirements/ci.txt` or
`pip install -r requirements/ci-3.11.txt` if running Python ≥ 3.11.
## Deselecting groups of tests
pypdf makes use of the following pytest markers:
* `slow`: Tests that require more than 5 seconds.
* `samples`: Tests that require [the `sample-files` git submodule](https://github.com/py-pdf/sample-files) to be initialized. As of October 2022, this is about 25 MB.
* `enable_socket`: Tests that download PDF documents. They are stored locally and thus only need to be downloaded once. As of October 2022, this is about 200 MB.
* To successfully run the tests, please download most of the documents beforehand: `python -c "from tests import download_test_pdfs; download_test_pdfs()"`
You can disable them by `pytest -m "not enable_socket"` or `pytest -m "not samples"`.
You can even disable all of them: `pytest -m "not enable_socket" -m "not samples" -m "not slow"`.
Please note that this reduces test coverage. The CI will always test all files.
## Docstrings in Unit tests
The first line of a docstring in a unit test should be written in a way that
you could prefix it with "This tests ensures that ...", e.g.
* Invalid XML in xmp_metadata is gracefully handled.
* The identity is returning its input.
* xmp_modify_date is extracted correctly.
This way, plugins like [`pytest-testdox`](https://pypi.org/project/pytest-testdox/)
can generate really nice output when the tests are running. This looks similar
to the output of [mocha.js](https://mochajs.org/).
If the test is a regression test, write
> This test is a regression test for issue #1234
If the regression test is just one parameter of other tests, then add it as
a comment for that parameter.
## Evaluate a PR in-progress version
You may want to test a version from a PR which has not been released yet.
The easiest way is to use pip and install a version from git:
a) Go the PR and identify the repository and branch.
Example from below : repository: __pubpub-zz__ / branch: __iss2200__ :

b) you can then install the version using pip from git:
Example:
```
pip install git+https://github.com/pubpub-zz/pypdf.git@iss2200
```
================================================
FILE: docs/index.rst
================================================
.. pypdf documentation main file, created by
sphinx-quickstart on Thu Apr 7 20:13:19 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to pypdf
=================
pypdf is a `free <https://en.wikipedia.org/wiki/Free_software>`_ and open
source pure-python PDF library capable of splitting,
merging, cropping, and transforming the pages of PDF files. It can also add
custom data, viewing options, and passwords to PDF files.
pypdf can retrieve text and metadata from PDFs as well.
See `pdfly <https://github.com/py-pdf/pdfly>`_ for a CLI application that uses pypdf to interact with PDFs.
You can contribute to `pypdf on GitHub <https://github.com/py-pdf/pypdf>`_.
.. toctree::
:caption: User Guide
:maxdepth: 1
user/installation
user/robustness
user/security
user/suppress-warnings
user/metadata
user/extract-text
user/post-processing-in-text-extraction
user/extract-images
user/handle-attachments
user/encryption-decryption
user/merging-pdfs
user/cropping-and-transforming
user/reading-pdf-annotations
user/adding-pdf-annotations
user/add-watermark
user/add-javascript
user/viewer-preferences
user/forms
user/handling-outlines
user/streaming-data
user/file-size
user/pdf-version-support
user/pdfa-compliance
.. toctree::
:caption: API Reference
:maxdepth: 1
modules/PdfReader
modules/PdfWriter
modules/Destination
modules/DocumentInformation
modules/Field
modules/Fit
modules/PageObject
modules/PageRange
modules/PaperSize
modules/RectangleObject
modules/Transformation
modules/XmpInformation
modules/annotations
modules/constants
modules/errors
modules/generic
modules/PdfDocCommon
.. toctree::
:caption: Developer Guide
:maxdepth: 1
dev/intro
dev/pdf-format
dev/pypdf-parsing
dev/pypdf-writing
dev/cmaps
dev/deprecations
dev/documentation
dev/testing
dev/releasing
.. toctree::
:caption: About pypdf
:maxdepth: 1
meta/CHANGELOG
meta/changelog-v1
meta/migration-1-to-2
meta/project-governance
meta/taking-ownership
meta/history
meta/CONTRIBUTORS
meta/scope-of-pypdf
meta/comparisons
meta/faq
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
================================================
FILE: docs/make.bat
================================================
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
================================================
FILE: docs/meta/changelog-v1.md
================================================
# Changelog of PyPDF2 1.X
## Version 1.28.4, 2022-05-29
Bug Fixes (BUG):
- XmpInformation._converter_date was unusable (#921)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.28.3...1.28.4)
## Version 1.28.3, 2022-05-28
### Deprecations (DEP)
- PEP8 renaming (#905)
### Bug Fixes (BUG)
- XmpInformation missing method _getText (#917)
- Fix PendingDeprecationWarning on _merge_page (#904)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.28.2...1.28.3)
## Version 1.28.2, 2022-05-23
### Bug Fixes (BUG)
- PendingDeprecationWarning for getContents (#893)
- PendingDeprecationWarning on using PdfMerger (#891)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.28.1...1.28.2)
## Version 1.28.1, 2022-05-22
### Bug Fixes (BUG)
- Incorrectly show deprecation warnings on internal usage (#887)
### Maintenance (MAINT)
- Add stacklevel=2 to deprecation warnings (#889)
- Remove duplicate warnings imports (#888)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.28.0...1.28.1)
## Version 1.28.0, 2022-05-22
This release adds a lot of deprecation warnings in preparation of the
PyPDF2 2.0.0 release. The changes are mostly using snake_case function-, method-,
and variable-names as well as using properties instead of getter-methods.
Maintenance (MAINT):
- Remove IronPython Fallback for zlib (#868)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.12...1.27.13)
### Deprecations (DEP)
* Make the `PyPDF2.utils` module private
* Rename of core classes:
* PdfFileReader ➔ PdfReader
* PdfFileWriter ➔ PdfWriter
* PdfFileMerger ➔ PdfMerger
* Use PEP8 conventions for function names and parameters
* If a property and a getter-method are both present, use the property
#### Details
In many places:
- getObject ➔ get_object
- writeToStream ➔ write_to_stream
- readFromStream ➔ read_from_stream
PyPDF2.generic
- readObject ➔ read_object
- convertToInt ➔ convert_to_int
- DocumentInformation.getText ➔ DocumentInformation._get_text :
This method should typically not be used; please let me know if you need it.
PdfReader class:
- `reader.getPage(pageNumber)` ➔ `reader.pages[page_number]`
- `reader.getNumPages()` / `reader.numPages` ➔ `len(reader.pages)`
- getDocumentInfo ➔ metadata
- flattenedPages attribute ➔ flattened_pages
- resolvedObjects attribute ➔ resolved_objects
- xrefIndex attribute ➔ xref_index
- getNamedDestinations / namedDestinations attribute ➔ named_destinations
- getPageLayout / pageLayout ➔ page_layout attribute
- getPageMode / pageMode ➔ page_mode attribute
- getIsEncrypted / isEncrypted ➔ is_encrypted attribute
- getOutlines ➔ get_outlines
- readObjectHeader ➔ read_object_header
- cacheGetIndirectObject ➔ cache_get_indirect_object
- cacheIndirectObject ➔ cache_indirect_object
- getDestinationPageNumber ➔ get_destination_page_number
- readNextEndLine ➔ read_next_end_line
- _zeroXref ➔ _zero_xref
- _authenticateUserPassword ➔ _authenticate_user_password
- _pageId2Num attribute ➔ _page_id2num
- _buildDestination ➔ _build_destination
- _buildOutline ➔ _build_outline
- _getPageNumberByIndirect(indirectRef) ➔ _get_page_number_by_indirect(indirect_ref)
- _getObjectFromStream ➔ _get_object_from_stream
- _decryptObject ➔ _decrypt_object
- _flatten(..., indirectRef) ➔ _flatten(..., indirect_ref)
- _buildField ➔ _build_field
- _checkKids ➔ _check_kids
- _writeField ➔ _write_field
- _write_field(..., fieldAttributes) ➔ _write_field(..., field_attributes)
- _read_xref_subsections(..., getEntry, ...) ➔ _read_xref_subsections(..., get_entry, ...)
PdfWriter class:
- `writer.getPage(pageNumber)` ➔ `writer.pages[page_number]`
- `writer.getNumPages()` ➔ `len(writer.pages)`
- addMetadata ➔ add_metadata
- addPage ➔ add_page
- addBlankPage ➔ add_blank_page
- addAttachment(fname, fdata) ➔ add_attachment(filename, data)
- insertPage ➔ insert_page
- insertBlankPage ➔ insert_blank_page
- appendPagesFromReader ➔ append_pages_from_reader
- updatePageFormFieldValues ➔ update_page_form_field_values
- cloneReaderDocumentRoot ➔ clone_reader_document_root
- cloneDocumentFromReader ➔ clone_document_from_reader
- getReference ➔ get_reference
- getOutlineRoot ➔ get_outline_root
- getNamedDestRoot ➔ get_named_dest_root
- addBookmarkDestination ➔ add_bookmark_destination
- addBookmarkDict ➔ add_bookmark_dict
- addBookmark ➔ add_bookmark
- addNamedDestinationObject ➔ add_named_destination_object
- addNamedDestination ➔ add_named_destination
- removeLinks ➔ remove_links
- removeImages(ignoreByteStringObject) ➔ remove_images(ignore_byte_string_object)
- removeText(ignoreByteStringObject) ➔ remove_text(ignore_byte_string_object)
- addURI ➔ add_uri
- addLink ➔ add_link
- getPage(pageNumber) ➔ get_page(page_number)
- getPageLayout / setPageLayout / pageLayout ➔ page_layout attribute
- getPageMode / setPageMode / pageMode ➔ page_mode attribute
- _addObject ➔ _add_object
- _addPage ➔ _add_page
- _sweepIndirectReferences ➔ _sweep_indirect_references
PdfMerger class
- `__init__` parameter: strict=True ➔ strict=False (the PdfFileMerger still has the old default)
- addMetadata ➔ add_metadata
- addNamedDestination ➔ add_named_destination
- setPageLayout ➔ set_page_layout
- setPageMode ➔ set_page_mode
Page class:
- artBox / bleedBox/ cropBox/ mediaBox / trimBox ➔ artbox / bleedbox/ cropbox/ mediabox / trimbox
- getWidth, getHeight ➔ width / height
- getLowerLeft_x / getUpperLeft_x ➔ left
- getUpperRight_x / getLowerRight_x ➔ right
- getLowerLeft_y / getLowerRight_y ➔ bottom
- getUpperRight_y / getUpperLeft_y ➔ top
- getLowerLeft / setLowerLeft ➔ lower_left property
- upperRight ➔ upper_right
- mergePage ➔ merge_page
- rotateClockwise / rotateCounterClockwise ➔ rotate_clockwise
- _mergeResources ➔ _merge_resources
- _contentStreamRename ➔ _content_stream_rename
- _pushPopGS ➔ _push_pop_gs
- _addTransformationMatrix ➔ _add_transformation_matrix
- _mergePage ➔ _merge_page
XmpInformation class:
- getElement(..., aboutUri, ...) ➔ get_element(..., about_uri, ...)
- getNodesInNamespace(..., aboutUri, ...) ➔ get_nodes_in_namespace(..., aboutUri, ...)
- _getText ➔ _get_text
utils.py:
- matrixMultiply ➔ matrix_multiply
- RC4_encrypt is moved to the security module
## Version 1.27.12, 2022-05-02
### Bug Fixes (BUG)
- _rebuild_xref_table expects trailer to be a dict (#857)
### Documentation (DOC)
- Security Policy
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.11...1.27.12)
## Version 1.27.11, 2022-05-02
### Bug Fixes (BUG)
- Incorrectly issued xref warning/exception (#855)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.10...1.27.11)
## Version 1.27.10, 2022-05-01
### Robustness (ROB)
- Handle missing destinations in reader (#840)
- warn-only in readStringFromStream (#837)
- Fix corruption in startxref or xref table (#788 and #830)
### Documentation (DOC)
- Project Governance (#799)
- History of PyPDF2
- PDF feature/version support (#816)
- More details on text parsing issues (#815)
### Developer Experience (DEV)
- Add benchmark command to Makefile
- Ignore IronPython parts for code coverage (#826)
### Maintenance (MAINT)
- Split pdf module (#836)
- Separated CCITTFax param parsing/decoding (#841)
- Update requirements files
### Testing (TST)
- Use external repository for larger/more PDFs for testing (#820)
- Swap incorrect test names (#838)
- Add test for PdfFileReader and page properties (#835)
- Add tests for PyPDF2.generic (#831)
- Add tests for utils, form fields, PageRange (#827)
- Add test for ASCII85Decode (#825)
- Add test for FlateDecode (#823)
- Add test for filters.ASCIIHexDecode (#822)
### Code Style (STY)
- Apply pre-commit (black, isort) + use snake_case variables (#832)
- Remove debug code (#828)
- Documentation, Variable names (#839)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.9...1.27.10)
## Version 1.27.9, 2022-04-24
A change I would like to highlight is the performance improvement for
large PDF files (#808) 🎉
### New Features (ENH)
- Add papersizes (#800)
- Allow setting permission flags when encrypting (#803)
- Allow setting form field flags (#802)
### Bug Fixes (BUG)
- TypeError in xmp._converter_date (#813)
- Improve spacing for text extraction (#806)
- Fix PDFDocEncoding Character Set (#809)
### Robustness (ROB)
- Use null ID when encrypted but no ID given (#812)
- Handle recursion error (#804)
### Documentation (DOC)
- CMaps (#811)
- The PDF Format + commit prefixes (#810)
- Add compression example (#792)
### Developer Experience (DEV)
- Add Benchmark for Performance Testing (#781)
### Maintenance (MAINT)
- Validate PDF magic byte in strict mode (#814)
- Make PdfFileMerger.addBookmark() behave life PdfFileWriters' (#339)
- Quadratic runtime while parsing reduced to linear (#808)
### Testing (TST)
- Newlines in text extraction (#807)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.8...1.27.9)
## Version 1.27.8, 2022-04-21
### Bug Fixes (BUG)
- Use 1MB as offset for readNextEndLine (#321)
- 'PdfFileWriter' object has no attribute 'stream' (#787)
### Robustness (ROB)
- Invalid float object; use 0 as fallback (#782)
### Documentation (DOC)
- Robustness (#785)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.7...1.27.8)
## Version 1.27.7, 2022-04-19
### Bug Fixes (BUG)
- Import exceptions from PyPDF2.errors in PyPDF2.utils (#780)
### Code Style (STY)
- Naming in 'make_changelog.py'
## Version 1.27.6, 2022-04-18
### Deprecations (DEP)
- Remove support for Python 2.6 and older (#776)
### New Features (ENH)
- Extract document permissions (#320)
### Bug Fixes (BUG)
- Clip by trimBox when merging pages, which would otherwise be ignored (#240)
- Add overwriteWarnings parameter PdfFileMerger (#243)
- IndexError for getPage() of decrypted file (#359)
- Handle cases where decodeParms is an ArrayObject (#405)
- Updated PDF fields don't show up when page is written (#412)
- Set Linked Form Value (#414)
- Fix zlib -5 error for corrupt files (#603)
- Fix reading more than last1K for EOF (#642)
- Accidental import
### Robustness (ROB)
- Allow extra whitespace before "obj" in readObjectHeader (#567)
### Documentation (DOC)
- Link to pdftoc in Sample_Code (#628)
- Working with annotations (#764)
- Structure history
### Developer Experience (DEV)
- Add issue templates (#765)
- Add tool to generate changelog
### Maintenance (MAINT)
- Use grouped constants instead of string literals (#745)
- Add error module (#768)
- Use decorators for @staticmethod (#775)
- Split long functions (#777)
### Testing (TST)
- Run tests in CI once with -OO Flags (#770)
- Filling out forms (#771)
- Add tests for Writer (#772)
- Error cases (#773)
- Check Error messages (#769)
- Regression test for issue #88
- Regression test for issue #327
### Code Style (STY)
- Make variable naming more consistent in tests
[Full changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.5...1.27.6)
## Version 1.27.5, 2022-04-15
### Security (SEC)
- ContentStream_readInlineImage had potential infinite loop (#740)
### Bug fixes (BUG)
- Fix merging encrypted files (#757)
- CCITTFaxDecode decodeParms can be an ArrayObject (#756)
### Robustness improvements (ROBUST)
- title sometimes None (#744)
### Documentation (DOC)
- Adjust short description of the package
### Tests and Test setup (TST)
- Rewrite JS tests from unittest to pytest (#746)
- Increase Test coverage, mainly with filters (#756)
- Add test for inline images (#758)
### Developer Experience Improvements (DEV)
- Remove unused Travis-CI configuration (#747)
- Show code coverage (#754, #755)
- Add mutmut (#760)
### Miscellaneous
- STY: Closing file handles, explicit exports, ... (#743)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.4...1.27.5)
## Version 1.27.4, 2022-04-12
### Bug fixes (BUG)
- Guard formatting of `__init__.__doc__` string (#738)
### Packaging (PKG)
- Add more precise license field to setup (#733)
### Testing (TST)
- Add test for issue #297
### Miscellaneous
- DOC: Miscallenious ➔ Miscellaneous (Typo)
- TST: Fix CI triggering (master ➔ main) (#739)
- STY: Fix various style issues (#742)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.3...1.27.4)
## Version 1.27.3, 2022-04-10
- PKG: Make Tests not a subpackage (#728)
- BUG: Fix ASCII85Decode.decode assertion (#729)
- BUG: Error in Chinese character encoding (#463)
- BUG: Code duplication in Scripts/2-up.py
- ROBUST: Guard 'obj.writeToStream' with 'if obj is not None'
- ROBUST: Ignore a /Prev entry with value 0 in the trailer
- MAINT: Remove Sample_Code (#726)
- TST: Close file handle in test_writer (#722)
- TST: Fix test_get_images (#730)
- DEV: Make tox use pytest and add more Python versions (#721)
- DOC: Many (#720, #723-725, #469)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.2...1.27.3)
## Version 1.27.2, 2022-04-09
- Add Scripts (including `pdfcat`), Resources, Tests, and Sample_Code back to
PyPDF2. It was removed by accident in 1.27.0, but might get removed with 2.0.0
See [discussions/718](https://github.com/py-pdf/PyPDF2/discussions/718).
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.1...1.27.2)
## Version 1.27.1, 2022-04-08
- Fixed project links on PyPI page after migration from mstamy2
to MartinThoma to the py-pdf organization on GitHub
- Documentation is now at [pypdf2.readthedocs.io](https://pypdf2.readthedocs.io/en/latest/)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.27.0...1.27.1)
## Version 1.27.0, 2022-04-07
Features:
- Add alpha channel support for png files in Script (#614)
### Bug fixes (BUG)
- Fix formatWarning for filename without slash (#612)
- Add whitespace between words for extractText() (#569, #334)
- "invalid escape sequence" SyntaxError (#522)
- Avoid error when printing warning in pythonw (#486)
- Stream operations can be List or Dict (#665)
### Documentation (DOC)
- Added Scripts/pdf-image-extractor.py
- Documentation improvements (#550, #538, #324, #426, #394)
### Tests and Test setup (TST)
- Add GitHub Action which automatically runs unit tests via pytest and
static code analysis with Flake8 (#660)
- Add several unit tests (#661, #663)
- Add .coveragerc to create coverage reports
### Developer Experience Improvements (DEV)
- Pre commit: Developers can now `pre-commit install` to avoid tiny issues like trailing whitespaces
### Miscellaneous
- Add the LICENSE file to the distributed packages (#288)
- Use setuptools instead of distutils (#599)
- Improvements for the PyPI page (#644)
- Python 3 changes (#504, #366)
[Full Changelog](https://github.com/py-pdf/PyPDF2/compare/1.26.0...1.27.0)
## Version 1.26.0, 2016-05-18
- NOTE: Active maintenance on PyPDF2 is resuming after a hiatus
- Fixed a bug where image resources where incorrectly
overwritten when merging pages
- Added dictionary for JavaScript actions to the root (louib)
- Added unit tests for the JS functionality (louib)
- Add more Python 3 compatibility when reading inline images (im2703
and (VyacheslavHashov)
- Return NullObject instead of raising error when failing to resolve
object (ctate)
- Don't output warning for non-zeroed xref table when strict=False
(BenRussert)
- Remove extraneous zeroes from output formatting (speedplane)
- Fix bug where reading an inline image would cut off prematurely
in certain cases (speedplane)
## Version 1.25.1, 2015-07-20
- Fix bug when parsing inline images. Occurred when merging
certain pages with inline images
- Fixed type error when creating outlines by utilizing the
isString() test
## Version 1.25, 2015-07-07
BUGFIXES:
- Added Python 3 algorithm for ASCII85Decode. Fixes issue when
reading reportlab-generated files with Py 3 (jerickbixly)
- Recognize more escape sequence which would otherwise throw an
exception (manuelzs, robertsoakes)
- Fixed overflow error in generic.py. Occurred
when reading a too-large int in Python 2 (by Raja Jamwal)
- Allow access to files which were encrypted with an empty
password. Previously threw a "File has not been decrypted"
exception (Elena Williams)
- Do not attempt to decode an empty data stream. Previously
would cause an error in decode algorithms (vladir)
- Fixed some type issues specific to Py 2 or Py 3
- Fix issue when stream data begins with whitespace (soloma83)
- Recognize abbreviated filter names (AlmightyOatmeal and
Matthew Weiss)
- Copy decryption key from PdfFileReader to PdfFileMerger.
Allows usage of PdfFileMerger with encrypted files (twolfson)
- Fixed bug which occurred when a NameObject is present at end
of a file stream. Threw a "Stream has ended unexpectedly"
exception (speedplane)
FEATURES:
- Initial work on a test suite; to be expanded in future.
Tests and Resources directory added, README updated (robertsoakes)
- Added document cloning methods to PdfFileWriter:
appendPagesFromReader, cloneReaderDocumentRoot, and
cloneDocumentFromReader. See official documentation (robertsoakes)
- Added method for writing to form fields: updatePageFormFieldValues.
This will be enhanced in the future. See official documentation
(robertsoakes)
- New addAttachment method. See documentation. Support for adding
and extracting embedded files to be enhanced in the future
(moshekaplan)
- Added methods to get page number of given PageObject or
Destination: getPageNumber and getDestinationPageNumber.
See documentation (mozbugbox)
OTHER ENHANCEMENTS:
- Enhanced type handling (Brent Amrhein)
- Enhanced exception handling in NameObject (sbywater)
- Enhanced extractText method output (peircej)
- Better exception handling
- Enhanced regex usage in NameObject class (speedplane)
## Version 1.24, 2014-12-31
- Bugfixes for reading files in Python 3 (by Anthony Tuininga and
pqqp)
- Appropriate errors are now raised instead of infinite loops (by
naure and Cyrus Vafadari)
- Bugfix for parsing number tokens with leading spaces (by Maxim
Kamenkov)
- Don't crash on bad /Outlines reference (by eshellman)
- Conform tabs/spaces and blank lines to PEP 8 standards
- Utilize the readUntilRegex method when reading Number Objects
(by Brendan Jurd)
- More bugfixes for Python 3 and clearer exception handling
- Fixed encoding issue in merger (with eshellman)
- Created separate folder for scripts
## Version 1.23, 2014-08-11
- Documentation now available at pythonhosted.org
- Bugfix in pagerange.py for when `__init__.__doc__` has no value (by
Vladir Cruz)
- Fix typos in OutlinesObject().add() (by shilluc)
- Re-added a missing return statement in a utils.py method
- Corrected viewing mode names (by Jason Scheirer)
- New PdfFileWriter method: addJS() (by vfigueiro)
- New bookmark features: color, boldness, italics, and page fit
(by Joshua Arnott)
- New PdfFileReader method: getFields(). Used to extract field
information from PDFs with interactive forms. See documentation
for details
- Converted README file to markdown format (by Stephen Bussard)
- Several improvements to overall performance and efficiency
(by mozbugbox)
- Fixed a bug where geospatial information was not scaling along with
its page
- Fixed a type issue and a Python 3 issue in the decryption algorithms
(with Francisco Vieira and koba-ninkigumi)
- Fixed a bug causing an infinite loop in the ASCII 85 decoding
algorithm (by madmaardigan)
- Annotations (links, comment windows, etc.) are now preserved when
pages are merged together
- Used the Destination class in addLink() and addBookmark() so that
the page fit option could be properly customized
## Version 1.22, 2014-05-29
- Added .DS_Store to .gitignore (for Mac users) (by Steve Witham)
- Removed `__init__()` implementation in NameObject (by Steve Witham)
- Fixed bug (inf. loop) when merging pages in Python 3 (by commx)
- Corrected error when calculating height in scaleTo()
- Removed unnecessary code from DictionaryObject (by Georges Dubus)
- Fixed bug where an exception was thrown upon reading a NULL string
(by speedplane)
- Allow string literals (non-unicode strings in Python 2) to be passed
to PdfFileReader
- Allow ConvertFunctionsToVirtualList to be indexed with slices and
longs (in Python 2) (by Matt Gilson)
- Major improvements and bugfixes to addLink() method (see documentation
in source code) (by Henry Keiter)
- General code clean-up and improvements (with Steve Witham and Henry Keiter)
- Fixed bug that caused crash when comments are present at end of
dictionary
## Version 1.21, 2014-04-21
- Fix for when /Type isn't present in the Pages dictionary (by Rob1080)
- More tolerance for extra whitespace in Indirect Objects
- Improved Exception handling
- Fixed error in getHeight() method (by Simon Kaempflein)
- implement use of utils.string_type to resolve Py2-3 compatibility issues
- Prevent exception for multiple definitions in a dictionary (with carlosfunk)
(only when strict = False)
- Fixed errors when parsing a slice using pdfcat on command line (by
Steve Witham)
- Tolerance for EOF markers within 1024 bytes of the actual end of the
file (with David Wolever)
- Added overwriteWarnings parameter to PdfFileReader constructor, if False
PyPDF2 will NOT overwrite methods from Python's warnings.py module with
a custom implementation.
- Fix NumberObject and NameObject constructors for compatibility with PyPy
(Rüdiger Jungbeck, Xavier Dupré, shezadkhan137, Steven Witham)
- Utilize utils.Str in pdf.py and pagerange.py to resolve type issues (by
egbutter)
- Improvements in implementing StringIO for Python 2 and BytesIO for
Python 3 (by Xavier Dupré)
- Added /x00 to Whitespaces, defined utils.WHITESPACES to clarify code (by
Maxim Kamenkov)
- Bugfix for merging 3 or more resources with the same name (by lucky-user)
- Improvements to Xref parsing algorithm (by speedplane)
## Version 1.20, 2014-01-27
- Official Python 3+ support (with contributions from TWAC and cgammans)
Support for Python versions 2.6 and 2.7 will be maintained
- Command line concatenation (see pdfcat in sample code) (by Steve Witham)
- New FAQ; link included in README
- Allow more (although unnecessary) escape sequences
- Prevent exception when reading a null object in decoding parameters
- Corrected error in reading destination types (added a slash since they
are name objects)
- Corrected TypeError in scaleTo() method
- addBookmark() method in PdfFileMerger now returns bookmark (so nested
bookmarks can be created)
- Additions to Sample Code and Sample PDFs
- changes to allow 2up script to work (see sample code) (by Dylan McNamee)
- changes to metadata encoding (by Chris Hiestand)
- New methods for links: addLink() (by Enrico Lambertini) and removeLinks()
- Bugfix to handle nested bookmarks correctly (by Jamie Lentin)
- New methods removeImages() and removeText() available for PdfFileWriter
(by Tien Haï)
- Exception handling for illegal characters in Name Objects
## Version 1.19, 2013-10-08
BUGFIXES:
- Removed pop in sweepIndirectReferences to prevent infinite loop
(provided by ian-su-sirca)
- Fixed bug caused by whitespace when parsing PDFs generated by AutoCad
- Fixed a bug caused by reading a 'null' ASCII value in a dictionary
object (primarily in PDFs generated by AutoCad).
FEATURES:
- Added new folders for PyPDF2 sample code and example PDFs; see README
for each folder
- Added a method for debugging purposes to show current location while
parsing
- Ability to create custom metadata (by jamma313)
- Ability to access and customize document layout and view mode
(by Joshua Arnott)
OTHER:
- Added and corrected some documentation
- Added some more warnings and exception messages
- Removed old test/debugging code
UPCOMING:
- More bugfixes (We have received many problematic PDFs via email, we
will work with them)
- Documentation - It's time for PyPDF2 to get its own documentation
since it has grown much since the original pyPdf
- A FAQ to answer common questions
## Version 1.18, 2013-08-19
- Fixed a bug where older versions of objects were incorrectly added to the
cache, resulting in outdated or missing pages, images, and other objects
(from speedplane)
- Fixed a bug in parsing the xref table where new xref values were
overwritten; also cleaned up code (from speedplane)
- New method mergeRotatedAroundPointPage which merges a page while rotating
it around a point (from speedplane)
- Updated Destination syntax to respect PDF 1.6 specifications (from
jamma313)
- Prevented infinite loop when a PdfFileReader object was instantiated
with an empty file (from Jerome Nexedi)
Other Changes:
- Downloads now available via PyPI
- Installation through pip library is fixed
## Version 1.17, 2013-07-25
- Removed one (from pdf.py) of the two Destination classes. Both
classes had the same name, but were slightly different in content,
causing some errors. (from Janne Vanhala)
- Corrected and Expanded README file to demonstrate PdfFileMerger
- Added filter for LZW encoded streams (from Michal Horejsek)
- PyPDF2 issue tracker enabled on Github to allow community
discussion and collaboration
## Versions -1.16, -2013-06-30
- Note: This ChangeLog has not been kept up-to-date for a while.
Hopefully we can keep better track of it from now on. Some of the
changes listed here come from previous versions 1.14 and 1.15; they
were only vaguely defined. With the new _version.py file we should
have more structured and better documented versioning from now on.
- Defined `PyPDF2.__version__`
- Fixed encrypt() method (from Martijn The)
- Improved error handling on PDFs with truncated streams (from cecilkorik)
- Python 3 support (from kushal-kumaran)
- Fixed example code in README (from Jeremy Bethmont)
- Fixed an bug caused by DecimalError Exception (from Adam Morris)
- Many other bug fixes and features by:
jeansch
Anton Vlasenko
Joseph Walton
Jan Oliver Oelerich
Fabian Henze
And any others I missed.
Thanks for contributing!
## Version 1.13, 2010-12-04
- Fixed a typo in code for reading a "\b" escape character in strings.
- Improved `__repr__` in FloatObject.
- Fixed a bug in reading octal escape sequences in strings.
- Added getWidth and getHeight methods to the RectangleObject class.
- Fixed compatibility warnings with Python 2.4 and 2.5.
- Added addBlankPage and insertBlankPage methods on PdfFileWriter class.
- Fixed a bug with circular references in page's object trees (typically
annotations) that prevented correctly writing out a copy of those pages.
- New merge page functions allow application of a transformation matrix.
- To all patch contributors: I did a poor job of keeping this ChangeLog
up-to-date for this release, so I am missing attributions here for any
changes you submitted. Sorry! I'll do better in the future.
## Version 1.12, 2008-09-02
- Added support for XMP metadata.
- Fix reading files with xref streams with multiple /Index values.
- Fix extracting content streams that use graphics operators longer than 2
characters. Affects merging PDF files.
## Version 1.11, 2008-05-09
- Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
or FloatObject values.
- PDF compatibility fixes.
- Fix to read object xref stream in correct order.
- Fix for comments inside content streams.
## Version 1.10, 2007-10-04
- Text strings from PDF files are returned as Unicode string objects when
pyPdf determines that they can be decoded (as UTF-16 strings, or as
PDFDocEncoding strings). Unicode objects are also written out when
necessary. This means that string objects in pyPdf can be either
generic.ByteStringObject instances, or generic.TextStringObject instances.
- The extractText method now returns a unicode string object.
- All document information properties now return unicode string objects. In
the event that a document provides docinfo properties that are not decoded by
pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
title_raw rather than title)
- generic.DictionaryObject instances have been enhanced to be easier to use.
Values coming out of dictionary objects will automatically be de-referenced
(.getObject will be called on them), unless accessed by the new "raw_get"
method. DictionaryObjects can now only contain PdfObject instances (as keys
and values), making it easier to debug where non-PdfObject values (which
cannot be written out) are entering dictionaries.
- Support for reading named destinations and outlines in PDF files. Original
patch by Ashish Kulkarni.
- Stream compatibility reading enhancements for malformed PDF files.
- Cross reference table reading enhancements for malformed PDF files.
- Encryption documentation.
- Replace some "assert" statements with error raising.
- Minor optimizations to FlateDecode algorithm increase speed when using PNG
predictors.
## Version 1.9, 2006-12-15
- Fix several serious bugs introduced in version 1.8, caused by a failure to
run through our PDF test suite before releasing that version.
- Fix bug in NullObject reading and writing.
## Version 1.8, 2006-12-14
- Add support for decryp
gitextract_mui37wu0/
├── .git-blame-ignore-revs
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug-report.md
│ │ └── feature-request.md
│ ├── SECURITY.md
│ ├── dependabot.yaml
│ ├── scripts/
│ │ ├── check_gh_pages_updates.py
│ │ ├── check_pr_title.py
│ │ └── check_urls.py
│ └── workflows/
│ ├── benchmark.yaml
│ ├── create-github-release.yaml
│ ├── gh-pages-check.yaml
│ ├── github-ci.yaml
│ ├── publish-to-pypi.yaml
│ ├── release.yaml
│ ├── title-check.yaml
│ └── urls-check.yaml
├── .gitignore
├── .gitmodules
├── .pre-commit-config.yaml
├── .readthedocs.yaml
├── CHANGELOG.md
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE
├── Makefile
├── README.md
├── docs/
│ ├── Makefile
│ ├── _static/
│ │ └── releasing.drawio
│ ├── conf.py
│ ├── dev/
│ │ ├── cmaps.md
│ │ ├── deprecations.md
│ │ ├── documentation.md
│ │ ├── intro.md
│ │ ├── pdf-format.md
│ │ ├── pypdf-parsing.md
│ │ ├── pypdf-writing.md
│ │ ├── releasing.md
│ │ └── testing.md
│ ├── index.rst
│ ├── make.bat
│ ├── meta/
│ │ ├── changelog-v1.md
│ │ ├── comparisons.md
│ │ ├── faq.md
│ │ ├── history.md
│ │ ├── migration-1-to-2.md
│ │ ├── project-governance.md
│ │ ├── scope-of-pypdf.md
│ │ └── taking-ownership.md
│ ├── modules/
│ │ ├── Destination.rst
│ │ ├── DocumentInformation.rst
│ │ ├── Field.rst
│ │ ├── Fit.rst
│ │ ├── PageObject.rst
│ │ ├── PageRange.rst
│ │ ├── PaperSize.rst
│ │ ├── PdfDocCommon.rst
│ │ ├── PdfReader.rst
│ │ ├── PdfWriter.rst
│ │ ├── RectangleObject.rst
│ │ ├── Transformation.rst
│ │ ├── XmpInformation.rst
│ │ ├── annotations.rst
│ │ ├── constants.rst
│ │ ├── errors.rst
│ │ └── generic.rst
│ └── user/
│ ├── add-javascript.md
│ ├── add-watermark.md
│ ├── adding-pdf-annotations.md
│ ├── cropping-and-transforming.md
│ ├── encryption-decryption.md
│ ├── extract-images.md
│ ├── extract-text.md
│ ├── file-size.md
│ ├── forms.md
│ ├── handle-attachments.md
│ ├── handling-outlines.md
│ ├── installation.md
│ ├── merging-pdfs.md
│ ├── metadata.md
│ ├── pdf-version-support.md
│ ├── pdfa-compliance.md
│ ├── post-processing-in-text-extraction.md
│ ├── reading-pdf-annotations.md
│ ├── robustness.md
│ ├── security.md
│ ├── streaming-data.md
│ ├── suppress-warnings.md
│ └── viewer-preferences.md
├── make_release.py
├── pypdf/
│ ├── __init__.py
│ ├── _cmap.py
│ ├── _codecs/
│ │ ├── __init__.py
│ │ ├── _codecs.py
│ │ ├── adobe_glyphs.py
│ │ ├── core_font_metrics.py
│ │ ├── pdfdoc.py
│ │ ├── std.py
│ │ ├── symbol.py
│ │ └── zapfding.py
│ ├── _crypt_providers/
│ │ ├── __init__.py
│ │ ├── _base.py
│ │ ├── _cryptography.py
│ │ ├── _fallback.py
│ │ └── _pycryptodome.py
│ ├── _doc_common.py
│ ├── _encryption.py
│ ├── _font.py
│ ├── _page.py
│ ├── _page_labels.py
│ ├── _protocols.py
│ ├── _reader.py
│ ├── _text_extraction/
│ │ ├── __init__.py
│ │ ├── _layout_mode/
│ │ │ ├── __init__.py
│ │ │ ├── _fixed_width_page.py
│ │ │ ├── _text_state_manager.py
│ │ │ └── _text_state_params.py
│ │ └── _text_extractor.py
│ ├── _utils.py
│ ├── _version.py
│ ├── _writer.py
│ ├── annotations/
│ │ ├── __init__.py
│ │ ├── _base.py
│ │ ├── _markup_annotations.py
│ │ └── _non_markup_annotations.py
│ ├── constants.py
│ ├── errors.py
│ ├── filters.py
│ ├── generic/
│ │ ├── __init__.py
│ │ ├── _appearance_stream.py
│ │ ├── _base.py
│ │ ├── _data_structures.py
│ │ ├── _files.py
│ │ ├── _fit.py
│ │ ├── _image_inline.py
│ │ ├── _image_xobject.py
│ │ ├── _link.py
│ │ ├── _outline.py
│ │ ├── _rectangle.py
│ │ ├── _utils.py
│ │ └── _viewerpref.py
│ ├── pagerange.py
│ ├── papersizes.py
│ ├── py.typed
│ ├── types.py
│ └── xmp.py
├── pyproject.toml
├── requirements/
│ ├── ci-3.11.txt
│ ├── ci.in
│ ├── ci.txt
│ ├── dev.in
│ ├── dev.txt
│ ├── docs.in
│ └── docs.txt
├── resources/
│ ├── 010-pdflatex-forms.txt
│ ├── AEO.1172.layout.rot180.txt
│ ├── AEO.1172.layout.txt
│ ├── Claim Maker Alerts Guide_pg2.layout.txt
│ ├── Epic.Page.layout.txt
│ ├── afm_to_dataclass.py
│ ├── crazyones.txt
│ ├── crazyones_layout_vertical_space.txt
│ ├── crazyones_layout_vertical_space_font_height_weight.txt
│ ├── jpeg.txt
│ ├── multicolumn-lorem-ipsum.txt
│ └── toy.layout.txt
└── tests/
├── __init__.py
├── bench.py
├── conftest.py
├── example_files.yaml
├── generic/
│ ├── __init__.py
│ ├── test_base.py
│ ├── test_data_structures.py
│ ├── test_files.py
│ ├── test_image_inline.py
│ ├── test_image_xobject.py
│ └── test_link.py
├── scripts/
│ ├── __init__.py
│ ├── data/
│ │ └── commits__version_4_0_1.json
│ ├── test_example_files.py
│ └── test_make_release.py
├── test_annotations.py
├── test_appearance_stream.py
├── test_cmap.py
├── test_codecs.py
├── test_constants.py
├── test_doc_common.py
├── test_encryption.py
├── test_filters.py
├── test_font.py
├── test_forms.py
├── test_generic.py
├── test_images.py
├── test_javascript.py
├── test_merger.py
├── test_page.py
├── test_page_labels.py
├── test_pagerange.py
├── test_papersizes.py
├── test_pdfa.py
├── test_protocols.py
├── test_reader.py
├── test_text_extraction.py
├── test_utils.py
├── test_workflows.py
├── test_writer.py
├── test_xmp.py
└── utils.py
SYMBOL INDEX (1963 symbols across 85 files)
FILE: .github/scripts/check_gh_pages_updates.py
function fetch_json (line 18) | def fetch_json(url: str) -> dict:
function fetch_bytes (line 24) | def fetch_bytes(url: str) -> bytes:
function get_latest_version (line 30) | def get_latest_version(pkg: str) -> str:
function sri_hash (line 36) | def sri_hash(content: bytes) -> str:
function scan_html (line 42) | def scan_html(path: Path) -> list[re.Match[str]]:
function main (line 48) | def main() -> None:
FILE: .github/scripts/check_urls.py
function get_urls_from_test_files (line 24) | def get_urls_from_test_files() -> Iterator[str]:
function get_urls_from_example_files (line 39) | def get_urls_from_example_files() -> Iterator[str]:
function check_url (line 45) | def check_url(url: str) -> bool:
function main (line 72) | def main() -> bool:
FILE: make_release.py
class Change (line 16) | class Change:
function main (line 26) | def main(changelog_path: str) -> None:
function print_instructions (line 64) | def print_instructions(new_version: str) -> None:
function adjust_version_py (line 75) | def adjust_version_py(version: str) -> None:
function get_version_interactive (line 81) | def get_version_interactive(new_version: str, changes: str) -> str:
function is_semantic_version (line 97) | def is_semantic_version(version: str) -> bool:
function write_commit_msg_file (line 108) | def write_commit_msg_file(new_version: str, commit_changes: str) -> None:
function write_release_msg_file (line 122) | def write_release_msg_file(
function strip_header (line 138) | def strip_header(md: str) -> str:
function version_bump (line 143) | def version_bump(git_tag: str) -> str:
function get_changelog (line 159) | def get_changelog(changelog_path: str) -> str:
function write_changelog (line 174) | def write_changelog(new_changelog: str, changelog_path: str) -> None:
function get_formatted_changes (line 187) | def get_formatted_changes(git_tag: str) -> tuple[str, str]:
function get_most_recent_git_tag (line 266) | def get_most_recent_git_tag() -> str:
function get_author_mapping (line 279) | def get_author_mapping(line_count: int) -> dict[str, str]:
function get_git_commits_since_tag (line 305) | def get_git_commits_since_tag(git_tag: str) -> list[Change]:
function parse_commit_line (line 336) | def parse_commit_line(line: str, authors: dict[str, str]) -> Change:
FILE: pypdf/_cmap.py
function get_encoding (line 41) | def get_encoding(
function _parse_encoding (line 59) | def _parse_encoding(
function _parse_to_unicode (line 118) | def _parse_to_unicode(
function prepare_cm (line 151) | def prepare_cm(ft: DictionaryObject) -> bytes:
function process_cm_line (line 192) | def process_cm_line(
function _check_mapping_size (line 225) | def _check_mapping_size(size: int) -> None:
function parse_bfrange (line 230) | def parse_bfrange(
function parse_bfchar (line 298) | def parse_bfchar(line: bytes, map_dict: dict[Any, Any], int_entry: list[...
function _type1_alternative (line 322) | def _type1_alternative(
FILE: pypdf/_codecs/__init__.py
function fill_from_encoding (line 8) | def fill_from_encoding(enc: str) -> list[str]:
function rev_encoding (line 18) | def rev_encoding(enc: list[str]) -> dict[str, int]:
FILE: pypdf/_codecs/_codecs.py
class Codec (line 15) | class Codec(ABC):
method encode (line 19) | def encode(self, data: bytes) -> bytes:
method decode (line 32) | def decode(self, data: bytes) -> bytes:
class LzwCodec (line 45) | class LzwCodec(Codec):
method __init__ (line 53) | def __init__(self, max_output_length: int = 75_000_000) -> None:
method _initialize_encoding_table (line 56) | def _initialize_encoding_table(self) -> None:
method _increase_next_code (line 63) | def _increase_next_code(self) -> None:
method encode (line 73) | def encode(self, data: bytes) -> bytes:
method _pack_codes_into_bytes (line 115) | def _pack_codes_into_bytes(self, codes: list[int]) -> bytes:
method _initialize_decoding_table (line 149) | def _initialize_decoding_table(self) -> None:
method _next_code_decode (line 157) | def _next_code_decode(self, data: bytes) -> int:
method decode (line 211) | def decode(self, data: bytes) -> bytes:
method _add_entry_decode (line 267) | def _add_entry_decode(self, old_string: bytes, new_char: int) -> None:
FILE: pypdf/_codecs/adobe_glyphs.py
function _complete (line 13963) | def _complete() -> None:
FILE: pypdf/_crypt_providers/_base.py
class CryptBase (line 29) | class CryptBase:
method encrypt (line 30) | def encrypt(self, data: bytes) -> bytes: # pragma: no cover
method decrypt (line 33) | def decrypt(self, data: bytes) -> bytes: # pragma: no cover
class CryptIdentity (line 37) | class CryptIdentity(CryptBase):
FILE: pypdf/_crypt_providers/_cryptography.py
class CryptRC4 (line 47) | class CryptRC4(CryptBase):
method __init__ (line 48) | def __init__(self, key: bytes) -> None:
method encrypt (line 51) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 55) | def decrypt(self, data: bytes) -> bytes:
class CryptAES (line 60) | class CryptAES(CryptBase):
method __init__ (line 61) | def __init__(self, key: bytes) -> None:
method encrypt (line 64) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 73) | def decrypt(self, data: bytes) -> bytes:
function rc4_encrypt (line 91) | def rc4_encrypt(key: bytes, data: bytes) -> bytes:
function rc4_decrypt (line 96) | def rc4_decrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_encrypt (line 101) | def aes_ecb_encrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_decrypt (line 106) | def aes_ecb_decrypt(key: bytes, data: bytes) -> bytes:
function aes_cbc_encrypt (line 111) | def aes_cbc_encrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
function aes_cbc_decrypt (line 116) | def aes_cbc_decrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
FILE: pypdf/_crypt_providers/_fallback.py
class CryptRC4 (line 37) | class CryptRC4(CryptBase):
method __init__ (line 38) | def __init__(self, key: bytes) -> None:
method encrypt (line 45) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 57) | def decrypt(self, data: bytes) -> bytes:
class CryptAES (line 61) | class CryptAES(CryptBase):
method __init__ (line 62) | def __init__(self, key: bytes) -> None:
method encrypt (line 65) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 68) | def decrypt(self, data: bytes) -> bytes:
function rc4_encrypt (line 72) | def rc4_encrypt(key: bytes, data: bytes) -> bytes:
function rc4_decrypt (line 76) | def rc4_decrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_encrypt (line 80) | def aes_ecb_encrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_decrypt (line 84) | def aes_ecb_decrypt(key: bytes, data: bytes) -> bytes:
function aes_cbc_encrypt (line 88) | def aes_cbc_encrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
function aes_cbc_decrypt (line 92) | def aes_cbc_decrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
FILE: pypdf/_crypt_providers/_pycryptodome.py
class CryptRC4 (line 39) | class CryptRC4(CryptBase):
method __init__ (line 40) | def __init__(self, key: bytes) -> None:
method encrypt (line 43) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 46) | def decrypt(self, data: bytes) -> bytes:
class CryptAES (line 50) | class CryptAES(CryptBase):
method __init__ (line 51) | def __init__(self, key: bytes) -> None:
method encrypt (line 54) | def encrypt(self, data: bytes) -> bytes:
method decrypt (line 60) | def decrypt(self, data: bytes) -> bytes:
function rc4_encrypt (line 76) | def rc4_encrypt(key: bytes, data: bytes) -> bytes:
function rc4_decrypt (line 80) | def rc4_decrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_encrypt (line 84) | def aes_ecb_encrypt(key: bytes, data: bytes) -> bytes:
function aes_ecb_decrypt (line 88) | def aes_ecb_decrypt(key: bytes, data: bytes) -> bytes:
function aes_cbc_encrypt (line 92) | def aes_cbc_encrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
function aes_cbc_decrypt (line 96) | def aes_cbc_decrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
FILE: pypdf/_doc_common.py
function convert_to_int (line 90) | def convert_to_int(d: bytes, size: int) -> Union[int, tuple[Any, ...]]:
class DocumentInformation (line 98) | class DocumentInformation(DictionaryObject):
method __init__ (line 113) | def __init__(self) -> None:
method _get_text (line 116) | def _get_text(self, key: str) -> Optional[str]:
method title (line 125) | def title(self) -> Optional[str]:
method title_raw (line 139) | def title_raw(self) -> Optional[str]:
method author (line 144) | def author(self) -> Optional[str]:
method author_raw (line 154) | def author_raw(self) -> Optional[str]:
method subject (line 159) | def subject(self) -> Optional[str]:
method subject_raw (line 169) | def subject_raw(self) -> Optional[str]:
method creator (line 174) | def creator(self) -> Optional[str]:
method creator_raw (line 186) | def creator_raw(self) -> Optional[str]:
method producer (line 191) | def producer(self) -> Optional[str]:
method producer_raw (line 203) | def producer_raw(self) -> Optional[str]:
method creation_date (line 208) | def creation_date(self) -> Optional[datetime]:
method creation_date_raw (line 213) | def creation_date_raw(self) -> Optional[str]:
method modification_date (line 223) | def modification_date(self) -> Optional[datetime]:
method modification_date_raw (line 232) | def modification_date_raw(self) -> Optional[str]:
method keywords (line 243) | def keywords(self) -> Optional[str]:
method keywords_raw (line 253) | def keywords_raw(self) -> Optional[str]:
class PdfDocCommon (line 258) | class PdfDocCommon:
method root_object (line 275) | def root_object(self) -> DictionaryObject:
method pdf_header (line 280) | def pdf_header(self) -> str:
method get_object (line 284) | def get_object(
method _replace_object (line 290) | def _replace_object(self, indirect: IndirectObject, obj: PdfObject) ->...
method _info (line 295) | def _info(self) -> Optional[DictionaryObject]:
method metadata (line 299) | def metadata(self) -> Optional[DocumentInformation]:
method xmp_metadata (line 314) | def xmp_metadata(self) -> Optional[XmpInformation]:
method viewer_preferences (line 318) | def viewer_preferences(self) -> Optional[ViewerPreferences]:
method get_num_pages (line 332) | def get_num_pages(self) -> int:
method get_page (line 353) | def get_page(self, page_number: int) -> PageObject:
method _get_page_in_node (line 371) | def _get_page_in_node(
method named_destinations (line 409) | def named_destinations(self) -> dict[str, Destination]:
method get_named_dest_root (line 413) | def get_named_dest_root(self) -> ArrayObject:
method _get_named_destinations (line 447) | def _get_named_destinations(
method get_fields (line 523) | def get_fields(
method _get_qualified_field_name (line 573) | def _get_qualified_field_name(self, parent: DictionaryObject) -> str:
method _build_field (line 586) | def _build_field(
method _check_kids (line 628) | def _check_kids(
method _write_field (line 647) | def _write_field(self, fileobj: Any, field: Any, field_attributes: Any...
method get_form_text_fields (line 684) | def get_form_text_fields(self, full_qualified_name: bool = False) -> d...
method get_pages_showing_field (line 722) | def get_pages_showing_field(
method open_destination (line 793) | def open_destination(
method open_destination (line 823) | def open_destination(self, dest: Union[None, str, Destination, PageObj...
method outline (line 827) | def outline(self) -> OutlineType:
method _get_outline (line 835) | def _get_outline(
method threads (line 894) | def threads(self) -> Optional[ArrayObject]:
method _get_page_number_by_indirect (line 915) | def _get_page_number_by_indirect(
method get_page_number (line 920) | def get_page_number(self, page: PageObject) -> Optional[int]:
method get_destination_page_number (line 934) | def get_destination_page_number(self, destination: Destination) -> Opt...
method _build_destination (line 947) | def _build_destination(
method _build_outline_item (line 977) | def _build_outline_item(self, node: DictionaryObject) -> Optional[Dest...
method pages (line 1055) | def pages(self) -> list[PageObject]:
method page_labels (line 1072) | def page_labels(self) -> list[str]:
method page_layout (line 1082) | def page_layout(self) -> Optional[str]:
method page_mode (line 1110) | def page_mode(self) -> Optional[PagemodeType]:
method _flatten (line 1135) | def _flatten(
method remove_page (line 1220) | def remove_page(
method _get_indirect_object (line 1264) | def _get_indirect_object(self, num: int, gen: int) -> Optional[PdfObje...
method decode_permissions (line 1280) | def decode_permissions(
method user_access_permissions (line 1308) | def user_access_permissions(self) -> Optional[UserAccessPermissions]:
method are_permissions_valid (line 1325) | def are_permissions_valid(self) -> Optional[bool]:
method is_encrypted (line 1346) | def is_encrypted(self) -> bool:
method xfa (line 1356) | def xfa(self) -> Optional[dict[str, Any]]:
method attachments (line 1379) | def attachments(self) -> Mapping[str, list[bytes]]:
method attachment_list (line 1389) | def attachment_list(self) -> Generator[EmbeddedFile, None, None]:
method _list_attachments (line 1393) | def _list_attachments(self) -> list[str]:
method _get_attachment_list (line 1408) | def _get_attachment_list(self, name: str) -> list[bytes]:
method _get_attachments (line 1414) | def _get_attachments(
method _repr_mimebundle_ (line 1457) | def _repr_mimebundle_(
class LazyDict (line 1475) | class LazyDict(Mapping[Any, Any]):
method __init__ (line 1476) | def __init__(self, *args: Any, **kwargs: Any) -> None:
method __getitem__ (line 1479) | def __getitem__(self, key: str) -> Any:
method __iter__ (line 1483) | def __iter__(self) -> Iterator[Any]:
method __len__ (line 1486) | def __len__(self) -> int:
method __str__ (line 1489) | def __str__(self) -> str:
FILE: pypdf/_encryption.py
class CryptFilter (line 60) | class CryptFilter:
method __init__ (line 61) | def __init__(
method encrypt_object (line 71) | def encrypt_object(self, obj: PdfObject) -> PdfObject:
method decrypt_object (line 94) | def decrypt_object(self, obj: PdfObject) -> PdfObject:
function _padding (line 117) | def _padding(data: bytes) -> bytes:
class AlgV4 (line 121) | class AlgV4:
method compute_key (line 123) | def compute_key(
method compute_O_value_key (line 208) | def compute_O_value_key(owner_password: bytes, rev: int, key_size: int...
method compute_O_value (line 259) | def compute_O_value(rc4_key: bytes, user_password: bytes, rev: int) ->...
method compute_U_value (line 281) | def compute_U_value(key: bytes, rev: int, id1_entry: bytes) -> bytes:
method verify_user_password (line 341) | def verify_user_password(
method verify_owner_password (line 400) | def verify_owner_password(
class AlgV5 (line 472) | class AlgV5:
method verify_owner_password (line 474) | def verify_owner_password(
method verify_user_password (line 546) | def verify_user_password(
method calculate_hash (line 573) | def calculate_hash(R: int, password: bytes, salt: bytes, udata: bytes)...
method verify_perms (line 594) | def verify_perms(
method generate_values (line 621) | def generate_values(
method compute_U_value (line 643) | def compute_U_value(R: int, password: bytes, key: bytes) -> tuple[byte...
method compute_O_value (line 680) | def compute_O_value(
method compute_Perms_value (line 726) | def compute_Perms_value(key: bytes, p: int, metadata_encrypted: bool) ...
class PasswordType (line 764) | class PasswordType(IntEnum):
class EncryptAlgorithm (line 770) | class EncryptAlgorithm(tuple, Enum): # type: ignore # noqa: SLOT001
class EncryptionValues (line 779) | class EncryptionValues:
class Encryption (line 787) | class Encryption:
method __init__ (line 812) | def __init__(
method is_decrypted (line 843) | def is_decrypted(self) -> bool:
method encrypt_object (line 846) | def encrypt_object(self, obj: PdfObject, idnum: int, generation: int) ...
method decrypt_object (line 854) | def decrypt_object(self, obj: PdfObject, idnum: int, generation: int) ...
method _is_encryption_object (line 863) | def _is_encryption_object(obj: PdfObject) -> bool:
method _make_crypt_filter (line 875) | def _make_crypt_filter(self, idnum: int, generation: int) -> CryptFilter:
method _get_crypt (line 939) | def _get_crypt(
method _encode_password (line 952) | def _encode_password(password: Union[bytes, str]) -> bytes:
method verify (line 962) | def verify(self, password: Union[bytes, str]) -> PasswordType:
method verify_v4 (line 970) | def verify_v4(self, password: bytes) -> tuple[bytes, PasswordType]:
method verify_v5 (line 998) | def verify_v5(self, password: bytes) -> tuple[bytes, PasswordType]:
method write_entry (line 1019) | def write_entry(
method compute_values_v4 (line 1071) | def compute_values_v4(self, user_password: bytes, owner_password: byte...
method read (line 1091) | def read(encryption_entry: DictionaryObject, first_id_entry: bytes) ->...
method make (line 1160) | def make(
FILE: pypdf/_font.py
class FontDescriptor (line 14) | class FontDescriptor:
class CoreFontMetrics (line 37) | class CoreFontMetrics:
class Font (line 43) | class Font:
method _collect_tt_t1_character_widths (line 72) | def _collect_tt_t1_character_widths(
method _collect_cid_character_widths (line 104) | def _collect_cid_character_widths(
method _add_default_width (line 177) | def _add_default_width(current_widths: dict[str, int], flags: int) -> ...
method _parse_font_descriptor (line 196) | def _parse_font_descriptor(font_descriptor_obj: DictionaryObject) -> d...
method from_font_resource (line 220) | def from_font_resource(
method as_font_resource (line 305) | def as_font_resource(self) -> DictionaryObject:
method text_width (line 317) | def text_width(self, text: str = "") -> float:
FILE: pypdf/_page.py
function _get_rectangle (line 94) | def _get_rectangle(self: Any, name: str, defaults: Iterable[str]) -> Rec...
function _set_rectangle (line 114) | def _set_rectangle(self: Any, name: str, value: Union[RectangleObject, f...
function _delete_rectangle (line 118) | def _delete_rectangle(self: Any, name: str) -> None:
function _create_rectangle_accessor (line 122) | def _create_rectangle_accessor(name: str, fallback: Iterable[str]) -> pr...
class Transformation (line 130) | class Transformation:
method __init__ (line 159) | def __init__(self, ctm: CompressedTransformationMatrix = (1, 0, 0, 1, ...
method matrix (line 163) | def matrix(self) -> TransformationMatrixType:
method compress (line 176) | def compress(matrix: TransformationMatrixType) -> CompressedTransforma...
method _to_cm (line 196) | def _to_cm(self) -> str:
method transform (line 203) | def transform(self, m: "Transformation") -> "Transformation":
method translate (line 225) | def translate(self, tx: float = 0, ty: float = 0) -> "Transformation":
method scale (line 240) | def scale(
method rotate (line 269) | def rotate(self, rotation: float) -> "Transformation":
method __repr__ (line 289) | def __repr__(self) -> str:
method apply_on (line 293) | def apply_on(self, pt: list[float], as_object: bool = False) -> list[f...
method apply_on (line 297) | def apply_on(
method apply_on (line 302) | def apply_on(
class ImageFile (line 327) | class ImageFile:
method replace (line 354) | def replace(self, new_image: Image, **kwargs: Any) -> None:
method __str__ (line 411) | def __str__(self) -> str:
method __repr__ (line 414) | def __repr__(self) -> str:
class VirtualListImages (line 418) | class VirtualListImages(Sequence[ImageFile]):
method __init__ (line 425) | def __init__(
method __len__ (line 434) | def __len__(self) -> int:
method keys (line 437) | def keys(self) -> list[Union[str, list[str]]]:
method items (line 440) | def items(self) -> list[tuple[Union[str, list[str]], ImageFile]]:
method __getitem__ (line 444) | def __getitem__(self, index: Union[int, str, list[str]]) -> ImageFile:
method __getitem__ (line 448) | def __getitem__(self, index: slice) -> Sequence[ImageFile]:
method __getitem__ (line 451) | def __getitem__(
method __iter__ (line 472) | def __iter__(self) -> Iterator[ImageFile]:
method __str__ (line 476) | def __str__(self) -> str:
class PageObject (line 481) | class PageObject(DictionaryObject):
method __init__ (line 500) | def __init__(
method hash_bin (line 514) | def hash_bin(self) -> int:
method hash_value_data (line 529) | def hash_value_data(self) -> bytes:
method user_unit (line 535) | def user_unit(self) -> float:
method create_blank_page (line 546) | def create_blank_page(
method _get_ids_image (line 591) | def _get_ids_image(
method _get_image (line 629) | def _get_image(
method images (line 674) | def images(self) -> VirtualListImages:
method _translate_value_inline_image (line 713) | def _translate_value_inline_image(self, k: str, v: PdfObject) -> PdfOb...
method _get_inline_images (line 728) | def _get_inline_images(self) -> dict[str, ImageFile]:
method rotation (line 775) | def rotation(self) -> int:
method rotation (line 786) | def rotation(self, r: float) -> None:
method transfer_rotation_to_content (line 789) | def transfer_rotation_to_content(self) -> None:
method rotate (line 824) | def rotate(self, angle: int) -> "PageObject":
method _merge_resources (line 840) | def _merge_resources(
method _content_stream_rename (line 922) | def _content_stream_rename(
method _add_transformation_matrix (line 944) | def _add_transformation_matrix(
method _get_contents_as_bytes (line 960) | def _get_contents_as_bytes(self) -> Optional[bytes]:
method get_contents (line 975) | def get_contents(self) -> Optional[ContentStream]:
method replace_contents (line 996) | def replace_contents(
method merge_page (line 1059) | def merge_page(
method _merge_page (line 1081) | def _merge_page(
method _merge_page_writer (line 1188) | def _merge_page_writer(
method _expand_mediabox (line 1326) | def _expand_mediabox(
method merge_transformed_page (line 1369) | def merge_transformed_page(
method merge_scaled_page (line 1401) | def merge_scaled_page(
method merge_rotated_page (line 1419) | def merge_rotated_page(
method merge_translated_page (line 1441) | def merge_translated_page(
method add_transformation (line 1465) | def add_transformation(
method scale (line 1515) | def scale(self, sx: float, sy: float) -> None:
method scale_by (line 1569) | def scale_by(self, factor: float) -> None:
method scale_to (line 1580) | def scale_to(self, width: float, height: float) -> None:
method compress_content_streams (line 1594) | def compress_content_streams(self, level: int = -1) -> None:
method page_number (line 1618) | def page_number(self) -> Optional[int]:
method _debug_for_extract (line 1634) | def _debug_for_extract(self) -> str: # pragma: no cover
method _extract_text (line 1672) | def _extract_text(
method _layout_mode_fonts (line 1835) | def _layout_mode_fonts(self) -> dict[str, Font]:
method _layout_mode_text (line 1861) | def _layout_mode_text(
method extract_text (line 1920) | def extract_text(
method extract_xform_text (line 2056) | def extract_xform_text(
method _get_fonts (line 2091) | def _get_fonts(self) -> tuple[set[str], set[str]]:
method annotations (line 2140) | def annotations(self) -> Optional[ArrayObject]:
method annotations (line 2146) | def annotations(self, value: Optional[ArrayObject]) -> None:
class _VirtualList (line 2162) | class _VirtualList(Sequence[PageObject]):
method __init__ (line 2163) | def __init__(
method __len__ (line 2172) | def __len__(self) -> int:
method __getitem__ (line 2176) | def __getitem__(self, index: int) -> PageObject:
method __getitem__ (line 2180) | def __getitem__(self, index: slice) -> Sequence[PageObject]:
method __getitem__ (line 2183) | def __getitem__(
method __delitem__ (line 2200) | def __delitem__(self, index: Union[int, slice]) -> None:
method __iter__ (line 2247) | def __iter__(self) -> Iterator[PageObject]:
method __str__ (line 2251) | def __str__(self) -> str:
function _get_fonts_walk (line 2256) | def _get_fonts_walk(
FILE: pypdf/_page_labels.py
function number2uppercase_roman_numeral (line 75) | def number2uppercase_roman_numeral(num: int) -> str:
function number2lowercase_roman_numeral (line 103) | def number2lowercase_roman_numeral(number: int) -> str:
function number2uppercase_letter (line 107) | def number2uppercase_letter(number: int) -> str:
function number2lowercase_letter (line 123) | def number2lowercase_letter(number: int) -> str:
function get_label_from_nums (line 127) | def get_label_from_nums(dictionary_object: DictionaryObject, index: int)...
function index2label (line 164) | def index2label(reader: PdfCommonDocProtocol, index: int) -> str:
function nums_insert (line 213) | def nums_insert(
function nums_clear_range (line 243) | def nums_clear_range(
function nums_next (line 270) | def nums_next(
FILE: pypdf/_protocols.py
class PdfObjectProtocol (line 10) | class PdfObjectProtocol(Protocol):
method clone (line 13) | def clone(
method _reference_clone (line 21) | def _reference_clone(self, clone: Any, pdf_dest: Any) -> Any:
method get_object (line 24) | def get_object(self) -> Optional["PdfObjectProtocol"]:
method hash_value (line 27) | def hash_value(self) -> bytes:
method write_to_stream (line 30) | def write_to_stream(
class XmpInformationProtocol (line 36) | class XmpInformationProtocol(PdfObjectProtocol):
class PdfCommonDocProtocol (line 40) | class PdfCommonDocProtocol(Protocol):
method pdf_header (line 42) | def pdf_header(self) -> str:
method pages (line 46) | def pages(self) -> list[Any]:
method root_object (line 50) | def root_object(self) -> PdfObjectProtocol:
method get_object (line 53) | def get_object(self, indirect_reference: Any) -> Optional[PdfObjectPro...
method strict (line 57) | def strict(self) -> bool:
class PdfReaderProtocol (line 61) | class PdfReaderProtocol(PdfCommonDocProtocol, Protocol):
method xref (line 64) | def xref(self) -> dict[int, dict[int, Any]]:
method trailer (line 69) | def trailer(self) -> dict[str, Any]:
class PdfWriterProtocol (line 73) | class PdfWriterProtocol(PdfCommonDocProtocol, Protocol):
method write (line 81) | def write(self, stream: Union[Path, StrByteType]) -> tuple[bool, IO[An...
method _add_object (line 85) | def _add_object(self, obj: Any) -> Any:
FILE: pypdf/_reader.py
class PdfReader (line 95) | class PdfReader(PdfDocCommon):
method __init__ (line 118) | def __init__(
method _initialize_stream (line 159) | def _initialize_stream(self, stream: Union[StrByteType, Path]) -> None:
method _handle_encryption (line 174) | def _handle_encryption(self, password: Optional[Union[str, bytes]]) ->...
method __enter__ (line 194) | def __enter__(self) -> Self:
method __exit__ (line 197) | def __exit__(
method close (line 205) | def close(self) -> None:
method root_object (line 217) | def root_object(self) -> DictionaryObject:
method _info (line 261) | def _info(self) -> Optional[DictionaryObject]:
method _ID (line 281) | def _ID(self) -> Optional[ArrayObject]:
method pdf_header (line 296) | def pdf_header(self) -> str:
method xmp_metadata (line 312) | def xmp_metadata(self) -> Optional[XmpInformation]:
method _get_page_number_by_indirect (line 320) | def _get_page_number_by_indirect(
method _get_object_from_stream (line 348) | def _get_object_from_stream(
method get_object (line 427) | def get_object(
method read_object_header (line 577) | def read_object_header(self, stream: StreamType) -> tuple[int, int]:
method cache_get_indirect_object (line 604) | def cache_get_indirect_object(
method cache_indirect_object (line 612) | def cache_indirect_object(
method _replace_object (line 625) | def _replace_object(self, indirect_reference: IndirectObject, obj: Pdf...
method read (line 635) | def read(self, stream: StreamType) -> None:
method _basic_validation (line 701) | def _basic_validation(self, stream: StreamType) -> None:
method _find_eof_marker (line 719) | def _find_eof_marker(self, stream: StreamType) -> None:
method _find_startxref_pos (line 752) | def _find_startxref_pos(self, stream: StreamType) -> int:
method _read_standard_xref_table (line 778) | def _read_standard_xref_table(self, stream: StreamType) -> None:
method _read_xref_tables_and_trailers (line 898) | def _read_xref_tables_and_trailers(
method _process_xref_stream (line 947) | def _process_xref_stream(self, xrefstream: DictionaryObject) -> None:
method _read_xref (line 959) | def _read_xref(self, stream: StreamType) -> Optional[int]:
method _read_xref_other_error (line 985) | def _read_xref_other_error(
method _read_pdf15_xref_stream (line 1026) | def _read_pdf15_xref_stream(
method _get_xref_issues (line 1073) | def _get_xref_issues(stream: StreamType, startxref: int) -> int:
method _find_pdf_objects (line 1108) | def _find_pdf_objects(cls, data: bytes) -> Iterable[tuple[int, int, in...
method _find_pdf_trailers (line 1149) | def _find_pdf_trailers(cls, data: bytes) -> Iterable[int]:
method _rebuild_xref_table (line 1169) | def _rebuild_xref_table(self, stream: StreamType) -> None:
method _read_xref_subsections (line 1221) | def _read_xref_subsections(
method _pairs (line 1256) | def _pairs(self, array: list[int]) -> Iterable[tuple[int, int]]:
method decrypt (line 1263) | def decrypt(self, password: Union[str, bytes]) -> PasswordType:
method is_encrypted (line 1289) | def is_encrypted(self) -> bool:
method add_form_topname (line 1298) | def add_form_topname(self, name: str) -> Optional[DictionaryObject]:
method rename_form_topname (line 1341) | def rename_form_topname(self, name: str) -> Optional[DictionaryObject]:
method _repr_mimebundle_ (line 1369) | def _repr_mimebundle_(
FILE: pypdf/_text_extraction/__init__.py
class OrientationNotFoundError (line 19) | class OrientationNotFoundError(Exception):
function set_custom_rtl (line 23) | def set_custom_rtl(
function mult (line 69) | def mult(m: list[float], n: list[float]) -> list[float]:
function orient (line 80) | def orient(m: list[float]) -> int:
function crlf_space_check (line 90) | def crlf_space_check(
function get_text_operands (line 155) | def get_text_operands(
function get_display_str (line 195) | def get_display_str(
FILE: pypdf/_text_extraction/_layout_mode/_fixed_width_page.py
class BTGroup (line 16) | class BTGroup(TypedDict):
function bt_group (line 41) | def bt_group(tj_op: TextStateParams, rendered_text: str, dispaced_tx: fl...
function recurs_to_target_op (line 63) | def recurs_to_target_op(
function y_coordinate_groups (line 208) | def y_coordinate_groups(
function text_show_operations (line 256) | def text_show_operations(
function fixed_char_width (line 328) | def fixed_char_width(bt_groups: list[BTGroup], scale_weight: float = 1.2...
function fixed_width_page (line 348) | def fixed_width_page(
FILE: pypdf/_text_extraction/_layout_mode/_text_state_manager.py
class TextStateManager (line 18) | class TextStateManager:
method __init__ (line 36) | def __init__(self) -> None:
method set_state_param (line 51) | def set_state_param(self, op: bytes, value: Union[float, list[Any]]) -...
method set_font (line 66) | def set_font(self, font: Font, size: float) -> None:
method text_state_params (line 78) | def text_state_params(self, value: Union[bytes, str] = "") -> TextStat...
method raw_transform (line 128) | def raw_transform(
method new_transform (line 140) | def new_transform(
method reset_tm (line 155) | def reset_tm(self) -> TextStateManagerChainMapType:
method reset_trm (line 164) | def reset_trm(self) -> TextStateManagerChainMapType:
method remove_q (line 170) | def remove_q(self) -> TextStateManagerChainMapType:
method add_q (line 179) | def add_q(self) -> None:
method add_cm (line 184) | def add_cm(self, *args: Any) -> TextStateManagerChainMapType:
method _complete_matrix (line 191) | def _complete_matrix(self, operands: list[float]) -> list[float]:
method add_tm (line 197) | def add_tm(self, operands: list[float]) -> TextStateManagerChainMapType:
method add_trm (line 206) | def add_trm(self, operands: list[float]) -> TextStateManagerChainMapType:
method effective_transform (line 216) | def effective_transform(self) -> list[float]:
FILE: pypdf/_text_extraction/_layout_mode/_text_state_params.py
class TextStateParams (line 12) | class TextStateParams:
method __post_init__ (line 56) | def __post_init__(self) -> None:
method font_size_matrix (line 84) | def font_size_matrix(self) -> list[float]:
method displaced_transform (line 95) | def displaced_transform(self) -> list[float]:
method render_transform (line 99) | def render_transform(self) -> list[float]:
method displacement_matrix (line 103) | def displacement_matrix(
method word_tx (line 118) | def word_tx(self, word: str, td_offset: float = 0.0) -> float:
method to_dict (line 133) | def to_dict(inst: "TextStateParams") -> dict[str, Any]:
FILE: pypdf/_text_extraction/_text_extractor.py
class TextExtraction (line 38) | class TextExtraction:
method __init__ (line 47) | def __init__(self) -> None:
method initialize_extraction (line 115) | def initialize_extraction(
method compute_str_widths (line 133) | def compute_str_widths(self, str_widths: float) -> float:
method process_operation (line 136) | def process_operation(self, operator: bytes, operands: list[Any]) -> N...
method _post_process_text_operation (line 145) | def _post_process_text_operation(self, str_widths: float) -> None:
method _handle_tj (line 168) | def _handle_tj(
method _flush_text (line 204) | def _flush_text(self) -> None:
method _handle_bt (line 215) | def _handle_bt(self, operands: list[Any]) -> None:
method _handle_et (line 220) | def _handle_et(self, operands: list[Any]) -> None:
method _handle_save_graphics_state (line 224) | def _handle_save_graphics_state(self, operands: list[Any]) -> None:
method _handle_restore_graphics_state (line 238) | def _handle_restore_graphics_state(self, operands: list[Any]) -> None:
method _handle_cm (line 253) | def _handle_cm(self, operands: list[Any]) -> None:
method _handle_tz (line 266) | def _handle_tz(self, operands: list[Any]) -> None:
method _handle_tw (line 270) | def _handle_tw(self, operands: list[Any]) -> None:
method _handle_tl (line 274) | def _handle_tl(self, operands: list[Any]) -> None:
method _handle_tf (line 279) | def _handle_tf(self, operands: list[Any]) -> None:
method _handle_td (line 308) | def _handle_td(self, operands: list[Any]) -> float:
method _handle_tm (line 320) | def _handle_tm(self, operands: list[Any]) -> float:
method _handle_t_star (line 327) | def _handle_t_star(self, operands: list[Any]) -> float:
method _handle_tj_operation (line 335) | def _handle_tj_operation(self, operands: list[Any]) -> float:
FILE: pypdf/_utils.py
function parse_iso8824_date (line 78) | def parse_iso8824_date(text: Optional[str]) -> Optional[datetime]:
function format_iso8824_date (line 110) | def format_iso8824_date(dt: datetime) -> str:
function _get_max_pdf_version_header (line 135) | def _get_max_pdf_version_header(header1: str, header2: str) -> str:
function read_until_whitespace (line 159) | def read_until_whitespace(stream: StreamType, maxchars: Optional[int] = ...
function read_non_whitespace (line 184) | def read_non_whitespace(stream: StreamType) -> bytes:
function skip_over_whitespace (line 201) | def skip_over_whitespace(stream: StreamType) -> bool:
function check_if_whitespace_only (line 221) | def check_if_whitespace_only(value: bytes) -> bool:
function skip_over_comment (line 235) | def skip_over_comment(stream: StreamType) -> None:
function read_until_regex (line 245) | def read_until_regex(stream: StreamType, regex: Pattern[bytes]) -> bytes:
function read_block_backwards (line 285) | def read_block_backwards(stream: StreamType, to_read: int) -> bytes:
function read_previous_line (line 310) | def read_previous_line(stream: StreamType) -> bytes:
function matrix_multiply (line 368) | def matrix_multiply(
function mark_location (line 377) | def mark_location(stream: StreamType) -> None:
function ord_ (line 390) | def ord_(b: str) -> int:
function ord_ (line 395) | def ord_(b: bytes) -> bytes:
function ord_ (line 400) | def ord_(b: int) -> int:
function ord_ (line 404) | def ord_(b: Union[int, str, bytes]) -> Union[int, bytes]:
function deprecate (line 410) | def deprecate(msg: str, stacklevel: int = 3) -> None:
function deprecation (line 414) | def deprecation(msg: str) -> None:
function deprecate_with_replacement (line 418) | def deprecate_with_replacement(old_name: str, new_name: str, removed_in:...
function deprecation_with_replacement (line 426) | def deprecation_with_replacement(old_name: str, new_name: str, removed_i...
function deprecate_no_replacement (line 433) | def deprecate_no_replacement(name: str, removed_in: str) -> None:
function deprecation_no_replacement (line 438) | def deprecation_no_replacement(name: str, removed_in: str) -> None:
function logger_error (line 443) | def logger_error(message: str, *, source: str, **values: Any) -> None:
function logger_warning (line 455) | def logger_warning(msg: str, src: str) -> None:
function rename_kwargs (line 474) | def rename_kwargs(
function _human_readable_bytes (line 509) | def _human_readable_bytes(bytes: int) -> str:
class classproperty (line 554) | class classproperty: # noqa: N801
method __init__ (line 560) | def __init__(self, method=None) -> None: # type: ignore # noqa: ANN001
method __get__ (line 563) | def __get__(self, instance, cls=None) -> Any: # type: ignore # noqa:...
method getter (line 566) | def getter(self, method) -> Self: # type: ignore # noqa: ANN001
class File (line 572) | class File:
method __str__ (line 588) | def __str__(self) -> str:
method __repr__ (line 591) | def __repr__(self) -> str:
class Version (line 596) | class Version:
method __init__ (line 599) | def __init__(self, version_str: str) -> None:
method _parse_version (line 603) | def _parse_version(self, version_str: str) -> list[tuple[int, str]]:
method __eq__ (line 618) | def __eq__(self, other: object) -> bool:
method __hash__ (line 623) | def __hash__(self) -> int:
method __lt__ (line 627) | def __lt__(self, other: Any) -> bool:
FILE: pypdf/_writer.py
class ObjectDeletionFlag (line 129) | class ObjectDeletionFlag(enum.IntFlag):
function _rolling_checksum (line 142) | def _rolling_checksum(stream: BytesIO, blocksize: int = 65536) -> str:
class PdfWriter (line 149) | class PdfWriter(PdfDocCommon):
method __init__ (line 173) | def __init__(
method is_encrypted (line 313) | def is_encrypted(self) -> bool:
method root_object (line 323) | def root_object(self) -> DictionaryObject:
method _info (line 334) | def _info(self) -> Optional[DictionaryObject]:
method _info (line 349) | def _info(self, value: Optional[Union[IndirectObject, DictionaryObject...
method xmp_metadata (line 364) | def xmp_metadata(self) -> Optional[XmpInformation]:
method xmp_metadata (line 369) | def xmp_metadata(self, value: Union[XmpInformation, bytes, None]) -> N...
method with_as_usage (line 393) | def with_as_usage(self) -> bool:
method with_as_usage (line 398) | def with_as_usage(self, value: bool) -> None:
method __enter__ (line 402) | def __enter__(self) -> Self:
method __exit__ (line 412) | def __exit__(
method pdf_header (line 423) | def pdf_header(self) -> str:
method pdf_header (line 436) | def pdf_header(self, new_header: Union[str, bytes]) -> None:
method _add_object (line 441) | def _add_object(self, obj: PdfObject) -> IndirectObject:
method get_object (line 456) | def get_object(
method _replace_object (line 469) | def _replace_object(
method _add_page (line 490) | def _add_page(
method set_need_appearances_writer (line 548) | def set_need_appearances_writer(self, state: bool = True) -> None:
method create_viewer_preferences (line 581) | def create_viewer_preferences(self) -> ViewerPreferences:
method add_page (line 588) | def add_page(
method insert_page (line 613) | def insert_page(
method _get_page_number_by_indirect (line 641) | def _get_page_number_by_indirect(
method add_blank_page (line 665) | def add_blank_page(
method insert_blank_page (line 690) | def insert_blank_page(
method open_destination (line 731) | def open_destination(
method open_destination (line 737) | def open_destination(self, dest: Union[None, str, Destination, PageObj...
method add_js (line 756) | def add_js(self, javascript: str) -> None:
method add_attachment (line 795) | def add_attachment(self, filename: str, data: Union[str, bytes]) -> "E...
method append_pages_from_reader (line 813) | def append_pages_from_reader(
method _merge_content_stream_to_page (line 845) | def _merge_content_stream_to_page(
method _add_apstream_object (line 884) | def _add_apstream_object(
method update_page_form_field_values (line 939) | def update_page_form_field_values(
method reattach_fields (line 1080) | def reattach_fields(
method clone_reader_document_root (line 1130) | def clone_reader_document_root(self, reader: PdfReader) -> None:
method clone_document_from_reader (line 1183) | def clone_document_from_reader(
method _compute_document_identifier (line 1231) | def _compute_document_identifier(self) -> ByteStringObject:
method generate_file_identifiers (line 1237) | def generate_file_identifiers(self) -> None:
method encrypt (line 1257) | def encrypt(
method _resolve_links (line 1316) | def _resolve_links(self) -> None:
method write_stream (line 1329) | def write_stream(self, stream: StreamType) -> None:
method write (line 1350) | def write(self, stream: Union[Path, StrByteType]) -> tuple[bool, IO[An...
method list_objects_in_increment (line 1382) | def list_objects_in_increment(self) -> list[IndirectObject]:
method _write_increment (line 1406) | def _write_increment(self, stream: StreamType) -> None:
method _write_pdf_structure (line 1477) | def _write_pdf_structure(self, stream: StreamType) -> tuple[list[int],...
method _write_xref_table (line 1497) | def _write_xref_table(
method _write_trailer (line 1513) | def _write_trailer(self, stream: StreamType, xref_location: int) -> None:
method metadata (line 1538) | def metadata(self) -> Optional[DocumentInformation]:
method metadata (line 1553) | def metadata(
method add_metadata (line 1565) | def add_metadata(self, infos: dict[str, Any]) -> None:
method compress_identical_objects (line 1585) | def compress_identical_objects(
method get_reference (line 1661) | def get_reference(self, obj: PdfObject) -> IndirectObject:
method get_outline_root (line 1667) | def get_outline_root(self) -> TreeObject:
method get_threads_root (line 1686) | def get_threads_root(self) -> ArrayObject:
method threads (line 1706) | def threads(self) -> ArrayObject:
method add_outline_item_destination (line 1717) | def add_outline_item_destination(
method add_outline_item_dict (line 1755) | def add_outline_item_dict(
method add_outline_item (line 1778) | def add_outline_item(
method add_outline (line 1856) | def add_outline(self) -> None:
method add_named_destination_array (line 1861) | def add_named_destination_array(
method add_named_destination_object (line 1875) | def add_named_destination_object(
method add_named_destination (line 1886) | def add_named_destination(
method remove_links (line 1909) | def remove_links(self) -> None:
method remove_annotations (line 1914) | def remove_annotations(
method _remove_annots_from_page (line 1930) | def _remove_annots_from_page(
method remove_objects_from_page (line 1948) | def remove_objects_from_page(
method _remove_objects_from_page__clean (line 2013) | def _remove_objects_from_page__clean(
method _remove_objects_from_page__clean_forms (line 2056) | def _remove_objects_from_page__clean_forms(
method remove_images (line 2139) | def remove_images(
method remove_text (line 2163) | def remove_text(self, font_names: Optional[list[str]] = None) -> None:
method add_uri (line 2220) | def add_uri(
method _get_page_layout (line 2297) | def _get_page_layout(self) -> Optional[LayoutType]:
method _set_page_layout (line 2303) | def _set_page_layout(self, layout: Union[NameObject, LayoutType]) -> N...
method set_page_layout (line 2338) | def set_page_layout(self, layout: LayoutType) -> None:
method page_layout (line 2367) | def page_layout(self) -> Optional[LayoutType]:
method page_layout (line 2392) | def page_layout(self, layout: LayoutType) -> None:
method _get_page_mode (line 2404) | def _get_page_mode(self) -> Optional[PagemodeType]:
method page_mode (line 2411) | def page_mode(self) -> Optional[PagemodeType]:
method page_mode (line 2434) | def page_mode(self, mode: PagemodeType) -> None:
method add_annotation (line 2445) | def add_annotation(
method clean_page (line 2499) | def clean_page(self, page: Union[PageObject, IndirectObject]) -> PageO...
method _create_stream (line 2526) | def _create_stream(
method append (line 2562) | def append(
method merge (line 2630) | def merge(
method _merge__process_named_dests (line 2786) | def _merge__process_named_dests(self, dest: Any, reader: PdfDocCommon,...
method _add_articles_thread (line 2820) | def _add_articles_thread(
method add_filtered_articles (line 2885) | def add_filtered_articles(
method _get_cloned_page (line 2921) | def _get_cloned_page(
method _insert_filtered_annotations (line 2938) | def _insert_filtered_annotations(
method _get_filtered_outline (line 2995) | def _get_filtered_outline(
method _clone_outline (line 3047) | def _clone_outline(self, dest: Destination) -> TreeObject:
method _insert_filtered_outline (line 3066) | def _insert_filtered_outline(
method close (line 3082) | def close(self) -> None:
method find_outline_item (line 3086) | def find_outline_item(
method reset_translation (line 3116) | def reset_translation(
method set_page_label (line 3144) | def set_page_label(
method _set_page_label (line 3195) | def _set_page_label(
method _repr_mimebundle_ (line 3260) | def _repr_mimebundle_(
function _pdf_objectify (line 3292) | def _pdf_objectify(obj: Union[dict[str, Any], str, float, list[Any]]) ->...
function _create_outline_item (line 3313) | def _create_outline_item(
FILE: pypdf/annotations/_base.py
class AnnotationDictionary (line 8) | class AnnotationDictionary(DictionaryObject, ABC):
method __init__ (line 9) | def __init__(self) -> None:
method flags (line 21) | def flags(self) -> AnnotationFlag:
method flags (line 25) | def flags(self, value: AnnotationFlag) -> None:
FILE: pypdf/annotations/_markup_annotations.py
function _get_bounding_rectangle (line 31) | def _get_bounding_rectangle(vertices: list[Vertex]) -> RectangleObject:
class MarkupAnnotation (line 42) | class MarkupAnnotation(AnnotationDictionary, ABC):
method __init__ (line 65) | def __init__(
class Text (line 101) | class Text(MarkupAnnotation):
method __init__ (line 114) | def __init__(
class FreeText (line 131) | class FreeText(MarkupAnnotation):
method __init__ (line 134) | def __init__(
class Line (line 195) | class Line(MarkupAnnotation):
method __init__ (line 196) | def __init__(
class PolyLine (line 235) | class PolyLine(MarkupAnnotation):
method __init__ (line 236) | def __init__(
class Rectangle (line 257) | class Rectangle(MarkupAnnotation):
method __init__ (line 258) | def __init__(
class Highlight (line 280) | class Highlight(MarkupAnnotation):
method __init__ (line 281) | def __init__(
class Ellipse (line 305) | class Ellipse(MarkupAnnotation):
method __init__ (line 306) | def __init__(
class Polygon (line 329) | class Polygon(MarkupAnnotation):
method __init__ (line 330) | def __init__(
FILE: pypdf/annotations/_non_markup_annotations.py
class Link (line 15) | class Link(AnnotationDictionary):
method __init__ (line 16) | def __init__(
class Popup (line 79) | class Popup(AnnotationDictionary):
method __init__ (line 80) | def __init__(
FILE: pypdf/constants.py
class StrEnum (line 6) | class StrEnum(str, Enum): # Once we are on Python 3.11+: enum.StrEnum
method __str__ (line 7) | def __str__(self) -> str:
class Core (line 11) | class Core:
class TrailerKeys (line 21) | class TrailerKeys:
class CatalogAttributes (line 30) | class CatalogAttributes:
class EncryptionDictAttributes (line 35) | class EncryptionDictAttributes:
class UserAccessPermissions (line 50) | class UserAccessPermissions(IntFlag):
method _is_reserved (line 90) | def _is_reserved(cls, name: str) -> bool:
method _is_active (line 95) | def _is_active(cls, name: str) -> bool:
method to_dict (line 99) | def to_dict(self) -> dict[str, bool]:
method from_dict (line 109) | def from_dict(cls, value: dict[str, bool]) -> "UserAccessPermissions":
method all (line 127) | def all(cls) -> "UserAccessPermissions":
class Resources (line 131) | class Resources:
class PagesAttributes (line 147) | class PagesAttributes:
class PageAttributes (line 158) | class PageAttributes:
class FileSpecificationDictionaryEntries (line 198) | class FileSpecificationDictionaryEntries:
class StreamAttributes (line 216) | class StreamAttributes:
class FilterTypes (line 228) | class FilterTypes(StrEnum):
class FilterTypeAbbreviations (line 242) | class FilterTypeAbbreviations:
class LzwFilterParameters (line 254) | class LzwFilterParameters:
class CcittFaxDecodeParameters (line 267) | class CcittFaxDecodeParameters:
class ImageAttributes (line 283) | class ImageAttributes:
class ColorSpaces (line 301) | class ColorSpaces:
class TypArguments (line 307) | class TypArguments:
class TypFitArguments (line 316) | class TypFitArguments:
class GoToActionArguments (line 329) | class GoToActionArguments:
class AnnotationDictionaryAttributes (line 335) | class AnnotationDictionaryAttributes:
class InteractiveFormDictEntries (line 355) | class InteractiveFormDictEntries:
class FieldDictionaryAttributes (line 366) | class FieldDictionaryAttributes:
class FfBits (line 386) | class FfBits(IntFlag):
method attributes (line 445) | def attributes(cls) -> tuple[str, ...]:
method attributes_dict (line 472) | def attributes_dict(cls) -> dict[str, str]:
class CheckboxRadioButtonAttributes (line 498) | class CheckboxRadioButtonAttributes:
method attributes (line 504) | def attributes(cls) -> tuple[str, ...]:
method attributes_dict (line 520) | def attributes_dict(cls) -> dict[str, str]:
class FieldFlag (line 539) | class FieldFlag(IntFlag):
class DocumentInformationAttributes (line 547) | class DocumentInformationAttributes:
class PageLayouts (line 561) | class PageLayouts:
class GraphicsStateParameters (line 575) | class GraphicsStateParameters:
class CatalogDictionary (line 607) | class CatalogDictionary:
class OutlineFontFlag (line 644) | class OutlineFontFlag(IntFlag):
class PageLabelStyle (line 651) | class PageLabelStyle:
class AnnotationFlag (line 664) | class AnnotationFlag(IntFlag):
class ImageType (line 709) | class ImageType(IntFlag):
class AFRelationship (line 767) | class AFRelationship:
class BorderStyles (line 785) | class BorderStyles:
class FontFlags (line 799) | class FontFlags(IntFlag):
FILE: pypdf/errors.py
class DeprecationError (line 8) | class DeprecationError(Exception):
class DependencyError (line 12) | class DependencyError(Exception):
class PyPdfError (line 19) | class PyPdfError(Exception):
class PdfReadError (line 23) | class PdfReadError(PyPdfError):
class PageSizeNotDefinedError (line 27) | class PageSizeNotDefinedError(PyPdfError):
class PdfReadWarning (line 31) | class PdfReadWarning(UserWarning):
class PdfStreamError (line 35) | class PdfStreamError(PdfReadError):
class ParseError (line 39) | class ParseError(PyPdfError):
class FileNotDecryptedError (line 46) | class FileNotDecryptedError(PdfReadError):
class WrongPasswordError (line 54) | class WrongPasswordError(FileNotDecryptedError):
class EmptyFileError (line 58) | class EmptyFileError(PdfReadError):
class EmptyImageDataError (line 62) | class EmptyImageDataError(PyPdfError):
class LimitReachedError (line 69) | class LimitReachedError(PyPdfError):
class XmpDocumentError (line 73) | class XmpDocumentError(PyPdfError, RuntimeError):
FILE: pypdf/filters.py
function _decompress_with_limit (line 88) | def _decompress_with_limit(data: bytes) -> bytes:
function decompress (line 98) | def decompress(data: bytes) -> bytes:
class FlateDecode (line 169) | class FlateDecode:
method decode (line 171) | def decode(
method _get_parameters (line 227) | def _get_parameters(parameters: DictionaryObject) -> tuple[int, int, i...
method _decode_png_prediction (line 241) | def _decode_png_prediction(data: bytes, columns: int, rowlength: int) ...
method encode (line 305) | def encode(data: bytes, level: int = -1) -> bytes:
class ASCIIHexDecode (line 320) | class ASCIIHexDecode:
method decode (line 327) | def decode(
class RunLengthDecode (line 372) | class RunLengthDecode:
method decode (line 387) | def decode(
class LZWDecode (line 449) | class LZWDecode:
class Decoder (line 450) | class Decoder:
method __init__ (line 454) | def __init__(self, data: bytes) -> None:
method decode (line 457) | def decode(self) -> bytes:
method decode (line 461) | def decode(
class ASCII85Decode (line 481) | class ASCII85Decode:
method decode (line 485) | def decode(
class DCTDecode (line 515) | class DCTDecode:
method decode (line 517) | def decode(
class JPXDecode (line 538) | class JPXDecode:
method decode (line 540) | def decode(
class CCITTParameters (line 561) | class CCITTParameters:
method group (line 574) | def group(self) -> int:
function __create_old_class_instance (line 585) | def __create_old_class_instance(
class CCITTFaxDecode (line 598) | class CCITTFaxDecode:
method _get_parameters (line 609) | def _get_parameters(
method decode (line 636) | def decode(
class JBIG2Decode (line 695) | class JBIG2Decode:
method decode (line 697) | def decode(
method _is_binary_compatible (line 750) | def _is_binary_compatible() -> bool:
function _deprecate_inline_image_filters (line 764) | def _deprecate_inline_image_filters(filter_name: str, old_name: str, new...
function decode_stream_data (line 773) | def decode_stream_data(stream: StreamObject) -> bytes:
FILE: pypdf/generic/_appearance_stream.py
class BaseStreamConfig (line 24) | class BaseStreamConfig:
class BaseStreamAppearance (line 31) | class BaseStreamAppearance(DecodedStreamObject):
method __init__ (line 34) | def __init__(self, layout: Optional[BaseStreamConfig] = None) -> None:
class TextAlignment (line 48) | class TextAlignment(IntEnum):
class TextStreamAppearance (line 56) | class TextStreamAppearance(BaseStreamAppearance):
method _scale_text (line 65) | def _scale_text(
method _generate_appearance_stream_data (line 143) | def _generate_appearance_stream_data(
method __init__ (line 305) | def __init__(
method _find_annotation_font_resource (line 420) | def _find_annotation_font_resource(
method from_text_annotation (line 458) | def from_text_annotation(
FILE: pypdf/generic/_base.py
class PdfObject (line 64) | class PdfObject(PdfObjectProtocol):
method hash_bin (line 69) | def hash_bin(self) -> int:
method hash_value_data (line 81) | def hash_value_data(self) -> bytes:
method hash_value (line 84) | def hash_value(self) -> bytes:
method replicate (line 90) | def replicate(
method clone (line 107) | def clone(
method _reference_clone (line 138) | def _reference_clone(
method get_object (line 196) | def get_object(self) -> Optional["PdfObject"]:
method write_to_stream (line 200) | def write_to_stream(
class NullObject (line 206) | class NullObject(PdfObject):
method clone (line 207) | def clone(
method hash_bin (line 218) | def hash_bin(self) -> int:
method write_to_stream (line 228) | def write_to_stream(
method read_from_stream (line 238) | def read_from_stream(stream: StreamType) -> "NullObject":
method __repr__ (line 244) | def __repr__(self) -> str:
method __eq__ (line 247) | def __eq__(self, other: object) -> bool:
method __hash__ (line 250) | def __hash__(self) -> int:
class BooleanObject (line 254) | class BooleanObject(PdfObject):
method __init__ (line 255) | def __init__(self, value: Any) -> None:
method clone (line 258) | def clone(
method hash_bin (line 270) | def hash_bin(self) -> int:
method __eq__ (line 280) | def __eq__(self, o: object, /) -> bool:
method __hash__ (line 287) | def __hash__(self) -> int:
method __repr__ (line 290) | def __repr__(self) -> str:
method write_to_stream (line 293) | def write_to_stream(
method read_from_stream (line 306) | def read_from_stream(stream: StreamType) -> "BooleanObject":
class IndirectObject (line 316) | class IndirectObject(PdfObject):
method __init__ (line 317) | def __init__(self, idnum: int, generation: int, pdf: Any) -> None: # ...
method __hash__ (line 322) | def __hash__(self) -> int:
method hash_bin (line 325) | def hash_bin(self) -> int:
method replicate (line 335) | def replicate(
method clone (line 341) | def clone(
method indirect_reference (line 378) | def indirect_reference(self) -> "IndirectObject": # type: ignore[over...
method get_object (line 381) | def get_object(self) -> Optional["PdfObject"]:
method __deepcopy__ (line 384) | def __deepcopy__(self, memo: Any) -> "IndirectObject":
method _get_object_with_check (line 387) | def _get_object_with_check(self) -> Optional["PdfObject"]:
method __getattr__ (line 396) | def __getattr__(self, name: str) -> Any:
method __getitem__ (line 405) | def __getitem__(self, key: Any) -> Any:
method __contains__ (line 409) | def __contains__(self, key: Any) -> bool:
method __iter__ (line 412) | def __iter__(self) -> Any:
method __float__ (line 415) | def __float__(self) -> str:
method __int__ (line 419) | def __int__(self) -> int:
method __str__ (line 423) | def __str__(self) -> str:
method __repr__ (line 427) | def __repr__(self) -> str:
method __eq__ (line 430) | def __eq__(self, other: object) -> bool:
method __ne__ (line 439) | def __ne__(self, other: object) -> bool:
method write_to_stream (line 442) | def write_to_stream(
method read_from_stream (line 452) | def read_from_stream(stream: StreamType, pdf: Any) -> "IndirectObject"...
class FloatObject (line 482) | class FloatObject(float, PdfObject):
method __new__ (line 483) | def __new__(
method clone (line 497) | def clone(
method hash_bin (line 509) | def hash_bin(self) -> int:
method myrepr (line 519) | def myrepr(self) -> str:
method __repr__ (line 525) | def __repr__(self) -> str:
method as_numeric (line 528) | def as_numeric(self) -> float:
method write_to_stream (line 531) | def write_to_stream(
class NumberObject (line 541) | class NumberObject(int, PdfObject):
method __new__ (line 544) | def __new__(cls, value: Any) -> Self:
method clone (line 551) | def clone(
method hash_bin (line 563) | def hash_bin(self) -> int:
method as_numeric (line 573) | def as_numeric(self) -> int:
method write_to_stream (line 576) | def write_to_stream(
method read_from_stream (line 586) | def read_from_stream(stream: StreamType) -> Union["NumberObject", "Flo...
class ByteStringObject (line 593) | class ByteStringObject(bytes, PdfObject):
method clone (line 602) | def clone(
method hash_bin (line 616) | def hash_bin(self) -> int:
method original_bytes (line 627) | def original_bytes(self) -> bytes:
method write_to_stream (line 631) | def write_to_stream(
method __str__ (line 642) | def __str__(self) -> str:
class TextStringObject (line 652) | class TextStringObject(str, PdfObject): # noqa: SLOT000
method __new__ (line 666) | def __new__(cls, value: Any) -> Self:
method clone (line 701) | def clone(
method hash_bin (line 717) | def hash_bin(self) -> int:
method original_bytes (line 728) | def original_bytes(self) -> bytes:
method get_original_bytes (line 739) | def get_original_bytes(self) -> bytes:
method get_encoded_bytes (line 755) | def get_encoded_bytes(self) -> bytes:
method write_to_stream (line 774) | def write_to_stream(
class NameObject (line 796) | class NameObject(str, PdfObject): # noqa: SLOT000
method clone (line 804) | def clone(
method hash_bin (line 816) | def hash_bin(self) -> int:
method write_to_stream (line 826) | def write_to_stream(
method renumber (line 835) | def renumber(self) -> bytes:
method _sanitize (line 853) | def _sanitize(self) -> "NameObject":
method surfix (line 869) | def surfix(cls) -> bytes: # noqa: N805
method unnumber (line 874) | def unnumber(sin: bytes) -> bytes:
method read_from_stream (line 894) | def read_from_stream(stream: StreamType, pdf: Any) -> "NameObject": #...
function encode_pdfdocencoding (line 924) | def encode_pdfdocencoding(unicode_string: str) -> bytes:
function is_null_or_none (line 937) | def is_null_or_none(x: Any) -> TypeGuard[Union[None, NullObject, Indirec...
FILE: pypdf/generic/_data_structures.py
class ArrayObject (line 99) | class ArrayObject(list[Any], PdfObject):
method replicate (line 100) | def replicate(
method clone (line 115) | def clone(
method hash_bin (line 152) | def hash_bin(self) -> int:
method items (line 162) | def items(self) -> Iterable[Any]:
method _to_lst (line 166) | def _to_lst(self, lst: Any) -> list[Any]:
method __add__ (line 183) | def __add__(self, lst: Any) -> "ArrayObject":
method __iadd__ (line 202) | def __iadd__(self, lst: Any) -> Self:
method __isub__ (line 217) | def __isub__(self, lst: Any) -> Self:
method write_to_stream (line 227) | def write_to_stream(
method read_from_stream (line 241) | def read_from_stream(
class DictionaryObject (line 272) | class DictionaryObject(dict[Any, Any], PdfObject):
method replicate (line 273) | def replicate(
method clone (line 287) | def clone(
method _clone (line 311) | def _clone(
method hash_bin (line 424) | def hash_bin(self) -> int:
method raw_get (line 436) | def raw_get(self, key: Any) -> Any:
method get_inherited (line 439) | def get_inherited(self, key: str, default: Any = None) -> Any:
method __setitem__ (line 464) | def __setitem__(self, key: Any, value: Any) -> Any:
method setdefault (line 471) | def setdefault(self, key: Any, value: Optional[Any] = None) -> Any:
method __getitem__ (line 478) | def __getitem__(self, key: Any) -> PdfObject:
method xmp_metadata (line 482) | def xmp_metadata(self) -> Optional[XmpInformationProtocol]:
method write_to_stream (line 504) | def write_to_stream(
method _get_next_object_position (line 522) | def _get_next_object_position(
method _read_unsized_from_stream (line 534) | def _read_unsized_from_stream(
method read_from_stream (line 553) | def read_from_stream(
class TreeObject (line 690) | class TreeObject(DictionaryObject):
method __init__ (line 691) | def __init__(self, dct: Optional[DictionaryObject] = None) -> None:
method has_children (line 696) | def has_children(self) -> bool:
method __iter__ (line 699) | def __iter__(self) -> Any:
method children (line 702) | def children(self) -> Iterable[Any]:
method add_child (line 726) | def add_child(self, child: Any, pdf: PdfWriterProtocol) -> None:
method inc_parent_counter_default (line 729) | def inc_parent_counter_default(
method inc_parent_counter_outline (line 742) | def inc_parent_counter_outline(
method insert_child (line 759) | def insert_child(
method _remove_node_from_tree (line 809) | def _remove_node_from_tree(
method remove_child (line 852) | def remove_child(self, child: Any) -> None:
method remove_from_tree (line 889) | def remove_from_tree(self) -> None:
method empty_tree (line 895) | def empty_tree(self) -> None:
function _reset_node_tree_relationship (line 908) | def _reset_node_tree_relationship(child_obj: Any) -> None:
class StreamObject (line 925) | class StreamObject(DictionaryObject):
method __init__ (line 926) | def __init__(self) -> None:
method replicate (line 930) | def replicate(
method _clone (line 955) | def _clone(
method hash_bin (line 987) | def hash_bin(self) -> int:
method get_data (line 998) | def get_data(self) -> bytes:
method set_data (line 1001) | def set_data(self, data: bytes) -> None:
method hash_value_data (line 1004) | def hash_value_data(self) -> bytes:
method write_to_stream (line 1009) | def write_to_stream(
method initialize_from_dictionary (line 1024) | def initialize_from_dictionary(
method flate_encode (line 1039) | def flate_encode(self, level: int = -1) -> "EncodedStreamObject":
method decode_as_image (line 1071) | def decode_as_image(self, pillow_parameters: Union[dict[str, Any], Non...
class DecodedStreamObject (line 1101) | class DecodedStreamObject(StreamObject):
class EncodedStreamObject (line 1105) | class EncodedStreamObject(StreamObject):
method __init__ (line 1106) | def __init__(self) -> None:
method get_data (line 1110) | def get_data(self) -> bytes:
method set_data (line 1127) | def set_data(self, data: bytes) -> None:
class ContentStream (line 1147) | class ContentStream(DecodedStreamObject):
method __init__ (line 1168) | def __init__(
method replicate (line 1223) | def replicate(
method clone (line 1253) | def clone(
method _clone (line 1289) | def _clone(
method _parse_content_stream (line 1315) | def _parse_content_stream(self, stream: StreamType) -> None:
method _read_inline_image (line 1346) | def _read_inline_image(self, stream: StreamType) -> dict[str, Any]:
method get_data (line 1422) | def get_data(self) -> bytes:
method set_data (line 1444) | def set_data(self, data: bytes) -> None:
method operations (line 1449) | def operations(self) -> list[tuple[Any, bytes]]:
method operations (line 1456) | def operations(self, operations: list[tuple[Any, bytes]]) -> None:
method isolate_graphics_state (line 1460) | def isolate_graphics_state(self) -> None:
method write_to_stream (line 1468) | def write_to_stream(
function read_object (line 1476) | def read_object(
class Field (line 1526) | class Field(TreeObject):
method __init__ (line 1534) | def __init__(self, data: DictionaryObject) -> None:
method field_type (line 1558) | def field_type(self) -> Optional[NameObject]:
method parent (line 1563) | def parent(self) -> Optional[DictionaryObject]:
method kids (line 1568) | def kids(self) -> Optional["ArrayObject"]:
method name (line 1573) | def name(self) -> Optional[str]:
method alternate_name (line 1578) | def alternate_name(self) -> Optional[str]:
method mapping_name (line 1583) | def mapping_name(self) -> Optional[str]:
method flags (line 1593) | def flags(self) -> Optional[int]:
method value (line 1601) | def value(self) -> Optional[Any]:
method default_value (line 1610) | def default_value(self) -> Optional[Any]:
method additional_actions (line 1615) | def additional_actions(self) -> Optional[DictionaryObject]:
class Destination (line 1625) | class Destination(TreeObject):
method __init__ (line 1646) | def __init__(
method dest_array (line 1700) | def dest_array(self) -> "ArrayObject":
method write_to_stream (line 1710) | def write_to_stream(
method title (line 1734) | def title(self) -> Optional[str]:
method page (line 1739) | def page(self) -> Optional[IndirectObject]:
method typ (line 1744) | def typ(self) -> Optional[str]:
method zoom (line 1749) | def zoom(self) -> Optional[int]:
method left (line 1754) | def left(self) -> Optional[FloatObject]:
method right (line 1759) | def right(self) -> Optional[FloatObject]:
method top (line 1764) | def top(self) -> Optional[FloatObject]:
method bottom (line 1769) | def bottom(self) -> Optional[FloatObject]:
method color (line 1774) | def color(self) -> Optional["ArrayObject"]:
method font_format (line 1781) | def font_format(self) -> Optional[OutlineFontFlag]:
method outline_count (line 1790) | def outline_count(self) -> Optional[int]:
FILE: pypdf/generic/_files.py
class EmbeddedFile (line 32) | class EmbeddedFile:
method __init__ (line 40) | def __init__(self, name: str, pdf_object: DictionaryObject, parent: Ar...
method name (line 52) | def name(self) -> str:
method _create_new (line 57) | def _create_new(cls, writer: PdfWriter, name: str, content: str | byte...
method _get_names_array (line 107) | def _get_names_array(cls, writer: PdfWriter) -> ArrayObject:
method _get_insertion_index (line 151) | def _get_insertion_index(cls, names_array: ArrayObject, name: str) -> ...
method alternative_name (line 167) | def alternative_name(self) -> str | None:
method alternative_name (line 179) | def alternative_name(self, value: TextStringObject | None) -> None:
method description (line 191) | def description(self) -> str | None:
method description (line 199) | def description(self, value: TextStringObject | None) -> None:
method associated_file_relationship (line 207) | def associated_file_relationship(self) -> str:
method associated_file_relationship (line 212) | def associated_file_relationship(self, value: NameObject) -> None:
method _embedded_file (line 217) | def _embedded_file(self) -> StreamObject:
method _params (line 228) | def _params(self) -> DictionaryObject:
method _ensure_params (line 233) | def _ensure_params(self) -> DictionaryObject:
method subtype (line 241) | def subtype(self) -> str | None:
method subtype (line 249) | def subtype(self, value: NameObject | None) -> None:
method content (line 258) | def content(self) -> bytes:
method content (line 263) | def content(self, value: str | bytes) -> None:
method size (line 270) | def size(self) -> int | None:
method size (line 278) | def size(self, value: NumberObject | None) -> None:
method creation_date (line 287) | def creation_date(self) -> datetime.datetime | None:
method creation_date (line 292) | def creation_date(self, value: datetime.datetime | None) -> None:
method modification_date (line 302) | def modification_date(self) -> datetime.datetime | None:
method modification_date (line 307) | def modification_date(self, value: datetime.datetime | None) -> None:
method checksum (line 317) | def checksum(self) -> bytes | None:
method checksum (line 325) | def checksum(self, value: ByteStringObject | None) -> None:
method delete (line 333) | def delete(self) -> None:
method __repr__ (line 350) | def __repr__(self) -> str:
method _load_from_names (line 354) | def _load_from_names(cls, names: ArrayObject) -> Generator[EmbeddedFile]:
method _load (line 373) | def _load(cls, catalog: DictionaryObject) -> Generator[EmbeddedFile]:
FILE: pypdf/generic/_fit.py
class Fit (line 6) | class Fit:
method __init__ (line 7) | def __init__(
method xyz (line 18) | def xyz(
method fit (line 46) | def fit(cls) -> "Fit":
method fit_horizontally (line 59) | def fit_horizontally(cls, top: Optional[float] = None) -> "Fit":
method fit_vertically (line 79) | def fit_vertically(cls, left: Optional[float] = None) -> "Fit":
method fit_rectangle (line 83) | def fit_rectangle(
method fit_box (line 116) | def fit_box(cls) -> "Fit":
method fit_box_horizontally (line 129) | def fit_box_horizontally(cls, top: Optional[float] = None) -> "Fit":
method fit_box_vertically (line 149) | def fit_box_vertically(cls, left: Optional[float] = None) -> "Fit":
method __str__ (line 168) | def __str__(self) -> str:
FILE: pypdf/generic/_image_inline.py
function _check_end_image_marker (line 48) | def _check_end_image_marker(stream: StreamType) -> bool:
function extract_inline__ascii_hex_decode (line 55) | def extract_inline__ascii_hex_decode(stream: StreamType) -> bytes:
function extract_inline__ascii85_decode (line 93) | def extract_inline__ascii85_decode(stream: StreamType) -> bytes:
function extract_inline__run_length_decode (line 122) | def extract_inline__run_length_decode(stream: StreamType) -> bytes:
function extract_inline__dct_decode (line 161) | def extract_inline__dct_decode(stream: StreamType) -> bytes:
function extract_inline_default (line 208) | def extract_inline_default(stream: StreamType) -> bytes:
function is_followed_by_binary_data (line 260) | def is_followed_by_binary_data(stream: IO[bytes], length: int = 10) -> b...
FILE: pypdf/generic/_image_xobject.py
function _get_image_mode (line 42) | def _get_image_mode(
function bits2byte (line 124) | def bits2byte(data: bytes, size: tuple[int, int], bits: int) -> bytes:
function _extended_image_from_bytes (line 142) | def _extended_image_from_bytes(
function __handle_flate__indexed (line 162) | def __handle_flate__indexed(color_space: ArrayObject) -> tuple[Any, Any,...
function _handle_flate (line 180) | def _handle_flate(
function _handle_jpx (line 295) | def _handle_jpx(
function _apply_decode (line 332) | def _apply_decode(
function _get_mode_and_invert_color (line 375) | def _get_mode_and_invert_color(
function _xobj_to_image (line 405) | def _xobj_to_image(
FILE: pypdf/generic/_link.py
class NamedReferenceLink (line 42) | class NamedReferenceLink:
method __init__ (line 45) | def __init__(self, reference: TextStringObject, source_pdf: "PdfReader...
method find_referenced_page (line 50) | def find_referenced_page(self) -> Union[IndirectObject, None]:
method patch_reference (line 54) | def patch_reference(self, target_pdf: "PdfWriter", new_page: IndirectO...
class DirectReferenceLink (line 61) | class DirectReferenceLink:
method __init__ (line 64) | def __init__(self, reference: ArrayObject) -> None:
method find_referenced_page (line 68) | def find_referenced_page(self) -> IndirectObject:
method patch_reference (line 71) | def patch_reference(self, target_pdf: "PdfWriter", new_page: IndirectO...
function extract_links (line 79) | def extract_links(new_page: "PageObject", old_page: "PageObject") -> lis...
function _build_link (line 109) | def _build_link(indirect_object: IndirectObject, page: "PageObject") -> ...
function _create_link (line 130) | def _create_link(reference: PdfObject, source_pdf: "PdfReader") -> Optio...
FILE: pypdf/generic/_outline.py
class OutlineItem (line 8) | class OutlineItem(Destination):
method write_to_stream (line 9) | def write_to_stream(
FILE: pypdf/generic/_rectangle.py
class RectangleObject (line 7) | class RectangleObject(ArrayObject):
method __init__ (line 20) | def __init__(
method _ensure_is_number (line 28) | def _ensure_is_number(self, value: Any) -> Union[FloatObject, NumberOb...
method scale (line 33) | def scale(self, sx: float, sy: float) -> "RectangleObject":
method __repr__ (line 43) | def __repr__(self) -> str:
method left (line 47) | def left(self) -> FloatObject:
method left (line 51) | def left(self, f: float) -> None:
method bottom (line 55) | def bottom(self) -> FloatObject:
method bottom (line 59) | def bottom(self, f: float) -> None:
method right (line 63) | def right(self) -> FloatObject:
method right (line 67) | def right(self, f: float) -> None:
method top (line 71) | def top(self) -> FloatObject:
method top (line 75) | def top(self, f: float) -> None:
method lower_left (line 79) | def lower_left(self) -> tuple[float, float]:
method lower_left (line 87) | def lower_left(self, value: tuple[float, float]) -> None:
method lower_right (line 91) | def lower_right(self) -> tuple[float, float]:
method lower_right (line 99) | def lower_right(self, value: tuple[float, float]) -> None:
method upper_left (line 103) | def upper_left(self) -> tuple[float, float]:
method upper_left (line 111) | def upper_left(self, value: tuple[float, float]) -> None:
method upper_right (line 115) | def upper_right(self) -> tuple[float, float]:
method upper_right (line 123) | def upper_right(self, value: tuple[float, float]) -> None:
method width (line 127) | def width(self) -> float:
method height (line 131) | def height(self) -> float:
FILE: pypdf/generic/_utils.py
function hex_to_rgb (line 10) | def hex_to_rgb(value: str) -> tuple[float, float, float]:
function read_hex_string_from_stream (line 14) | def read_hex_string_from_stream(
function read_string_from_stream (line 62) | def read_string_from_stream(
function create_string_object (line 123) | def create_string_object(
function decode_pdfdocencoding (line 195) | def decode_pdfdocencoding(byte_array: bytes) -> str:
FILE: pypdf/generic/_viewerpref.py
class ViewerPreferences (line 40) | class ViewerPreferences(DictionaryObject):
method __init__ (line 41) | def __init__(self, obj: Optional[DictionaryObject] = None) -> None:
method _get_bool (line 50) | def _get_bool(self, key: str, default: Optional[BooleanObject]) -> Opt...
method _set_bool (line 53) | def _set_bool(self, key: str, v: bool) -> None:
method _get_name (line 56) | def _get_name(self, key: str, default: Optional[NameObject]) -> Option...
method _set_name (line 59) | def _set_name(self, key: str, lst: list[str], v: NameObject) -> None:
method _get_arr (line 66) | def _get_arr(self, key: str, default: Optional[list[Any]]) -> Optional...
method _set_arr (line 69) | def _set_arr(self, key: str, v: Optional[ArrayObject]) -> None:
method _get_int (line 80) | def _get_int(self, key: str, default: Optional[NumberObject]) -> Optio...
method _set_int (line 83) | def _set_int(self, key: str, v: int) -> None:
method PRINT_SCALING (line 87) | def PRINT_SCALING(self) -> NameObject:
method __new__ (line 90) | def __new__(cls: Any, value: Any = None) -> "ViewerPreferences": # no...
FILE: pypdf/pagerange.py
class PageRange (line 19) | class PageRange:
method __init__ (line 36) | def __init__(self, arg: Union[slice, "PageRange", str]) -> None:
method valid (line 80) | def valid(input: Any) -> bool:
method to_slice (line 95) | def to_slice(self) -> slice:
method __str__ (line 99) | def __str__(self) -> str:
method __repr__ (line 112) | def __repr__(self) -> str:
method indices (line 116) | def indices(self, n: int) -> tuple[int, int, int]:
method __eq__ (line 132) | def __eq__(self, other: object) -> bool:
method __hash__ (line 137) | def __hash__(self) -> int:
method __add__ (line 140) | def __add__(self, other: "PageRange") -> "PageRange":
function parse_filename_page_ranges (line 161) | def parse_filename_page_ranges(
FILE: pypdf/papersizes.py
class Dimensions (line 6) | class Dimensions(NamedTuple):
class PaperSize (line 11) | class PaperSize:
FILE: pypdf/xmp.py
function _identity (line 105) | def _identity(value: K) -> K:
function _converter_date (line 109) | def _converter_date(value: str) -> datetime.datetime:
function _format_datetime_utc (line 136) | def _format_datetime_utc(value: datetime.datetime) -> str:
function _generic_get (line 149) | def _generic_get(
class XmpInformation (line 164) | class XmpInformation(XmpInformationProtocol, PdfObject):
method __init__ (line 174) | def __init__(self, stream: ContentStream) -> None:
method create (line 187) | def create(cls) -> "XmpInformation":
method write_to_stream (line 198) | def write_to_stream(
method get_element (line 212) | def get_element(self, about_uri: str, namespace: str, name: str) -> It...
method get_nodes_in_namespace (line 220) | def get_nodes_in_namespace(self, about_uri: str, namespace: str) -> It...
method _get_text (line 231) | def _get_text(self, element: XmlElement) -> str:
method _get_single_value (line 238) | def _get_single_value(
method _getter_bag (line 260) | def _getter_bag(self, namespace: str, name: str) -> Optional[list[str]]:
method _get_seq_values (line 275) | def _get_seq_values(
method _get_langalt_values (line 304) | def _get_langalt_values(self, namespace: str, name: str) -> Optional[d...
method dc_contributor (line 323) | def dc_contributor(self) -> Optional[list[str]]:
method dc_contributor (line 328) | def dc_contributor(self, values: Optional[list[str]]) -> None:
method dc_coverage (line 332) | def dc_coverage(self) -> Optional[str]:
method dc_coverage (line 337) | def dc_coverage(self, value: Optional[str]) -> None:
method dc_creator (line 341) | def dc_creator(self) -> Optional[list[str]]:
method dc_creator (line 346) | def dc_creator(self, values: Optional[list[str]]) -> None:
method dc_date (line 350) | def dc_date(self) -> Optional[list[datetime.datetime]]:
method dc_date (line 355) | def dc_date(self, values: Optional[list[Union[str, datetime.datetime]]...
method dc_description (line 368) | def dc_description(self) -> Optional[dict[str, str]]:
method dc_description (line 373) | def dc_description(self, values: Optional[dict[str, str]]) -> None:
method dc_format (line 377) | def dc_format(self) -> Optional[str]:
method dc_format (line 382) | def dc_format(self, value: Optional[str]) -> None:
method dc_identifier (line 386) | def dc_identifier(self) -> Optional[str]:
method dc_identifier (line 391) | def dc_identifier(self, value: Optional[str]) -> None:
method dc_language (line 395) | def dc_language(self) -> Optional[list[str]]:
method dc_language (line 400) | def dc_language(self, values: Optional[list[str]]) -> None:
method dc_publisher (line 404) | def dc_publisher(self) -> Optional[list[str]]:
method dc_publisher (line 409) | def dc_publisher(self, values: Optional[list[str]]) -> None:
method dc_relation (line 413) | def dc_relation(self) -> Optional[list[str]]:
method dc_relation (line 418) | def dc_relation(self, values: Optional[list[str]]) -> None:
method dc_rights (line 422) | def dc_rights(self) -> Optional[dict[str, str]]:
method dc_rights (line 427) | def dc_rights(self, values: Optional[dict[str, str]]) -> None:
method dc_source (line 431) | def dc_source(self) -> Optional[str]:
method dc_source (line 436) | def dc_source(self, value: Optional[str]) -> None:
method dc_subject (line 440) | def dc_subject(self) -> Optional[list[str]]:
method dc_subject (line 445) | def dc_subject(self, values: Optional[list[str]]) -> None:
method dc_title (line 449) | def dc_title(self) -> Optional[dict[str, str]]:
method dc_title (line 454) | def dc_title(self, values: Optional[dict[str, str]]) -> None:
method dc_type (line 458) | def dc_type(self) -> Optional[list[str]]:
method dc_type (line 463) | def dc_type(self, values: Optional[list[str]]) -> None:
method pdf_keywords (line 467) | def pdf_keywords(self) -> Optional[str]:
method pdf_keywords (line 472) | def pdf_keywords(self, value: Optional[str]) -> None:
method pdf_pdfversion (line 476) | def pdf_pdfversion(self) -> Optional[str]:
method pdf_pdfversion (line 481) | def pdf_pdfversion(self, value: Optional[str]) -> None:
method pdf_producer (line 485) | def pdf_producer(self) -> Optional[str]:
method pdf_producer (line 490) | def pdf_producer(self, value: Optional[str]) -> None:
method xmp_create_date (line 494) | def xmp_create_date(self) -> Optional[datetime.datetime]:
method xmp_create_date (line 499) | def xmp_create_date(self, value: Optional[datetime.datetime]) -> None:
method xmp_modify_date (line 507) | def xmp_modify_date(self) -> Optional[datetime.datetime]:
method xmp_modify_date (line 512) | def xmp_modify_date(self, value: Optional[datetime.datetime]) -> None:
method xmp_metadata_date (line 520) | def xmp_metadata_date(self) -> Optional[datetime.datetime]:
method xmp_metadata_date (line 525) | def xmp_metadata_date(self, value: Optional[datetime.datetime]) -> None:
method xmp_creator_tool (line 533) | def xmp_creator_tool(self) -> Optional[str]:
method xmp_creator_tool (line 538) | def xmp_creator_tool(self, value: Optional[str]) -> None:
method xmpmm_document_id (line 542) | def xmpmm_document_id(self) -> Optional[str]:
method xmpmm_document_id (line 547) | def xmpmm_document_id(self, value: Optional[str]) -> None:
method xmpmm_instance_id (line 551) | def xmpmm_instance_id(self) -> Optional[str]:
method xmpmm_instance_id (line 556) | def xmpmm_instance_id(self, value: Optional[str]) -> None:
method pdfaid_part (line 560) | def pdfaid_part(self) -> Optional[str]:
method pdfaid_part (line 565) | def pdfaid_part(self, value: Optional[str]) -> None:
method pdfaid_conformance (line 569) | def pdfaid_conformance(self) -> Optional[str]:
method pdfaid_conformance (line 574) | def pdfaid_conformance(self, value: Optional[str]) -> None:
method custom_properties (line 578) | def custom_properties(self) -> dict[Any, Any]:
method _get_or_create_description (line 608) | def _get_or_create_description(self, about_uri: str = "") -> XmlElement:
method _clear_cache_entry (line 622) | def _clear_cache_entry(self, namespace: str, name: str) -> None:
method _set_single_value (line 628) | def _set_single_value(self, namespace: str, name: str, value: Optional...
method _set_bag_values (line 652) | def _set_bag_values(self, namespace: str, name: str, values: Optional[...
method _set_seq_values (line 680) | def _set_seq_values(self, namespace: str, name: str, values: Optional[...
method _set_langalt_values (line 708) | def _set_langalt_values(self, namespace: str, name: str, values: Optio...
method _get_namespace_prefix (line 737) | def _get_namespace_prefix(self, namespace: str) -> str:
method _update_stream (line 741) | def _update_stream(self) -> None:
FILE: resources/afm_to_dataclass.py
class Parser (line 18) | class Parser:
method __init__ (line 19) | def __init__(self) -> None:
method get_fonts (line 23) | def get_fonts(self) -> None:
method get_disclaimer (line 37) | def get_disclaimer(self, width: int = 95) -> str:
method _handle_font (line 50) | def _handle_font(self, file_name: str, font_data: str) -> list[str]: ...
method get_font_data (line 170) | def get_font_data(self) -> str:
FILE: tests/__init__.py
function _get_data_from_url (line 23) | def _get_data_from_url(url: str) -> bytes:
function get_data_from_url (line 41) | def get_data_from_url(url: Optional[str] = None, name: Optional[str] = N...
function _strip_position (line 77) | def _strip_position(line: str) -> str:
function normalize_warnings (line 98) | def normalize_warnings(caplog_text: str) -> list[str]:
function is_sublist (line 102) | def is_sublist(child_list, parent_list):
function read_yaml_to_list_of_dicts (line 119) | def read_yaml_to_list_of_dicts(yaml_file: Path) -> list[dict[str, str]]:
function download_test_pdfs (line 124) | def download_test_pdfs():
class PILContext (line 140) | class PILContext:
method __init__ (line 143) | def __init__(self) -> None:
method __enter__ (line 146) | def __enter__(self) -> Self:
method __exit__ (line 153) | def __exit__(self, type_, value, traceback) -> Optional[bool]:
FILE: tests/bench.py
function page_ops (line 19) | def page_ops(pdf_path, password):
function test_page_operations (line 53) | def test_page_operations(benchmark):
function merge (line 62) | def merge():
function test_merge (line 119) | def test_merge(benchmark):
function text_extraction (line 128) | def text_extraction(pdf_path):
function test_text_extraction (line 137) | def test_text_extraction(benchmark):
function read_string_from_stream_performance (line 142) | def read_string_from_stream_performance():
function test_read_string_from_stream_performance (line 147) | def test_read_string_from_stream_performance(benchmark):
function image_new_property (line 157) | def image_new_property(data):
function test_image_new_property_performance (line 217) | def test_image_new_property_performance(benchmark):
function image_extraction (line 225) | def image_extraction(data):
function test_large_compressed_image_performance (line 231) | def test_large_compressed_image_performance(benchmark):
FILE: tests/conftest.py
function pdf_file_path (line 9) | def pdf_file_path(tmp_path_factory):
function txt_file_path (line 14) | def txt_file_path(tmp_path_factory):
FILE: tests/generic/test_base.py
function test_text_string_object__looks_like_bom (line 18) | def test_text_string_object__looks_like_bom(source: bytes, expected: str...
function test_text_string_object__wrongly_detected_bom (line 25) | def test_text_string_object__wrongly_detected_bom():
FILE: tests/generic/test_data_structures.py
function test_dictionary_object__get_next_object_position (line 31) | def test_dictionary_object__get_next_object_position():
function test_tree_object__cyclic_reference (line 52) | def test_tree_object__cyclic_reference(caplog):
function test_array_object__clone_same_object_multiple_times (line 67) | def test_array_object__clone_same_object_multiple_times(caplog):
function test_array_object__clone_same_stream_multiple_times (line 79) | def test_array_object__clone_same_stream_multiple_times():
function test_dictionary_object__read_from_stream__limit (line 124) | def test_dictionary_object__read_from_stream__limit():
function _prepare_test_dictionary_object__read_from_stream__no_limit (line 138) | def _prepare_test_dictionary_object__read_from_stream__no_limit(
function test_dictionary_object__read_from_stream__no_limit (line 167) | def test_dictionary_object__read_from_stream__no_limit(tmp_path):
function test_dictionary_object__read_from_stream__no_limit__path (line 199) | def test_dictionary_object__read_from_stream__no_limit__path(tmp_path):
function _get_array_based_buffer (line 227) | def _get_array_based_buffer(stream_count: int, chunk_bytes: int) -> Byte...
function test_content_stream__array_based__performance (line 244) | def test_content_stream__array_based__performance():
function test_content_stream__array_based__length (line 250) | def test_content_stream__array_based__length():
function test_content_stream__array_based__output_length (line 260) | def test_content_stream__array_based__output_length():
FILE: tests/generic/test_files.py
function test_embedded_file__basic (line 30) | def test_embedded_file__basic(tmpdir):
function test_embedded_file__artificial (line 52) | def test_embedded_file__artificial():
function test_embedded_file__kids (line 83) | def test_embedded_file__kids():
function test_embedded_file__ensure_params__existing_params (line 122) | def test_embedded_file__ensure_params__existing_params():
function test_embedded_file__name_is_read_only (line 148) | def test_embedded_file__name_is_read_only():
function test_embedded_file__alternative_name_setter (line 158) | def test_embedded_file__alternative_name_setter():
function test_embedded_file__alternative_name__uf_key_only (line 177) | def test_embedded_file__alternative_name__uf_key_only():
function test_embedded_file__alternative_name__f_key_only (line 196) | def test_embedded_file__alternative_name__f_key_only():
function test_embedded_file__alternative_name__both_f_and_uf (line 216) | def test_embedded_file__alternative_name__both_f_and_uf():
function test_embedded_file__description_setter (line 234) | def test_embedded_file__description_setter():
function test_embedded_file__subtype_setter (line 249) | def test_embedded_file__subtype_setter():
function test_embedded_file__content_setter (line 264) | def test_embedded_file__content_setter():
function test_embedded_file__size_setter (line 276) | def test_embedded_file__size_setter():
function test_embedded_file__size_getter (line 291) | def test_embedded_file__size_getter():
function test_embedded_file__creation_date_setter (line 303) | def test_embedded_file__creation_date_setter():
function test_embedded_file__modification_date_setter (line 315) | def test_embedded_file__modification_date_setter():
function test_embedded_file__checksum_setter (line 327) | def test_embedded_file__checksum_setter():
function test_embedded_file__associated_file_relationship_setter (line 343) | def test_embedded_file__associated_file_relationship_setter():
function test_embedded_file__setters_integration (line 351) | def test_embedded_file__setters_integration():
function test_embedded_file__null_object_handling (line 377) | def test_embedded_file__null_object_handling():
function test_embedded_file__delete_without_parent (line 400) | def test_embedded_file__delete_without_parent():
function test_embedded_file__delete_known (line 406) | def test_embedded_file__delete_known():
function test_embedded_file__delete__no_indirect_reference (line 427) | def test_embedded_file__delete__no_indirect_reference():
function test_embedded_file__create__kids_based_name_tree (line 444) | def test_embedded_file__create__kids_based_name_tree():
function test_embedded_file__create__neither_kids_nor_names (line 479) | def test_embedded_file__create__neither_kids_nor_names():
function test_embedded_file__get_insertion_index (line 491) | def test_embedded_file__get_insertion_index():
function test_embedded_file__order (line 552) | def test_embedded_file__order():
FILE: tests/generic/test_image_inline.py
function test_is_followed_by_binary_data (line 12) | def test_is_followed_by_binary_data():
function test_extract_inline_dct__early_end_of_file (line 70) | def test_extract_inline_dct__early_end_of_file():
function test_extract_inline_dct__multiple_eod (line 81) | def test_extract_inline_dct__multiple_eod():
FILE: tests/generic/test_image_xobject.py
function test_get_imagemode_recursion_depth (line 19) | def test_get_imagemode_recursion_depth():
function test_handle_flate__image_mode_1 (line 36) | def test_handle_flate__image_mode_1(caplog):
function test_extended_image_frombytes_zero_data (line 126) | def test_extended_image_frombytes_zero_data():
function test_handle_flate__autodesk_indexed (line 135) | def test_handle_flate__autodesk_indexed():
function test_get_mode_and_invert_color (line 155) | def test_get_mode_and_invert_color():
function test_get_imagemode__empty_array (line 165) | def test_get_imagemode__empty_array():
function test_p_image_with_alpha_mask (line 175) | def test_p_image_with_alpha_mask():
function test_handle_flate__icc_based__image_mode_1 (line 216) | def test_handle_flate__icc_based__image_mode_1():
function test_handle_jpx__explicit_decode (line 241) | def test_handle_jpx__explicit_decode():
FILE: tests/generic/test_link.py
function test_extract_links__null_object_in_old_page (line 12) | def test_extract_links__null_object_in_old_page():
function test_extract_links (line 21) | def test_extract_links(caplog):
FILE: tests/scripts/test_example_files.py
function test_consistency (line 8) | def test_consistency():
FILE: tests/scripts/test_make_release.py
function test_strip_header (line 35) | def test_strip_header(data, expected):
function test_get_git_commits_since_tag (line 41) | def test_get_git_commits_since_tag():
function test_get_formatted_changes (line 87) | def test_get_formatted_changes():
function test_get_formatted_changes__other (line 131) | def test_get_formatted_changes__other():
FILE: tests/test_annotations.py
function test_ellipse (line 28) | def test_ellipse(pdf_file_path):
function test_text (line 48) | def test_text(pdf_file_path):
function test_free_text (line 69) | def test_free_text(pdf_file_path):
function test_free_text__font_specifier (line 109) | def test_free_text__font_specifier():
function test_annotation_dictionary (line 129) | def test_annotation_dictionary():
function test_polygon (line 135) | def test_polygon(pdf_file_path):
function test_polyline (line 158) | def test_polyline(pdf_file_path):
function test_line (line 184) | def test_line(pdf_file_path):
function test_rectangle (line 206) | def test_rectangle(pdf_file_path):
function test_highlight (line 228) | def test_highlight(pdf_file_path):
function test_link (line 292) | def test_link(pdf_file_path):
function test_popup (line 339) | def test_popup(caplog):
function test_markup_annotation_in_reply_to (line 374) | def test_markup_annotation_in_reply_to():
function test_markup_annotation_in_reply_to_group_type (line 411) | def test_markup_annotation_in_reply_to_group_type():
function test_markup_annotation_name_without_reply (line 434) | def test_markup_annotation_name_without_reply():
function test_markup_annotation_reply_type_without_reply (line 444) | def test_markup_annotation_reply_type_without_reply():
function test_markup_annotation_in_reply_to_custom_name (line 454) | def test_markup_annotation_in_reply_to_custom_name():
function test_markup_annotation_in_reply_to_unregistered (line 474) | def test_markup_annotation_in_reply_to_unregistered():
function test_markup_annotation_in_reply_to_indirect_object (line 485) | def test_markup_annotation_in_reply_to_indirect_object():
function test_outline_action_without_d_lenient (line 517) | def test_outline_action_without_d_lenient():
function test_outline_action_without_d_strict (line 523) | def test_outline_action_without_d_strict(pdf_file_path):
FILE: tests/test_appearance_stream.py
function test_comb (line 6) | def test_comb():
function test_scale_text (line 38) | def test_scale_text():
FILE: tests/test_cmap.py
function test_text_extraction_slow (line 46) | def test_text_extraction_slow(caplog, url: str, name: str, strict: bool):
function test_text_extraction_fast (line 82) | def test_text_extraction_fast(caplog, url: str, name: str, strict: bool):
function test_parse_encoding_advanced_encoding_not_implemented (line 91) | def test_parse_encoding_advanced_encoding_not_implemented(caplog):
function test_ascii_charset (line 100) | def test_ascii_charset():
function test_text_extraction_of_specific_pages (line 125) | def test_text_extraction_of_specific_pages(
function test_iss1533 (line 133) | def test_iss1533():
function test_cmap_encodings (line 160) | def test_cmap_encodings(caplog, url, name, page_index, within_text, capl...
function test_latex (line 169) | def test_latex():
function test_unixxx_glyphs (line 178) | def test_unixxx_glyphs():
function test_cmap_compute_space_width (line 186) | def test_cmap_compute_space_width():
function test_tabs_in_cmap (line 197) | def test_tabs_in_cmap():
function test_ignoring_non_put_entries (line 204) | def test_ignoring_non_put_entries():
function test_eten_b5 (line 211) | def test_eten_b5():
function test_missing_entries_in_cmap (line 217) | def test_missing_entries_in_cmap():
function test_null_missing_width (line 232) | def test_null_missing_width():
function test_unigb_utf16 (line 243) | def test_unigb_utf16():
function test_too_many_differences (line 254) | def test_too_many_differences():
function test_iss2925 (line 265) | def test_iss2925():
function test_iss2966 (line 275) | def test_iss2966():
function test_binascii_odd_length_string (line 286) | def test_binascii_odd_length_string(caplog):
function test_standard_encoding (line 298) | def test_standard_encoding(caplog):
function test_function_in_font_widths (line 310) | def test_function_in_font_widths(caplog):
function test_get_encoding__encoding_value_is_none (line 321) | def test_get_encoding__encoding_value_is_none():
function test_parse_bfchar (line 330) | def test_parse_bfchar(caplog):
function test_parse_bfrange__iteration_limit (line 341) | def test_parse_bfrange__iteration_limit():
function test_parse_bfchar__iteration_limit (line 414) | def test_parse_bfchar__iteration_limit():
FILE: tests/test_codecs.py
function test_encode_decode (line 28) | def test_encode_decode(data):
function test_encode_lzw (line 45) | def test_encode_lzw(plain, expected_encoded):
function test_decode_lzw (line 60) | def test_decode_lzw(encoded, expected_decoded):
function test_lzw_decoder_table_overflow (line 66) | def test_lzw_decoder_table_overflow(caplog):
function test_lzw_decoder_large_stream_performance (line 78) | def test_lzw_decoder_large_stream_performance(caplog):
function test_lzw_decoder__output_limit (line 83) | def test_lzw_decoder__output_limit():
FILE: tests/test_constants.py
function test_slash_prefix (line 10) | def test_slash_prefix():
function test_user_access_permissions__dict_handling (line 42) | def test_user_access_permissions__dict_handling():
function test_user_access_permissions__all (line 105) | def test_user_access_permissions__all():
FILE: tests/test_doc_common.py
function test_attachments (line 32) | def test_attachments(tmpdir):
function test_get_attachments__same_attachment_more_than_twice (line 90) | def test_get_attachments__same_attachment_more_than_twice():
function test_get_attachments__alternative_name_is_none (line 107) | def test_get_attachments__alternative_name_is_none():
function test_byte_encoded_named_destinations (line 122) | def test_byte_encoded_named_destinations():
function test_viewer_preferences__indirect_reference (line 181) | def test_viewer_preferences__indirect_reference():
function test_named_destinations__tree_is_null_object (line 194) | def test_named_destinations__tree_is_null_object():
function test_outline__issue3462 (line 203) | def test_outline__issue3462():
function test_flatten__cyclic_references (line 462) | def test_flatten__cyclic_references():
function test_get_outline__cyclic_references (line 480) | def test_get_outline__cyclic_references(caplog):
function test_get_outline__cyclic_references__nested_handling (line 504) | def test_get_outline__cyclic_references__nested_handling(caplog):
function test_xfa__decompression_limit (line 559) | def test_xfa__decompression_limit():
FILE: tests/test_encryption.py
function test_encryption (line 69) | def test_encryption(name, requires_aes):
function test_pdf_with_both_passwords (line 116) | def test_pdf_with_both_passwords(name, user_passwd, owner_passwd):
function test_aesv2_without_length_in_encrypt_dict (line 135) | def test_aesv2_without_length_in_encrypt_dict():
function test_read_page_from_encrypted_file_aes_256 (line 159) | def test_read_page_from_encrypted_file_aes_256(pdffile, password):
function test_merge_encrypted_pdfs (line 185) | def test_merge_encrypted_pdfs(names):
function test_encrypt_decrypt_with_cipher_class (line 208) | def test_encrypt_decrypt_with_cipher_class(cryptcls):
function test_attempt_decrypt_unencrypted_pdf (line 216) | def test_attempt_decrypt_unencrypted_pdf():
function test_alg_v5_generate_values (line 225) | def test_alg_v5_generate_values():
function test_pdf_encrypt (line 261) | def test_pdf_encrypt(pdf_file_path, alg, requires_aes):
function test_pdf_encrypt_multiple (line 315) | def test_pdf_encrypt_multiple(pdf_file_path, count):
function test_aes_decrypt_corrupted_data (line 353) | def test_aes_decrypt_corrupted_data():
function test_encrypt_stream_dictionary (line 361) | def test_encrypt_stream_dictionary(pdf_file_path):
function test_are_permissions_valid_none_for_unencrypted (line 387) | def test_are_permissions_valid_none_for_unencrypted():
function test_are_permissions_valid_none_before_decrypt (line 394) | def test_are_permissions_valid_none_before_decrypt():
function test_are_permissions_valid_true_for_valid_r6 (line 401) | def test_are_permissions_valid_true_for_valid_r6():
function test_are_permissions_valid_true_for_v4 (line 408) | def test_are_permissions_valid_true_for_v4():
function test_are_permissions_valid_false_when_tampered (line 420) | def test_are_permissions_valid_false_when_tampered():
FILE: tests/test_filters.py
function test_flate_decode_encode (line 64) | def test_flate_decode_encode(predictor, s):
function test_flatedecode_unsupported_predictor (line 72) | def test_flatedecode_unsupported_predictor():
function test_ascii_hex_decode_method (line 127) | def test_ascii_hex_decode_method(data, expected):
function test_ascii_hex_decode_missing_eod (line 135) | def test_ascii_hex_decode_missing_eod(caplog):
function test_decode_ahx (line 142) | def test_decode_ahx():
function test_ascii85decode_with_overflow (line 152) | def test_ascii85decode_with_overflow():
function test_ascii85decode_five_zero_bytes (line 169) | def test_ascii85decode_five_zero_bytes():
function test_ccitparameters (line 192) | def test_ccitparameters():
function test_ccittparameters (line 200) | def test_ccittparameters():
function test_ccitt_get_parameters (line 217) | def test_ccitt_get_parameters(parameters, expected_k, expected_black_is_1):
function test_ccitt_get_parameters__indirect_object (line 223) | def test_ccitt_get_parameters__indirect_object():
function test_ccitt_fax_decode (line 234) | def test_ccitt_fax_decode():
function test_decompress_zlib_error (line 253) | def test_decompress_zlib_error(caplog):
function test_lzw_decode_neg1 (line 261) | def test_lzw_decode_neg1():
function test_issue_399 (line 268) | def test_issue_399():
function test_image_without_pillow (line 274) | def test_image_without_pillow(tmp_path):
function test_issue_1737 (line 322) | def test_issue_1737():
function test_pa_image_extraction (line 330) | def test_pa_image_extraction():
function test_1bit_image_extraction (line 349) | def test_1bit_image_extraction():
function test_png_transparency_reverse (line 357) | def test_png_transparency_reverse():
function test_iss1787 (line 371) | def test_iss1787():
function test_tiff_predictor (line 387) | def test_tiff_predictor():
function test_rgba (line 398) | def test_rgba():
function test_cmyk (line 412) | def test_cmyk():
function test_iss1863 (line 431) | def test_iss1863():
function test_read_images (line 440) | def test_read_images():
function test_cascaded_filters_images (line 448) | def test_cascaded_filters_images():
function test_calrgb (line 457) | def test_calrgb():
function test_index_lookup (line 463) | def test_index_lookup():
function test_2bits_image (line 486) | def test_2bits_image():
function test_gray_devicen_cmyk (line 497) | def test_gray_devicen_cmyk():
function test_runlengthdecode (line 514) | def test_runlengthdecode():
function test_gray_separation_cmyk (line 534) | def test_gray_separation_cmyk():
function test_singleton_device (line 551) | def test_singleton_device():
function test_jpx_no_spacecode (line 560) | def test_jpx_no_spacecode():
function test_encodedstream_lookup (line 575) | def test_encodedstream_lookup():
function test_convert_1_to_la (line 584) | def test_convert_1_to_la():
function test_nested_device_n_color_space (line 594) | def test_nested_device_n_color_space():
function test_flate_decode_with_image_mode_1 (line 604) | def test_flate_decode_with_image_mode_1():
function test_flate_decode_with_image_mode_1__whitespace_at_end_of_lookup (line 614) | def test_flate_decode_with_image_mode_1__whitespace_at_end_of_lookup():
function test_ascii85decode__invalid_end__recoverable (line 623) | def test_ascii85decode__invalid_end__recoverable(caplog):
function test_ascii85decode__non_recoverable (line 634) | def test_ascii85decode__non_recoverable(caplog):
function test_ascii85decode__ignore_whitespaces (line 648) | def test_ascii85decode__ignore_whitespaces(caplog):
function test_ccitt_fax_decode__black_is_1 (line 656) | def test_ccitt_fax_decode__black_is_1():
function test_flate_decode__image_is_none_due_to_size_limit (line 678) | def test_flate_decode__image_is_none_due_to_size_limit(caplog):
function test_flate_decode__not_rectangular (line 697) | def test_flate_decode__not_rectangular(caplog):
function test_jbig2decode__binary_errors (line 714) | def test_jbig2decode__binary_errors():
function test_jbig2decode__edge_cases (line 747) | def test_jbig2decode__edge_cases(caplog):
function test_flate_decode_stream_with_faulty_tail_bytes (line 812) | def test_flate_decode_stream_with_faulty_tail_bytes():
function test_rle_decode_with_faulty_tail_byte_in_multi_encoded_stream (line 835) | def test_rle_decode_with_faulty_tail_byte_in_multi_encoded_stream(caplog):
function test_rle_decode_exception_with_corrupted_stream (line 854) | def test_rle_decode_exception_with_corrupted_stream(caplog):
function test_decompress (line 872) | def test_decompress():
function test_decompress__logging_on_invalid_data (line 901) | def test_decompress__logging_on_invalid_data(caplog):
function test_ccittfaxdecode__ccf_inline (line 910) | def test_ccittfaxdecode__ccf_inline():
function test_dctdecode__dct_inline (line 931) | def test_dctdecode__dct_inline():
function test_deprecate_inline_image_filters (line 986) | def test_deprecate_inline_image_filters():
function test_flatedecode__columns_is_zero (line 1019) | def test_flatedecode__columns_is_zero():
function test_runlengthdecode__decode_limit (line 1031) | def test_runlengthdecode__decode_limit():
function test_asciihexdecode__speed (line 1049) | def test_asciihexdecode__speed():
FILE: tests/test_font.py
function test_font_descriptor (line 7) | def test_font_descriptor():
FILE: tests/test_forms.py
function test_form_button__v_value_should_be_name_object (line 12) | def test_form_button__v_value_should_be_name_object():
FILE: tests/test_generic.py
class ChildDummy (line 53) | class ChildDummy(DictionaryObject):
method indirect_reference (line 55) | def indirect_reference(self):
function test_float_object_exception (line 59) | def test_float_object_exception(caplog):
function test_number_object_exception (line 64) | def test_number_object_exception(caplog):
function test_number_object_no_exception (line 69) | def test_number_object_no_exception():
function test_create_string_object_exception (line 73) | def test_create_string_object_exception():
function test_boolean_object (line 87) | def test_boolean_object(value, expected, tell):
function test_boolean_object_write (line 95) | def test_boolean_object_write():
function test_boolean_eq (line 103) | def test_boolean_eq():
function test_boolean_object_exception (line 118) | def test_boolean_object_exception():
function test_array_object_exception (line 125) | def test_array_object_exception():
function test_null_object_exception (line 132) | def test_null_object_exception():
function test_indirect_object_premature (line 140) | def test_indirect_object_premature(value):
function test_read_hex_string_from_stream (line 147) | def test_read_hex_string_from_stream():
function test_read_hex_string_from_stream_exception (line 152) | def test_read_hex_string_from_stream_exception():
function test_read_string_from_stream_exception (line 159) | def test_read_string_from_stream_exception():
function test_read_string_from_stream_not_in_escapedict_no_digit (line 166) | def test_read_string_from_stream_not_in_escapedict_no_digit():
function test_read_string_from_stream_multichar_eol (line 173) | def test_read_string_from_stream_multichar_eol():
function test_read_string_from_stream_multichar_eol2 (line 178) | def test_read_string_from_stream_multichar_eol2():
function test_read_string_from_stream_excape_digit (line 183) | def test_read_string_from_stream_excape_digit():
function test_read_string_from_stream_excape_digit2 (line 188) | def test_read_string_from_stream_excape_digit2():
function test_name_object (line 193) | def test_name_object(caplog):
function test_destination_fit_r (line 272) | def test_destination_fit_r():
function test_destination_fit_v (line 287) | def test_destination_fit_v():
function test_outline_item_write_to_stream (line 297) | def test_outline_item_write_to_stream():
function test_encode_pdfdocencoding_keyerror (line 305) | def test_encode_pdfdocencoding_keyerror():
function test_encode_pdfdocencoding_returns_bytes (line 312) | def test_encode_pdfdocencoding_returns_bytes(test_input):
function test_read_object_comment_exception (line 321) | def test_read_object_comment_exception():
function test_read_object_empty (line 329) | def test_read_object_empty():
function test_read_object_empty_in_array (line 335) | def test_read_object_empty_in_array():
function test_read_object_invalid (line 344) | def test_read_object_invalid():
function test_read_object_comment (line 352) | def test_read_object_comment():
function test_bytestringobject (line 359) | def test_bytestringobject():
function test_dictionaryobject_key_is_no_pdfobject (line 367) | def test_dictionaryobject_key_is_no_pdfobject():
function test_dictionaryobject_xmp_meta (line 374) | def test_dictionaryobject_xmp_meta():
function test_dictionaryobject_value_is_no_pdfobject (line 379) | def test_dictionaryobject_value_is_no_pdfobject():
function test_dictionaryobject_setdefault_key_is_no_pdfobject (line 386) | def test_dictionaryobject_setdefault_key_is_no_pdfobject():
function test_dictionaryobject_setdefault_value_is_no_pdfobject (line 393) | def test_dictionaryobject_setdefault_value_is_no_pdfobject():
function test_dictionaryobject_setdefault_value (line 400) | def test_dictionaryobject_setdefault_value():
function test_dictionaryobject_read_from_stream (line 405) | def test_dictionaryobject_read_from_stream():
function test_dictionaryobject_read_from_stream_broken (line 412) | def test_dictionaryobject_read_from_stream_broken():
function test_dictionaryobject_read_from_stream_unexpected_end (line 423) | def test_dictionaryobject_read_from_stream_unexpected_end():
function test_dictionaryobject_read_from_stream_stream_no_newline (line 431) | def test_dictionaryobject_read_from_stream_stream_no_newline():
function test_dictionaryobject_read_from_stream_stream_no_stream_length (line 440) | def test_dictionaryobject_read_from_stream_stream_no_stream_length(stric...
function test_dictionaryobject_read_from_stream_stream_stream_valid (line 468) | def test_dictionaryobject_read_from_stream_stream_stream_valid(
function test_rectangleobject (line 488) | def test_rectangleobject():
function test_textstringobject_exc (line 515) | def test_textstringobject_exc():
function test_textstringobject_autodetect_utf16 (line 520) | def test_textstringobject_autodetect_utf16():
function test_textstringobject__numbers_as_input (line 530) | def test_textstringobject__numbers_as_input():
function test_remove_child_not_in_tree (line 535) | def test_remove_child_not_in_tree():
function test_remove_child_not_in_that_tree (line 542) | def test_remove_child_not_in_that_tree():
function test_remove_child_not_found_in_tree (line 556) | def test_remove_child_not_found_in_tree():
function test_remove_child_found_in_tree (line 573) | def test_remove_child_found_in_tree():
function test_remove_child_in_tree (line 639) | def test_remove_child_in_tree():
function test_extract_text (line 693) | def test_extract_text(caplog, url: str, name: str, caplog_content: str):
function test_text_string_write_to_stream (line 705) | def test_text_string_write_to_stream():
function test_bool_repr (line 717) | def test_bool_repr(tmp_path):
function test_issue_997 (line 737) | def test_issue_997(pdf_file_path):
function test_checkboxradiobuttonattributes_opt (line 758) | def test_checkboxradiobuttonattributes_opt():
function test_name_object_invalid_decode (line 762) | def test_name_object_invalid_decode():
function test_indirect_object_invalid_read (line 779) | def test_indirect_object_invalid_read():
function test_create_string_object_utf16_bom (line 786) | def test_create_string_object_utf16_bom():
function test_create_string_object_force (line 829) | def test_create_string_object_force():
function test_float_object_decimal_to_string (line 868) | def test_float_object_decimal_to_string(value, expected):
function test_cloning (line 872) | def test_cloning(caplog):
function test_cloning_indirect_obj_keeps_hard_reference (line 917) | def test_cloning_indirect_obj_keeps_hard_reference():
function test_cloning_null_obj_keeps_hard_reference (line 945) | def test_cloning_null_obj_keeps_hard_reference():
function test_append_with_indirectobject_not_pointing (line 975) | def test_append_with_indirectobject_not_pointing(caplog):
function test_iss1615_1673 (line 990) | def test_iss1615_1673():
function test_destination_withoutzoom (line 1016) | def test_destination_withoutzoom():
function test_encodedstream_set_data (line 1028) | def test_encodedstream_set_data():
function test_set_data_2 (line 1067) | def test_set_data_2():
function test_calling_indirect_objects (line 1083) | def test_calling_indirect_objects():
function test_indirect_object_page_dimensions (line 1103) | def test_indirect_object_page_dimensions():
function test_indirect_object_contains (line 1112) | def test_indirect_object_contains():
function test_indirect_object_iter (line 1119) | def test_indirect_object_iter():
function test_array_operators (line 1126) | def test_array_operators():
function test_unitary_extract_inline_buffer_invalid (line 1155) | def test_unitary_extract_inline_buffer_invalid():
function test_unitary_extract_inline (line 1178) | def test_unitary_extract_inline():
function test_missing_hashbin (line 1244) | def test_missing_hashbin():
function test_is_null_or_none (line 1251) | def test_is_null_or_none():
function test_coverage_arrayobject (line 1266) | def test_coverage_arrayobject():
function test_coverage_streamobject (line 1282) | def test_coverage_streamobject():
function test_contentstream_arrayobject_containing_nullobject (line 1306) | def test_contentstream_arrayobject_containing_nullobject(caplog):
function test_build_link__go_to_action_without_destination (line 1317) | def test_build_link__go_to_action_without_destination():
function test_dictionaryobject__length_0_stream (line 1326) | def test_dictionaryobject__length_0_stream():
FILE: tests/test_images.py
function open_image (line 26) | def open_image(path: Union[Path, Image.Image, BytesIO]) -> Image.Image:
function image_size (line 39) | def image_size(image: Image.Image):
function image_similarity (line 45) | def image_similarity(
function test_image_similarity_one (line 83) | def test_image_similarity_one():
function test_image_similarity_zero (line 90) | def test_image_similarity_zero():
function test_image_similarity_mid (line 97) | def test_image_similarity_mid():
function test_image_new_property (line 116) | def test_image_new_property():
function test_image_extraction (line 220) | def test_image_extraction(src, page_index, image_key, expected):
function test_get_inline_image_without_xobject_resources (line 230) | def test_get_inline_image_without_xobject_resources():
function test_get_inline_image_without_xobject_resources_raises_when_missing (line 238) | def test_get_inline_image_without_xobject_resources_raises_when_missing():
function test_get_xobject_image_without_xobject_resources_raises (line 248) | def test_get_xobject_image_without_xobject_resources_raises():
function test_loop_in_image_keys (line 260) | def test_loop_in_image_keys():
function test_devicen_cmyk_black_only (line 268) | def test_devicen_cmyk_black_only():
function test_bi_in_text (line 284) | def test_bi_in_text():
function test_cmyk_no_filter (line 294) | def test_cmyk_no_filter():
function test_separation_1byte_to_rgb_inverted (line 303) | def test_separation_1byte_to_rgb_inverted():
function test_data_with_lf (line 319) | def test_data_with_lf():
function test_oserror (line 331) | def test_oserror():
function test_corrupted_jpeg_iss2266 (line 361) | def test_corrupted_jpeg_iss2266(pdf, pdf_name, images, images_name, filtr):
function test_large_compressed_image (line 394) | def test_large_compressed_image():
function test_ff_fe_starting_lut (line 403) | def test_ff_fe_starting_lut():
function test_inline_image_extraction (line 419) | def test_inline_image_extraction():
function test_extract_image_from_object (line 488) | def test_extract_image_from_object(caplog):
function test_extract_jpeg_with_explicit_quality (line 508) | def test_extract_jpeg_with_explicit_quality():
function test_4bits_images (line 521) | def test_4bits_images(caplog):
function test_no_filter_with_colorspace_as_list (line 532) | def test_no_filter_with_colorspace_as_list():
function test_contentstream__read_inline_image__fallback_is_successful (line 542) | def test_contentstream__read_inline_image__fallback_is_successful():
function test_inline_image_containing_ei_in_body (line 568) | def test_inline_image_containing_ei_in_body():
function test_jbig2decode (line 589) | def test_jbig2decode():
function test_jbig2decode__jbig2globals (line 609) | def test_jbig2decode__jbig2globals():
function test_jbig2decode__memory_limit (line 630) | def test_jbig2decode__memory_limit():
function test_get_ids_image__resources_is_none (line 655) | def test_get_ids_image__resources_is_none():
FILE: tests/test_javascript.py
function pdf_file_writer (line 11) | def pdf_file_writer():
function test_add_js (line 18) | def test_add_js(pdf_file_writer):
function test_added_js (line 29) | def test_added_js(pdf_file_writer):
FILE: tests/test_merger.py
function merger_operate (line 15) | def merger_operate(merger):
function check_outline (line 128) | def check_outline(tmp_path):
function test_merger_operations_by_traditional_usage_with_writer (line 151) | def test_merger_operations_by_traditional_usage_with_writer(tmp_path):
function test_merger_operations_by_semi_traditional_usage_with_writer (line 164) | def test_merger_operations_by_semi_traditional_usage_with_writer(tmp_path):
function test_merger_operation_by_new_usage_with_writer (line 176) | def test_merger_operation_by_new_usage_with_writer(tmp_path):
function test_merge_page_exception_with_writer (line 186) | def test_merge_page_exception_with_writer():
function test_merge_page_tuple_with_writer (line 198) | def test_merge_page_tuple_with_writer():
function test_merge_write_closed_fh_with_writer (line 205) | def test_merge_write_closed_fh_with_writer(pdf_file_path):
function test_trim_outline_list_with_writer (line 219) | def test_trim_outline_list_with_writer(pdf_file_path):
function test_zoom_with_writer (line 231) | def test_zoom_with_writer(pdf_file_path):
function test_zoom_xyz_no_left_with_add_page (line 243) | def test_zoom_xyz_no_left_with_add_page(pdf_file_path):
function test_zoom_xyz_no_left_with_writer (line 255) | def test_zoom_xyz_no_left_with_writer(pdf_file_path):
function test_outline_item_with_writer (line 267) | def test_outline_item_with_writer(pdf_file_path):
function test_trim_outline_with_writer (line 279) | def test_trim_outline_with_writer(pdf_file_path):
function test1_with_writer (line 291) | def test1_with_writer(pdf_file_path):
function test_sweep_recursion1_with_writer (line 303) | def test_sweep_recursion1_with_writer(pdf_file_path):
function test_sweep_recursion2_with_writer (line 333) | def test_sweep_recursion2_with_writer(url, name, pdf_file_path):
function test_sweep_indirect_list_newobj_is_none_with_writer (line 345) | def test_sweep_indirect_list_newobj_is_none_with_writer(caplog, pdf_file...
function test_iss1145_with_writer (line 360) | def test_iss1145_with_writer():
function test_iss1344_with_writer (line 370) | def test_iss1344_with_writer(caplog):
function test_articles_with_writer (line 383) | def test_articles_with_writer(caplog):
function test_null_articles_with_writer (line 397) | def test_null_articles_with_writer():
function test_get_reference (line 404) | def test_get_reference():
function test_direct_link_preserved (line 410) | def test_direct_link_preserved(pdf_file_path):
function test_direct_link_preserved_reordering (line 433) | def test_direct_link_preserved_reordering(pdf_file_path):
function test_direct_link_page_missing (line 459) | def test_direct_link_page_missing(pdf_file_path):
function test_named_reference_preserved (line 473) | def test_named_reference_preserved(pdf_file_path):
function test_named_ref_to_page_that_is_gone (line 499) | def test_named_ref_to_page_that_is_gone(pdf_file_path):
function test_merge__null_destination (line 513) | def test_merge__null_destination():
FILE: tests/test_page.py
function get_all_sample_files (line 42) | def get_all_sample_files():
function test_read (line 61) | def test_read(meta):
function test_page_operations (line 88) | def test_page_operations(pdf_path, password):
function test_mediabox_expansion_after_rotation (line 139) | def test_mediabox_expansion_after_rotation(
function test_transformation_equivalence (line 161) | def test_transformation_equivalence():
function test_transformation_equivalence2 (line 195) | def test_transformation_equivalence2():
function test_get_user_unit_property (line 246) | def test_get_user_unit_property():
function compare_dict_objects (line 252) | def compare_dict_objects(d1, d2):
function test_page_transformations (line 262) | def test_page_transformations():
function test_compress_content_streams (line 299) | def test_compress_content_streams(pdf_path, password):
function test_page_properties (line 321) | def test_page_properties():
function test_page_rotation (line 334) | def test_page_rotation():
function test_page_indirect_rotation (line 357) | def test_page_indirect_rotation():
function test_page_scale (line 365) | def test_page_scale():
function test_add_transformation_on_page_without_contents (line 375) | def test_add_transformation_on_page_without_contents():
function test_iss_1142 (line 384) | def test_iss_1142():
function test_extract_text (line 441) | def test_extract_text(url, name):
function test_extract_text_page_pdf_impossible_decode_xform (line 449) | def test_extract_text_page_pdf_impossible_decode_xform(caplog):
function test_extract_text_operator_t_star (line 461) | def test_extract_text_operator_t_star(): # L1266, L1267
function test_extract_text_visitor_callbacks (line 469) | def test_extract_text_visitor_callbacks():
function test_get_fonts (line 619) | def test_get_fonts(pdf_path, password, embedded, unembedded):
function test_get_fonts2 (line 631) | def test_get_fonts2():
function test_annotation_getter (line 668) | def test_annotation_getter():
function test_annotation_setter (line 709) | def test_annotation_setter(pdf_file_path):
function test_text_extraction_issue_1091 (line 771) | def test_text_extraction_issue_1091():
function test_empyt_password_1088 (line 782) | def test_empyt_password_1088():
function test_old_habibi (line 791) | def test_old_habibi():
function test_read_link_annotation (line 801) | def test_read_link_annotation():
function test_no_resources (line 831) | def test_no_resources():
function test_merge_page_reproducible_with_proc_set (line 840) | def test_merge_page_reproducible_with_proc_set():
function test_merge_resources (line 915) | def test_merge_resources(apage1, apage2, expected_result, expected_renam...
function test_merge_page_resources_smoke_test (line 936) | def test_merge_page_resources_smoke_test():
function test_merge_transformed_page_into_blank (line 1004) | def test_merge_transformed_page_into_blank():
function test_pages_printing (line 1041) | def test_pages_printing():
function test_del_pages (line 1051) | def test_del_pages():
function test_pdf_pages_missing_type (line 1100) | def test_pdf_pages_missing_type():
function test_merge_with_stream_wrapped_in_save_restore (line 1110) | def test_merge_with_stream_wrapped_in_save_restore():
function test_compression (line 1123) | def test_compression():
function test_merge_with_no_resources (line 1161) | def test_merge_with_no_resources():
function test_get_contents_from_nullobject (line 1171) | def test_get_contents_from_nullobject():
function test_pos_text_in_textvisitor (line 1182) | def test_pos_text_in_textvisitor():
function test_pos_text_in_textvisitor2 (line 1200) | def test_pos_text_in_textvisitor2():
function test_missing_basefont_in_type3 (line 1260) | def test_missing_basefont_in_type3():
function test_invalid_index (line 1268) | def test_invalid_index():
function test_negative_index (line 1275) | def test_negative_index():
function test_get_contents_as_bytes (line 1281) | def test_get_contents_as_bytes():
function test_recursive_get_page_from_node (line 1292) | def test_recursive_get_page_from_node():
function test_get_contents__none_type (line 1305) | def test_get_contents__none_type():
function test_extract_text__none_type (line 1322) | def test_extract_text__none_type():
function test_scale_by (line 1339) | def test_scale_by():
function test_box_rendering (line 1364) | def test_box_rendering(tmp_path):
function test_delete_non_existent_annotations (line 1400) | def test_delete_non_existent_annotations():
function test_replace_contents_on_reader (line 1409) | def test_replace_contents_on_reader():
function test_replace_contents_on_reader__indirect_reference (line 1427) | def test_replace_contents_on_reader__indirect_reference():
function test_merge_page__coverage (line 1441) | def test_merge_page__coverage():
function test_importing_without_pillow (line 1485) | def test_importing_without_pillow(tmp_path):
function test_replace_contents__null_object_cloning_error (line 1517) | def test_replace_contents__null_object_cloning_error():
function test_get_rectangle__size_handling (line 1538) | def test_get_rectangle__size_handling(caplog):
FILE: tests/test_page_labels.py
function test_number2uppercase_roman_numeral (line 44) | def test_number2uppercase_roman_numeral(number, expected):
function test_number2lowercase_roman_numeral (line 48) | def test_number2lowercase_roman_numeral():
function test_number2lowercase_letter (line 64) | def test_number2lowercase_letter(number, expected):
function test_number2uppercase_letter (line 68) | def test_number2uppercase_letter():
function test_index2label (line 74) | def test_index2label(caplog):
function test_index2label_kids (line 111) | def test_index2label_kids():
function test_index2label_kids__recursive (line 143) | def test_index2label_kids__recursive(caplog):
function test_get_label_from_nums__empty_nums_list (line 173) | def test_get_label_from_nums__empty_nums_list():
function test_index2label__empty_kids_list (line 179) | def test_index2label__empty_kids_list():
FILE: tests/test_pagerange.py
function test_equality (line 7) | def test_equality():
function test_hash (line 13) | def test_hash():
function test_str (line 28) | def test_str(page_range, expected):
function test_repr (line 36) | def test_repr(page_range, expected):
function test_equality_other_objectc (line 40) | def test_equality_other_objectc():
function test_idempotency (line 46) | def test_idempotency():
function test_str_init (line 59) | def test_str_init(range_str, expected):
function test_str_init_error (line 65) | def test_str_init_error():
function test_parse_filename_page_ranges (line 83) | def test_parse_filename_page_ranges(params, expected):
function test_parse_filename_page_ranges_err (line 87) | def test_parse_filename_page_ranges_err():
function test_addition (line 103) | def test_addition(a, b, expected):
function test_addition_gap (line 117) | def test_addition_gap(a: PageRange, b: PageRange):
function test_addition_non_page_range (line 123) | def test_addition_non_page_range():
function test_addition_stride (line 129) | def test_addition_stride():
FILE: tests/test_papersizes.py
function test_din_a0_paper_size (line 7) | def test_din_a0_paper_size():
function test_din_a_aspect_ratio (line 24) | def test_din_a_aspect_ratio(dimensions):
function test_din_a_size_doubling (line 33) | def test_din_a_size_doubling(dimensions_a, dimensions_b):
FILE: tests/test_pdfa.py
function is_pdfa1b_compliant (line 13) | def is_pdfa1b_compliant(src: BytesIO):
function test_pdfa (line 38) | def test_pdfa(src: Path, diagnostic_write_name: Optional[str]):
FILE: tests/test_protocols.py
class IPdfObjectProtocol (line 5) | class IPdfObjectProtocol(PdfObjectProtocol):
function test_pdfobjectprotocol (line 9) | def test_pdfobjectprotocol():
FILE: tests/test_reader.py
function test_get_num_pages (line 48) | def test_get_num_pages(src, num_pages):
function test_read_metadata (line 89) | def test_read_metadata(pdf_path, expected):
function test_read_metadata_title_is_utf8 (line 117) | def test_read_metadata_title_is_utf8():
function test_iss1943 (line 125) | def test_iss1943():
function test_broken_meta_data (line 148) | def test_broken_meta_data(pdf_path):
function test_get_annotations (line 168) | def test_get_annotations(src):
function test_get_attachments (line 185) | def test_get_attachments(src, nb_attachments):
function test_get_outline (line 206) | def test_get_outline(src, outline_elements):
function test_get_images (line 233) | def test_get_images(src, expected_images):
function test_get_images_raw (line 310) | def test_get_images_raw(
function test_issue297 (line 360) | def test_issue297(caplog):
function test_get_page_of_encrypted_file (line 383) | def test_get_page_of_encrypted_file(pdffile, password, should_fail):
function test_get_form (line 423) | def test_get_form(src, expected, expected_get_fields, txt_file_path):
function test_reading_choice_field_without_opt_key (line 451) | def test_reading_choice_field_without_opt_key():
function test_get_page_number (line 472) | def test_get_page_number(src, page_number):
function test_get_page_layout (line 484) | def test_get_page_layout(src, expected):
function test_get_page_mode (line 497) | def test_get_page_mode(src, expected):
function test_read_empty (line 503) | def test_read_empty():
function test_read_malformed_header (line 509) | def test_read_malformed_header(caplog):
function test_read_malformed_body (line 521) | def test_read_malformed_body():
function test_read_prev_0_trailer (line 529) | def test_read_prev_0_trailer():
function test_circular_xref_prev_reference (line 566) | def test_circular_xref_prev_reference(caplog):
function test_read_missing_startxref (line 602) | def test_read_missing_startxref():
function test_read_unknown_zero_pages (line 637) | def test_read_unknown_zero_pages(caplog):
function test_read_encrypted_without_decryption (line 689) | def test_read_encrypted_without_decryption():
function test_get_destination_page_number (line 697) | def test_get_destination_page_number():
function test_do_not_get_stuck_on_large_files_without_start_xref (line 706) | def test_do_not_get_stuck_on_large_files_without_start_xref():
function test_decrypt_when_no_id (line 722) | def test_decrypt_when_no_id():
function test_reader_properties (line 734) | def test_reader_properties():
function test_issue604 (line 747) | def test_issue604(caplog, strict):
function test_decode_permissions (line 781) | def test_decode_permissions():
function test_user_access_permissions (line 818) | def test_user_access_permissions():
function test_pages_attribute (line 852) | def test_pages_attribute():
function test_convert_to_int (line 870) | def test_convert_to_int():
function test_convert_to_int_error (line 874) | def test_convert_to_int_error():
function test_iss925 (line 881) | def test_iss925():
function test_get_object (line 894) | def test_get_object():
function test_extract_text_hello_world (line 900) | def test_extract_text_hello_world():
function test_read_path (line 919) | def test_read_path():
function test_read_not_binary_mode (line 925) | def test_read_not_binary_mode(caplog):
function test_read_form_416 (line 938) | def test_read_form_416():
function test_form_topname_with_and_without_acroform (line 947) | def test_form_topname_with_and_without_acroform(caplog):
function test_extract_text_xref_issue_2 (line 976) | def test_extract_text_xref_issue_2(caplog):
function test_extract_text_xref_issue_3 (line 991) | def test_extract_text_xref_issue_3(caplog):
function test_extract_text_pdf15 (line 1004) | def test_extract_text_pdf15():
function test_extract_text_xref_table_21_bytes_clrf (line 1013) | def test_extract_text_xref_table_21_bytes_clrf():
function test_get_fields (line 1022) | def test_get_fields():
function test_get_full_qualified_fields (line 1035) | def test_get_full_qualified_fields():
function test_get_fields_read_else_block (line 1056) | def test_get_fields_read_else_block():
function test_get_fields_read_else_block2 (line 1064) | def test_get_fields_read_else_block2():
function test_get_fields_read_else_block3 (line 1074) | def test_get_fields_read_else_block3():
function test_metadata_is_none (line 1081) | def test_metadata_is_none():
function test_get_fields_read_write_report (line 1089) | def test_get_fields_read_write_report(txt_file_path):
function test_xfa (line 1105) | def test_xfa(src):
function test_xfa_non_empty (line 1111) | def test_xfa_non_empty():
function test_header (line 1132) | def test_header(src, pdf_header):
function test_outline_color (line 1139) | def test_outline_color():
function test_outline_font_format (line 1145) | def test_outline_font_format():
function get_outline_property (line 1150) | def get_outline_property(outline, attribute_name: str):
function test_outline_title_issue_1121 (line 1164) | def test_outline_title_issue_1121():
function test_outline_count (line 1211) | def test_outline_count():
function test_outline_missing_title (line 1257) | def test_outline_missing_title(caplog):
function test_named_destination (line 1287) | def test_named_destination(url, name):
function test_outline_with_missing_named_destination (line 1293) | def test_outline_with_missing_named_destination():
function test_outline_with_empty_action (line 1302) | def test_outline_with_empty_action():
function test_outline_with_invalid_destinations (line 1311) | def test_outline_with_invalid_destinations():
function test_pdfreader_multiple_definitions (line 1319) | def test_pdfreader_multiple_definitions(caplog):
function test_wrong_password_error (line 1330) | def test_wrong_password_error():
function test_get_page_number_by_indirect (line 1339) | def test_get_page_number_by_indirect():
function test_corrupted_xref_table (line 1345) | def test_corrupted_xref_table():
function test_reader (line 1358) | def test_reader(caplog):
function test_zeroing_xref (line 1374) | def test_zeroing_xref():
function test_thread (line 1386) | def test_thread():
function test_build_outline_item (line 1402) | def test_build_outline_item(caplog):
function test_page_labels (line 1448) | def test_page_labels(src, page_labels):
function test_iss1559 (line 1454) | def test_iss1559():
function test_iss1652 (line 1463) | def test_iss1652():
function test_iss1689 (line 1472) | def test_iss1689():
function test_iss1710 (line 1480) | def test_iss1710():
function test_broken_file_header (line 1487) | def test_broken_file_header():
function test_iss1756 (line 1522) | def test_iss1756():
function test_iss1825 (line 1532) | def test_iss1825():
function test_iss2082 (line 1541) | def test_iss2082():
function test_issue_140 (line 1555) | def test_issue_140():
function test_xyz_with_missing_param (line 1564) | def test_xyz_with_missing_param():
function test_corrupted_xref (line 1576) | def test_corrupted_xref():
function test_truncated_xref (line 1584) | def test_truncated_xref(caplog):
function test_damaged_pdf (line 1592) | def test_damaged_pdf():
function test_looping_form (line 1607) | def test_looping_form(caplog):
function test_context_manager_with_stream (line 1632) | def test_context_manager_with_stream():
function test_iss2761 (line 1669) | def test_iss2761():
function test_iss2817 (line 1678) | def test_iss2817():
function test_truncated_files (line 1690) | def test_truncated_files(caplog):
function test_comments_in_array (line 1714) | def test_comments_in_array(caplog):
function test_space_in_names_to_continue_processing (line 1729) | def test_space_in_names_to_continue_processing(caplog):
function test_unbalanced_brackets_in_dictionary_object (line 1775) | def test_unbalanced_brackets_in_dictionary_object(caplog):
function test_repair_root (line 1784) | def test_repair_root(caplog):
function test_issue3151 (line 1865) | def test_issue3151(caplog):
function test_issue2886 (line 1874) | def test_issue2886(caplog):
function test_infinite_loop_for_length_value (line 1884) | def test_infinite_loop_for_length_value():
function test_trailer_cannot_be_read (line 1895) | def test_trailer_cannot_be_read():
function test_read_pdf15_xref_stream (line 1904) | def test_read_pdf15_xref_stream():
function test_read_standard_xref_table__two_whitespace_characters_between_offset_and_generation (line 1919) | def test_read_standard_xref_table__two_whitespace_characters_between_off...
function test_root_object_recovery_limit (line 1930) | def test_root_object_recovery_limit(caplog):
function test_rebuild_xref_table__speed (line 1973) | def test_rebuild_xref_table__speed():
function test_find_pdf_objects (line 1986) | def test_find_pdf_objects():
function test_find_pdf_trailers (line 2011) | def test_find_pdf_trailers(data: bytes, expected: list[int]):
function test_objstm_batch_parse_caches_all_objects (line 2016) | def test_objstm_batch_parse_caches_all_objects():
function test_objstm_cache_hit_returns_target (line 2030) | def test_objstm_cache_hit_returns_target():
function test_objstm_skips_cache_for_overridden_objects (line 2045) | def test_objstm_skips_cache_for_overridden_objects():
FILE: tests/test_text_extraction.py
function test_multi_language (line 26) | def test_multi_language(visitor_text):
function test_visitor_text_matrices (line 79) | def test_visitor_text_matrices(file_name, constraints):
function test_issue_2336 (line 110) | def test_issue_2336():
function test_font_class_to_dict (line 118) | def test_font_class_to_dict():
function test_uninterpretable_type3_font (line 155) | def test_uninterpretable_type3_font(mock_logger_warning):
function test_layout_mode_epic_page_fonts (line 168) | def test_layout_mode_epic_page_fonts():
function test_layout_mode_uncommon_operators (line 176) | def test_layout_mode_uncommon_operators():
function test_layout_mode_type0_font_widths (line 184) | def test_layout_mode_type0_font_widths():
function test_layout_mode_indirect_sequence_font_widths (line 197) | def test_layout_mode_indirect_sequence_font_widths(caplog):
function dummy_visitor_text (line 211) | def dummy_visitor_text(text, ctm, tm, fd, fs):
function test_layout_mode_warnings (line 216) | def test_layout_mode_warnings(mock_logger_warning):
function test_space_with_one_unit_smaller_than_font_width (line 229) | def test_space_with_one_unit_smaller_than_font_width():
function test_space_position_calculation (line 240) | def test_space_position_calculation():
function test_text_leading_height_unit (line 250) | def test_text_leading_height_unit():
function test_layout_mode_space_vertically_font_height_weight (line 258) | def test_layout_mode_space_vertically_font_height_weight():
function test_infinite_loop_arrays (line 294) | def test_infinite_loop_arrays():
function test_content_stream_is_dictionary_object (line 306) | def test_content_stream_is_dictionary_object(caplog):
function test_tz_with_no_operands (line 329) | def test_tz_with_no_operands():
function test_iss3060 (line 341) | def test_iss3060():
function test_iss3074 (line 352) | def test_iss3074():
function test_layout_mode_text_state (line 363) | def test_layout_mode_text_state():
function test_rotated_line_wrap (line 380) | def test_rotated_line_wrap():
function test_layout_mode_warns_on_malformed_content_stream (line 401) | def test_layout_mode_warns_on_malformed_content_stream(op, msg, caplog):
function test_process_operation__cm_multiplication_issue (line 408) | def test_process_operation__cm_multiplication_issue():
function test_rotated_layout_mode (line 421) | def test_rotated_layout_mode(caplog):
function test_extract_text__none_objects (line 438) | def test_extract_text__none_objects():
function test_extract_text__with_visitor_text (line 448) | def test_extract_text__with_visitor_text():
function test_extract_text__restore_cm_stack_pop_error (line 465) | def test_extract_text__restore_cm_stack_pop_error():
function test_slow_huge_string (line 480) | def test_slow_huge_string():
function test_extract_text_with_missing_font_bbox (line 492) | def test_extract_text_with_missing_font_bbox():
FILE: tests/test_utils.py
function test_skip_over_whitespace (line 54) | def test_skip_over_whitespace(stream, expected):
function test_check_if_whitespace_only (line 72) | def test_check_if_whitespace_only(value, expected):
function test_read_until_whitespace (line 76) | def test_read_until_whitespace():
function test_skip_over_comment (line 89) | def test_skip_over_comment(stream, remainder):
function test_read_until_regex_premature_ending_name (line 94) | def test_read_until_regex_premature_ending_name():
function test_read_until_regex_match_in_first_chunk (line 99) | def test_read_until_regex_match_in_first_chunk():
function test_read_until_regex_match_in_second_chunk (line 107) | def test_read_until_regex_match_in_second_chunk():
function test_read_until_regex_match_at_chunk_boundary (line 118) | def test_read_until_regex_match_at_chunk_boundary():
function test_read_until_regex_multi_byte_spanning_boundary (line 129) | def test_read_until_regex_multi_byte_spanning_boundary():
function test_read_until_regex_no_match_exhausted (line 141) | def test_read_until_regex_no_match_exhausted():
function test_read_until_regex_exponential_chunk_growth (line 149) | def test_read_until_regex_exponential_chunk_growth():
function test_read_until_regex_match_spanning_later_boundary (line 160) | def test_read_until_regex_match_spanning_later_boundary():
function test_read_until_regex_tail_overlap_is_fixed (line 173) | def test_read_until_regex_tail_overlap_is_fixed():
function test_matrix_multiply (line 201) | def test_matrix_multiply(a, b, expected):
function test_mark_location (line 205) | def test_mark_location():
function test_deprecate_no_replacement (line 211) | def test_deprecate_no_replacement():
function test_read_block_backwards (line 234) | def test_read_block_backwards(dat, pos, to_read, expected, expected_pos):
function test_read_block_backwards_at_start (line 245) | def test_read_block_backwards_at_start():
function test_read_previous_line (line 266) | def test_read_previous_line(dat, pos, expected, expected_pos):
function test_read_previous_line2 (line 274) | def test_read_previous_line2():
function test_get_max_pdf_version_header (line 300) | def test_get_max_pdf_version_header():
function test_read_block_backwards_exception (line 306) | def test_read_block_backwards_exception():
function test_deprecate_with_replacement (line 314) | def test_deprecate_with_replacement():
function test_deprecation_no_replacement (line 325) | def test_deprecation_no_replacement():
function test_logger_error (line 336) | def test_logger_error(caplog):
function test_rename_kwargs (line 347) | def test_rename_kwargs():
function test_rename_kwargs__stacklevel (line 385) | def test_rename_kwargs__stacklevel(tmp_path: Path) -> None:
function test_human_readable_bytes (line 431) | def test_human_readable_bytes(input_int, expected_output):
function test_file_class (line 436) | def test_file_class():
function test_parse_datetime (line 461) | def test_parse_datetime(text, expected):
function test_parse_datetime_edge_cases (line 474) | def test_parse_datetime_edge_cases(text, expected):
function test_parse_datetime_err (line 479) | def test_parse_datetime_err():
function test_format_iso8824_date (line 486) | def test_format_iso8824_date():
function test_format_iso8824_date_roundtrip (line 505) | def test_format_iso8824_date_roundtrip():
function test_is_sublist (line 527) | def test_is_sublist():
function test_version_compare (line 573) | def test_version_compare(left, right, is_less_than):
function test_version_compare_equal_str (line 577) | def test_version_compare_equal_str():
function test_version_compare_lt_str (line 582) | def test_version_compare_lt_str():
function test_bad_version (line 589) | def test_bad_version():
function test_version_eq_hash (line 593) | def test_version_eq_hash():
function test_classproperty (line 603) | def test_classproperty():
FILE: tests/test_workflows.py
function test_basic_features (line 32) | def test_basic_features(tmp_path):
function test_dropdown_items (line 80) | def test_dropdown_items():
function test_pdfreader_file_load (line 87) | def test_pdfreader_file_load():
function test_pdfreader_jpeg_image (line 113) | def test_pdfreader_jpeg_image():
function test_decrypt (line 136) | def test_decrypt():
function test_text_extraction_encrypted (line 151) | def test_text_extraction_encrypted():
function test_rotate (line 165) | def test_rotate(degree):
function test_rotate_45 (line 172) | def test_rotate_45():
function test_extract_textbench (line 248) | def test_extract_textbench(enable, url, pages):
function test_transform_compress_identical_objects (line 264) | def test_transform_compress_identical_objects():
function test_orientations (line 281) | def test_orientations():
function test_overlay (line 330) | def test_overlay(pdf_file_path, base_path, overlay_path):
function test_merge_with_warning (line 358) | def test_merge_with_warning(tmp_path, url, name):
function test_merge (line 378) | def test_merge(tmp_path, url, name):
function test_get_metadata (line 405) | def test_get_metadata(url, name, expected_metadata):
function test_extract_text (line 484) | def test_extract_text(url, name, strict, exception):
function test_compress_raised (line 524) | def test_compress_raised(url, name):
function test_get_fields_warns (line 544) | def test_get_fields_warns(tmp_path, caplog, url, name):
function test_get_fields_no_warning (line 569) | def test_get_fields_no_warning(tmp_path, url, name):
function test_scale_rectangle_indirect_object (line 580) | def test_scale_rectangle_indirect_object():
function test_merge_output (line 590) | def test_merge_output(caplog):
function test_image_extraction (line 665) | def test_image_extraction(url, name):
function test_image_extraction_strict (line 691) | def test_image_extraction_strict():
function test_image_extraction2 (line 728) | def test_image_extraction2(url, name):
function test_get_outline (line 766) | def test_get_outline(url, name):
function test_get_xfa (line 786) | def test_get_xfa(url, name):
function test_get_fonts (line 818) | def test_get_fonts(url, name, strict):
function test_get_xmp (line 851) | def test_get_xmp(url, name, strict):
function test_tounicode_is_identity (line 884) | def test_tounicode_is_identity():
function test_append_forms (line 893) | def test_append_forms():
function test_extra_test_iss1541 (line 918) | def test_extra_test_iss1541():
function test_fields_returning_stream (line 949) | def test_fields_returning_stream():
function test_replace_image (line 958) | def test_replace_image(tmp_path):
function test_inline_images (line 1003) | def test_inline_images():
function test_issue1899 (line 1041) | def test_issue1899():
function test_cr_with_cm_operation (line 1051) | def test_cr_with_cm_operation():
function remove_trailing_whitespace (line 1068) | def remove_trailing_whitespace(text: str) -> str:
function test_text_extraction_layout_mode (line 1087) | def test_text_extraction_layout_mode(pdf_path, expected_path):
function test_layout_mode_space_vertically (line 1096) | def test_layout_mode_space_vertically():
function test_layout_mode_rotations (line 1111) | def test_layout_mode_rotations(rotation, strip_rotated):
function test_text_extraction_invalid_mode (line 1129) | def test_text_extraction_invalid_mode():
function test_get_page_showing_field (line 1137) | def test_get_page_showing_field():
function test_extract_empty_page (line 1294) | def test_extract_empty_page():
function test_iss2815 (line 1303) | def test_iss2815():
FILE: tests/test_writer.py
function _get_write_target (line 47) | def _get_write_target(convert) -> Any:
function test_writer_exception_non_binary (line 55) | def test_writer_exception_non_binary(tmp_path, caplog):
function test_writer_clone (line 68) | def test_writer_clone():
function test_clone_metadata (line 81) | def test_clone_metadata():
function test_writer_clone_bookmarks (line 109) | def test_writer_clone_bookmarks():
function writer_operate (line 151) | def writer_operate(writer: PdfWriter) -> None:
function test_insert_blank_page (line 258) | def test_insert_blank_page():
function test_writer_operations_by_traditional_usage (line 348) | def test_writer_operations_by_traditional_usage(convert, needs_cleanup):
function test_writer_operations_by_semi_traditional_usage (line 373) | def test_writer_operations_by_semi_traditional_usage(convert, needs_clea...
function test_writer_operations_by_semi_new_traditional_usage (line 399) | def test_writer_operations_by_semi_new_traditional_usage(convert, needs_...
function test_writer_operation_by_new_usage (line 420) | def test_writer_operation_by_new_usage(convert, needs_cleanup):
function test_remove_images (line 438) | def test_remove_images(pdf_file_path, input_path):
function test_remove_images_sub_level (line 463) | def test_remove_images_sub_level():
function test_remove_text (line 490) | def test_remove_text(input_path, pdf_file_path):
function test_remove_text_all_operators (line 505) | def test_remove_text_all_operators(pdf_file_path):
function test_write_metadata (line 567) | def test_write_metadata(pdf_file_path):
function test_fill_form (line 592) | def test_fill_form(pdf_file_path):
function test_fill_form_with_qualified (line 617) | def test_fill_form_with_qualified():
function test_encrypt (line 639) | def test_encrypt(use_128bit, user_password, owner_password, pdf_file_path):
function test_add_outline_item (line 693) | def test_add_outline_item(pdf_file_path):
function test_add_named_destination (line 746) | def test_add_named_destination(pdf_file_path):
function test_add_named_destination_sort_order (line 783) | def test_add_named_destination_sort_order(pdf_file_path):
function test_add_uri (line 811) | def test_add_uri(pdf_file_path):
function test_link_annotation (line 848) | def test_link_annotation(pdf_file_path):
function test_io_streams (line 902) | def test_io_streams():
function test_regression_issue670 (line 918) | def test_regression_issue670(pdf_file_path):
function test_issue301 (line 928) | def test_issue301():
function test_append_pages_from_reader_append (line 938) | def test_append_pages_from_reader_append():
function test_sweep_indirect_references_nullobject_exception (line 951) | def test_sweep_indirect_references_nullobject_exception(pdf_file_path):
function test_some_appends (line 978) | def test_some_appends(pdf_file_path, url, name):
function test_pdf_header (line 985) | def test_pdf_header():
function test_write_dict_stream_object (line 997) | def test_write_dict_stream_object(pdf_file_path):
function test_add_single_annotation (line 1041) | def test_add_single_annotation(pdf_file_path):
function test_colors_in_outline_item (line 1071) | def test_colors_in_outline_item(pdf_file_path):
function test_write_empty_stream (line 1092) | def test_write_empty_stream():
function test_startup_dest (line 1102) | def test_startup_dest():
function test_iss471 (line 1144) | def test_iss471():
function test_reset_translation (line 1157) | def test_reset_translation():
function test_threads_empty (line 1185) | def test_threads_empty():
function test_append_without_annots_and_articles (line 1195) | def test_append_without_annots_and_articles():
function test_append_multiple (line 1214) | def test_append_multiple():
function test_set_page_label (line 1229) | def test_set_page_label(pdf_file_path):
function test_iss1601 (line 1364) | def test_iss1601():
function test_attachments (line 1386) | def test_attachments():
function test_iss1614 (line 1433) | def test_iss1614():
function test_new_removes (line 1448) | def test_new_removes():
function test_late_iss1654 (line 1536) | def test_late_iss1654():
function test_iss1723 (line 1549) | def test_iss1723():
function test_iss1767 (line 1559) | def test_iss1767():
function test_named_dest_page_number (line 1570) | def test_named_dest_page_number():
function test_update_form_fields (line 1595) | def test_update_form_fields(caplog, tmp_path):
function test_add_apstream_object (line 1676) | def test_add_apstream_object():
function test_merge_content_stream_to_page (line 1704) | def test_merge_content_stream_to_page():
function test_update_form_fields2 (line 1736) | def test_update_form_fields2(caplog):
function test_iss1862 (line 1818) | def test_iss1862():
function test_empty_objects_before_cloning (line 1830) | def test_empty_objects_before_cloning():
function test_watermark (line 1845) | def test_watermark():
function test_watermarking_speed (line 1865) | def test_watermarking_speed():
function test_watermark_rendering (line 1882) | def test_watermark_rendering(tmp_path):
function test_watermarking_reportlab_rendering (line 1919) | def test_watermarking_reportlab_rendering(tmp_path):
function test_da_missing_in_annot (line 1957) | def test_da_missing_in_annot():
function test_missing_fields (line 1983) | def test_missing_fields(pdf_file_path):
function test_missing_info (line 2005) | def test_missing_info():
function test_germanfields (line 2040) | def test_germanfields():
function test_no_t_in_articles (line 2063) | def test_no_t_in_articles():
function test_no_i_in_articles (line 2073) | def test_no_i_in_articles():
function test_damaged_pdf_length_returning_none (line 2083) | def test_damaged_pdf_length_returning_none():
function test_viewerpreferences (line 2096) | def test_viewerpreferences():
function test_extra_spaces_in_da_text (line 2164) | def test_extra_spaces_in_da_text(caplog):
function test_object_contains_indirect_reference_to_self (line 2179) | def test_object_contains_indirect_reference_to_self():
function test_remove_image_per_type (line 2190) | def test_remove_image_per_type():
function test_add_outlines_on_empty_dict (line 2220) | def test_add_outlines_on_empty_dict():
function test_merging_many_temporary_files (line 2299) | def test_merging_many_temporary_files(caplog):
function test_reattach_fields (line 2366) | def test_reattach_fields():
function test_get_pagenumber_from_indirectobject (line 2398) | def test_get_pagenumber_from_indirectobject():
function test_replace_object (line 2411) | def test_replace_object():
function test_mime_jupyter (line 2436) | def test_mime_jupyter():
function test_init_without_named_arg (line 2444) | def test_init_without_named_arg():
function test_i_in_choice_fields (line 2466) | def test_i_in_choice_fields():
function test_selfont (line 2478) | def test_selfont():
function test_no_resource_for_14_std_fonts (line 2502) | def test_no_resource_for_14_std_fonts():
function test_field_box_upside_down (line 2518) | def test_field_box_upside_down():
function test_matrix_entry_in_field_annots (line 2535) | def test_matrix_entry_in_field_annots():
function test_compress_identical_objects (line 2549) | def test_compress_identical_objects():
function test_set_need_appearances_writer (line 2571) | def test_set_need_appearances_writer():
function test_utf16_metadata (line 2577) | def test_utf16_metadata():
function test_increment_writer (line 2599) | def test_increment_writer(caplog):
function test_append_pdf_with_dest_without_page (line 2693) | def test_append_pdf_with_dest_without_page(caplog):
function test_destination_is_nullobject (line 2705) | def test_destination_is_nullobject():
function test_destination_page_is_none (line 2715) | def test_destination_page_is_none():
function test_stream_not_closed (line 2724) | def test_stream_not_closed():
function test_auto_write (line 2744) | def test_auto_write(tmp_path):
function test_deprecate_with_as (line 2752) | def test_deprecate_with_as():
function test_inline_image_q_operator_handling (line 2770) | def test_inline_image_q_operator_handling(tmp_path):
function test_insert_filtered_annotations__annotations_are_none (line 2805) | def test_insert_filtered_annotations__annotations_are_none():
function test_incremental_read (line 2816) | def test_incremental_read():
function test_compress_identical_objects__after_remove_images (line 2850) | def test_compress_identical_objects__after_remove_images():
function test_merge__process_named_dests__no_dests_in_source_file (line 2857) | def test_merge__process_named_dests__no_dests_in_source_file():
function test_insert_filtered_annotations__link_without_destination (line 2876) | def test_insert_filtered_annotations__link_without_destination():
function test_insert_filtered_annotations__annotations_are_no_list (line 2907) | def test_insert_filtered_annotations__annotations_are_no_list(caplog):
function test_unterminated_object__with_incremental_writer (line 2935) | def test_unterminated_object__with_incremental_writer():
function test_wrong_size_in_incremental_pdf (line 2948) | def test_wrong_size_in_incremental_pdf(caplog):
function test_flatten_form_field_without_font_in_resources (line 2971) | def test_flatten_form_field_without_font_in_resources():
function test_merge_with_null_acroform_does_not_raise_typeerror (line 2992) | def test_merge_with_null_acroform_does_not_raise_typeerror():
function test_compress_identical_objects__info_is_none (line 3014) | def test_compress_identical_objects__info_is_none():
function test_flatten_form_field_with_signature (line 3023) | def test_flatten_form_field_with_signature():
FILE: tests/test_xmp.py
function test_read_xmp_metadata_samples (line 24) | def test_read_xmp_metadata_samples(src):
function test_writer_xmp_metadata_samples (line 42) | def test_writer_xmp_metadata_samples():
function test_read_xmp_metadata (line 78) | def test_read_xmp_metadata(src, has_xmp):
function get_all_tiff (line 93) | def get_all_tiff(xmp: pypdf.xmp.XmpInformation):
function test_converter_date (line 105) | def test_converter_date():
function test_modify_date (line 122) | def test_modify_date():
function test_identity_function (line 137) | def test_identity_function(x):
function test_xmpmm_instance_id (line 153) | def test_xmpmm_instance_id(url, name, xmpmm_instance_id):
function test_xmp_dc_description_extraction (line 163) | def test_xmp_dc_description_extraction():
function test_dc_creator_extraction (line 179) | def test_dc_creator_extraction():
function test_custom_properties_extraction (line 191) | def test_custom_properties_extraction():
function test_dc_subject_extraction (line 203) | def test_dc_subject_extraction():
function test_invalid_xmp_information_handling (line 235) | def test_invalid_xmp_information_handling():
function test_pdfa_xmp_metadata_with_values (line 249) | def test_pdfa_xmp_metadata_with_values():
function test_pdfa_xmp_metadata_without_values (line 260) | def test_pdfa_xmp_metadata_without_values():
function test_xmp_metadata__content_stream_is_dictionary_object (line 271) | def test_xmp_metadata__content_stream_is_dictionary_object():
function test_dc_creator__bag_instead_of_seq (line 284) | def test_dc_creator__bag_instead_of_seq():
function test_dc_language__no_bag_container (line 294) | def test_dc_language__no_bag_container():
function test_reading_does_not_destroy_root_object (line 301) | def test_reading_does_not_destroy_root_object():
function test_xmp_information__write_to_stream (line 315) | def test_xmp_information__write_to_stream():
function test_pdf_writer__xmp_metadata_setter (line 332) | def test_pdf_writer__xmp_metadata_setter():
function test_xmp_information__create (line 389) | def test_xmp_information__create():
function test_xmp_information__set_dc_title (line 400) | def test_xmp_information__set_dc_title():
function test_xmp_information__set_dc_creator (line 412) | def test_xmp_information__set_dc_creator():
function test_xmp_information__set_dc_description (line 424) | def test_xmp_information__set_dc_description():
function test_xmp_information__set_dc_subject (line 436) | def test_xmp_information__set_dc_subject():
function test_xmp_information__set_dc_date (line 448) | def test_xmp_information__set_dc_date():
function test_xmp_information__set_single_fields (line 466) | def test_xmp_information__set_single_fields():
function test_xmp_information__set_bag_fields (line 491) | def test_xmp_information__set_bag_fields():
function test_xmp_information__set_dc_rights (line 526) | def test_xmp_information__set_dc_rights():
function test_xmp_information__set_pdf_fields (line 538) | def test_xmp_information__set_pdf_fields():
function test_xmp_information__set_xmp_date_fields (line 558) | def test_xmp_information__set_xmp_date_fields():
function test_xmp_information__set_xmp_creator_tool (line 592) | def test_xmp_information__set_xmp_creator_tool():
function test_xmp_information__set_xmpmm_fields (line 602) | def test_xmp_information__set_xmpmm_fields():
function test_xmp_information__set_pdfaid_fields (line 619) | def test_xmp_information__set_pdfaid_fields():
function test_xmp_information__create_with_writer (line 634) | def test_xmp_information__create_with_writer():
function test_xmp_information__namespace_prefix (line 657) | def test_xmp_information__namespace_prefix():
function test_xmp_information__owner_document_none_errors (line 670) | def test_xmp_information__owner_document_none_errors():
function test_xmp_information__remove_existing_attribute (line 752) | def test_xmp_information__remove_existing_attribute():
function test_xmp_information__edge_case_coverage (line 779) | def test_xmp_information__edge_case_coverage():
function test_xmp_information__create_new_description (line 801) | def test_xmp_information__create_new_description():
function test_xmp_information__get_text_skips_non_text_nodes (line 815) | def test_xmp_information__get_text_skips_non_text_nodes():
function test_xmp_information__get_or_create_description_mismatch_about_uri (line 827) | def test_xmp_information__get_or_create_description_mismatch_about_uri():
function test_xmp_information__attribute_handling (line 842) | def test_xmp_information__attribute_handling():
function test_xmp_information__create_and_set_metadata (line 868) | def test_xmp_information__create_and_set_metadata():
function test_xmp_information__external_entity_expansion (line 892) | def test_xmp_information__external_entity_expansion(tmpdir):
function test_xmp_information__exponential_entity_expansion (line 914) | def test_xmp_information__exponential_entity_expansion():
FILE: tests/utils.py
class PositionedText (line 11) | class PositionedText:
method __init__ (line 18) | def __init__(self, text, x, y, font_dict, font_size) -> None:
method get_base_font (line 26) | def get_base_font(self) -> str:
class Rectangle (line 37) | class Rectangle:
method __init__ (line 40) | def __init__(self, x, y, w, h) -> None:
method contains (line 46) | def contains(self, x, y) -> bool:
function extract_text_and_rectangles (line 53) | def extract_text_and_rectangles(
function extract_table (line 104) | def extract_table(
function extract_cell_text (line 186) | def extract_cell_text(cell_texts: list[PositionedText]) -> str:
function get_image_data (line 191) | def get_image_data(
class ReaderDummy (line 201) | class ReaderDummy:
method __init__ (line 202) | def __init__(self, strict=False) -> None:
method get_object (line 205) | def get_object(self, indirect_reference):
method get_reference (line 212) | def get_reference(self, obj):
Condensed preview — 207 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,754K chars).
[
{
"path": ".git-blame-ignore-revs",
"chars": 488,
"preview": "# This file helps us to ignore style / formatting / doc changes\n# in git blame. That is useful when we're trying to find"
},
{
"path": ".github/ISSUE_TEMPLATE/bug-report.md",
"chars": 743,
"preview": "---\nname: Report a bug\nabout: Something broke!\ntitle: ''\nlabels: Bug\nassignees: ''\n\n---\n\nReplace this: What happened? Wh"
},
{
"path": ".github/ISSUE_TEMPLATE/feature-request.md",
"chars": 365,
"preview": "---\nname: Request a Feature\nabout: What do you think is missing in pypdf?\ntitle: ''\nlabels: Feature Request\nassignees: '"
},
{
"path": ".github/SECURITY.md",
"chars": 1105,
"preview": "# Security Policy\n\n## Supported Versions\n\nSecurity fixes are applied to the latest version.\n\n## Reporting a Vulnerabilit"
},
{
"path": ".github/dependabot.yaml",
"chars": 200,
"preview": "# Set update schedule for GitHub Actions\n\nversion: 2\nupdates:\n\n - package-ecosystem: \"github-actions\"\n directory: \"/"
},
{
"path": ".github/scripts/check_gh_pages_updates.py",
"chars": 3087,
"preview": "\"\"\"Check that all GitHub pages JavaScript dependencies are up-to-date.\"\"\" # noqa: INP001\n\nimport base64\nimport hashlib\n"
},
{
"path": ".github/scripts/check_pr_title.py",
"chars": 893,
"preview": "\"\"\"Check that all PR titles follow the desired scheme.\"\"\" # noqa: INP001\n\nimport os\nimport sys\n\nKNOWN_PREFIXES = (\n "
},
{
"path": ".github/scripts/check_urls.py",
"chars": 2685,
"preview": "\"\"\"Check that all test data URLs are still accessible.\"\"\" # noqa: INP001\nimport ast\nimport sys\nfrom collections.abc imp"
},
{
"path": ".github/workflows/benchmark.yaml",
"chars": 1513,
"preview": "name: Benchmarking pypdf\non:\n push:\n branches:\n - main\n\npermissions:\n contents: write\n deployments: write\n\njo"
},
{
"path": ".github/workflows/create-github-release.yaml",
"chars": 1126,
"preview": "name: Create a GitHub release page\n\non:\n push:\n tags:\n - '*.*.*'\n workflow_dispatch:\n\npermissions:\n contents:"
},
{
"path": ".github/workflows/gh-pages-check.yaml",
"chars": 706,
"preview": "name: 'GitHub Pages Check'\non:\n workflow_dispatch:\n schedule:\n - cron: 0 6 * * 1\n\njobs:\n url-check:\n name: GitH"
},
{
"path": ".github/workflows/github-ci.yaml",
"chars": 10893,
"preview": "# This workflow will install Python dependencies, run tests and lint with a variety of Python versions\n# For more inform"
},
{
"path": ".github/workflows/publish-to-pypi.yaml",
"chars": 1197,
"preview": "name: Publish Python Package to PyPI\n\non:\n push:\n tags:\n - '*.*.*'\n workflow_dispatch:\n\njobs:\n build:\n nam"
},
{
"path": ".github/workflows/release.yaml",
"chars": 1941,
"preview": "# This action assumes that there is a REL-commit which already has a\n# Markdown-formatted git tag. Hence, the CHANGELOG "
},
{
"path": ".github/workflows/title-check.yaml",
"chars": 533,
"preview": "name: 'PR Title Check'\non:\n pull_request:\n # check when PR\n # * is created,\n # * title is edited, and\n # * "
},
{
"path": ".github/workflows/urls-check.yaml",
"chars": 547,
"preview": "name: 'URL Check'\non:\n workflow_dispatch:\n schedule:\n - cron: 0 6 * * 1\n\njobs:\n url-check:\n name: URL check\n "
},
{
"path": ".gitignore",
"chars": 513,
"preview": "*.pyc\n*.swp\n.DS_Store\n.tox\nbuild\n.idea/*\n*.egg-info/\ndist/*\n__pycache__/\n\n# in-project virtual environments\nvenv/\n.venv/"
},
{
"path": ".gitmodules",
"chars": 94,
"preview": "[submodule \"sample-files\"]\n\tpath = sample-files\n\turl = https://github.com/py-pdf/sample-files\n"
},
{
"path": ".pre-commit-config.yaml",
"chars": 941,
"preview": "# pre-commit run --all-files\nrepos:\n- repo: https://github.com/pre-commit/pre-commit-hooks\n rev: v6.0.0\n hooks:\n"
},
{
"path": ".readthedocs.yaml",
"chars": 551,
"preview": "# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\nversion: 2\n\n\nbuild:\n os: ubuntu-lts-latest\n"
},
{
"path": "CHANGELOG.md",
"chars": 73651,
"preview": "# CHANGELOG\n\n## Version 6.9.1, 2026-03-17\n\n### Security (SEC)\n- Improve performance and limit length of array-based cont"
},
{
"path": "CONTRIBUTING.md",
"chars": 1367,
"preview": "Please check the [documentation page dedicated to development](https://pypdf.readthedocs.io/en/stable/dev/intro.html).\n\n"
},
{
"path": "CONTRIBUTORS.md",
"chars": 5107,
"preview": "# Contributors\n\npypdf had a lot of contributors since it started as pyPdf in 2005. We are\na free software project withou"
},
{
"path": "LICENSE",
"chars": 1605,
"preview": "Copyright (c) 2006-2008, Mathieu Fenniak\nSome contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.c"
},
{
"path": "Makefile",
"chars": 774,
"preview": "maint:\n\tpre-commit autoupdate\n\tpip-compile -U requirements/ci.in\n\tpip-compile -U requirements/dev.in\n\tpip-compile -U req"
},
{
"path": "README.md",
"chars": 4858,
"preview": "[](https://badge.fury.io/py/pypdf)\n[ and\nhosted by [Read the Docs](ht"
},
{
"path": "docs/dev/intro.md",
"chars": 4551,
"preview": "# Developer Intro\n\npypdf is a library and hence its users are developers. This document is not for\nthe users, but for pe"
},
{
"path": "docs/dev/pdf-format.md",
"chars": 4770,
"preview": "# The PDF Format\n\nIt is recommended to look in the PDF specification for details and clarifications.\n\n* [PDF Specificati"
},
{
"path": "docs/dev/pypdf-parsing.md",
"chars": 1646,
"preview": "# How pypdf parses PDF files\n\npypdf uses {class}`~pypdf.PdfReader` to parse PDF files.\nThe method {py:meth}`PdfReader.re"
},
{
"path": "docs/dev/pypdf-writing.md",
"chars": 3895,
"preview": "# How pypdf writes PDF files\n\npypdf uses {py:class}`PdfWriter <pypdf.PdfWriter>` to write PDF files. pypdf has\n{py:class"
},
{
"path": "docs/dev/releasing.md",
"chars": 2501,
"preview": "# Releasing\n\nA `pypdf` release contains the following artifacts:\n\n* A new [release on PyPI](https://pypi.org/project/pyp"
},
{
"path": "docs/dev/testing.md",
"chars": 2369,
"preview": "# Testing\n\npypdf uses [`pytest`](https://docs.pytest.org/en/7.1.x/) for testing.\n\nTo run the tests, you need to install "
},
{
"path": "docs/index.rst",
"chars": 2386,
"preview": ".. pypdf documentation main file, created by\n sphinx-quickstart on Thu Apr 7 20:13:19 2022.\n You can adapt this fil"
},
{
"path": "docs/make.bat",
"chars": 800,
"preview": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sp"
},
{
"path": "docs/meta/changelog-v1.md",
"chars": 33978,
"preview": "# Changelog of PyPDF2 1.X\n\n## Version 1.28.4, 2022-05-29\n\nBug Fixes (BUG):\n- XmpInformation._converter_date was unusabl"
},
{
"path": "docs/meta/comparisons.md",
"chars": 3746,
"preview": "# pypdf vs X\n\npypdf is a [free] and open source pure-python PDF library capable of\nsplitting, merging, cropping, and tra"
},
{
"path": "docs/meta/faq.md",
"chars": 2946,
"preview": "# Frequently Asked Questions\n\n## How is pypdf related to PyPDF2?\n\nPyPDF2 was a fork from the original pyPdf. After sever"
},
{
"path": "docs/meta/history.md",
"chars": 3529,
"preview": "# History of pypdf\n\n## The Origins: pyPdf (2005-2010)\n\nIn 2005, [Mathieu Fenniak] launched pyPdf \"as a PDF toolkit...\"\nf"
},
{
"path": "docs/meta/migration-1-to-2.md",
"chars": 8071,
"preview": "# Migration Guide: 1.x to 2.x\n\n`PyPDF2<2.0.0` ([docs](https://pypdf2.readthedocs.io/en/1.27.12/meta/history.html))\nis ve"
},
{
"path": "docs/meta/project-governance.md",
"chars": 7385,
"preview": "# Project Governance\n\nThis document describes how the pypdf project is managed. It describes the\ndifferent actors, their"
},
{
"path": "docs/meta/scope-of-pypdf.md",
"chars": 4391,
"preview": "# Scope of pypdf\n\nWhat features should pypdf have and which features will it never have?\n\npypdf aims at simplifying inte"
},
{
"path": "docs/meta/taking-ownership.md",
"chars": 1968,
"preview": "# Taking Ownership of pypdf\n\npypdf is currently maintained by stefan6419846. We want to avoid that\npypdf ever goes unmai"
},
{
"path": "docs/modules/Destination.rst",
"chars": 143,
"preview": "The Destination Class\n---------------------\n\n.. autoclass:: pypdf.generic.Destination\n :members:\n :undoc-members:\n"
},
{
"path": "docs/modules/DocumentInformation.rst",
"chars": 159,
"preview": "The DocumentInformation Class\n-----------------------------\n\n.. autoclass:: pypdf.DocumentInformation\n :members:\n "
},
{
"path": "docs/modules/Field.rst",
"chars": 125,
"preview": "The Field Class\n---------------\n\n.. autoclass:: pypdf.generic.Field\n :members:\n :undoc-members:\n :show-inherita"
},
{
"path": "docs/modules/Fit.rst",
"chars": 119,
"preview": "The Fit Class\n-------------\n\n.. autoclass:: pypdf.generic.Fit\n :members:\n :undoc-members:\n :show-inheritance:\n"
},
{
"path": "docs/modules/PageObject.rst",
"chars": 372,
"preview": "The PageObject Class\n--------------------\n\n.. autoclass:: pypdf._page.PageObject\n :members:\n :undoc-members:\n :"
},
{
"path": "docs/modules/PageRange.rst",
"chars": 129,
"preview": "The PageRange Class\n-------------------\n\n.. autoclass:: pypdf.PageRange\n :members:\n :undoc-members:\n :show-inhe"
},
{
"path": "docs/modules/PaperSize.rst",
"chars": 813,
"preview": "The PaperSize Class\n-------------------\n\n.. autoclass:: pypdf.PaperSize\n :members:\n :undoc-members:\n :show-inhe"
},
{
"path": "docs/modules/PdfDocCommon.rst",
"chars": 360,
"preview": "The PdfDocCommon Class\n----------------------\n\n**PdfDocCommon** is an abstract class which is inherited by :class:`~pypd"
},
{
"path": "docs/modules/PdfReader.rst",
"chars": 245,
"preview": "The PdfReader Class\n-------------------\n\n.. autoclass:: pypdf.PdfReader\n :members:\n :inherited-members:\n :undoc"
},
{
"path": "docs/modules/PdfWriter.rst",
"chars": 251,
"preview": "The PdfWriter Class\n-------------------\n\n.. autoclass:: pypdf.PdfWriter\n :members:\n :inherited-members:\n :undoc"
},
{
"path": "docs/modules/RectangleObject.rst",
"chars": 155,
"preview": "The RectangleObject Class\n-------------------------\n\n.. autoclass:: pypdf.generic.RectangleObject\n :members:\n :und"
},
{
"path": "docs/modules/Transformation.rst",
"chars": 144,
"preview": "The Transformation Class\n------------------------\n\n.. autoclass:: pypdf.Transformation\n :members:\n :undoc-members:"
},
{
"path": "docs/modules/XmpInformation.rst",
"chars": 149,
"preview": "The XmpInformation Class\n-------------------------\n\n.. autoclass:: pypdf.xmp.XmpInformation\n :members:\n :undoc-mem"
},
{
"path": "docs/modules/annotations.rst",
"chars": 138,
"preview": "The annotations module\n----------------------\n\n.. automodule:: pypdf.annotations\n :members:\n :undoc-members:\n :"
},
{
"path": "docs/modules/constants.rst",
"chars": 660,
"preview": "Constants\n---------\n\n.. autoclass:: pypdf.constants.AnnotationFlag\n :members:\n :undoc-members:\n :show-inheritan"
},
{
"path": "docs/modules/errors.rst",
"chars": 101,
"preview": "Errors\n------\n\n.. automodule:: pypdf.errors\n :members:\n :undoc-members:\n :show-inheritance:\n"
},
{
"path": "docs/modules/generic.rst",
"chars": 753,
"preview": "Generic PDF objects\n-------------------\n\n.. automodule:: pypdf.generic\n :members:\n :undoc-members:\n :show-inher"
},
{
"path": "docs/user/add-javascript.md",
"chars": 742,
"preview": "# Adding JavaScript to a PDF\n\nPDF readers vary in the extent they support JavaScript, with some not supporting it at all"
},
{
"path": "docs/user/add-watermark.md",
"chars": 3529,
"preview": "# Adding a Stamp or Watermark to a PDF\n\nAdding stamps or watermarks are two common ways to manipulate PDF files.\nA stamp"
},
{
"path": "docs/user/adding-pdf-annotations.md",
"chars": 7824,
"preview": "# Adding PDF Annotations\n\n```{note}\nBy default, some annotations might be invisible, for example polylines, as the defau"
},
{
"path": "docs/user/cropping-and-transforming.md",
"chars": 7286,
"preview": "# Cropping and Transforming PDFs\n\n```{note}\nJust because content is no longer visible, it is not gone.\nCropping works by"
},
{
"path": "docs/user/encryption-decryption.md",
"chars": 1837,
"preview": "# Encryption and Decryption of PDFs\n\nPDF encryption makes use of [`RC4`](https://en.wikipedia.org/wiki/RC4) and\n[`AES`]("
},
{
"path": "docs/user/extract-images.md",
"chars": 1647,
"preview": "# Extract Images\n\n```{note}\nIn order to use the following code you need to install optional\ndependencies, see [installat"
},
{
"path": "docs/user/extract-text.md",
"chars": 15970,
"preview": "# Extract Text from a PDF\n\nYou can extract text from a PDF:\n\n```{testsetup}\npypdf_test_setup(\"user/extract-text\", {\n "
},
{
"path": "docs/user/file-size.md",
"chars": 4318,
"preview": "# Reduce PDF File Size\n\nThere are multiple ways to reduce the size of a given PDF file. The easiest\none is to remove con"
},
{
"path": "docs/user/forms.md",
"chars": 6189,
"preview": "# Interactions with PDF Forms\n\n## Reading form fields\n\n```{testsetup}\npypdf_test_setup(\"user/forms\", {\n \"form.pdf\": \""
},
{
"path": "docs/user/handle-attachments.md",
"chars": 4042,
"preview": "# Handle Attachments\n\nPDF documents can contain attachments, from time to time named embedded file as well.\n\n## Retrieve"
},
{
"path": "docs/user/handling-outlines.md",
"chars": 6369,
"preview": "# Handling Outlines\n\nPDF outlines - also known as bookmarks - provide a structured navigation panel in PDF readers. `pyp"
},
{
"path": "docs/user/installation.md",
"chars": 2422,
"preview": "# Installation\n\nThere are several ways to install pypdf. The most common option is to use pip.\n\n## pip\n\npypdf requires P"
},
{
"path": "docs/user/merging-pdfs.md",
"chars": 5734,
"preview": "# Merging PDF files\n\n## Basic Example\n\n```{testsetup}\npypdf_test_setup(\"user/merging-pdfs\", {\n \"example.pdf\": \"../res"
},
{
"path": "docs/user/metadata.md",
"chars": 7734,
"preview": "# Metadata\n\nPDF files can have two types of metadata: \"Regular\" and XMP ones. They can both exist at the same time.\n\n## "
},
{
"path": "docs/user/pdf-version-support.md",
"chars": 1964,
"preview": "# PDF Version Support\n\nPDF comes in the following versions:\n\n* 1993: 1.0\n* 1994: 1.1\n* 1996: 1.2\n* 1999: 1.3\n* 2001: 1.4"
},
{
"path": "docs/user/pdfa-compliance.md",
"chars": 4759,
"preview": "# PDF/A Compliance\n\nPDF/A is a specialized, ISO-standardized version of the Portable Document Format\n(PDF) specifically "
},
{
"path": "docs/user/post-processing-in-text-extraction.md",
"chars": 3558,
"preview": "# Post-Processing of Text Extraction\n\nPost-processing can recognizably improve the results of text extraction. It is,\nho"
},
{
"path": "docs/user/reading-pdf-annotations.md",
"chars": 2637,
"preview": "# Reading PDF Annotations\n\nPDF 2.0 defines the following annotation types:\n\n* Text\n* Link\n* FreeText\n* Line\n* Square\n* C"
},
{
"path": "docs/user/robustness.md",
"chars": 1387,
"preview": "# Robustness and strict=False\n\nPDF is [specified in various versions](https://pdfa.org/resource/pdf-specification-archiv"
},
{
"path": "docs/user/security.md",
"chars": 3794,
"preview": "# Security\n\nWe strive to provide a library with secure defaults.\n\n## Configuration\n\n### Filters\n\n*pypdf* currently emplo"
},
{
"path": "docs/user/streaming-data.md",
"chars": 2617,
"preview": "# Streaming Data with pypdf\n\nIn some cases, you might want to avoid saving things explicitly as a file\nto disk, e.g. whe"
},
{
"path": "docs/user/suppress-warnings.md",
"chars": 2969,
"preview": "# Exceptions, Warnings, and Log messages\n\npypdf makes use of three mechanisms to show if something went wrong:\n\n* **Exce"
},
{
"path": "docs/user/viewer-preferences.md",
"chars": 2810,
"preview": "# Adding Viewer Preferences\n\nIt is possible to set viewer preferences of a PDF file.\n§12.2 of the [PDF 1.7 specification"
},
{
"path": "make_release.py",
"chars": 10011,
"preview": "\"\"\"Internal tool to update the CHANGELOG.\"\"\"\n\nimport json\nimport subprocess\nimport urllib.request\nfrom dataclasses impor"
},
{
"path": "pypdf/__init__.py",
"chars": 1283,
"preview": "\"\"\"\npypdf is a free and open-source pure-python PDF library capable of splitting,\nmerging, cropping, and transforming th"
},
{
"path": "pypdf/_cmap.py",
"chars": 12571,
"preview": "import binascii\nfrom binascii import Error as BinasciiError\nfrom binascii import unhexlify\nfrom math import ceil\nfrom ty"
},
{
"path": "pypdf/_codecs/__init__.py",
"chars": 1373,
"preview": "from .adobe_glyphs import adobe_glyphs\nfrom .pdfdoc import _pdfdoc_encoding\nfrom .std import _std_encoding\nfrom .symbol "
},
{
"path": "pypdf/_codecs/_codecs.py",
"chars": 10555,
"preview": "\"\"\"\nThis module is for codecs only.\n\nWhile the codec implementation can contain details of the PDF specification,\nthe mo"
},
{
"path": "pypdf/_codecs/adobe_glyphs.py",
"chars": 447211,
"preview": "# https://raw.githubusercontent.com/adobe-type-tools/agl-aglfn/master/glyphlist.txt\n\n# converted manually to python\n# Ex"
},
{
"path": "pypdf/_codecs/core_font_metrics.py",
"chars": 114968,
"preview": "# This file is based upon the 14 core AFM files provided by Adobe/Macromedia at\n# https://download.macromedia.com/pub/de"
},
{
"path": "pypdf/_codecs/pdfdoc.py",
"chars": 4269,
"preview": "# PDFDocEncoding Character Set: Table D.2 of PDF Reference 1.7\n# C.1 Predefined encodings sorted by character name of an"
},
{
"path": "pypdf/_codecs/std.py",
"chars": 2517,
"preview": "_std_encoding = [\n \"\\x00\",\n \"\\x01\",\n \"\\x02\",\n \"\\x03\",\n \"\\x04\",\n \"\\x05\",\n \"\\x06\",\n \"\\x07\",\n \"\\"
},
{
"path": "pypdf/_codecs/symbol.py",
"chars": 3734,
"preview": "# manually generated from https://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt\n_symbol_encoding = [\n \"\\u0"
},
{
"path": "pypdf/_codecs/zapfding.py",
"chars": 3742,
"preview": "# manually generated from https://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/zdingbat.txt\n\n_zapfding_encoding = [\n "
},
{
"path": "pypdf/_crypt_providers/__init__.py",
"chars": 3054,
"preview": "# Copyright (c) 2023, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_crypt_providers/_base.py",
"chars": 1711,
"preview": "# Copyright (c) 2023, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_crypt_providers/_cryptography.py",
"chars": 4557,
"preview": "# Copyright (c) 2023, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_crypt_providers/_fallback.py",
"chars": 3334,
"preview": "# Copyright (c) 2023, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_crypt_providers/_pycryptodome.py",
"chars": 3381,
"preview": "# Copyright (c) 2023, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_doc_common.py",
"chars": 54387,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# Copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>\n# Copyright (c) "
},
{
"path": "pypdf/_encryption.py",
"chars": 49100,
"preview": "# Copyright (c) 2022, exiledkingcc\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or"
},
{
"path": "pypdf/_font.py",
"chars": 14255,
"preview": "from collections.abc import Sequence\nfrom dataclasses import dataclass, field\nfrom typing import Any, Union, cast\n\nfrom "
},
{
"path": "pypdf/_page.py",
"chars": 90186,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# Copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>\n#\n# All rights r"
},
{
"path": "pypdf/_page_labels.py",
"chars": 8625,
"preview": "\"\"\"\nPage labels are shown by PDF viewers as \"the page number\".\n\nA page has a numeric index, starting at 0. Additionally,"
},
{
"path": "pypdf/_protocols.py",
"chars": 2123,
"preview": "\"\"\"Helpers for working with PDF types.\"\"\"\n\nfrom abc import abstractmethod\nfrom pathlib import Path\nfrom typing import IO"
},
{
"path": "pypdf/_reader.py",
"chars": 57168,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# Copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>\n#\n# All rights r"
},
{
"path": "pypdf/_text_extraction/__init__.py",
"chars": 8515,
"preview": "\"\"\"\nCode related to text extraction.\n\nSome parts are still in _page.py. In doubt, they will stay there.\n\"\"\"\n\nimport math"
},
{
"path": "pypdf/_text_extraction/_layout_mode/__init__.py",
"chars": 340,
"preview": "\"\"\"Layout mode text extraction extension for pypdf\"\"\"\nfrom ..._font import Font\nfrom ._fixed_width_page import (\n fix"
},
{
"path": "pypdf/_text_extraction/_layout_mode/_fixed_width_page.py",
"chars": 15370,
"preview": "\"\"\"Extract PDF text preserving the layout of the source PDF\"\"\"\n\nfrom collections.abc import Iterator\nfrom itertools impo"
},
{
"path": "pypdf/_text_extraction/_layout_mode/_text_state_manager.py",
"chars": 8221,
"preview": "\"\"\"manage the PDF transform stack during \"layout\" mode text extraction\"\"\"\n\nfrom collections import ChainMap, Counter\nfro"
},
{
"path": "pypdf/_text_extraction/_layout_mode/_text_state_params.py",
"chars": 5481,
"preview": "\"\"\"A dataclass that captures the CTM and Text State for a tj operation\"\"\"\n\nimport math\nfrom dataclasses import dataclass"
},
{
"path": "pypdf/_text_extraction/_text_extractor.py",
"chars": 14386,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# Copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>\n#\n# All rights r"
},
{
"path": "pypdf/_utils.py",
"chars": 20814,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with"
},
{
"path": "pypdf/_version.py",
"chars": 22,
"preview": "__version__ = \"6.9.1\"\n"
},
{
"path": "pypdf/_writer.py",
"chars": 131002,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# Copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>\n#\n# All rights r"
},
{
"path": "pypdf/annotations/__init__.py",
"chars": 990,
"preview": "\"\"\"\nPDF specifies several annotation types which pypdf makes available here.\n\nThe names of the annotations and their att"
},
{
"path": "pypdf/annotations/_base.py",
"chars": 961,
"preview": "from abc import ABC\n\nfrom ..constants import AnnotationFlag\nfrom ..generic import NameObject, NumberObject\nfrom ..generi"
},
{
"path": "pypdf/annotations/_markup_annotations.py",
"chars": 11862,
"preview": "import sys\nimport uuid\nfrom abc import ABC\nfrom typing import Any, Literal, Optional, Union\n\nfrom ..constants import Ann"
},
{
"path": "pypdf/annotations/_non_markup_annotations.py",
"chars": 3649,
"preview": "from typing import TYPE_CHECKING, Any, Optional, Union\n\nfrom ..generic._base import (\n BooleanObject,\n NameObject,"
},
{
"path": "pypdf/constants.py",
"chars": 23623,
"preview": "\"\"\"Various constants, enums, and flags to aid readability.\"\"\"\n\nfrom enum import Enum, IntFlag, auto, unique\n\n\nclass StrE"
},
{
"path": "pypdf/errors.py",
"chars": 1947,
"preview": "\"\"\"\nAll errors/exceptions pypdf raises and all of the warnings it uses.\n\nPlease note that broken PDF files might cause o"
},
{
"path": "pypdf/filters.py",
"chars": 30413,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with"
},
{
"path": "pypdf/generic/__init__.py",
"chars": 3468,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with"
},
{
"path": "pypdf/generic/_appearance_stream.py",
"chars": 26834,
"preview": "import re\nfrom dataclasses import dataclass\nfrom enum import IntEnum\nfrom typing import Any, Optional, Union, cast\n\nfrom"
},
{
"path": "pypdf/generic/_base.py",
"chars": 32755,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with"
},
{
"path": "pypdf/generic/_data_structures.py",
"chars": 65753,
"preview": "# Copyright (c) 2006, Mathieu Fenniak\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with"
},
{
"path": "pypdf/generic/_files.py",
"chars": 16238,
"preview": "from __future__ import annotations\n\nimport bisect\nfrom functools import cached_property\nfrom typing import TYPE_CHECKING"
},
{
"path": "pypdf/generic/_fit.py",
"chars": 5515,
"preview": "from typing import Any, Optional, Union\n\nfrom ._base import is_null_or_none\n\n\nclass Fit:\n def __init__(\n self,"
},
{
"path": "pypdf/generic/_image_inline.py",
"chars": 12787,
"preview": "# Copyright (c) 2024, pypdf contributors\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, w"
},
{
"path": "pypdf/generic/_image_xobject.py",
"chars": 21944,
"preview": "\"\"\"Functions to convert an image XObject to an image\"\"\"\n\nimport sys\nfrom io import BytesIO\nfrom typing import Any, Liter"
},
{
"path": "pypdf/generic/_link.py",
"chars": 5799,
"preview": "# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or without\n# modification, are permit"
},
{
"path": "pypdf/generic/_outline.py",
"chars": 1094,
"preview": "from typing import Union\n\nfrom .._utils import StreamType, deprecation_no_replacement\nfrom ._base import NameObject\nfrom"
},
{
"path": "pypdf/generic/_rectangle.py",
"chars": 3785,
"preview": "from typing import Any, Union\n\nfrom ._base import FloatObject, NumberObject\nfrom ._data_structures import ArrayObject\n\n\n"
},
{
"path": "pypdf/generic/_utils.py",
"chars": 7258,
"preview": "import codecs\nfrom typing import Union\n\nfrom .._codecs import _pdfdoc_encoding\nfrom .._utils import StreamType, logger_w"
},
{
"path": "pypdf/generic/_viewerpref.py",
"chars": 6774,
"preview": "# Copyright (c) 2023, Pubpub-ZZ\n#\n# All rights reserved.\n#\n# Redistribution and use in source and binary forms, with or "
},
{
"path": "pypdf/pagerange.py",
"chars": 7108,
"preview": "\"\"\"\nRepresentation and utils for ranges of PDF file pages.\n\nCopyright (c) 2014, Steve Witham <switham_github@mac-guyver."
},
{
"path": "pypdf/papersizes.py",
"chars": 1413,
"preview": "\"\"\"Helper to get paper sizes.\"\"\"\n\nfrom typing import NamedTuple\n\n\nclass Dimensions(NamedTuple):\n width: int\n heigh"
},
{
"path": "pypdf/py.typed",
"chars": 0,
"preview": ""
},
{
"path": "pypdf/types.py",
"chars": 1915,
"preview": "\"\"\"Helpers for working with PDF types.\"\"\"\n\nimport sys\nfrom typing import Literal, Union\n\nif sys.version_info[:2] >= (3, "
},
{
"path": "pypdf/xmp.py",
"chars": 29236,
"preview": "\"\"\"\nAnything related to Extensible Metadata Platform (XMP) metadata.\n\nhttps://en.wikipedia.org/wiki/Extensible_Metadata_"
},
{
"path": "pyproject.toml",
"chars": 9128,
"preview": "[build-system]\nrequires = [\"flit_core >=3.11,<4\"]\nbuild-backend = \"flit_core.buildapi\"\n\n[project]\nname = \"pypdf\"\nauthors"
},
{
"path": "requirements/ci-3.11.txt",
"chars": 1560,
"preview": "#\n# This file is autogenerated by pip-compile with Python 3.11\n# by the following command:\n#\n# pip-compile --output-f"
},
{
"path": "requirements/ci.in",
"chars": 168,
"preview": "coverage\nfpdf2\nmypy\npillow\ncryptography\npytest\npytest-benchmark\npytest-socket\npytest-timeout\npytest-xdist\npytest-cov\n# r"
},
{
"path": "requirements/ci.txt",
"chars": 1450,
"preview": "#\n# This file is autogenerated by pip-compile with Python 3.9\n# by the following command:\n#\n# pip-compile requirement"
},
{
"path": "requirements/dev.in",
"chars": 50,
"preview": "pillow\npip-tools\npre-commit\npytest-cov\nflit\nwheel\n"
},
{
"path": "requirements/dev.txt",
"chars": 1646,
"preview": "#\n# This file is autogenerated by pip-compile with Python 3.9\n# by the following command:\n#\n# pip-compile requirement"
},
{
"path": "requirements/docs.in",
"chars": 36,
"preview": "sphinx\nsphinx_rtd_theme\nmyst_parser\n"
},
{
"path": "requirements/docs.txt",
"chars": 1505,
"preview": "#\n# This file is autogenerated by pip-compile with Python 3.10\n# by the following command:\n#\n# pip-compile requiremen"
},
{
"path": "resources/010-pdflatex-forms.txt",
"chars": 104,
"preview": "Name\n\nCheck\n\nSubmit\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n 1"
},
{
"path": "resources/AEO.1172.layout.rot180.txt",
"chars": 1996,
"preview": "9 1of Page "
},
{
"path": "resources/AEO.1172.layout.txt",
"chars": 1076,
"preview": " STATUS: FNL\n"
},
{
"path": "resources/Claim Maker Alerts Guide_pg2.layout.txt",
"chars": 6005,
"preview": " Updated System Responses for Common Scenarios\n\n\n Scenario Before Change Af"
},
{
"path": "resources/Epic.Page.layout.txt",
"chars": 1056,
"preview": "All Postprocedure Notes\n Last edited 10/11/23 0919 by Danny Chaung, DO\n Date of Service 10/11/23 0918\n Status: Sig"
},
{
"path": "resources/afm_to_dataclass.py",
"chars": 7748,
"preview": "# ruff: noqa: T201, INP001, D100\n# Use this file to generate Font dataclasses for the 14 Adobe Core fonts.\nimport re\nimp"
},
{
"path": "resources/crazyones.txt",
"chars": 898,
"preview": "The Crazy Ones\nOctober 14, 1998\nHeres to the crazy ones. The misfits. The rebels. The troublemakers.\nThe round pegs in th"
},
{
"path": "resources/crazyones_layout_vertical_space.txt",
"chars": 983,
"preview": "The Crazy Ones\nOctober 14, 1998\n\n Heres to the crazy ones. The misfits. The rebels. The troublemakers.\n The round"
},
{
"path": "resources/crazyones_layout_vertical_space_font_height_weight.txt",
"chars": 989,
"preview": "The Crazy Ones\nOctober 14, 1998\n\n Heres to the crazy ones. The misfits. The rebels. The troublemakers.\n The round"
},
{
"path": "resources/jpeg.txt",
"chars": 82804,
"preview": "ffd8ffe000104a46494600010100000100010000ffdb0043000302020302020303030304030304050805050404050a070706080c0a0c0c0b0a0b0b0d"
},
{
"path": "resources/multicolumn-lorem-ipsum.txt",
"chars": 4404,
"preview": " Two-Column Document with Lorem Ipsum\n\n Y"
},
{
"path": "resources/toy.layout.txt",
"chars": 145,
"preview": "AWAY again1\n AWAY again2\n\n\n Something[cited]\n\n Single quote operator\n Double quote operat"
},
{
"path": "tests/__init__.py",
"chars": 4600,
"preview": "import concurrent.futures\nimport os\nimport ssl\nimport sys\nimport urllib.request\nfrom pathlib import Path\nfrom typing imp"
},
{
"path": "tests/bench.py",
"chars": 6622,
"preview": "\"\"\"\nBenchmark the speed of pypdf.\n\nThe results are on https://py-pdf.github.io/pypdf/dev/bench/\nPlease keep in mind that"
},
{
"path": "tests/conftest.py",
"chars": 381,
"preview": "\"\"\"Fixtures that are available automatically for all tests.\"\"\"\n\nimport uuid\n\nimport pytest\n\n\n@pytest.fixture(scope=\"sess"
},
{
"path": "tests/example_files.yaml",
"chars": 7507,
"preview": "- local_filename: 2201.00214.pdf\n url: https://arxiv.org/pdf/2201.00214.pdf\n- local_filename: ASurveyofImageClassificat"
},
{
"path": "tests/generic/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/generic/test_base.py",
"chars": 1521,
"preview": "\"\"\"Test the pypdf.generic._base module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PdfReader, PdfWriter"
},
{
"path": "tests/generic/test_data_structures.py",
"chars": 9423,
"preview": "\"\"\"Test the pypdf.generic._data_structures module.\"\"\"\nimport os\nimport subprocess\nimport sys\nfrom io import BytesIO\nfrom"
},
{
"path": "tests/generic/test_files.py",
"chars": 22081,
"preview": "\"\"\"Test the pypdf.generic._files module.\"\"\"\nimport datetime\nimport shutil\nimport subprocess\nfrom io import BytesIO\n\nimpo"
},
{
"path": "tests/generic/test_image_inline.py",
"chars": 2722,
"preview": "\"\"\"Test the pypdf.generic._image_inline module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PdfReader\nfr"
},
{
"path": "tests/generic/test_image_xobject.py",
"chars": 9312,
"preview": "\"\"\"Test the pypdf.generic._image_xobject module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\nfrom PIL import Image\n\nfrom py"
},
{
"path": "tests/generic/test_link.py",
"chars": 1634,
"preview": "\"\"\"Test the pypdf.generic._link module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PageObject, PdfReade"
},
{
"path": "tests/scripts/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/scripts/data/commits__version_4_0_1.json",
"chars": 26833,
"preview": "[\n {\n \"sha\": \"b7bfd0d7eddfd0865a94cc9e7027df6596242cf7\",\n \"node_id\": \"C_kwDOAC-ZndoAKGI3YmZkMGQ3ZWRkZmQwODY1YTk0Y"
},
{
"path": "tests/scripts/test_example_files.py",
"chars": 468,
"preview": "\"\"\"Tests related to the example files.\"\"\"\nfrom operator import itemgetter\nfrom pathlib import Path\n\nfrom tests import re"
},
{
"path": "tests/scripts/test_make_release.py",
"chars": 6351,
"preview": "\"\"\"Test the `make_release.py` script.\"\"\"\nimport sys\nfrom pathlib import Path\nfrom unittest import mock\n\nimport pytest\n\nD"
},
{
"path": "tests/test_annotations.py",
"chars": 14538,
"preview": "\"\"\"Test the pypdf.annotations submodule.\"\"\"\n\nfrom io import BytesIO\nfrom pathlib import Path\n\nimport pytest\n\nfrom pypdf "
},
{
"path": "tests/test_appearance_stream.py",
"chars": 3786,
"preview": "\"\"\"Test the pypdf.generic._appearance_stream module.\"\"\"\n\nfrom pypdf.generic._appearance_stream import BaseStreamConfig, "
},
{
"path": "tests/test_cmap.py",
"chars": 13736,
"preview": "\"\"\"Test the pypdf_cmap module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PdfReader, PdfWriter\nfrom pyp"
},
{
"path": "tests/test_codecs.py",
"chars": 11495,
"preview": "\"\"\"Test LZW-related code.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PdfReader\nfrom pypdf._codecs._code"
},
{
"path": "tests/test_constants.py",
"chars": 3757,
"preview": "\"\"\"Test the pypdf.constants module.\"\"\"\nimport re\nfrom typing import Callable\n\nimport pytest\n\nfrom pypdf.constants import"
},
{
"path": "tests/test_doc_common.py",
"chars": 17715,
"preview": "\"\"\"Test the pypdf._doc_common module.\"\"\"\nimport itertools\nimport re\nimport shutil\nimport subprocess\nfrom io import Bytes"
},
{
"path": "tests/test_encryption.py",
"chars": 15321,
"preview": "\"\"\"Test the pypdf._encryption module.\"\"\"\nimport secrets\nfrom io import BytesIO\n\nimport pytest\n\nimport pypdf\nfrom pypdf i"
},
{
"path": "tests/test_filters.py",
"chars": 40439,
"preview": "\"\"\"Test the pypdf.filters module.\"\"\"\nimport os\nimport string\nimport subprocess\nimport sys\nimport zlib\nfrom io import Byt"
},
{
"path": "tests/test_font.py",
"chars": 1166,
"preview": "\"\"\"Test font-related functionality.\"\"\"\n\nfrom pypdf._font import Font\nfrom pypdf.generic import DictionaryObject, NameObj"
},
{
"path": "tests/test_forms.py",
"chars": 734,
"preview": "\"\"\"Test form-related functionality. Separate file to keep overview.\"\"\"\n\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypd"
},
{
"path": "tests/test_generic.py",
"chars": 42998,
"preview": "\"\"\"Test the pypdf.generic module.\"\"\"\n\nimport codecs\nimport gc\nimport weakref\nfrom base64 import a85encode\nfrom copy impo"
},
{
"path": "tests/test_images.py",
"chars": 24174,
"preview": "\"\"\"\nTests which ensure that image extraction works properly go here.\n\nTypically, tests in here should compare the extrac"
},
{
"path": "tests/test_javascript.py",
"chars": 1558,
"preview": "\"\"\"Test topics around the usage of JavaScript in PDF documents.\"\"\"\nfrom typing import Any\n\nimport pytest\n\nfrom pypdf imp"
},
{
"path": "tests/test_merger.py",
"chars": 16027,
"preview": "\"\"\"Test merging PDF functionality.\"\"\"\nfrom io import BytesIO\nfrom pathlib import Path\n\nimport pytest\n\nimport pypdf\nfrom "
},
{
"path": "tests/test_page.py",
"chars": 52595,
"preview": "\"\"\"Test the pypdf._page module.\"\"\"\nimport json\nimport math\nimport os\nimport re\nimport shutil\nimport subprocess\nimport sy"
},
{
"path": "tests/test_page_labels.py",
"chars": 4764,
"preview": "\"\"\"Test the pypdf._page_labels module.\"\"\"\nfrom io import BytesIO\n\nimport pytest\n\nfrom pypdf import PdfReader\nfrom pypdf."
},
{
"path": "tests/test_pagerange.py",
"chars": 3649,
"preview": "\"\"\"Test the pypdf.pagerange module.\"\"\"\nimport pytest\n\nfrom pypdf.pagerange import PageRange, ParseError, parse_filename_"
},
{
"path": "tests/test_papersizes.py",
"chars": 1197,
"preview": "\"\"\"Test the pypdf.papersizes module.\"\"\"\nimport pytest\n\nfrom pypdf import papersizes\n\n\ndef test_din_a0_paper_size():\n "
},
{
"path": "tests/test_pdfa.py",
"chars": 1415,
"preview": "\"\"\"Ensure that pypdf doesn't break PDF/A compliance.\"\"\"\n\nfrom io import BytesIO\nfrom pathlib import Path\nfrom typing imp"
},
{
"path": "tests/test_protocols.py",
"chars": 411,
"preview": "\"\"\"Test the pypdf._protocols module.\"\"\"\nfrom pypdf._protocols import PdfObjectProtocol\n\n\nclass IPdfObjectProtocol(PdfObj"
}
]
// ... and 7 more files (download for full content)
About this extraction
This page contains the full source code of the py-pdf/pypdf GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 207 files (2.4 MB), approximately 647.2k tokens, and a symbol index with 1963 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.