Showing preview only (533K chars total). Download the full file or copy to clipboard to get everything.
Repository: pemistahl/grex
Branch: main
Commit: 99cc34770737
Files: 44
Total size: 513.5 KB
Directory structure:
gitextract_b41iglrb/
├── .editorconfig
├── .github/
│ ├── dependabot.yml
│ └── workflows/
│ ├── python-build.yml
│ ├── release.yml
│ └── rust-build.yml
├── .gitignore
├── Cargo.toml
├── LICENSE
├── README.md
├── README_PYPI.md
├── RELEASE_NOTES.md
├── benches/
│ ├── benchmark.rs
│ └── testcases.txt
├── demo.tape
├── grex.pyi
├── pyproject.toml
├── requirements.txt
├── src/
│ ├── builder.rs
│ ├── char_range.rs
│ ├── cluster.rs
│ ├── component.rs
│ ├── config.rs
│ ├── dfa.rs
│ ├── expression.rs
│ ├── format.rs
│ ├── grapheme.rs
│ ├── lib.rs
│ ├── macros.rs
│ ├── main.rs
│ ├── python.rs
│ ├── quantifier.rs
│ ├── regexp.rs
│ ├── substring.rs
│ ├── unicode_tables/
│ │ ├── decimal.rs
│ │ ├── mod.rs
│ │ ├── space.rs
│ │ └── word.rs
│ └── wasm.rs
└── tests/
├── cli_integration_tests.rs
├── lib_integration_tests.rs
├── property_tests.rs
├── python/
│ └── test_grex.py
├── wasm_browser_tests.rs
└── wasm_node_tests.rs
================================================
FILE CONTENTS
================================================
================================================
FILE: .editorconfig
================================================
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Editor configuration, see http://editorconfig.org
root = true
[*.rs]
charset = utf-8
indent_style = space
indent_size = 4
insert_final_newline = true
trim_trailing_whitespace = false
max_line_length = 100
[*.md]
max_line_length = off
trim_trailing_whitespace = false
================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "daily"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
================================================
FILE: .github/workflows/python-build.yml
================================================
#
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Python Build
on:
push:
branches:
- main
paths:
- 'Cargo.lock'
- 'Cargo.toml'
- 'pyproject.toml'
- 'requirements.txt'
- 'src/**'
- 'tests/**'
- '**.yml'
pull_request:
branches:
- main
paths:
- 'Cargo.lock'
- 'Cargo.toml'
- 'pyproject.toml'
- 'requirements.txt'
- 'src/**'
- 'tests/**'
- '**.yml'
jobs:
python-build:
name: Python ${{ matrix.python-version }} on ${{ matrix.name }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ ubuntu-latest, macos-latest, windows-latest ]
python-version: [ '3.12', '3.13', '3.14' ]
include:
- os: ubuntu-latest
name: Linux 64-Bit
- os: macos-latest
name: MacOS 64-Bit
- os: windows-latest
name: Windows 64-Bit
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install maturin and pytest
run: pip install -r requirements.txt
- name: Build Python extension
run: maturin build
- name: Install Python extension
run: pip install --find-links=target/wheels grex
- name: Run Python unit tests
run: pytest tests/python/test_grex.py
================================================
FILE: .github/workflows/release.yml
================================================
#
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Release
on:
push:
tags:
- v1.*
jobs:
rust-release-build:
name: ${{ matrix.name }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
include:
- os: ubuntu-latest
name: Rust Release Build on Linux
x86_64-target: x86_64-unknown-linux-musl
aarch64-target: aarch64-unknown-linux-musl
- os: macos-latest
name: Rust Release Build on MacOS
x86_64-target: x86_64-apple-darwin
aarch64-target: aarch64-apple-darwin
- os: windows-latest
name: Rust Release Build on Windows
x86_64-target: x86_64-pc-windows-msvc
aarch64-target: aarch64-pc-windows-msvc
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Build x86_64 target in release mode
uses: houseabsolute/actions-rust-cross@v1
with:
target: ${{ matrix.x86_64-target }}
args: '--release --locked'
- name: Build aarch64 target in release mode
uses: houseabsolute/actions-rust-cross@v1
with:
target: ${{ matrix.aarch64-target }}
args: '--release --locked'
- name: Get latest release version number
id: get_version
uses: battila7/get-version-action@v2
- name: Create x86_64 zip file on Windows
if: ${{ matrix.os == 'windows-latest' }}
run: |
choco install zip
cd target/${{ matrix.x86_64-target }}/release
zip grex-${{ steps.get_version.outputs.version }}-${{ matrix.x86_64-target }}.zip grex.exe
cd ../../..
- name: Create aarch64 zip file on Windows
if: ${{ matrix.os == 'windows-latest' }}
run: |
cd target/${{ matrix.aarch64-target }}/release
zip grex-${{ steps.get_version.outputs.version }}-${{ matrix.aarch64-target }}.zip grex.exe
cd ../../..
- name: Create x86_64 tar.gz file on Linux and macOS
if: ${{ matrix.os != 'windows-latest' }}
run: |
chmod +x target/${{ matrix.x86_64-target }}/release/grex
tar -zcf target/${{ matrix.x86_64-target }}/release/grex-${{ steps.get_version.outputs.version }}-${{ matrix.x86_64-target }}.tar.gz -C target/${{ matrix.x86_64-target }}/release grex
- name: Create aarch64 tar.gz file on Linux and macOS
if: ${{ matrix.os != 'windows-latest' }}
run: |
chmod +x target/${{ matrix.aarch64-target }}/release/grex
tar -zcf target/${{ matrix.aarch64-target }}/release/grex-${{ steps.get_version.outputs.version }}-${{ matrix.aarch64-target }}.tar.gz -C target/${{ matrix.aarch64-target }}/release grex
- name: Upload release and assets to GitHub
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
tag: ${{ github.ref }}
release_name: grex ${{ steps.get_version.outputs.version-without-v }}
file_glob: true
file: target/*/release/grex-${{ steps.get_version.outputs.version }}-*.{zip,tar.gz}
python-linux-release-build:
name: Python Release Build on Linux and target ${{ matrix.target }}
needs: rust-release-build
runs-on: ubuntu-latest
strategy:
matrix:
target: [ x86_64, aarch64 ]
linux: [ auto, musllinux_1_2 ]
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.target }}
args: --release --out dist -i 3.12 3.13 3.14 pypy3.11
sccache: 'true'
manylinux: ${{ matrix.linux }}
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: linux-${{ matrix.linux }}-${{ matrix.target }}-wheels
path: dist
python-windows-release-build:
name: Python Release Build on Windows and target ${{ matrix.target }}
needs: rust-release-build
runs-on: windows-latest
strategy:
matrix:
target: [ x86_64, aarch64 ]
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.target }}
args: --release --out dist -i 3.12 3.13 3.14
sccache: 'true'
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: windows-${{ matrix.target }}-wheels
path: dist
python-macos-release-build:
name: Python Release Build on MacOS and target ${{ matrix.target }}
needs: rust-release-build
runs-on: macos-latest
strategy:
matrix:
target: [ x86_64, aarch64 ]
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Build wheels
uses: PyO3/maturin-action@v1
with:
target: ${{ matrix.target }}
args: --release --out dist -i 3.12 3.13 3.14 pypy3.11
sccache: 'true'
- name: Upload wheels
uses: actions/upload-artifact@v5
with:
name: macos-${{ matrix.target }}-wheels
path: dist
python-release-upload:
name: Publish wheels to PyPI
needs: [ python-linux-release-build, python-windows-release-build, python-macos-release-build ]
runs-on: ubuntu-latest
steps:
- name: Download wheels from previous jobs
uses: actions/download-artifact@v6
with:
path: wheels
merge-multiple: true
- name: Upload to PyPI
uses: PyO3/maturin-action@v1
env:
MATURIN_PYPI_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
with:
command: upload
args: --skip-existing wheels/*.whl
rust-release-upload:
name: Upload to crates.io
needs: [ python-linux-release-build, python-windows-release-build, python-macos-release-build ]
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Upload release to crates.io
uses: katyo/publish-crates@v2
with:
registry-token: ${{ secrets.CARGO_REGISTRY_TOKEN }}
================================================
FILE: .github/workflows/rust-build.yml
================================================
#
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Rust Build
on:
push:
branches:
- main
paths:
- 'Cargo.lock'
- 'Cargo.toml'
- 'src/**'
- 'tests/**'
- '**.yml'
pull_request:
branches:
- main
paths:
- 'Cargo.lock'
- 'Cargo.toml'
- 'src/**'
- 'tests/**'
- '**.yml'
jobs:
rust-build:
name: Rust on ${{ matrix.name }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
include:
- os: ubuntu-latest
name: Linux 64-Bit
target: x86_64-unknown-linux-musl
- os: macos-latest
name: MacOS 64-Bit
target: x86_64-apple-darwin
env:
MACOSX_DEPLOYMENT_TARGET: 10.7
- os: windows-latest
name: Windows 64-Bit
target: x86_64-pc-windows-msvc
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Add rustup target
run: rustup target add ${{ matrix.target }}
- name: Store or retrieve cargo caches
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
- name: Build target in debug mode
run: cargo build --target ${{ matrix.target }} --locked
- name: Test target in debug mode
run: cargo test --target ${{ matrix.target }}
- name: Check Clippy lints
run: cargo clippy --target ${{ matrix.target }} -- -Dwarnings
wasm-build:
name: WASM Build
needs: rust-build
runs-on: macos-latest
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Install wasm-pack
run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh
- name: Install Firefox and Geckodriver # not available anymore in macos-latest
run: |
brew install --cask firefox
brew install geckodriver
#- name: Enable Safari web driver
# run: sudo safaridriver --enable
- name: Run WASM integration tests on NodeJS
run: wasm-pack test --node -- --no-default-features
- name: Run WASM integration tests in Chrome
run: wasm-pack test --headless --chrome -- --no-default-features
- name: Run WASM integration tests in Firefox
run: wasm-pack test --headless --firefox -- --no-default-features
# Safari WASM tests not working, reason unclear
# Increasing driver timeout does not seem to work
# https://github.com/wasm-bindgen/wasm-bindgen/pull/4320
#- name: Run WASM integration tests in Safari
# env:
# WASM_BINDGEN_TEST_DRIVER_TIMEOUT: 10
# run: wasm-pack test --headless --safari -- --no-default-features
coverage-report:
name: Coverage Report
needs: rust-build
if: ${{ github.event_name == 'push' }}
runs-on: ubuntu-latest
container:
image: xd009642/tarpaulin:develop-nightly
options: --security-opt seccomp=unconfined
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Generate coverage report
run: cargo +nightly tarpaulin --ignore-config --ignore-panics --ignore-tests --exclude-files src/python.rs src/main.rs src/wasm.rs --verbose --timeout 900 --out xml
- name: Workaround for codecov/feedback#263
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
- name: Upload coverage report
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: true
================================================
FILE: .gitignore
================================================
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
/pkg/
/target/
**/*.rs.bk
.idea
.project
.c9/
*.launch
.settings/
.metadata/
.venv
*.sublime-workspace
bin/
tmp/
out/
*.iml
*.ipr
*.iws
*.bak
*.tmp
*.class
*.html
.buildpath
.classpath
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
.DS_Store
Thumbs.db
$RECYCLE.BIN/
._*
.AppleDouble
.LSOverride
*.lnk
Desktop.ini
ehthumbs.db
*.proptest-regressions
================================================
FILE: Cargo.toml
================================================
#
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[package]
name = "grex"
version = "1.4.6"
authors = ["Peter M. Stahl <pemistahl@gmail.com>"]
description = """
grex generates regular expressions from user-provided test cases.
"""
homepage = "https://github.com/pemistahl/grex"
repository = "https://github.com/pemistahl/grex"
documentation = "https://docs.rs/grex"
license = "Apache-2.0"
readme = "README.md"
edition = "2021"
categories = ["command-line-utilities", "parsing"]
keywords = ["pattern", "regex", "regexp"]
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
itertools = "0.14.0"
ndarray = "0.17.1"
petgraph = {version = "0.8.3", default-features = false, features = ["stable_graph"]}
regex = "1.12.2"
unicode-general-category = "1.1.0"
unicode-segmentation = "1.12.0"
[target.'cfg(not(target_family = "wasm"))'.dependencies]
clap = {version = "4.5.53", features = ["derive", "wrap_help"], optional = true}
pyo3 = {version = "0.27.1", optional = true}
[target.'cfg(target_family = "wasm")'.dependencies]
wasm-bindgen = "0.2.105"
[dev-dependencies]
indoc = "2.0.7"
rstest = "0.26.1"
[target.'cfg(not(target_family = "wasm"))'.dev-dependencies]
assert_cmd = "2.1.1"
criterion = "0.7.0"
predicates = "3.1.3"
proptest = "1.9.0"
tempfile = "3.23.0"
[target.'cfg(target_family = "wasm")'.dev-dependencies]
wasm-bindgen-test = "0.3.55"
[features]
default = ["cli"]
cli = ["clap"]
python = ["pyo3"]
[[bench]]
name = "benchmark"
harness = false
[profile.bench]
debug = true
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
<div align="center">

<br>
[](https://github.com/pemistahl/grex/actions/workflows/rust-build.yml)
[](https://github.com/pemistahl/grex/actions/workflows/python-build.yml)
[](https://docs.rs/grex)
[](https://codecov.io/gh/pemistahl/grex)
[](https://deps.rs/crate/grex/1.4.6)
[](https://pemistahl.github.io/grex-js/)
[](https://crates.io/crates/grex)
[](https://crates.io/crates/grex)
[](https://lib.rs/crates/grex)

[](https://pypi.org/project/grex)
[](https://www.apache.org/licenses/LICENSE-2.0)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-x86_64-unknown-linux-musl.tar.gz)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-aarch64-unknown-linux-musl.tar.gz)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-x86_64-apple-darwin.tar.gz)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-aarch64-apple-darwin.tar.gz)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-x86_64-pc-windows-msvc.zip)
[](https://github.com/pemistahl/grex/releases/download/v1.4.6/grex-v1.4.6-aarch64-pc-windows-msvc.zip)
</div>
<br>

<br>
## 1. What does this tool do?
*grex* is a library as well as a command-line utility that is meant to simplify the often
complicated and tedious task of creating regular expressions. It does so by automatically
generating a single regular expression from user-provided test cases. The resulting
expression is guaranteed to match the test cases which it was generated from.
This project has started as a Rust port of the JavaScript tool
[*regexgen*](https://github.com/devongovett/regexgen) written by
[Devon Govett](https://github.com/devongovett). Although a lot of further useful features
could be added to it, its development was apparently ceased several years ago. The plan
is now to add these new features to *grex* as Rust really shines when it comes to
command-line tools. *grex* offers all features that *regexgen* provides, and more.
The philosophy of this project is to generate the most specific regular expression
possible by default which exactly matches the given input only and nothing else.
With the use of command-line flags (in the CLI tool) or preprocessing methods
(in the library), more generalized expressions can be created.
The produced expressions are [Perl-compatible regular expressions](https://www.pcre.org) which are also
compatible with the regular expression parser in Rust's [*regex* crate](https://crates.io/crates/regex).
Other regular expression parsers or respective libraries from other programming languages
have not been tested so far, but they ought to be mostly compatible as well.
## 2. Do I still need to learn to write regexes then?
**Definitely, yes!** Using the standard settings, *grex* produces a regular expression that is guaranteed
to match only the test cases given as input and nothing else.
This has been verified by [property tests](https://github.com/pemistahl/grex/blob/main/tests/property_tests.rs).
However, if the conversion to shorthand character classes such as `\w` is enabled, the resulting regex matches
a much wider scope of test cases. Knowledge about the consequences of this conversion is essential for finding
a correct regular expression for your business domain.
*grex* uses an algorithm that tries to find the shortest possible regex for the given test cases.
Very often though, the resulting expression is still longer or more complex than it needs to be.
In such cases, a more compact or elegant regex can be created only by hand.
Also, every regular expression engine has different built-in optimizations. *grex* does not know anything
about those and therefore cannot optimize its regexes for a specific engine.
**So, please learn how to write regular expressions!** The currently best use case for *grex* is to find
an initial correct regex which should be inspected by hand if further optimizations are possible.
## 3. Current Features
- literals
- character classes
- detection of common prefixes and suffixes
- detection of repeated substrings and conversion to `{min,max}` quantifier notation
- alternation using `|` operator
- optionality using `?` quantifier
- escaping of non-ascii characters, with optional conversion of astral code points to surrogate pairs
- case-sensitive or case-insensitive matching
- capturing or non-capturing groups
- optional anchors `^` and `$`
- fully compliant to [Unicode Standard 16.0](https://unicode.org/versions/Unicode15.0.0)
- fully compatible with [*regex* crate 1.11.0+](https://crates.io/crates/regex)
- correctly handles graphemes consisting of multiple Unicode symbols
- reads input strings from the command-line or from a file
- produces more readable expressions indented on multiple using optional verbose mode
- optional syntax highlighting for nicer output in supported terminals
## 4. How to install?
### 4.1 The command-line tool
You can download the self-contained executable for your platform above and put it in a place of your choice.
Alternatively, pre-compiled 64-Bit binaries are available within the package managers [Scoop](https://scoop.sh)
(for Windows), [Homebrew](https://brew.sh) (for macOS and Linux), [MacPorts](https://www.macports.org) (for macOS), and [Huber](https://github.com/innobead/huber) (for macOS, Linux and Windows).
[Raúl Piracés](https://github.com/piraces) has contributed a [Chocolatey Windows package](https://community.chocolatey.org/packages/grex).
*grex* is also hosted on [crates.io](https://crates.io/crates/grex),
the official Rust package registry. If you are a Rust developer and already have the Rust
toolchain installed, you can install by compiling from source using
[*cargo*](https://doc.rust-lang.org/cargo/), the Rust package manager.
So the summary of your installation options is:
```
( brew | cargo | choco | huber | port | scoop ) install grex
```
### 4.2 The library
In order to use *grex* as a library, simply add it as a dependency to your `Cargo.toml` file:
```toml
[dependencies]
grex = { version = "1.4.6", default-features = false }
```
The dependency *clap* is only needed for the command-line tool.
By disabling the default features, the download and compilation of clap is prevented for the library.
## 5. How to use?
Detailed explanations of the available settings are provided in the [library section](#52-the-library).
All settings can be freely combined with each other.
### 5.1 The command-line tool
Test cases are passed either directly (`grex a b c`) or from a file (`grex -f test_cases.txt`).
*grex* is able to receive its input from Unix pipelines as well, e.g. `cat test_cases.txt | grex -`.
The following table shows all available flags and options:
```
$ grex -h
grex 1.4.6
© 2019-today Peter M. Stahl <pemistahl@gmail.com>
Licensed under the Apache License, Version 2.0
Downloadable from https://crates.io/crates/grex
Source code at https://github.com/pemistahl/grex
grex generates regular expressions from user-provided test cases.
Usage: grex [OPTIONS] {INPUT...|--file <FILE>}
Input:
[INPUT]... One or more test cases separated by blank space
-f, --file <FILE> Reads test cases on separate lines from a file
Digit Options:
-d, --digits Converts any Unicode decimal digit to \d
-D, --non-digits Converts any character which is not a Unicode decimal digit to \D
Whitespace Options:
-s, --spaces Converts any Unicode whitespace character to \s
-S, --non-spaces Converts any character which is not a Unicode whitespace character to \S
Word Options:
-w, --words Converts any Unicode word character to \w
-W, --non-words Converts any character which is not a Unicode word character to \W
Escaping Options:
-e, --escape Replaces all non-ASCII characters with unicode escape sequences
--with-surrogates Converts astral code points to surrogate pairs if --escape is set
Repetition Options:
-r, --repetitions
Detects repeated non-overlapping substrings and converts them to {min,max} quantifier
notation
--min-repetitions <QUANTITY>
Specifies the minimum quantity of substring repetitions to be converted if --repetitions
is set [default: 1]
--min-substring-length <LENGTH>
Specifies the minimum length a repeated substring must have in order to be converted if
--repetitions is set [default: 1]
Anchor Options:
--no-start-anchor Removes the caret anchor `^` from the resulting regular expression
--no-end-anchor Removes the dollar sign anchor `$` from the resulting regular expression
--no-anchors Removes the caret and dollar sign anchors from the resulting regular
expression
Display Options:
-x, --verbose Produces a nicer-looking regular expression in verbose mode
-c, --colorize Provides syntax highlighting for the resulting regular expression
Miscellaneous Options:
-i, --ignore-case Performs case-insensitive matching, letters match both upper and lower case
-g, --capture-groups Replaces non-capturing groups with capturing ones
-h, --help Prints help information
-v, --version Prints version information
```
### 5.2 The library
#### 5.2.1 Default settings
Test cases are passed either from a collection via [`RegExpBuilder::from()`](https://docs.rs/grex/1.4.6/grex/struct.RegExpBuilder.html#method.from)
or from a file via [`RegExpBuilder::from_file()`](https://docs.rs/grex/1.4.6/grex/struct.RegExpBuilder.html#method.from_file).
If read from a file, each test case must be on a separate line. Lines may be ended with either a newline `\n` or a carriage
return with a line feed `\r\n`.
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["a", "aa", "aaa"]).build();
assert_eq!(regexp, "^a(?:aa?)?$");
```
#### 5.2.2 Convert to character classes
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["a", "aa", "123"])
.with_conversion_of_digits()
.with_conversion_of_words()
.build();
assert_eq!(regexp, "^(\\d\\d\\d|\\w(?:\\w)?)$");
```
#### 5.2.3 Convert repeated substrings
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.build();
assert_eq!(regexp, "^(?:a{2}|(?:bc){2}|(?:def){3})$");
```
By default, *grex* converts each substring this way which is at least a single character long
and which is subsequently repeated at least once. You can customize these two parameters if you like.
In the following example, the test case `aa` is not converted to `a{2}` because the repeated substring
`a` has a length of 1, but the minimum substring length has been set to 2.
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.with_minimum_substring_length(2)
.build();
assert_eq!(regexp, "^(?:aa|(?:bc){2}|(?:def){3})$");
```
Setting a minimum number of 2 repetitions in the next example, only the test case `defdefdef` will be
converted because it is the only one that is repeated twice.
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.with_minimum_repetitions(2)
.build();
assert_eq!(regexp, "^(?:bcbc|aa|(?:def){3})$");
```
#### 5.2.4 Escape non-ascii characters
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["You smell like 💩."])
.with_escaping_of_non_ascii_chars(false)
.build();
assert_eq!(regexp, "^You smell like \\u{1f4a9}\\.$");
```
Old versions of JavaScript do not support unicode escape sequences for the astral code planes
(range `U+010000` to `U+10FFFF`). In order to support these symbols in JavaScript regular
expressions, the conversion to surrogate pairs is necessary. More information on that matter
can be found [here](https://mathiasbynens.be/notes/javascript-unicode).
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["You smell like 💩."])
.with_escaped_non_ascii_chars(true)
.build();
assert_eq!(regexp, "^You smell like \\u{d83d}\\u{dca9}\\.$");
```
#### 5.2.5 Case-insensitive matching
The regular expressions that *grex* generates are case-sensitive by default.
Case-insensitive matching can be enabled like so:
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["big", "BIGGER"])
.with_case_insensitive_matching()
.build();
assert_eq!(regexp, "(?i)^big(?:ger)?$");
```
#### 5.2.6 Capturing Groups
Non-capturing groups are used by default.
Extending the previous example, you can switch to capturing groups instead.
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["big", "BIGGER"])
.with_case_insensitive_matching()
.with_capturing_groups()
.build();
assert_eq!(regexp, "(?i)^big(ger)?$");
```
#### 5.2.7 Verbose mode
If you find the generated regular expression hard to read, you can enable verbose mode.
The expression is then put on multiple lines and indented to make it more pleasant to the eyes.
```rust
use grex::RegExpBuilder;
use indoc::indoc;
let regexp = RegExpBuilder::from(&["a", "b", "bcd"])
.with_verbose_mode()
.build();
assert_eq!(regexp, indoc!(
r#"
(?x)
^
(?:
b
(?:
cd
)?
|
a
)
$"#
));
```
#### 5.2.8 Disable anchors
By default, the anchors `^` and `$` are put around every generated regular expression in order
to ensure that it matches only the test cases given as input. Often enough, however, it is
desired to use the generated pattern as part of a larger one. For this purpose, the anchors
can be disabled, either separately or both of them.
```rust
use grex::RegExpBuilder;
let regexp = RegExpBuilder::from(&["a", "aa", "aaa"])
.without_anchors()
.build();
assert_eq!(regexp, "a(?:aa?)?");
```
### 5.3 Examples
The following examples show the various supported regex syntax features:
```shell
$ grex a b c
^[a-c]$
$ grex a c d e f
^[ac-f]$
$ grex a b x de
^(?:de|[abx])$
$ grex abc bc
^a?bc$
$ grex a b bc
^(?:bc?|a)$
$ grex [a-z]
^\[a\-z\]$
$ grex -r b ba baa baaa
^b(?:a{1,3})?$
$ grex -r b ba baa baaaa
^b(?:a{1,2}|a{4})?$
$ grex y̆ a z
^(?:y̆|[az])$
Note:
Grapheme y̆ consists of two Unicode symbols:
U+0079 (Latin Small Letter Y)
U+0306 (Combining Breve)
$ grex "I ♥ cake" "I ♥ cookies"
^I ♥ c(?:ookies|ake)$
Note:
Input containing blank space must be
surrounded by quotation marks.
```
The string `"I ♥♥♥ 36 and ٣ and 💩💩."` serves as input for the following examples using the command-line notation:
```shell
$ grex <INPUT>
^I ♥♥♥ 36 and ٣ and 💩💩\.$
$ grex -e <INPUT>
^I \u{2665}\u{2665}\u{2665} 36 and \u{663} and \u{1f4a9}\u{1f4a9}\.$
$ grex -e --with-surrogates <INPUT>
^I \u{2665}\u{2665}\u{2665} 36 and \u{663} and \u{d83d}\u{dca9}\u{d83d}\u{dca9}\.$
$ grex -d <INPUT>
^I ♥♥♥ \d\d and \d and 💩💩\.$
$ grex -s <INPUT>
^I\s♥♥♥\s36\sand\s٣\sand\s💩💩\.$
$ grex -w <INPUT>
^\w ♥♥♥ \w\w \w\w\w \w \w\w\w 💩💩\.$
$ grex -D <INPUT>
^\D\D\D\D\D\D36\D\D\D\D\D٣\D\D\D\D\D\D\D\D$
$ grex -S <INPUT>
^\S \S\S\S \S\S \S\S\S \S \S\S\S \S\S\S$
$ grex -dsw <INPUT>
^\w\s♥♥♥\s\d\d\s\w\w\w\s\d\s\w\w\w\s💩💩\.$
$ grex -dswW <INPUT>
^\w\s\W\W\W\s\d\d\s\w\w\w\s\d\s\w\w\w\s\W\W\W$
$ grex -r <INPUT>
^I ♥{3} 36 and ٣ and 💩{2}\.$
$ grex -er <INPUT>
^I \u{2665}{3} 36 and \u{663} and \u{1f4a9}{2}\.$
$ grex -er --with-surrogates <INPUT>
^I \u{2665}{3} 36 and \u{663} and (?:\u{d83d}\u{dca9}){2}\.$
$ grex -dgr <INPUT>
^I ♥{3} \d(\d and ){2}💩{2}\.$
$ grex -rs <INPUT>
^I\s♥{3}\s36\sand\s٣\sand\s💩{2}\.$
$ grex -rw <INPUT>
^\w ♥{3} \w(?:\w \w{3} ){2}💩{2}\.$
$ grex -Dr <INPUT>
^\D{6}36\D{5}٣\D{8}$
$ grex -rS <INPUT>
^\S \S(?:\S{2} ){2}\S{3} \S \S{3} \S{3}$
$ grex -rW <INPUT>
^I\W{5}36\Wand\W٣\Wand\W{4}$
$ grex -drsw <INPUT>
^\w\s♥{3}\s\d(?:\d\s\w{3}\s){2}💩{2}\.$
$ grex -drswW <INPUT>
^\w\s\W{3}\s\d(?:\d\s\w{3}\s){2}\W{3}$
```
## 6. How to build?
In order to build the source code yourself, you need the
[stable Rust toolchain](https://www.rust-lang.org/tools/install) installed on your machine
so that [*cargo*](https://doc.rust-lang.org/cargo/), the Rust package manager is available.
**Please note**: Rust >= 1.70.0 is required to build the CLI. For the library part, Rust < 1.70.0 is sufficient.
```shell
git clone https://github.com/pemistahl/grex.git
cd grex
cargo build
```
The source code is accompanied by an extensive test suite consisting of unit tests, integration
tests and property tests. For running them, simply say:
```shell
cargo test
```
Benchmarks measuring the performance of several settings can be run with:
```shell
cargo bench
```
## 7. Python extension module
With the help of [PyO3](https://github.com/PyO3/pyo3) and
[Maturin](https://github.com/PyO3/maturin), the library has been compiled to a
Python extension module so that it can be used within any Python software as well.
It is available in the [Python Package Index](https://pypi.org/project/grex) and can
be installed with:
```shell
pip install grex
```
To build the Python extension module yourself, create a virtual environment and install
[Maturin](https://github.com/PyO3/maturin).
```shell
python -m venv /path/to/virtual/environment
source /path/to/virtual/environment/bin/activate
pip install maturin
maturin build
```
The Python library contains a single class named `RegExpBuilder` that can be imported like so:
```python
from grex import RegExpBuilder
```
## 8. WebAssembly support
This library can be compiled to [WebAssembly (WASM)](https://webassembly.org) which allows to use *grex*
in any JavaScript-based project, be it in the browser or in the back end running on [Node.js](https://nodejs.org).
The easiest way to compile is to use [`wasm-pack`](https://rustwasm.github.io/wasm-pack). After the installation,
you can, for instance, build the library with the web target so that it can be directly used in the browser:
wasm-pack build --target web
This creates a directory named `pkg` on the top-level of this repository, containing the compiled wasm files
and JavaScript and TypeScript bindings. In an HTML file, you can then call *grex* like the following, for instance:
```html
<script type="module">
import init, { RegExpBuilder } from "./pkg/grex.js";
init().then(_ => {
alert(RegExpBuilder.from(["hello", "world"]).build());
});
</script>
```
There are also some integration tests available both for Node.js and for the browsers Chrome, Firefox and Safari.
To run them, simply say:
wasm-pack test --node --headless --chrome --firefox --safari
If the tests fail to start in Safari, you need to enable Safari's web driver first by running:
sudo safaridriver --enable
The output of `wasm-pack` will be hosted in a [separate repository](https://github.com/pemistahl/grex-js) which
allows to add further JavaScript-related configuration, tests and documentation. *grex* will then be added to the
[npm registry](https://www.npmjs.com) as well, allowing for an easy download and installation within every JavaScript
or TypeScript project.
There is a [demo website](https://pemistahl.github.io/grex-js/) available where you can give grex a try.

## 9. How does it work?
1. A [deterministic finite automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA)
is created from the input strings.
2. The number of states and transitions between states in the DFA is reduced by applying
[Hopcroft's DFA minimization algorithm](https://en.wikipedia.org/wiki/DFA_minimization#Hopcroft.27s_algorithm).
3. The minimized DFA is expressed as a system of linear equations which are solved with
[Brzozowski's algebraic method](http://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions#2392),
resulting in the final regular expression.
## 10. What's next for version 1.5.0?
Take a look at the [planned issues](https://github.com/pemistahl/grex/milestone/5).
## 11. Contributions
In case you want to contribute something to *grex*, I encourage you to do so.
Do you have ideas for cool features? Or have you found any bugs so far?
Feel free to open an issue or send a pull request. It's very much appreciated. :-)
================================================
FILE: README_PYPI.md
================================================
<div align="center">

<br>
[](https://github.com/pemistahl/grex/actions/workflows/python-build.yml)
[](https://codecov.io/gh/pemistahl/grex)
[](https://pemistahl.github.io/grex-js/)

[](https://pypi.org/project/grex)
[](https://www.apache.org/licenses/LICENSE-2.0)
</div>
<br>
## 1. What does this library do?
*grex* is a library that is meant to simplify the often complicated and tedious
task of creating regular expressions. It does so by automatically generating a
single regular expression from user-provided test cases. The resulting
expression is guaranteed to match the test cases which it was generated from.
This project has started as a [Rust port](https://github.com/pemistahl/grex) of
the JavaScript tool [*regexgen*](https://github.com/devongovett/regexgen)
written by [Devon Govett](https://github.com/devongovett). Although a lot of
further useful features could be added to it, its development was apparently
ceased several years ago. The Rust library offers new features and extended
Unicode support. With the help of [PyO3](https://github.com/PyO3/pyo3) and
[Maturin](https://github.com/PyO3/maturin), the library has been compiled to a
Python extension module so that it can be used within any Python software as well.
The philosophy of this project is to generate the most specific regular expression
possible by default which exactly matches the given input only and nothing else.
With the use of preprocessing methods, more generalized expressions can be created.
The produced expressions are [Perl-compatible regular expressions](https://www.pcre.org) which are also
compatible with the [regular expression module](https://docs.python.org/3/library/re.html) in Python's
standard library.
There is a [demo website](https://pemistahl.github.io/grex-js/) available where you can give grex a try.

## 2. Do I still need to learn to write regexes then?
**Definitely, yes!** Using the standard settings, *grex* produces a regular expression that is guaranteed
to match only the test cases given as input and nothing else. However, if the conversion to shorthand
character classes such as `\w` is enabled, the resulting regex matches a much wider scope of test cases.
Knowledge about the consequences of this conversion is essential for finding a correct regular expression
for your business domain.
*grex* uses an algorithm that tries to find the shortest possible regex for the given test cases.
Very often though, the resulting expression is still longer or more complex than it needs to be.
In such cases, a more compact or elegant regex can be created only by hand.
Also, every regular expression engine has different built-in optimizations. *grex* does not know anything
about those and therefore cannot optimize its regexes for a specific engine.
**So, please learn how to write regular expressions!** The currently best use case for *grex* is to find
an initial correct regex which should be inspected by hand if further optimizations are possible.
## 3. Current Features
- literals
- character classes
- detection of common prefixes and suffixes
- detection of repeated substrings and conversion to `{min,max}` quantifier notation
- alternation using `|` operator
- optionality using `?` quantifier
- escaping of non-ascii characters, with optional conversion of astral code points to surrogate pairs
- case-sensitive or case-insensitive matching
- capturing or non-capturing groups
- optional anchors `^` and `$`
- fully compliant to [Unicode Standard 16.0](https://unicode.org/versions/Unicode16.0.0)
- correctly handles graphemes consisting of multiple Unicode symbols
- produces more readable expressions indented on multiple using optional verbose mode
- optional syntax highlighting for nicer output in supported terminals
## 4. How to install?
*grex* is available in the [Python Package Index](https://pypi.org/project/grex) and can be installed with:
```
pip install grex
```
The current version 1.0.2 corresponds to the latest version 1.4.6 of the Rust
library and command-line tool.
## 5. How to use?
This library contains a single class named `RegExpBuilder` that can be imported like so:
```python
from grex import RegExpBuilder
```
### 5.1 Default settings
```python
pattern = RegExpBuilder.from_test_cases(["a", "aa", "aaa"]).build()
assert pattern == "^a(?:aa?)?$"
```
### 5.2 Convert to character classes
```python
pattern = (RegExpBuilder.from_test_cases(["a", "aa", "123"])
.with_conversion_of_digits()
.with_conversion_of_words()
.build())
assert pattern == "^(?:\\d\\d\\d|\\w(?:\\w)?)$"
```
### 5.3 Convert repeated substrings
```python
pattern = (RegExpBuilder.from_test_cases(["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.build())
assert pattern == "^(?:a{2}|(?:bc){2}|(?:def){3})$"
```
By default, *grex* converts each substring this way which is at least a single character long
and which is subsequently repeated at least once. You can customize these two parameters if you like.
In the following example, the test case `aa` is not converted to `a{2}` because the repeated substring
`a` has a length of 1, but the minimum substring length has been set to 2.
```python
pattern = (RegExpBuilder.from_test_cases(["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.with_minimum_substring_length(2)
.build())
assert pattern == "^(?:aa|(?:bc){2}|(?:def){3})$"
```
Setting a minimum number of 2 repetitions in the next example, only the test case `defdefdef` will be
converted because it is the only one that is repeated twice.
```python
pattern = (RegExpBuilder.from_test_cases(["aa", "bcbc", "defdefdef"])
.with_conversion_of_repetitions()
.with_minimum_repetitions(2)
.build())
assert pattern == "^(?:bcbc|aa|(?:def){3})$"
```
### 5.4 Escape non-ascii characters
```python
pattern = (RegExpBuilder.from_test_cases(["You smell like 💩."])
.with_escaping_of_non_ascii_chars(use_surrogate_pairs=False)
.build())
assert pattern == "^You smell like \\U0001f4a9\\.$"
```
Old versions of JavaScript do not support unicode escape sequences for the astral code planes
(range `U+010000` to `U+10FFFF`). In order to support these symbols in JavaScript regular
expressions, the conversion to surrogate pairs is necessary. More information on that matter
can be found [here](https://mathiasbynens.be/notes/javascript-unicode).
```python
pattern = (RegExpBuilder.from_test_cases(["You smell like 💩."])
.with_escaping_of_non_ascii_chars(use_surrogate_pairs=True)
.build())
assert pattern == "^You smell like \\ud83d\\udca9\\.$"
```
### 5.5 Case-insensitive matching
The regular expressions that *grex* generates are case-sensitive by default.
Case-insensitive matching can be enabled like so:
```python
pattern = (RegExpBuilder.from_test_cases(["big", "BIGGER"])
.with_case_insensitive_matching()
.build())
assert pattern == "(?i)^big(?:ger)?$"
```
### 5.6 Capturing Groups
Non-capturing groups are used by default.
Extending the previous example, you can switch to capturing groups instead.
```python
pattern = (RegExpBuilder.from_test_cases(["big", "BIGGER"])
.with_case_insensitive_matching()
.with_capturing_groups()
.build())
assert pattern == "(?i)^big(ger)?$"
```
### 5.7 Verbose mode
If you find the generated regular expression hard to read, you can enable verbose mode.
The expression is then put on multiple lines and indented to make it more pleasant to the eyes.
```python
import inspect
pattern = (RegExpBuilder.from_test_cases(["a", "b", "bcd"])
.with_verbose_mode()
.build())
assert pattern == inspect.cleandoc("""
(?x)
^
(?:
b
(?:
cd
)?
|
a
)
$
"""
)
```
### 5.8 Disable anchors
By default, the anchors `^` and `$` are put around every generated regular expression in order
to ensure that it matches only the test cases given as input. Often enough, however, it is
desired to use the generated pattern as part of a larger one. For this purpose, the anchors
can be disabled, either separately or both of them.
```python
pattern = (RegExpBuilder.from_test_cases(["a", "aa", "aaa"])
.without_anchors()
.build())
assert pattern == "a(?:aa?)?"
```
## 6. How to build?
In order to build the source code yourself, you need the
[stable Rust toolchain](https://www.rust-lang.org/tools/install) installed on your machine
so that [*cargo*](https://doc.rust-lang.org/cargo/), the Rust package manager is available.
```shell
git clone https://github.com/pemistahl/grex.git
cd grex
cargo build
```
To build the Python extension module, create a virtual environment and install [Maturin](https://github.com/PyO3/maturin).
```shell
python -m venv /path/to/virtual/environment
source /path/to/virtual/environment/bin/activate
pip install maturin
maturin build
```
The Rust source code is accompanied by an extensive test suite consisting of unit tests, integration
tests and property tests. For running them, simply say:
```shell
cargo test
```
Additional Python tests can be run after installing pytest which is an optional dependency:
```shell
maturin develop --extras=test
pytest tests/python/test_grex.py
```
================================================
FILE: RELEASE_NOTES.md
================================================
## grex 1.4.6 (released on 14 Nov 2025)
### Improvements
- All characters from the current Unicode standard 16.0 are now fully supported.
### Changes
- The unmaintained unic-* dependencies have been replaced by @jqnatividad. (#337)
- All other dependencies have been updated to their latest versions.
- Support for Python 3.14 has been added.
- Support for Python < 3.12 has been dropped.
## grex 1.4.5 (released on 06 Mar 2024)
### Improvements
- Type stubs for the Python bindings are now available, allowing better static code
analysis, better code completion in supported IDEs and easier understanding of the library's API.
- The code for creating regular expressions in verbose mode has been simplified and is more performant now.
- ARM64 binaries are now provided for every major platform (Linux, macOs, Windows).
### Bug Fixes
- For a small set of special characters, *grex* produced incorrect regular expressions when
the case-insensitivity feature was enabled. This has been fixed.
### Changes
- All dependencies have been updated to their latest versions.
## grex 1.4.4 (released on 24 Aug 2023)
### Bug Fixes
- The Python release workflow was incorrect as it produced too many wheels for upload.
This has been fixed.
## grex 1.4.3 (released on 24 Aug 2023)
### Features
- Python bindings are now available for the library. Use grex within any Python software. (#172)
### Changes
- All dependencies have been updated to their latest versions.
## grex 1.4.2 (released on 26 Jul 2023)
### Improvements
- All characters from the current Unicode standard 15.0 are now fully supported. (#128)
- A proper exit code is now returned if the provided user input cannot be handled by the CLI.
Big thanks to @spenserblack for the respective pull request. (#165)
### Changes
- It is not possible anymore to call `RegExpBuilder.with_syntax_highlighting()` in the library
as it only makes sense for the CLI.
- The dependency `atty` has been removed in favor of `std::io::IsTerminal` in Rust >= 1.70.0.
As a result, Rust >= 1.70.0 is now needed to compile the CLI.
- All remaining dependencies have been updated to their latest versions.
### Bug Fixes
- Several bugs have been fixed that caused incorrect expressions to be generated in rare cases.
## grex 1.4.1 (released on 21 Oct 2022)
### Changes
- `clap` has been updated to version 4.0. The help output by `grex -h` now looks a little different.
### Bug Fixes
- A bug in the grapheme segmentation was fixed that caused test cases which contain backslashes to produce
incorrect regular expressions.
## grex 1.4.0 (released on 26 Jul 2022)
### Features
- The library can now be compiled to WebAssembly and be used in any JavaScript project. (#82)
- The supported character set for regular expression generation has been updated to the current Unicode Standard 14.0.
- `structopt` has been replaced with `clap` providing much nicer help output for the command-line tool.
### Improvements
- The regular expression generation performance has been significantly improved, especially for generating very long
expressions from a large set of test cases. This has been accomplished by reducing the number of memory allocations,
removing deprecated code and applying several minor optimizations.
### Bug Fixes
- Several bugs have been fixed that caused incorrect expressions to be generated in rare cases.
## grex 1.3.0 (released on 15 Sep 2021)
### Features
- anchors can now be disabled so that the generated expression can be used as part of a larger one (#30)
- the command-line tool can now be used within Unix pipelines (#45)
### Changes
- Additional methods have been added to `RegExpBuilder` in order to replace the enum `Feature` and make the library API more consistent. (#47)
### Bug Fixes
- Under rare circumstances, the conversion of repetitions did not work. This has been fixed. (#36)
## grex 1.2.0 (released on 28 Mar 2021)
### Features
- verbose mode is now supported with the `--verbose` flag to produce regular expressions which are easier to read (#17)
## grex 1.1.0 (released on 17 Apr 2020)
### Features
- case-insensitive matching regexes are now supported with the `--ignore-case` command-line flag or with `Feature::CaseInsensitivity` in the library (#23)
- non-capturing groups are now the default; capturing groups can be enabled with the `--capture-groups` command-line flag or with `Feature::CapturingGroup` in the library (#15)
- a lower bound for the conversion of repeated substrings can now be set by specifying `--min-repetitions` and `--min-substring-length` or using the library methods `RegExpBuilder.with_minimum_repetitions()` and `RegExpBuilder.with_minimum_substring_length()` (#10)
- test cases can now be passed from a file within the library as well using `RegExpBuilder::from_file()` (#13)
### Changes
- the rules for the conversion of test cases to shorthand character classes have been updated to be compliant to the newest Unicode Standard 13.0 (#21)
- the dependency on the unmaintained linked-list crate has been removed (#24)
### Bug Fixes
- test cases starting with a hyphen are now correctly parsed on the command-line (#12)
- the common substring detection algorithm now uses optionality expressions where possible instead of redundant union operations (#22)
### Test Coverage
- new unit tests, integration tests and property tests have been added
## grex 1.0.0 (released on 02 Feb 2020)
### Features
- conversion to character classes `\d`, `\D`, `\s`, `\S`, `\w`, `\W` is now supported
- repetition detection now works with arbitrarily nested expressions. Input strings such as `aaabaaab` which were previously converted to `^(aaab){2}$` are now converted to `^(a{3}b){2}$`.
- optional syntax highlighting for the produced regular expressions can now be enabled using the `--colorize` command-line flag or with the library method `RegExpBuilder.with_syntax_highlighting()`
### Test Coverage
- new unit tests, integration tests and property tests have been added
## grex 0.3.2 (released on 12 Jan 2020)
### Test Coverage
- new property tests have been added that revealed new bugs
### Bug Fixes
- entire rewrite of the repetition detection algorithm
- the former algorithm produced wrong regular expressions or even panicked for certain test cases
## grex 0.3.1 (released on 06 Jan 2020)
### Test Coverage
- property tests have been added using the [proptest](https://crates.io/crates/proptest) crate
- big thanks go to [Christophe Biocca](https://github.com/christophebiocca) for pointing me to the concept of property tests in the first place and for writing an initial implementation of these tests
### Bug Fixes
- some regular expression specific characters were not escaped correctly in the generated expression
- expressions consisting of a single alternation such as `^(abc|xyz)$` were missing the outer parentheses. This caused an erroneous match of strings such as `abc123` or `456xyz` because of precedence rules.
- the created DFA was wrong for repetition conversion in some corner cases. The input `a, aa, aaa, aaaa, aaab` previously returned the expression `^a{1,4}b?$` which erroneously matches `aaaab`. Now the correct expression `^(a{3}b|a{1,4})$` is returned.
### Documentation
- some minor documentation updates
## grex 0.3.0 (released on 24 Dec 2019)
### Features
- *grex* is now also available as a library
- escaping of non-ascii characters is now supported with the `-e` flag
- astral code points can be converted to surrogate with the `--with-surrogates` flag
- repeated non-overlapping substrings can be converted to `{min,max}` quantifier notation using the `-r` flag
### Bug Fixes
- many many many bug fixes :-O
## grex 0.2.0 (released on 20 Oct 2019)
### Features
- character classes are now supported
- input strings can now be read from a text file
### Changes
- unicode characters are not escaped anymore by default
- the performance of the DFA minimization algorithm has been improved for large DFAs
- regular expressions are now always surrounded by anchors `^` and `$`
### Bug Fixes
- fixed a bug that caused a panic when giving an empty string as input
## grex 0.1.0 (released on 06 Oct 2019)
This is the very first release of *grex*. It aims at simplifying the construction of regular expressions based on matching example input.
### Features
- literals
- detection of common prefixes and suffixes
- alternation using `|` operator
- optionality using `?` quantifier
- concatenation of all of the former
================================================
FILE: benches/benchmark.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use criterion::{criterion_group, criterion_main, Criterion};
use grex::RegExpBuilder;
use itertools::Itertools;
use std::fs::File;
use std::io::Read;
fn load_test_cases() -> Vec<String> {
let mut f = File::open("./benches/testcases.txt").expect("Test cases could not be loaded");
let mut s = String::new();
f.read_to_string(&mut s).unwrap();
s.split("\n")
.map(|test_case| test_case.to_string())
.collect_vec()
}
fn benchmark_grex_with_default_settings(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with default settings", |bencher| {
bencher.iter(|| RegExpBuilder::from(&test_cases).build())
});
}
fn benchmark_grex_with_conversion_of_repetitions(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of repetitions", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_repetitions()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_digits(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of digits", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_digits()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_non_digits(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of non-digits", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_non_digits()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_words(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of words", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_words()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_non_words(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of non-words", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_non_words()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_whitespace(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of whitespace", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_whitespace()
.build()
})
});
}
fn benchmark_grex_with_conversion_of_non_whitespace(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with conversion of non-whitespace", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_conversion_of_non_whitespace()
.build()
})
});
}
fn benchmark_grex_with_case_insensitive_matching(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with case-insensitive matching", |bencher| {
bencher.iter(|| {
RegExpBuilder::from(&test_cases)
.with_case_insensitive_matching()
.build()
})
});
}
fn benchmark_grex_with_verbose_mode(c: &mut Criterion) {
let test_cases = load_test_cases();
c.bench_function("grex with verbose mode", |bencher| {
bencher.iter(|| RegExpBuilder::from(&test_cases).with_verbose_mode().build())
});
}
criterion_group!(
benches,
benchmark_grex_with_default_settings,
benchmark_grex_with_conversion_of_repetitions,
benchmark_grex_with_conversion_of_digits,
benchmark_grex_with_conversion_of_non_digits,
benchmark_grex_with_conversion_of_words,
benchmark_grex_with_conversion_of_non_words,
benchmark_grex_with_conversion_of_whitespace,
benchmark_grex_with_conversion_of_non_whitespace,
benchmark_grex_with_case_insensitive_matching,
benchmark_grex_with_verbose_mode
);
criterion_main!(benches);
================================================
FILE: benches/testcases.txt
================================================
Rocket Sled
Elysian Heirloom
Kaleb's Favor
Blazing Renegade
Flash Fire
Silence
Talir's Favored
Timekeeper
Oasis Sanctuary
Rolant's Favor
Mantle of Justice
Eilyn's Favor
Thunderbird
Primal Incarnation
Vampire Bat
Vara's Favor
Devouring Shadow
Seat of Order
Seat of Fury
Seat of Impulse
Seat of Vengeance
Seat of Glory
Seat of Progress
Seat of Chaos
Seat of Mystery
Seat of Cunning
Seat of Wisdom
Firebomb
Grenadin
Iron Sword
Magmahound
Wisp
Rhinarc
Sentinel
Owl
Gemblade
Frog
Snowball
Pig
Serpent Hatchling
Carnosaur
Stormdancer
Illusionary Dragon
Spiteling
Vengeful Gargoyle
Muertis, Pale Rider
Occi, Pale Rider
Sangu, Pale Rider
Volan, Pale Rider
Direwood Beast
================================================
FILE: demo.tape
================================================
# demo.gif created with https://github.com/charmbracelet/vhs on macOS 13 (Ventura)
Require grex
Output demo.gif
Set Shell zsh
Set Theme "Whimsy"
Set Width 1200
Set Height 850
Set TypingSpeed 150ms
Type "grex -c 'regexes are awesome' 'regexes are awful'"
Sleep 3s
Enter
Sleep 10s
Up
Left 42
Type " --verbose"
Sleep 3s
Enter
Sleep 15s
Type "clear"
Enter
Type "grex -c haha HAHAHA"
Sleep 3s
Enter
Sleep 10s
Up
Left 12
Type " --repetitions"
Sleep 3s
Enter
Sleep 10s
Up
Left 12
Type " --verbose"
Sleep 3s
Enter
Sleep 15s
Up
Left 12
Type " --ignore-case"
Sleep 3s
Enter
Sleep 15s
================================================
FILE: grex.pyi
================================================
#
# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List
class RegExpBuilder:
"""This class builds regular expressions from user-provided test cases."""
@classmethod
def from_test_cases(cls, test_cases: List[str]) -> "RegExpBuilder":
"""Specify the test cases to build the regular expression from.
The test cases need not be sorted because `RegExpBuilder` sorts them internally.
Args:
test_cases (list[str]): The list of test cases
Raises:
ValueError: if `test_cases` is empty
"""
def with_conversion_of_digits(self) -> "RegExpBuilder":
"""Convert any Unicode decimal digit to character class `\d`.
This method takes precedence over `with_conversion_of_words` if both are set.
Decimal digits are converted to `\d`, the remaining word characters to `\w`.
This method takes precedence over `with_conversion_of_non_whitespace` if both are set.
Decimal digits are converted to `\d`, the remaining non-whitespace characters to `\S`.
"""
def with_conversion_of_non_digits(self) -> "RegExpBuilder":
"""Convert any character which is not a Unicode decimal digit to character class `\D`.
This method takes precedence over `with_conversion_of_non_words` if both are set.
Non-digits which are also non-word characters are converted to `\D`.
This method takes precedence over `with_conversion_of_non_whitespace` if both are set.
Non-digits which are also non-space characters are converted to `\D`.
"""
def with_conversion_of_whitespace(self) -> "RegExpBuilder":
"""Convert any Unicode whitespace character to character class `\s`.
This method takes precedence over `with_conversion_of_non_digits` if both are set.
Whitespace characters are converted to `\s`, the remaining non-digit characters to `\D`.
This method takes precedence over `with_conversion_of_non_words` if both are set.
Whitespace characters are converted to `\s`, the remaining non-word characters to `\W`.
"""
def with_conversion_of_non_whitespace(self) -> "RegExpBuilder":
"""Convert any character which is not a Unicode whitespace character to character class `\S`."""
def with_conversion_of_words(self) -> "RegExpBuilder":
"""Convert any Unicode word character to character class `\w`.
This method takes precedence over `with_conversion_of_non_digits` if both are set.
Word characters are converted to `\w`, the remaining non-digit characters to `\D`.
This method takes precedence over `with_conversion_of_non_whitespace` if both are set.
Word characters are converted to `\w`, the remaining non-space characters to `\S`.
"""
def with_conversion_of_non_words(self) -> "RegExpBuilder":
"""Convert any character which is not a Unicode word character to character class `\W`.
This method takes precedence over `with_conversion_of_non_whitespace` if both are set.
Non-words which are also non-space characters are converted to `\W`.
"""
def with_conversion_of_repetitions(self) -> "RegExpBuilder":
"""Detect repeated non-overlapping substrings and to convert them to `{min,max}` quantifier notation."""
def with_case_insensitive_matching(self) -> "RegExpBuilder":
"""Enable case-insensitive matching of test cases so that letters match both upper and lower case."""
def with_capturing_groups(self) -> "RegExpBuilder":
"""Replace non-capturing groups with capturing ones."""
def with_minimum_repetitions(self, quantity: int) -> "RegExpBuilder":
"""Specify the minimum quantity of substring repetitions to be converted
if `with_conversion_of_repetitions` is set.
If the quantity is not explicitly set with this method, a default value of 1 will be used.
Args:
quantity (int): The minimum quantity of substring repetitions
Raises:
ValueError: if `quantity` is zero
"""
def with_minimum_substring_length(self, length: int) -> "RegExpBuilder":
"""Specify the minimum length a repeated substring must have in order
to be converted if `with_conversion_of_repetitions` is set.
If the length is not explicitly set with this method, a default value of 1 will be used.
Args:
length (int): The minimum substring length
Raises:
ValueError: if `length` is zero
"""
def with_escaping_of_non_ascii_chars(self, use_surrogate_pairs: bool) -> "RegExpBuilder":
"""Convert non-ASCII characters to unicode escape sequences.
The parameter `use_surrogate_pairs` specifies whether to convert astral
code planes (range `U+010000` to `U+10FFFF`) to surrogate pairs.
Args:
use_surrogate_pairs (bool): Whether to convert astral code planes to surrogate pairs
"""
def with_verbose_mode(self) -> "RegExpBuilder":
""" Produce a nicer looking regular expression in verbose mode."""
def without_start_anchor(self) -> "RegExpBuilder":
"""Remove the caret anchor '^' from the resulting regular expression,
thereby allowing to match the test cases also when they do not occur
at the start of a string.
"""
def without_end_anchor(self) -> "RegExpBuilder":
"""Remove the dollar sign anchor '$' from the resulting regular expression,
thereby allowing to match the test cases also when they do not occur
at the end of a string.
"""
def without_anchors(self) -> "RegExpBuilder":
"""Remove the caret and dollar sign anchors from the resulting regular expression,
thereby allowing to match the test cases also when they occur within a larger
string that contains other content as well.
"""
def build(self) -> str:
"""Build the actual regular expression using the previously given settings."""
================================================
FILE: pyproject.toml
================================================
[project]
name = "grex"
version = "1.0.2"
authors = [{name = "Peter M. Stahl", email = "pemistahl@gmail.com"}]
description = "grex generates regular expressions from user-provided test cases."
readme = "README_PYPI.md"
requires-python = ">=3.12"
license = {file = "LICENSE"}
keywords = ["pattern", "regex", "regexp"]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Information Technology",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
"Programming Language :: Rust",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Text Processing"
]
[project.urls]
homepage = "https://github.com/pemistahl/grex"
repository = "https://github.com/pemistahl/grex"
[project.optional-dependencies]
test = ["pytest == 9.0.1"]
[tool.maturin]
no-default-features = true
features = ["pyo3/extension-module", "pyo3/generate-import-lib", "python"]
[build-system]
requires = ["maturin>=1.1,<2.0"]
build-backend = "maturin"
================================================
FILE: requirements.txt
================================================
maturin == 1.10.1
pytest == 9.0.1
================================================
FILE: src/builder.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::config::RegExpConfig;
use crate::regexp::RegExp;
use itertools::Itertools;
use std::io::ErrorKind;
use std::path::PathBuf;
pub(crate) const MISSING_TEST_CASES_MESSAGE: &str =
"No test cases have been provided for regular expression generation";
pub(crate) const MINIMUM_REPETITIONS_MESSAGE: &str =
"Quantity of minimum repetitions must be greater than zero";
pub(crate) const MINIMUM_SUBSTRING_LENGTH_MESSAGE: &str =
"Minimum substring length must be greater than zero";
/// This struct builds regular expressions from user-provided test cases.
#[derive(Clone)]
#[cfg_attr(feature = "python", pyo3::prelude::pyclass)]
pub struct RegExpBuilder {
pub(crate) test_cases: Vec<String>,
pub(crate) config: RegExpConfig,
}
impl RegExpBuilder {
/// Specifies the test cases to build the regular expression from.
///
/// The test cases need not be sorted because `RegExpBuilder` sorts them internally.
///
/// ⚠ Panics if `test_cases` is empty.
pub fn from<T: Clone + Into<String>>(test_cases: &[T]) -> Self {
if test_cases.is_empty() {
panic!("{}", MISSING_TEST_CASES_MESSAGE);
}
Self {
test_cases: test_cases.iter().cloned().map(|it| it.into()).collect_vec(),
config: RegExpConfig::new(),
}
}
/// Specifies a text file containing test cases to build the regular expression from.
///
/// The test cases need not be sorted because `RegExpBuilder` sorts them internally.
///
/// Each test case needs to be on a separate line.
/// Lines may be ended with either a newline (`\n`) or
/// a carriage return with a line feed (`\r\n`).
/// The final line ending is optional.
///
/// ⚠ Panics if:
/// - the file cannot be found
/// - the file's encoding is not valid UTF-8 data
/// - the file cannot be opened because of conflicting permissions
pub fn from_file<T: Into<PathBuf>>(file_path: T) -> Self {
match std::fs::read_to_string(file_path.into()) {
Ok(file_content) => Self {
test_cases: file_content.lines().map(|it| it.to_string()).collect_vec(),
config: RegExpConfig::new(),
},
Err(error) => match error.kind() {
ErrorKind::NotFound => panic!("The specified file could not be found"),
ErrorKind::InvalidData => {
panic!("The specified file's encoding is not valid UTF-8")
}
ErrorKind::PermissionDenied => {
panic!("Permission denied: The specified file could not be opened")
}
_ => panic!("{}", error),
},
}
}
/// Converts any Unicode decimal digit to character class `\d`.
///
/// This method takes precedence over
/// [`with_conversion_of_words`](Self::with_conversion_of_words) if both are set.
/// Decimal digits are converted to `\d`, the remaining word characters to `\w`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_whitespace`](Self::with_conversion_of_non_whitespace) if both are set.
/// Decimal digits are converted to `\d`, the remaining non-whitespace characters to `\S`.
pub fn with_conversion_of_digits(&mut self) -> &mut Self {
self.config.is_digit_converted = true;
self
}
/// Converts any character which is not a Unicode decimal digit to character class `\D`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_words`](Self::with_conversion_of_non_words) if both are set.
/// Non-digits which are also non-word characters are converted to `\D`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_whitespace`](Self::with_conversion_of_non_whitespace) if both are set.
/// Non-digits which are also non-space characters are converted to `\D`.
pub fn with_conversion_of_non_digits(&mut self) -> &mut Self {
self.config.is_non_digit_converted = true;
self
}
/// Converts any Unicode whitespace character to character class `\s`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_digits`](Self::with_conversion_of_non_digits) if both are set.
/// Whitespace characters are converted to `\s`, the remaining non-digit characters to `\D`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_words`](Self::with_conversion_of_non_words) if both are set.
/// Whitespace characters are converted to `\s`, the remaining non-word characters to `\W`.
pub fn with_conversion_of_whitespace(&mut self) -> &mut Self {
self.config.is_space_converted = true;
self
}
/// Converts any character which is not a Unicode whitespace character to character class `\S`.
pub fn with_conversion_of_non_whitespace(&mut self) -> &mut Self {
self.config.is_non_space_converted = true;
self
}
/// Converts any Unicode word character to character class `\w`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_digits`](Self::with_conversion_of_non_digits) if both are set.
/// Word characters are converted to `\w`, the remaining non-digit characters to `\D`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_whitespace`](Self::with_conversion_of_non_whitespace) if both are set.
/// Word characters are converted to `\w`, the remaining non-space characters to `\S`.
pub fn with_conversion_of_words(&mut self) -> &mut Self {
self.config.is_word_converted = true;
self
}
/// Converts any character which is not a Unicode word character to character class `\W`.
///
/// This method takes precedence over
/// [`with_conversion_of_non_whitespace`](Self::with_conversion_of_non_whitespace) if both are set.
/// Non-words which are also non-space characters are converted to `\W`.
pub fn with_conversion_of_non_words(&mut self) -> &mut Self {
self.config.is_non_word_converted = true;
self
}
/// Detects repeated non-overlapping substrings and
/// to convert them to `{min,max}` quantifier notation.
pub fn with_conversion_of_repetitions(&mut self) -> &mut Self {
self.config.is_repetition_converted = true;
self
}
/// Enables case-insensitive matching of test cases
/// so that letters match both upper and lower case.
pub fn with_case_insensitive_matching(&mut self) -> &mut Self {
self.config.is_case_insensitive_matching = true;
self
}
/// Replaces non-capturing groups with capturing ones.
pub fn with_capturing_groups(&mut self) -> &mut Self {
self.config.is_capturing_group_enabled = true;
self
}
/// Specifies the minimum quantity of substring repetitions to be converted if
/// [`with_conversion_of_repetitions`](Self::with_conversion_of_repetitions) is set.
///
/// If the quantity is not explicitly set with this method, a default value of 1 will be used.
///
/// ⚠ Panics if `quantity` is zero.
pub fn with_minimum_repetitions(&mut self, quantity: u32) -> &mut Self {
if quantity == 0 {
panic!("{}", MINIMUM_REPETITIONS_MESSAGE);
}
self.config.minimum_repetitions = quantity;
self
}
/// Specifies the minimum length a repeated substring must have in order to be converted if
/// [`with_conversion_of_repetitions`](Self::with_conversion_of_repetitions) is set.
///
/// If the length is not explicitly set with this method, a default value of 1 will be used.
///
/// ⚠ Panics if `length` is zero.
pub fn with_minimum_substring_length(&mut self, length: u32) -> &mut Self {
if length == 0 {
panic!("{}", MINIMUM_SUBSTRING_LENGTH_MESSAGE);
}
self.config.minimum_substring_length = length;
self
}
/// Converts non-ASCII characters to unicode escape sequences.
/// The parameter `use_surrogate_pairs` specifies whether to convert astral code planes
/// (range `U+010000` to `U+10FFFF`) to surrogate pairs.
pub fn with_escaping_of_non_ascii_chars(&mut self, use_surrogate_pairs: bool) -> &mut Self {
self.config.is_non_ascii_char_escaped = true;
self.config.is_astral_code_point_converted_to_surrogate = use_surrogate_pairs;
self
}
/// Produces a nicer looking regular expression in verbose mode.
pub fn with_verbose_mode(&mut self) -> &mut Self {
self.config.is_verbose_mode_enabled = true;
self
}
/// Removes the caret anchor '^' from the resulting regular
/// expression, thereby allowing to match the test cases also when they do not occur
/// at the start of a string.
pub fn without_start_anchor(&mut self) -> &mut Self {
self.config.is_start_anchor_disabled = true;
self
}
/// Removes the dollar sign anchor '$' from the resulting regular
/// expression, thereby allowing to match the test cases also when they do not occur
/// at the end of a string.
pub fn without_end_anchor(&mut self) -> &mut Self {
self.config.is_end_anchor_disabled = true;
self
}
/// Removes the caret and dollar sign anchors from the resulting
/// regular expression, thereby allowing to match the test cases also when they occur
/// within a larger string that contains other content as well.
pub fn without_anchors(&mut self) -> &mut Self {
self.config.is_start_anchor_disabled = true;
self.config.is_end_anchor_disabled = true;
self
}
/// Provides syntax highlighting for the resulting regular expression.
///
/// ⚠ This method may only be used if the resulting regular expression is meant to
/// be printed to the console. The regex string representation returned from enabling
/// this setting cannot be fed into the [*regex*](https://crates.io/crates/regex) crate.
#[cfg(feature = "cli")]
#[doc(hidden)]
pub fn with_syntax_highlighting(&mut self) -> &mut Self {
self.config.is_output_colorized = true;
self
}
/// Builds the actual regular expression using the previously given settings.
pub fn build(&mut self) -> String {
RegExp::from(&mut self.test_cases, &self.config).to_string()
}
}
================================================
FILE: src/char_range.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/// A lightweight replacement for unic_char_range::CharRange
/// Represents a closed range of Unicode characters
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) struct CharRange {
start: char,
end: char,
}
impl CharRange {
/// Creates a closed character range from start to end (inclusive)
pub(crate) fn closed(start: char, end: char) -> Self {
Self { start, end }
}
/// Checks if the given character is within this range
pub(crate) fn contains(&self, c: char) -> bool {
c >= self.start && c <= self.end
}
/// Returns an iterator over all valid Unicode scalar values
/// This includes U+0000 to U+D7FF and U+E000 to U+10FFFF
/// (excludes surrogate code points U+D800 to U+DFFF)
pub(crate) fn all() -> CharRangeIter {
CharRangeIter {
current: '\0',
done: false,
}
}
}
/// Iterator over all valid Unicode scalar values
pub(crate) struct CharRangeIter {
current: char,
done: bool,
}
impl Iterator for CharRangeIter {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
if self.done {
return None;
}
let result = self.current;
// Get the next valid Unicode scalar value
let mut next_code_point = self.current as u32 + 1;
// Skip over surrogate code points (U+D800 to U+DFFF) and find next valid char
loop {
if next_code_point > 0x10FFFF {
// We've reached the end of valid Unicode code points
self.done = true;
break;
}
match char::from_u32(next_code_point) {
Some(next_char) => {
self.current = next_char;
break;
}
None => {
// Invalid code point (likely surrogate), skip to next
next_code_point += 1;
}
}
}
Some(result)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_char_range_contains() {
let range = CharRange::closed('a', 'z');
assert!(range.contains('a'));
assert!(range.contains('m'));
assert!(range.contains('z'));
assert!(!range.contains('A'));
assert!(!range.contains('0'));
}
#[test]
fn test_char_range_all() {
let all_chars: Vec<char> = CharRange::all().take(10).collect();
assert_eq!(all_chars[0], '\0');
assert_eq!(all_chars.len(), 10);
}
#[test]
fn test_char_range_all_count() {
// Valid Unicode scalar values: 0x110000 total code points - 0x800 surrogates = 0x10F800
let count = CharRange::all().count();
assert_eq!(count, 0x10F800);
}
}
================================================
FILE: src/cluster.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::char_range::CharRange;
use crate::config::RegExpConfig;
use crate::grapheme::Grapheme;
use crate::unicode_tables::{DECIMAL_NUMBER, WHITE_SPACE, WORD};
use itertools::Itertools;
use std::cmp::Ordering;
use std::collections::HashMap;
use std::ops::Range;
use std::sync::LazyLock;
use unicode_general_category::GeneralCategory as GC;
use unicode_segmentation::UnicodeSegmentation;
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) struct GraphemeCluster<'a> {
graphemes: Vec<Grapheme>,
config: &'a RegExpConfig,
}
impl<'a> GraphemeCluster<'a> {
pub(crate) fn from(s: &str, config: &'a RegExpConfig) -> Self {
Self {
graphemes: UnicodeSegmentation::graphemes(s, true)
.flat_map(|it| {
let contains_backslash = it.chars().count() == 2 && it.contains('\\');
let contains_combining_mark_or_unassigned_chars = it.chars().any(|c| {
let category = unicode_general_category::get_general_category(c);
matches!(
category,
// Mark categories
GC::NonspacingMark | GC::SpacingMark | GC::EnclosingMark |
// Other categories
GC::Control | GC::Format | GC::Surrogate | GC::PrivateUse | GC::Unassigned
)
});
if contains_backslash || contains_combining_mark_or_unassigned_chars {
it.chars()
.map(|c| {
Grapheme::from(
&c.to_string(),
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)
})
.collect_vec()
} else {
vec![Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)]
}
})
.collect_vec(),
config,
}
}
pub(crate) fn from_graphemes(graphemes: Vec<Grapheme>, config: &'a RegExpConfig) -> Self {
Self { graphemes, config }
}
pub(crate) fn new(grapheme: Grapheme, config: &'a RegExpConfig) -> Self {
Self {
graphemes: vec![grapheme],
config,
}
}
pub(crate) fn convert_to_char_classes(&mut self) {
let is_digit_converted = self.config.is_digit_converted;
let is_non_digit_converted = self.config.is_non_digit_converted;
let is_space_converted = self.config.is_space_converted;
let is_non_space_converted = self.config.is_non_space_converted;
let is_word_converted = self.config.is_word_converted;
let is_non_word_converted = self.config.is_non_word_converted;
for grapheme in self.graphemes.iter_mut() {
grapheme.chars = grapheme
.chars
.iter()
.map(|it| {
it.chars()
.map(|c| {
if is_digit_converted && is_digit(c) {
"\\d".to_string()
} else if is_word_converted && is_word(c) {
"\\w".to_string()
} else if is_space_converted && is_space(c) {
"\\s".to_string()
} else if is_non_digit_converted && !is_digit(c) {
"\\D".to_string()
} else if is_non_word_converted && !is_word(c) {
"\\W".to_string()
} else if is_non_space_converted && !is_space(c) {
"\\S".to_string()
} else {
c.to_string()
}
})
.join("")
})
.collect_vec();
}
}
pub(crate) fn convert_repetitions(&mut self) {
let mut repetitions = vec![];
convert_repetitions(self.graphemes(), repetitions.as_mut(), self.config);
if !repetitions.is_empty() {
self.graphemes = repetitions;
}
}
pub(crate) fn merge(
first: &GraphemeCluster,
second: &GraphemeCluster,
config: &'a RegExpConfig,
) -> Self {
let mut graphemes = vec![];
graphemes.extend_from_slice(&first.graphemes);
graphemes.extend_from_slice(&second.graphemes);
Self { graphemes, config }
}
pub(crate) fn graphemes(&self) -> &Vec<Grapheme> {
&self.graphemes
}
pub(crate) fn graphemes_mut(&mut self) -> &mut Vec<Grapheme> {
&mut self.graphemes
}
pub(crate) fn size(&self) -> usize {
self.graphemes.len()
}
pub(crate) fn char_count(&self, is_non_ascii_char_escaped: bool) -> usize {
self.graphemes
.iter()
.map(|it| it.char_count(is_non_ascii_char_escaped))
.sum()
}
pub(crate) fn is_empty(&self) -> bool {
self.graphemes.is_empty()
}
}
fn is_digit(c: char) -> bool {
static VALID_NUMERIC_CHARS: LazyLock<Vec<CharRange>> =
LazyLock::new(|| convert_chars_to_range(DECIMAL_NUMBER));
VALID_NUMERIC_CHARS.iter().any(|range| range.contains(c))
}
fn is_word(c: char) -> bool {
static VALID_ALPHANUMERIC_CHARS: LazyLock<Vec<CharRange>> =
LazyLock::new(|| convert_chars_to_range(WORD));
VALID_ALPHANUMERIC_CHARS
.iter()
.any(|range| range.contains(c))
}
fn is_space(c: char) -> bool {
static VALID_SPACE_CHARS: LazyLock<Vec<CharRange>> =
LazyLock::new(|| convert_chars_to_range(WHITE_SPACE));
VALID_SPACE_CHARS.iter().any(|range| range.contains(c))
}
fn convert_repetitions(
graphemes: &[Grapheme],
repetitions: &mut Vec<Grapheme>,
config: &RegExpConfig,
) {
let repeated_substrings = collect_repeated_substrings(graphemes);
let ranges_of_repetitions = create_ranges_of_repetitions(repeated_substrings, config);
let coalesced_repetitions = coalesce_repetitions(ranges_of_repetitions);
replace_graphemes_with_repetitions(coalesced_repetitions, graphemes, repetitions, config)
}
fn collect_repeated_substrings(graphemes: &[Grapheme]) -> HashMap<Vec<String>, Vec<usize>> {
let mut map = HashMap::new();
for i in 0..graphemes.len() {
let suffix = &graphemes[i..];
for j in 1..=graphemes.len() / 2 {
if suffix.len() >= j {
let prefix = suffix[..j].iter().map(|it| it.value()).collect_vec();
let indices = map.entry(prefix).or_insert_with(Vec::new);
indices.push(i);
}
}
}
map
}
fn create_ranges_of_repetitions(
repeated_substrings: HashMap<Vec<String>, Vec<usize>>,
config: &RegExpConfig,
) -> Vec<(Range<usize>, Vec<String>)> {
let mut repetitions = Vec::<(Range<usize>, Vec<String>)>::new();
for (prefix_length, group) in &repeated_substrings
.iter()
.filter(|&(prefix, indices)| {
indices
.iter()
.tuple_windows()
.all(|(first, second)| (second - first) >= prefix.len())
})
.sorted_by_key(|&(prefix, _)| prefix.len())
.rev()
.chunk_by(|&(prefix, _)| prefix.len())
{
for (prefix, indices) in group.sorted_by_key(|&(_, indices)| indices[0]) {
indices
.iter()
.map(|it| *it..it + prefix_length)
.coalesce(|x, y| {
if x.end == y.start {
Ok(x.start..y.end)
} else {
Err((x, y))
}
})
.filter(|range| {
let count = ((range.end - range.start) / prefix_length) as u32;
count > config.minimum_repetitions
})
.for_each(|range| repetitions.push((range, prefix.clone())));
}
}
repetitions
}
fn coalesce_repetitions(
ranges_of_repetitions: Vec<(Range<usize>, Vec<String>)>,
) -> Vec<(Range<usize>, Vec<String>)> {
ranges_of_repetitions
.iter()
.sorted_by(|&(first_range, _), &(second_range, _)| {
match second_range.end.cmp(&first_range.end) {
Ordering::Equal => first_range.start.cmp(&second_range.start),
other => other,
}
})
.coalesce(|first_tup, second_tup| {
let first_range = &first_tup.0;
let second_range = &second_tup.0;
if (first_range.contains(&second_range.start)
|| first_range.contains(&second_range.end))
&& second_range.end != first_range.start
{
Ok(first_tup)
} else {
Err((first_tup, second_tup))
}
})
.map(|(range, substr)| (range.clone(), substr.clone()))
.collect_vec()
}
fn replace_graphemes_with_repetitions(
coalesced_repetitions: Vec<(Range<usize>, Vec<String>)>,
graphemes: &[Grapheme],
repetitions: &mut Vec<Grapheme>,
config: &RegExpConfig,
) {
if coalesced_repetitions.is_empty() {
return;
}
for grapheme in graphemes {
repetitions.push(grapheme.clone());
}
for (range, substr) in coalesced_repetitions.iter() {
if range.end > repetitions.len() {
break;
}
let count = ((range.end - range.start) / substr.len()) as u32;
if substr.len() < config.minimum_substring_length as usize {
continue;
}
repetitions.splice(
range.clone(),
[Grapheme::new(
substr.clone(),
count,
count,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)]
.iter()
.cloned(),
);
}
for new_grapheme in repetitions.iter_mut() {
convert_repetitions(
&new_grapheme
.chars
.iter()
.map(|it| {
Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)
})
.collect_vec(),
new_grapheme.repetitions.as_mut(),
config,
);
}
}
fn convert_chars_to_range(chars: &[(char, char)]) -> Vec<CharRange> {
chars
.iter()
.map(|&(start, end)| CharRange::closed(start, end))
.collect_vec()
}
================================================
FILE: src/component.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::quantifier::Quantifier;
use std::fmt::{Display, Formatter, Result};
pub(crate) enum Component {
CapturedLeftParenthesis,
CapturedParenthesizedExpression(String, bool, bool),
Caret(bool),
CharClass(String),
DollarSign(bool),
Hyphen,
IgnoreCaseFlag,
IgnoreCaseAndVerboseModeFlag,
LeftBracket,
Pipe,
Quantifier(Quantifier, bool),
Repetition(u32, bool),
RepetitionRange(u32, u32, bool),
RightBracket,
RightParenthesis,
UncapturedLeftParenthesis,
UncapturedParenthesizedExpression(String, bool, bool),
VerboseModeFlag,
}
impl Component {
pub(crate) fn to_repr(&self, is_output_colorized: bool) -> String {
match is_output_colorized {
true => self.to_colored_string(false),
false => self.to_string(),
}
}
pub(crate) fn to_colored_string(&self, is_escaped: bool) -> String {
match self {
Component::CapturedLeftParenthesis => Self::green_bold(&self.to_string(), is_escaped),
Component::CapturedParenthesizedExpression(
expr,
is_verbose_mode_enabled,
has_final_line_break,
) => {
if *is_verbose_mode_enabled {
if *has_final_line_break {
format!(
"\n{}\n{}\n{}\n",
Component::CapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
} else {
format!(
"\n{}\n{}\n{}",
Component::CapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
}
} else {
format!(
"{}{}{}",
Component::CapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
}
}
Component::Caret(is_verbose_mode_enabled) => {
if *is_verbose_mode_enabled {
format!(
"{}\n",
Self::yellow_bold(&Component::Caret(false).to_string(), is_escaped)
)
} else {
Self::yellow_bold(&self.to_string(), is_escaped)
}
}
Component::CharClass(value) => Self::black_on_bright_yellow(value, is_escaped),
Component::DollarSign(is_verbose_mode_enabled) => {
if *is_verbose_mode_enabled {
format!(
"\n{}",
Self::yellow_bold(&Component::DollarSign(false).to_string(), is_escaped)
)
} else {
Self::yellow_bold(&self.to_string(), is_escaped)
}
}
Component::Hyphen => Self::cyan_bold(&self.to_string(), is_escaped),
Component::IgnoreCaseFlag => {
Self::bright_yellow_on_black(&self.to_string(), is_escaped)
}
Component::IgnoreCaseAndVerboseModeFlag => {
format!("{}\n", Self::bright_yellow_on_black("(?ix)", is_escaped))
}
Component::LeftBracket => Self::cyan_bold(&self.to_string(), is_escaped),
Component::Pipe => Self::red_bold(&self.to_string(), is_escaped),
Component::Quantifier(quantifier, is_verbose_mode_enabled) => {
if *is_verbose_mode_enabled {
format!(
"{}\n",
Self::purple_bold(&quantifier.to_string(), is_escaped)
)
} else {
Self::purple_bold(&self.to_string(), is_escaped)
}
}
Component::Repetition(num, is_verbose_mode_enabled) => {
if *is_verbose_mode_enabled {
format!(
"{}\n",
Self::white_on_bright_blue(
&Component::Repetition(*num, false).to_string(),
is_escaped
)
)
} else {
Self::white_on_bright_blue(&self.to_string(), is_escaped)
}
}
Component::RepetitionRange(min, max, is_verbose_mode_enabled) => {
if *is_verbose_mode_enabled {
format!(
"{}\n",
Self::white_on_bright_blue(
&Component::RepetitionRange(*min, *max, false).to_string(),
is_escaped
)
)
} else {
Self::white_on_bright_blue(&self.to_string(), is_escaped)
}
}
Component::RightBracket => Self::cyan_bold(&self.to_string(), is_escaped),
Component::RightParenthesis => Self::green_bold(&self.to_string(), is_escaped),
Component::UncapturedLeftParenthesis => Self::green_bold(&self.to_string(), is_escaped),
Component::UncapturedParenthesizedExpression(
expr,
is_verbose_mode_enabled,
has_final_line_break,
) => {
if *is_verbose_mode_enabled {
if *has_final_line_break {
format!(
"\n{}\n{}\n{}\n",
Component::UncapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
} else {
format!(
"\n{}\n{}\n{}",
Component::UncapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
}
} else {
format!(
"{}{}{}",
Component::UncapturedLeftParenthesis.to_colored_string(is_escaped),
expr,
Component::RightParenthesis.to_colored_string(is_escaped)
)
}
}
Component::VerboseModeFlag => {
format!("{}\n", Self::bright_yellow_on_black("(?x)", is_escaped))
}
}
}
fn black_on_bright_yellow(value: &str, is_escaped: bool) -> String {
Self::color_code("103;30", value, is_escaped)
}
fn bright_yellow_on_black(value: &str, is_escaped: bool) -> String {
Self::color_code("40;93", value, is_escaped)
}
fn cyan_bold(value: &str, is_escaped: bool) -> String {
Self::color_code("1;36", value, is_escaped)
}
fn green_bold(value: &str, is_escaped: bool) -> String {
Self::color_code("1;32", value, is_escaped)
}
fn purple_bold(value: &str, is_escaped: bool) -> String {
Self::color_code("1;35", value, is_escaped)
}
fn red_bold(value: &str, is_escaped: bool) -> String {
Self::color_code("1;31", value, is_escaped)
}
fn white_on_bright_blue(value: &str, is_escaped: bool) -> String {
Self::color_code("104;37", value, is_escaped)
}
fn yellow_bold(value: &str, is_escaped: bool) -> String {
Self::color_code("1;33", value, is_escaped)
}
fn color_code(code: &str, value: &str, is_escaped: bool) -> String {
if is_escaped {
format!("\u{1b}\\[{}m\\{}\u{1b}\\[0m", code, value)
} else {
format!("\u{1b}[{}m{}\u{1b}[0m", code, value)
}
}
}
impl Display for Component {
fn fmt(&self, f: &mut Formatter<'_>) -> Result {
write!(
f,
"{}",
match self {
Component::CapturedLeftParenthesis => "(".to_string(),
Component::CapturedParenthesizedExpression(
expr,
is_verbose_mode_enabled,
has_final_line_break,
) =>
if *is_verbose_mode_enabled {
if *has_final_line_break {
format!(
"\n{}\n{}\n{}\n",
Component::CapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
} else {
format!(
"\n{}\n{}\n{}",
Component::CapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
}
} else {
format!(
"{}{}{}",
Component::CapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
},
Component::Caret(is_verbose_mode_enabled) =>
if *is_verbose_mode_enabled {
"^\n".to_string()
} else {
"^".to_string()
},
Component::CharClass(value) => value.clone(),
Component::DollarSign(is_verbose_mode_enabled) =>
if *is_verbose_mode_enabled {
"\n$".to_string()
} else {
"$".to_string()
},
Component::Hyphen => "-".to_string(),
Component::IgnoreCaseFlag => "(?i)".to_string(),
Component::IgnoreCaseAndVerboseModeFlag => "(?ix)\n".to_string(),
Component::LeftBracket => "[".to_string(),
Component::Pipe => "|".to_string(),
Component::Quantifier(quantifier, is_verbose_mode_enabled) =>
if *is_verbose_mode_enabled {
format!("{}\n", quantifier)
} else {
quantifier.to_string()
},
Component::Repetition(num, is_verbose_mode_enabled) => {
if *num == 0 && *is_verbose_mode_enabled {
"{\\d+\\}\n".to_string()
} else if *num == 0 {
"{\\d+\\}".to_string()
} else if *is_verbose_mode_enabled {
format!("{{{}}}\n", num)
} else {
format!("{{{}}}", num)
}
}
Component::RepetitionRange(min, max, is_verbose_mode_enabled) => {
if *min == 0 && *max == 0 && *is_verbose_mode_enabled {
"{\\d+,\\d+\\}\n".to_string()
} else if *min == 0 && *max == 0 {
"{\\d+,\\d+\\}".to_string()
} else if *is_verbose_mode_enabled {
format!("{{{},{}}}\n", min, max)
} else {
format!("{{{},{}}}", min, max)
}
}
Component::RightBracket => "]".to_string(),
Component::RightParenthesis => ")".to_string(),
Component::UncapturedLeftParenthesis => "(?:".to_string(),
Component::UncapturedParenthesizedExpression(
expr,
is_verbose_mode_enabled,
has_final_line_break,
) => {
if *is_verbose_mode_enabled {
if *has_final_line_break {
format!(
"\n{}\n{}\n{}\n",
Component::UncapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
} else {
format!(
"\n{}\n{}\n{}",
Component::UncapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
}
} else {
format!(
"{}{}{}",
Component::UncapturedLeftParenthesis,
expr,
Component::RightParenthesis
)
}
}
Component::VerboseModeFlag => "(?x)\n".to_string(),
}
)
}
}
================================================
FILE: src/config.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#[derive(Clone, Debug, Hash, Ord, PartialOrd, Eq, PartialEq)]
pub(crate) struct RegExpConfig {
pub(crate) minimum_repetitions: u32,
pub(crate) minimum_substring_length: u32,
pub(crate) is_digit_converted: bool,
pub(crate) is_non_digit_converted: bool,
pub(crate) is_space_converted: bool,
pub(crate) is_non_space_converted: bool,
pub(crate) is_word_converted: bool,
pub(crate) is_non_word_converted: bool,
pub(crate) is_repetition_converted: bool,
pub(crate) is_case_insensitive_matching: bool,
pub(crate) is_capturing_group_enabled: bool,
pub(crate) is_non_ascii_char_escaped: bool,
pub(crate) is_astral_code_point_converted_to_surrogate: bool,
pub(crate) is_verbose_mode_enabled: bool,
pub(crate) is_start_anchor_disabled: bool,
pub(crate) is_end_anchor_disabled: bool,
pub(crate) is_output_colorized: bool,
}
impl RegExpConfig {
pub(crate) fn new() -> Self {
Self {
minimum_repetitions: 1,
minimum_substring_length: 1,
is_digit_converted: false,
is_non_digit_converted: false,
is_space_converted: false,
is_non_space_converted: false,
is_word_converted: false,
is_non_word_converted: false,
is_repetition_converted: false,
is_case_insensitive_matching: false,
is_capturing_group_enabled: false,
is_non_ascii_char_escaped: false,
is_astral_code_point_converted_to_surrogate: false,
is_verbose_mode_enabled: false,
is_start_anchor_disabled: false,
is_end_anchor_disabled: false,
is_output_colorized: false,
}
}
pub(crate) fn is_char_class_feature_enabled(&self) -> bool {
self.is_digit_converted
|| self.is_non_digit_converted
|| self.is_space_converted
|| self.is_non_space_converted
|| self.is_word_converted
|| self.is_non_word_converted
|| self.is_case_insensitive_matching
|| self.is_capturing_group_enabled
}
}
================================================
FILE: src/dfa.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::cluster::GraphemeCluster;
use crate::config::RegExpConfig;
use crate::grapheme::Grapheme;
use itertools::Itertools;
use petgraph::graph::NodeIndex;
use petgraph::stable_graph::{Edges, StableGraph};
use petgraph::visit::Dfs;
use petgraph::{Directed, Direction};
use std::cmp::{max, min};
use std::collections::{BTreeSet, HashMap, HashSet};
type State = NodeIndex<u32>;
type StateLabel = String;
type EdgeLabel = Grapheme;
pub(crate) struct Dfa<'a> {
alphabet: BTreeSet<Grapheme>,
graph: StableGraph<StateLabel, EdgeLabel>,
initial_state: State,
final_state_indices: HashSet<usize>,
config: &'a RegExpConfig,
}
impl<'a> Dfa<'a> {
pub(crate) fn from(
grapheme_clusters: &[GraphemeCluster],
is_minimized: bool,
config: &'a RegExpConfig,
) -> Self {
let mut dfa = Self::new(config);
for cluster in grapheme_clusters {
dfa.insert(cluster);
}
if is_minimized {
dfa.minimize();
}
dfa
}
pub(crate) fn state_count(&self) -> usize {
self.graph.node_count()
}
pub(crate) fn states_in_depth_first_order(&self) -> Vec<State> {
let mut depth_first_search = Dfs::new(&self.graph, self.initial_state);
let mut states = vec![];
while let Some(state) = depth_first_search.next(&self.graph) {
states.push(state);
}
states
}
pub(crate) fn outgoing_edges(&self, state: State) -> Edges<'_, Grapheme, Directed> {
self.graph.edges_directed(state, Direction::Outgoing)
}
pub(crate) fn is_final_state(&self, state: State) -> bool {
self.final_state_indices.contains(&state.index())
}
fn new(config: &'a RegExpConfig) -> Self {
let mut graph = StableGraph::new();
let initial_state = graph.add_node("".to_string());
Self {
alphabet: BTreeSet::new(),
graph,
initial_state,
final_state_indices: HashSet::new(),
config,
}
}
fn insert(&mut self, cluster: &GraphemeCluster) {
let mut current_state = self.initial_state;
for grapheme in cluster.graphemes() {
self.alphabet.insert(grapheme.clone());
current_state = self.return_next_state(current_state, grapheme);
}
self.final_state_indices.insert(current_state.index());
}
fn return_next_state(&mut self, current_state: State, edge_label: &Grapheme) -> State {
match self.find_next_state(current_state, edge_label) {
Some(next_state) => next_state,
None => self.add_new_state(current_state, edge_label),
}
}
fn find_next_state(&mut self, current_state: State, grapheme: &Grapheme) -> Option<State> {
for next_state in self.graph.neighbors(current_state) {
let edge_idx = self.graph.find_edge(current_state, next_state).unwrap();
let current_grapheme = self.graph.edge_weight(edge_idx).unwrap();
if current_grapheme.value() != grapheme.value() {
continue;
}
if current_grapheme.maximum() == grapheme.maximum() - 1 {
let min = min(current_grapheme.minimum(), grapheme.minimum());
let max = max(current_grapheme.maximum(), grapheme.maximum());
let new_grapheme = Grapheme::new(
grapheme.chars().clone(),
min,
max,
self.config.is_capturing_group_enabled,
self.config.is_output_colorized,
self.config.is_verbose_mode_enabled,
);
self.graph
.update_edge(current_state, next_state, new_grapheme);
return Some(next_state);
} else if current_grapheme.maximum() == grapheme.maximum() {
return Some(next_state);
}
}
None
}
fn add_new_state(&mut self, current_state: State, edge_label: &Grapheme) -> State {
let next_state = self.graph.add_node("".to_string());
self.graph
.add_edge(current_state, next_state, edge_label.clone());
next_state
}
#[allow(clippy::many_single_char_names)]
fn minimize(&mut self) {
let mut p = self.get_initial_partition();
let mut w = p.iter().cloned().collect_vec();
while !w.is_empty() {
let a = w.drain(0..1).next().unwrap();
for edge_label in self.alphabet.iter() {
let x = self.get_parent_states(&a, edge_label);
let mut replacements = vec![];
let mut is_replacement_needed = true;
let mut start_idx = 0;
while is_replacement_needed {
for (idx, y) in p.iter().enumerate().skip(start_idx) {
if x.intersection(y).count() == 0 || y.difference(&x).count() == 0 {
is_replacement_needed = false;
continue;
}
let i = x.intersection(y).copied().collect::<HashSet<State>>();
let d = y.difference(&x).copied().collect::<HashSet<State>>();
is_replacement_needed = true;
start_idx = idx;
replacements.push((y.clone(), i, d));
break;
}
if is_replacement_needed {
let (_, i, d) = replacements.last().unwrap();
p.remove(start_idx);
p.insert(start_idx, i.clone());
p.insert(start_idx + 1, d.clone());
}
}
for (y, i, d) in replacements {
if w.contains(&y) {
let idx = w.iter().position(|it| it == &y).unwrap();
w.remove(idx);
w.push(i);
w.push(d);
} else if i.len() <= d.len() {
w.push(i);
} else {
w.push(d);
}
}
}
}
self.recreate_graph(p.iter().filter(|&it| !it.is_empty()).collect_vec());
}
fn get_initial_partition(&self) -> Vec<HashSet<State>> {
let (final_states, non_final_states): (HashSet<State>, HashSet<State>) = self
.graph
.node_indices()
.partition(|&state| !self.final_state_indices.contains(&state.index()));
vec![final_states, non_final_states]
}
fn get_parent_states(&self, a: &HashSet<State>, label: &Grapheme) -> HashSet<State> {
let mut x = HashSet::new();
for &state in a {
let direct_parent_states = self.graph.neighbors_directed(state, Direction::Incoming);
for parent_state in direct_parent_states {
let edge = self.graph.find_edge(parent_state, state).unwrap();
let grapheme = self.graph.edge_weight(edge).unwrap();
if grapheme.value() == label.value()
&& (grapheme.maximum() == label.maximum()
|| grapheme.minimum() == label.minimum())
{
x.insert(parent_state);
break;
}
}
}
x
}
fn recreate_graph(&mut self, p: Vec<&HashSet<State>>) {
let mut graph = StableGraph::<StateLabel, EdgeLabel>::new();
let mut final_state_indices = HashSet::new();
let mut state_mappings = HashMap::new();
let mut new_initial_state: Option<NodeIndex> = None;
for equivalence_class in p.iter() {
let new_state = graph.add_node("".to_string());
for old_state in equivalence_class.iter() {
if self.initial_state == *old_state {
new_initial_state = Some(new_state);
}
state_mappings.insert(*old_state, new_state);
}
}
for equivalence_class in p.iter() {
let old_source_state = *equivalence_class.iter().next().unwrap();
let new_source_state = state_mappings.get(&old_source_state).unwrap();
for old_target_state in self.graph.neighbors(old_source_state) {
let edge = self
.graph
.find_edge(old_source_state, old_target_state)
.unwrap();
let grapheme = self.graph.edge_weight(edge).unwrap().clone();
let new_target_state = state_mappings.get(&old_target_state).unwrap();
graph.add_edge(*new_source_state, *new_target_state, grapheme.clone());
if self.final_state_indices.contains(&old_target_state.index()) {
final_state_indices.insert(new_target_state.index());
}
}
}
self.initial_state = new_initial_state.unwrap();
self.final_state_indices = final_state_indices;
self.graph = graph;
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_state_count() {
let config = RegExpConfig::new();
let mut dfa = Dfa::new(&config);
assert_eq!(dfa.state_count(), 1);
dfa.insert(&GraphemeCluster::from("abcd", &RegExpConfig::new()));
assert_eq!(dfa.state_count(), 5);
}
#[test]
fn test_is_final_state() {
let config = RegExpConfig::new();
let dfa = Dfa::from(
&[GraphemeCluster::from("abcd", &RegExpConfig::new())],
true,
&config,
);
let intermediate_state = State::new(3);
assert_eq!(dfa.is_final_state(intermediate_state), false);
let final_state = State::new(4);
assert_eq!(dfa.is_final_state(final_state), true);
}
#[test]
fn test_outgoing_edges() {
let config = RegExpConfig::new();
let dfa = Dfa::from(
&[
GraphemeCluster::from("abcd", &RegExpConfig::new()),
GraphemeCluster::from("abxd", &RegExpConfig::new()),
],
true,
&config,
);
let state = State::new(2);
let mut edges = dfa.outgoing_edges(state);
let first_edge = edges.next();
assert!(first_edge.is_some());
assert_eq!(
first_edge.unwrap().weight(),
&Grapheme::from("c", false, false, false)
);
let second_edge = edges.next();
assert!(second_edge.is_some());
assert_eq!(
second_edge.unwrap().weight(),
&Grapheme::from("x", false, false, false)
);
let third_edge = edges.next();
assert!(third_edge.is_none());
}
#[test]
fn test_states_in_depth_first_order() {
let config = RegExpConfig::new();
let dfa = Dfa::from(
&[
GraphemeCluster::from("abcd", &RegExpConfig::new()),
GraphemeCluster::from("axyz", &RegExpConfig::new()),
],
true,
&config,
);
let states = dfa.states_in_depth_first_order();
assert_eq!(states.len(), 7);
let first_state = states.get(0).unwrap();
let mut edges = dfa.outgoing_edges(*first_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("a", false, false, false)
);
assert!(edges.next().is_none());
let second_state = states.get(1).unwrap();
edges = dfa.outgoing_edges(*second_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("b", false, false, false)
);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("x", false, false, false)
);
assert!(edges.next().is_none());
let third_state = states.get(2).unwrap();
edges = dfa.outgoing_edges(*third_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("y", false, false, false)
);
assert!(edges.next().is_none());
let fourth_state = states.get(3).unwrap();
edges = dfa.outgoing_edges(*fourth_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("z", false, false, false)
);
assert!(edges.next().is_none());
let fifth_state = states.get(4).unwrap();
edges = dfa.outgoing_edges(*fifth_state);
assert!(edges.next().is_none());
let sixth_state = states.get(5).unwrap();
edges = dfa.outgoing_edges(*sixth_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("c", false, false, false)
);
assert!(edges.next().is_none());
let seventh_state = states.get(6).unwrap();
edges = dfa.outgoing_edges(*seventh_state);
assert_eq!(
edges.next().unwrap().weight(),
&Grapheme::from("d", false, false, false)
);
assert!(edges.next().is_none());
}
#[test]
fn test_minimization_algorithm() {
let config = RegExpConfig::new();
let mut dfa = Dfa::new(&config);
assert_eq!(dfa.graph.node_count(), 1);
assert_eq!(dfa.graph.edge_count(), 0);
dfa.insert(&GraphemeCluster::from("abcd", &RegExpConfig::new()));
assert_eq!(dfa.graph.node_count(), 5);
assert_eq!(dfa.graph.edge_count(), 4);
dfa.insert(&GraphemeCluster::from("abxd", &RegExpConfig::new()));
assert_eq!(dfa.graph.node_count(), 7);
assert_eq!(dfa.graph.edge_count(), 6);
dfa.minimize();
assert_eq!(dfa.graph.node_count(), 5);
assert_eq!(dfa.graph.edge_count(), 5);
}
#[test]
fn test_dfa_constructor() {
let config = RegExpConfig::new();
let dfa = Dfa::from(
&[
GraphemeCluster::from("abcd", &RegExpConfig::new()),
GraphemeCluster::from("abxd", &RegExpConfig::new()),
],
true,
&config,
);
assert_eq!(dfa.graph.node_count(), 5);
assert_eq!(dfa.graph.edge_count(), 5);
}
}
================================================
FILE: src/expression.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::cluster::GraphemeCluster;
use crate::config::RegExpConfig;
use crate::dfa::Dfa;
use crate::grapheme::Grapheme;
use crate::quantifier::Quantifier;
use crate::substring::Substring;
use itertools::EitherOrBoth::Both;
use itertools::Itertools;
use ndarray::{Array1, Array2};
use petgraph::prelude::EdgeRef;
use std::cmp::Reverse;
use std::collections::BTreeSet;
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) enum Expression<'a> {
Alternation(Vec<Expression<'a>>, bool, bool, bool),
CharacterClass(BTreeSet<char>, bool),
Concatenation(Box<Expression<'a>>, Box<Expression<'a>>, bool, bool, bool),
Literal(GraphemeCluster<'a>, bool, bool),
Repetition(Box<Expression<'a>>, Quantifier, bool, bool, bool),
}
impl<'a> Expression<'a> {
pub(crate) fn from(dfa: Dfa, config: &'a RegExpConfig) -> Self {
let states = dfa.states_in_depth_first_order();
let state_count = dfa.state_count();
let mut a = Array2::<Option<Expression>>::default((state_count, state_count));
let mut b = Array1::<Option<Expression>>::default(state_count);
for (i, state) in states.iter().enumerate() {
if dfa.is_final_state(*state) {
b[i] = Some(Expression::new_literal(
GraphemeCluster::from("", config),
config,
));
}
for edge in dfa.outgoing_edges(*state) {
let literal = Expression::new_literal(
GraphemeCluster::new(edge.weight().clone(), config),
config,
);
let j = states.iter().position(|&it| it == edge.target()).unwrap();
a[(i, j)] = if a[(i, j)].is_some() {
Self::union(&a[(i, j)], &Some(literal), config)
} else {
Some(literal)
}
}
}
for n in (0..state_count).rev() {
if a[(n, n)].is_some() {
b[n] = Self::concatenate(
&Self::repeat_zero_or_more_times(&a[(n, n)], config),
&b[n],
config,
);
for j in 0..n {
a[(n, j)] = Self::concatenate(
&Self::repeat_zero_or_more_times(&a[(n, n)], config),
&a[(n, j)],
config,
);
}
}
for i in 0..n {
if a[(i, n)].is_some() {
b[i] =
Self::union(&b[i], &Self::concatenate(&a[(i, n)], &b[n], config), config);
for j in 0..n {
a[(i, j)] = Self::union(
&a[(i, j)],
&Self::concatenate(&a[(i, n)], &a[(n, j)], config),
config,
);
}
}
}
}
if !b.is_empty() && b[0].is_some() {
b[0].as_ref().unwrap().clone()
} else {
Expression::new_literal(GraphemeCluster::from("", config), config)
}
}
pub(crate) fn new_alternation(exprs: Vec<Expression<'a>>, config: &RegExpConfig) -> Self {
let mut options: Vec<Expression> = vec![];
Self::flatten_alternations(&mut options, exprs);
options.sort_by_key(|option| Reverse(option.len()));
Expression::Alternation(
options,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)
}
fn new_character_class(
first_char_set: BTreeSet<char>,
second_char_set: BTreeSet<char>,
config: &RegExpConfig,
) -> Self {
let union_set = first_char_set.union(&second_char_set).copied().collect();
Expression::CharacterClass(union_set, config.is_output_colorized)
}
fn new_concatenation(
expr1: Expression<'a>,
expr2: Expression<'a>,
config: &RegExpConfig,
) -> Self {
Expression::Concatenation(
Box::from(expr1),
Box::from(expr2),
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)
}
pub(crate) fn new_literal(cluster: GraphemeCluster<'a>, config: &RegExpConfig) -> Self {
Expression::Literal(
cluster,
config.is_non_ascii_char_escaped,
config.is_astral_code_point_converted_to_surrogate,
)
}
fn new_repetition(expr: Expression<'a>, quantifier: Quantifier, config: &RegExpConfig) -> Self {
Expression::Repetition(
Box::from(expr),
quantifier,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled,
)
}
fn is_empty(&self) -> bool {
match self {
Expression::Literal(cluster, _, _) => cluster.is_empty(),
_ => false,
}
}
pub(crate) fn is_single_codepoint(&self) -> bool {
match self {
Expression::CharacterClass(_, _) => true,
Expression::Literal(cluster, is_non_ascii_char_escaped, _) => {
cluster.char_count(*is_non_ascii_char_escaped) == 1
&& cluster.graphemes().first().unwrap().maximum() == 1
}
_ => false,
}
}
fn len(&self) -> usize {
match self {
Expression::Alternation(options, _, _, _) => options.first().unwrap().len(),
Expression::CharacterClass(_, _) => 1,
Expression::Concatenation(expr1, expr2, _, _, _) => expr1.len() + expr2.len(),
Expression::Literal(cluster, _, _) => cluster.size(),
Expression::Repetition(expr, _, _, _, _) => expr.len(),
}
}
pub(crate) fn precedence(&self) -> u8 {
match self {
Expression::Alternation(_, _, _, _) | Expression::CharacterClass(_, _) => 1,
Expression::Concatenation(_, _, _, _, _) | Expression::Literal(_, _, _) => 2,
Expression::Repetition(_, _, _, _, _) => 3,
}
}
pub(crate) fn remove_substring(&mut self, substring: &Substring, length: usize) {
match self {
Expression::Concatenation(expr1, expr2, _, _, _) => match substring {
Substring::Prefix => {
if let Expression::Literal(_, _, _) = **expr1 {
expr1.remove_substring(substring, length)
}
}
Substring::Suffix => {
if let Expression::Literal(_, _, _) = **expr2 {
expr2.remove_substring(substring, length)
}
}
},
Expression::Literal(cluster, _, _) => match substring {
Substring::Prefix => {
cluster.graphemes_mut().drain(..length);
}
Substring::Suffix => {
let graphemes = cluster.graphemes_mut();
graphemes.drain(graphemes.len() - length..);
}
},
_ => (),
}
}
pub(crate) fn value(&self, substring: Option<&Substring>) -> Option<Vec<Grapheme>> {
match self {
Expression::Concatenation(expr1, expr2, _, _, _) => match substring {
Some(value) => match value {
Substring::Prefix => expr1.value(None),
Substring::Suffix => expr2.value(None),
},
None => None,
},
Expression::Literal(cluster, _, _) => Some(cluster.graphemes().clone()),
_ => None,
}
}
fn repeat_zero_or_more_times(
expr: &Option<Expression<'a>>,
config: &'a RegExpConfig,
) -> Option<Expression<'a>> {
expr.as_ref()
.map(|value| Expression::new_repetition(value.clone(), Quantifier::KleeneStar, config))
}
fn concatenate(
a: &Option<Expression<'a>>,
b: &Option<Expression<'a>>,
config: &'a RegExpConfig,
) -> Option<Expression<'a>> {
if a.is_none() || b.is_none() {
return None;
}
let expr1 = a.as_ref().unwrap();
let expr2 = b.as_ref().unwrap();
if expr1.is_empty() {
return b.clone();
}
if expr2.is_empty() {
return a.clone();
}
if let (Expression::Literal(graphemes_a, _, _), Expression::Literal(graphemes_b, _, _)) =
(&expr1, &expr2)
{
return Some(Expression::new_literal(
GraphemeCluster::merge(graphemes_a, graphemes_b, config),
config,
));
}
if let (
Expression::Literal(graphemes_a, _, _),
Expression::Concatenation(first, second, _, _, _),
) = (&expr1, &expr2)
{
if let Expression::Literal(graphemes_first, _, _) = &**first {
let literal = Expression::new_literal(
GraphemeCluster::merge(graphemes_a, graphemes_first, config),
config,
);
return Some(Expression::new_concatenation(
literal,
*second.clone(),
config,
));
}
}
if let (
Expression::Literal(graphemes_b, _, _),
Expression::Concatenation(first, second, _, _, _),
) = (&expr2, &expr1)
{
if let Expression::Literal(graphemes_second, _, _) = &**second {
let literal = Expression::new_literal(
GraphemeCluster::merge(graphemes_second, graphemes_b, config),
config,
);
return Some(Expression::new_concatenation(
*first.clone(),
literal,
config,
));
}
}
Some(Expression::new_concatenation(
expr1.clone(),
expr2.clone(),
config,
))
}
fn union(
a: &Option<Expression<'a>>,
b: &Option<Expression<'a>>,
config: &'a RegExpConfig,
) -> Option<Expression<'a>> {
if let (Some(mut expr1), Some(mut expr2)) = (a.clone(), b.clone()) {
if expr1 != expr2 {
let common_prefix =
Self::remove_common_substring(&mut expr1, &mut expr2, Substring::Prefix);
let common_suffix =
Self::remove_common_substring(&mut expr1, &mut expr2, Substring::Suffix);
let mut result = if expr1.is_empty() {
Some(Expression::new_repetition(
expr2.clone(),
Quantifier::QuestionMark,
config,
))
} else if expr2.is_empty() {
Some(Expression::new_repetition(
expr1.clone(),
Quantifier::QuestionMark,
config,
))
} else {
None
};
if result.is_none() {
if let Expression::Repetition(expr, quantifier, _, _, _) = &expr1 {
if quantifier == &Quantifier::QuestionMark {
let alternation = Expression::new_alternation(
vec![*expr.clone(), expr2.clone()],
config,
);
result = Some(Expression::new_repetition(
alternation,
Quantifier::QuestionMark,
config,
));
}
}
}
if result.is_none() {
if let Expression::Repetition(expr, quantifier, _, _, _) = &expr2 {
if quantifier == &Quantifier::QuestionMark {
let alternation = Expression::new_alternation(
vec![expr1.clone(), *expr.clone()],
config,
);
result = Some(Expression::new_repetition(
alternation,
Quantifier::QuestionMark,
config,
));
}
}
}
if result.is_none() && expr1.is_single_codepoint() && expr2.is_single_codepoint() {
let first_char_set = Self::extract_character_set(expr1.clone());
let second_char_set = Self::extract_character_set(expr2.clone());
result = Some(Expression::new_character_class(
first_char_set,
second_char_set,
config,
));
}
if result.is_none() {
result = Some(Expression::new_alternation(vec![expr1, expr2], config));
}
if let Some(prefix) = common_prefix {
result = Some(Expression::new_concatenation(
Expression::new_literal(
GraphemeCluster::from_graphemes(prefix, config),
config,
),
result.unwrap(),
config,
));
}
if let Some(suffix) = common_suffix {
result = Some(Expression::new_concatenation(
result.unwrap(),
Expression::new_literal(
GraphemeCluster::from_graphemes(suffix, config),
config,
),
config,
));
}
result
} else if a.is_some() {
a.clone()
} else if b.is_some() {
b.clone()
} else {
None
}
} else if a.is_some() {
a.clone()
} else if b.is_some() {
b.clone()
} else {
None
}
}
fn flatten_alternations(
flattened_options: &mut Vec<Expression<'a>>,
current_options: Vec<Expression<'a>>,
) {
for option in current_options {
if let Expression::Alternation(expr_options, _, _, _) = option {
Self::flatten_alternations(flattened_options, expr_options);
} else {
flattened_options.push(option);
}
}
}
fn extract_character_set(expr: Expression) -> BTreeSet<char> {
match expr {
Expression::Literal(cluster, _, _) => {
let single_char = cluster
.graphemes()
.first()
.unwrap()
.value()
.chars()
.next()
.unwrap();
btreeset![single_char]
}
Expression::CharacterClass(char_set, _) => char_set,
_ => BTreeSet::new(),
}
}
fn remove_common_substring(
a: &mut Expression,
b: &mut Expression,
substring: Substring,
) -> Option<Vec<Grapheme>> {
let common_substring = Self::find_common_substring(a, b, &substring);
if let Some(value) = &common_substring {
a.remove_substring(&substring, value.len());
b.remove_substring(&substring, value.len());
}
common_substring
}
fn find_common_substring(
a: &Expression,
b: &Expression,
substring: &Substring,
) -> Option<Vec<Grapheme>> {
let mut graphemes_a = a.value(Some(substring)).unwrap_or_default();
let mut graphemes_b = b.value(Some(substring)).unwrap_or_default();
let mut common_graphemes = vec![];
if let Substring::Suffix = substring {
graphemes_a.reverse();
graphemes_b.reverse();
}
for pair in graphemes_a.iter().zip_longest(graphemes_b.iter()) {
match pair {
Both(grapheme_a, grapheme_b) => {
if grapheme_a == grapheme_b {
common_graphemes.push(grapheme_a.clone());
} else {
break;
}
}
_ => break,
}
}
if let Substring::Suffix = substring {
common_graphemes.reverse();
}
if common_graphemes.is_empty() {
None
} else {
Some(common_graphemes)
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn ensure_correct_string_representation_of_alternation_1() {
let config = RegExpConfig::new();
let literal1 = Expression::new_literal(GraphemeCluster::from("abc", &config), &config);
let literal2 = Expression::new_literal(GraphemeCluster::from("def", &config), &config);
let alternation = Expression::new_alternation(vec![literal1, literal2], &config);
assert_eq!(alternation.to_string(), "abc|def");
}
#[test]
fn ensure_correct_string_representation_of_alternation_2() {
let config = RegExpConfig::new();
let literal1 = Expression::new_literal(GraphemeCluster::from("a", &config), &config);
let literal2 = Expression::new_literal(GraphemeCluster::from("ab", &config), &config);
let literal3 = Expression::new_literal(GraphemeCluster::from("abc", &config), &config);
let alternation = Expression::new_alternation(vec![literal1, literal2, literal3], &config);
assert_eq!(alternation.to_string(), "abc|ab|a");
}
#[test]
fn ensure_correct_string_representation_of_character_class_1() {
let config = RegExpConfig::new();
let char_class = Expression::new_character_class(btreeset!['a'], btreeset!['b'], &config);
assert_eq!(char_class.to_string(), "[ab]");
}
#[test]
fn ensure_correct_string_representation_of_character_class_2() {
let config = RegExpConfig::new();
let char_class =
Expression::new_character_class(btreeset!['a', 'b'], btreeset!['c'], &config);
assert_eq!(char_class.to_string(), "[a-c]");
}
#[test]
fn ensure_correct_string_representation_of_concatenation_1() {
let config = RegExpConfig::new();
let literal1 = Expression::new_literal(GraphemeCluster::from("abc", &config), &config);
let literal2 = Expression::new_literal(GraphemeCluster::from("def", &config), &config);
let concatenation = Expression::new_concatenation(literal1, literal2, &config);
assert_eq!(concatenation.to_string(), "abcdef");
}
#[test]
fn ensure_correct_string_representation_of_concatenation_2() {
let config = RegExpConfig::new();
let literal1 = Expression::new_literal(GraphemeCluster::from("abc", &config), &config);
let literal2 = Expression::new_literal(GraphemeCluster::from("def", &config), &config);
let repetition = Expression::new_repetition(literal1, Quantifier::KleeneStar, &config);
let concatenation = Expression::new_concatenation(repetition, literal2, &config);
assert_eq!(concatenation.to_string(), "(?:abc)*def");
}
#[test]
fn ensure_correct_removal_of_prefix_in_literal() {
let config = RegExpConfig::new();
let mut literal =
Expression::new_literal(GraphemeCluster::from("abcdef", &config), &config);
assert_eq!(
literal.value(None),
Some(
vec!["a", "b", "c", "d", "e", "f"]
.iter()
.map(|&it| Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled
))
.collect_vec()
)
);
literal.remove_substring(&Substring::Prefix, 2);
assert_eq!(
literal.value(None),
Some(
vec!["c", "d", "e", "f"]
.iter()
.map(|&it| Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled
))
.collect_vec()
)
);
}
#[test]
fn ensure_correct_removal_of_suffix_in_literal() {
let config = RegExpConfig::new();
let mut literal =
Expression::new_literal(GraphemeCluster::from("abcdef", &config), &config);
assert_eq!(
literal.value(None),
Some(
vec!["a", "b", "c", "d", "e", "f"]
.iter()
.map(|&it| Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled
))
.collect_vec()
)
);
literal.remove_substring(&Substring::Suffix, 2);
assert_eq!(
literal.value(None),
Some(
vec!["a", "b", "c", "d"]
.iter()
.map(|&it| Grapheme::from(
it,
config.is_capturing_group_enabled,
config.is_output_colorized,
config.is_verbose_mode_enabled
))
.collect_vec()
)
);
}
#[test]
fn ensure_correct_string_representation_of_repetition_1() {
let config = RegExpConfig::new();
let literal = Expression::new_literal(GraphemeCluster::from("abc", &config), &config);
let repetition = Expression::new_repetition(literal, Quantifier::KleeneStar, &config);
assert_eq!(repetition.to_string(), "(?:abc)*");
}
#[test]
fn ensure_correct_string_representation_of_repetition_2() {
let config = RegExpConfig::new();
let literal = Expression::new_literal(GraphemeCluster::from("a", &config), &config);
let repetition = Expression::new_repetition(literal, Quantifier::QuestionMark, &config);
assert_eq!(repetition.to_string(), "a?");
}
}
================================================
FILE: src/format.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::char_range::CharRange;
use crate::cluster::GraphemeCluster;
use crate::component::Component;
use crate::expression::Expression;
use crate::quantifier::Quantifier;
use itertools::Itertools;
use std::collections::BTreeSet;
use std::fmt::{Display, Formatter, Result};
impl Display for Expression<'_> {
fn fmt(&self, f: &mut Formatter<'_>) -> Result {
match self {
Expression::Alternation(
options,
is_capturing_group_enabled,
is_output_colorized,
is_verbose_mode_enabled,
) => format_alternation(
f,
self,
options,
*is_capturing_group_enabled,
*is_output_colorized,
*is_verbose_mode_enabled,
),
Expression::CharacterClass(char_set, is_output_colorized) => {
format_character_class(f, char_set, *is_output_colorized)
}
Expression::Concatenation(
expr1,
expr2,
is_capturing_group_enabled,
is_output_colorized,
is_verbose_mode_enabled,
) => format_concatenation(
f,
self,
expr1,
expr2,
*is_capturing_group_enabled,
*is_output_colorized,
*is_verbose_mode_enabled,
),
Expression::Literal(
cluster,
is_non_ascii_char_escaped,
is_astral_code_point_converted_to_surrogate,
) => format_literal(
f,
cluster,
*is_non_ascii_char_escaped,
*is_astral_code_point_converted_to_surrogate,
),
Expression::Repetition(
expr,
quantifier,
is_capturing_group_enabled,
is_output_colorized,
is_verbose_mode_enabled,
) => format_repetition(
f,
self,
expr,
quantifier,
*is_capturing_group_enabled,
*is_output_colorized,
*is_verbose_mode_enabled,
),
}
}
}
fn get_codepoint_position(c: char) -> usize {
CharRange::all().position(|it| it == c).unwrap()
}
fn format_alternation(
f: &mut Formatter<'_>,
expr: &Expression,
options: &[Expression],
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
) -> Result {
let pipe_component = Component::Pipe.to_repr(is_output_colorized);
let disjunction_operator = if is_verbose_mode_enabled {
format!("\n{}\n", pipe_component)
} else {
pipe_component
};
let alternation_str = options
.iter()
.map(|option| {
if option.precedence() < expr.precedence() && !option.is_single_codepoint() {
if is_capturing_group_enabled {
Component::CapturedParenthesizedExpression(
option.to_string(),
is_verbose_mode_enabled,
true,
)
.to_repr(is_output_colorized)
} else {
Component::UncapturedParenthesizedExpression(
option.to_string(),
is_verbose_mode_enabled,
true,
)
.to_repr(is_output_colorized)
}
} else {
format!("{}", option)
}
})
.join(&disjunction_operator);
write!(f, "{}", alternation_str)
}
fn format_character_class(
f: &mut Formatter<'_>,
char_set: &BTreeSet<char>,
is_output_colorized: bool,
) -> Result {
let chars_to_escape = ['[', ']', '\\', '-', '^', '$'];
let escaped_char_set = char_set
.iter()
.map(|c| {
if chars_to_escape.contains(c) {
format!("{}{}", "\\", c)
} else if c == &'\n' {
"\\n".to_string()
} else if c == &'\r' {
"\\r".to_string()
} else if c == &'\t' {
"\\t".to_string()
} else {
c.to_string()
}
})
.collect_vec();
let char_positions = char_set
.iter()
.map(|&it| get_codepoint_position(it))
.collect_vec();
let mut subsets = vec![];
let mut subset = vec![];
for ((first_c, first_pos), (second_c, second_pos)) in
escaped_char_set.iter().zip(char_positions).tuple_windows()
{
if subset.is_empty() {
subset.push(first_c);
}
if second_pos == first_pos + 1 {
subset.push(second_c);
} else {
subsets.push(subset);
subset = vec![second_c];
}
}
subsets.push(subset);
let mut char_class_strs = vec![];
for subset in subsets.iter() {
if subset.len() <= 2 {
for c in subset.iter() {
char_class_strs.push((*c).to_string());
}
} else {
char_class_strs.push(format!(
"{}{}{}",
subset.first().unwrap(),
Component::Hyphen.to_repr(is_output_colorized),
subset.last().unwrap()
));
}
}
write!(
f,
"{}{}{}",
Component::LeftBracket.to_repr(is_output_colorized),
char_class_strs.join(""),
Component::RightBracket.to_repr(is_output_colorized)
)
}
fn format_concatenation(
f: &mut Formatter<'_>,
expr: &Expression,
expr1: &Expression,
expr2: &Expression,
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
) -> Result {
let expr_strs = [expr1, expr2]
.iter()
.map(|&it| {
if it.precedence() < expr.precedence() && !it.is_single_codepoint() {
if is_capturing_group_enabled {
Component::CapturedParenthesizedExpression(
it.to_string(),
is_verbose_mode_enabled,
true,
)
.to_repr(is_output_colorized)
} else {
Component::UncapturedParenthesizedExpression(
it.to_string(),
is_verbose_mode_enabled,
true,
)
.to_repr(is_output_colorized)
}
} else {
format!("{}", it)
}
})
.collect_vec();
write!(
f,
"{}{}",
expr_strs.first().unwrap(),
expr_strs.last().unwrap()
)
}
fn format_literal(
f: &mut Formatter<'_>,
cluster: &GraphemeCluster,
is_non_ascii_char_escaped: bool,
is_astral_code_point_converted_to_surrogate: bool,
) -> Result {
let literal_str = cluster
.graphemes()
.iter()
.cloned()
.map(|mut grapheme| {
if grapheme.has_repetitions() {
grapheme
.repetitions_mut()
.iter_mut()
.for_each(|repeated_grapheme| {
repeated_grapheme.escape_regexp_symbols(
is_non_ascii_char_escaped,
is_astral_code_point_converted_to_surrogate,
);
});
} else {
grapheme.escape_regexp_symbols(
is_non_ascii_char_escaped,
is_astral_code_point_converted_to_surrogate,
);
}
grapheme.to_string()
})
.join("");
write!(f, "{}", literal_str)
}
fn format_repetition(
f: &mut Formatter<'_>,
expr: &Expression,
expr1: &Expression,
quantifier: &Quantifier,
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
) -> Result {
if expr1.precedence() < expr.precedence() && !expr1.is_single_codepoint() {
if is_capturing_group_enabled {
write!(
f,
"{}{}",
Component::CapturedParenthesizedExpression(
expr1.to_string(),
is_verbose_mode_enabled,
false
)
.to_repr(is_output_colorized),
Component::Quantifier(quantifier.clone(), is_verbose_mode_enabled)
.to_repr(is_output_colorized)
)
} else {
write!(
f,
"{}{}",
Component::UncapturedParenthesizedExpression(
expr1.to_string(),
is_verbose_mode_enabled,
false
)
.to_repr(is_output_colorized),
Component::Quantifier(quantifier.clone(), is_verbose_mode_enabled)
.to_repr(is_output_colorized)
)
}
} else {
write!(
f,
"{}{}",
expr1,
Component::Quantifier(quantifier.clone(), is_verbose_mode_enabled)
.to_repr(is_output_colorized)
)
}
}
================================================
FILE: src/grapheme.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
use crate::component::Component;
use itertools::Itertools;
use std::fmt::{Display, Formatter, Result};
const CHARS_TO_ESCAPE: [&str; 14] = [
"(", ")", "[", "]", "{", "}", "+", "*", "-", ".", "?", "|", "^", "$",
];
const CHAR_CLASSES: [&str; 6] = ["\\d", "\\s", "\\w", "\\D", "\\S", "\\W"];
#[derive(Clone, Debug, Hash, Ord, PartialOrd, Eq, PartialEq)]
pub(crate) struct Grapheme {
pub(crate) chars: Vec<String>,
pub(crate) repetitions: Vec<Grapheme>,
min: u32,
max: u32,
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
}
impl Grapheme {
pub(crate) fn from(
s: &str,
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
) -> Self {
Self {
chars: vec![s.to_string()],
repetitions: vec![],
min: 1,
max: 1,
is_capturing_group_enabled,
is_output_colorized,
is_verbose_mode_enabled,
}
}
pub(crate) fn new(
chars: Vec<String>,
min: u32,
max: u32,
is_capturing_group_enabled: bool,
is_output_colorized: bool,
is_verbose_mode_enabled: bool,
) -> Self {
Self {
chars,
repetitions: vec![],
min,
max,
is_capturing_group_enabled,
is_output_colorized,
is_verbose_mode_enabled,
}
}
pub(crate) fn value(&self) -> String {
self.chars.join("")
}
pub(crate) fn chars(&self) -> &Vec<String> {
&self.chars
}
pub(crate) fn chars_mut(&mut self) -> &mut Vec<String> {
&mut self.chars
}
pub(crate) fn has_repetitions(&self) -> bool {
!self.repetitions.is_empty()
}
pub(crate) fn repetitions_mut(&mut self) -> &mut Vec<Grapheme> {
&mut self.repetitions
}
pub(crate) fn minimum(&self) -> u32 {
self.min
}
pub(crate) fn maximum(&self) -> u32 {
self.max
}
pub(crate) fn char_count(&self, is_non_ascii_char_escaped: bool) -> usize {
if is_non_ascii_char_escaped {
self.chars
.iter()
.map(|it| it.chars().map(|c| self.escape(c, false)).join(""))
.join("")
.chars()
.count()
} else {
self.chars.iter().map(|it| it.chars().count()).sum()
}
}
pub(crate) fn escape_non_ascii_chars(&mut self, use_surrogate_pairs: bool) {
self.chars = self
.chars
.iter()
.map(|it| {
it.chars()
.map(|c| self.escape(c, use_surrogate_pairs))
.join("")
})
.collect_vec();
}
pub(crate) fn escape_regexp_symbols(
&mut self,
is_non_ascii_char_escaped: bool,
is_astral_code_point_converted_to_surrogate: bool,
) {
let characters = self.chars_mut();
#[allow(clippy::needless_range_loop)]
for i in 0..characters.len() {
let mut character = characters[i].clone();
for char_to_escape in CHARS_TO_ESCAPE.iter() {
character =
character.replace(char_to_escape, &format!("{}{}", "\\", char_to_escape));
}
character = character
.replace('\n', "\\n")
.replace('\r', "\\r")
.replace('\t', "\\t");
if character == "\\" {
character = "\\\\".to_string();
}
characters[i] = character;
}
if is_non_ascii_char_escaped {
self.escape_non_ascii_chars(is_astral_code_point_converted_to_surrogate);
}
}
fn escape(&self, c: char, use_surrogate_pairs: bool) -> String {
if c.is_ascii() {
c.to_string()
} else if use_surrogate_pairs && ('\u{10000}'..'\u{10ffff}').contains(&c) {
self.convert_to_surrogate_pair(c)
} else {
c.escape_unicode().to_string()
}
}
fn convert_to_surrogate_pair(&self, c: char) -> String {
c.encode_utf16(&mut [0; 2])
.iter()
.map(|it| format!("\\u{{{:x}}}", it))
.join("")
}
}
impl Display for Grapheme {
fn fmt(&self, f: &mut Formatter<'_>) -> Result {
let is_single_char = self.char_count(false) == 1
|| (self.chars.len() == 1 && self.chars[0].matches('\\').count() == 1);
let is_range = self.min < self.max;
let is_repetition = self.min > 1;
let mut value = if self.repetitions.is_empty() {
self.value()
} else {
self.repetitions.iter().map(|it| it.to_string()).join("")
};
value = Component::CharClass(value.clone())
.to_repr(self.is_output_colorized && CHAR_CLASSES.contains(&&*value));
if !is_range && is_repetition && is_single_char {
write!(
f,
"{}{}",
value,
Component::Repetition(self.min, false).to_repr(self.is_output_colorized)
)
} else if !is_range && is_repetition && !is_single_char {
write!(
f,
"{}{}",
if self.is_capturing_group_enabled {
Component::CapturedParenthesizedExpression(
value,
self.is_verbose_mode_enabled,
false,
)
.to_repr(self.is_output_colorized)
} else {
Component::UncapturedParenthesizedExpression(
value,
self.is_verbose_mode_enabled,
false,
)
.to_repr(self.is_output_colorized)
},
Component::Repetition(self.min, self.is_verbose_mode_enabled)
.to_repr(self.is_output_colorized)
)
} else if is_range && is_single_char {
write!(
f,
"{}{}",
value,
Component::RepetitionRange(self.min, self.max, false)
.to_repr(self.is_output_colorized)
)
} else if is_range && !is_single_char {
write!(
f,
"{}{}",
if self.is_capturing_group_enabled {
Component::CapturedParenthesizedExpression(
value,
self.is_verbose_mode_enabled,
false,
)
.to_repr(self.is_output_colorized)
} else {
Component::UncapturedParenthesizedExpression(
value,
self.is_verbose_mode_enabled,
false,
)
.to_repr(self.is_output_colorized)
},
Component::RepetitionRange(self.min, self.max, self.is_verbose_mode_enabled)
.to_repr(self.is_output_colorized)
)
} else {
write!(f, "{}", value)
}
}
}
================================================
FILE: src/lib.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
//! ## 1. What does this tool do?
//!
//! *grex* is a library as well as a command-line utility that is meant to simplify the often
//! complicated and tedious task of creating regular expressions. It does so by automatically
//! generating a single regular expression from user-provided test cases. The resulting
//! expression is guaranteed to match the test cases which it was generated from.
//!
//! This project has started as a Rust port of the JavaScript tool
//! [*regexgen*](https://github.com/devongovett/regexgen) written by
//! [Devon Govett](https://github.com/devongovett). Although a lot of further useful features
//! could be added to it, its development was apparently ceased several years ago. The plan
//! is now to add these new features to *grex* as Rust really shines when it comes to
//! command-line tools. *grex* offers all features that *regexgen* provides, and more.
//!
//! The philosophy of this project is to generate the most specific regular expression
//! possible by default which exactly matches the given input only and nothing else.
//! With the use of command-line flags (in the CLI tool) or preprocessing methods
//! (in the library), more generalized expressions can be created.
//!
//! The produced expressions are [Perl-compatible regular expressions](https://www.pcre.org)
//! which are also compatible with the regular expression parser in Rust's
//! [*regex crate*](https://crates.io/crates/regex).
//! Other regular expression parsers or respective libraries from other programming languages
//! have not been tested so far, but they ought to be mostly compatible as well.
//!
//! ## 2. Do I still need to learn to write regexes then?
//!
//! **Definitely, yes!** Using the standard settings, *grex* produces a regular expression that
//! is guaranteed to match only the test cases given as input and nothing else. This has been
//! verified by [property tests](https://github.com/pemistahl/grex/blob/main/tests/property_tests.rs).
//! However, if the conversion to shorthand character classes such as `\w` is enabled, the
//! resulting regex matches a much wider scope of test cases. Knowledge about the consequences of
//! this conversion is essential for finding a correct regular expression for your business domain.
//!
//! *grex* uses an algorithm that tries to find the shortest possible regex for the given test cases.
//! Very often though, the resulting expression is still longer or more complex than it needs to be.
//! In such cases, a more compact or elegant regex can be created only by hand.
//! Also, every regular expression engine has different built-in optimizations.
//! *grex* does not know anything about those and therefore cannot optimize its regexes
//! for a specific engine.
//!
//! **So, please learn how to write regular expressions!** The currently best use case for *grex*
//! is to find an initial correct regex which should be inspected by hand if further optimizations
//! are possible.
//!
//! ## 3. Current features
//!
//! - literals
//! - character classes
//! - detection of common prefixes and suffixes
//! - detection of repeated substrings and conversion to `{min,max}` quantifier notation
//! - alternation using `|` operator
//! - optionality using `?` quantifier
//! - escaping of non-ascii characters, with optional conversion of astral code points to surrogate pairs
//! - case-sensitive or case-insensitive matching
//! - capturing or non-capturing groups
//! - optional anchors `^` and `$`
//! - fully compliant to [Unicode Standard 15.0](https://unicode.org/versions/Unicode15.0.0)
//! - fully compatible with [*regex* crate 1.9.0+](https://crates.io/crates/regex)
//! - correctly handles graphemes consisting of multiple Unicode symbols
//! - reads input strings from the command-line or from a file
//! - produces more readable expressions indented on multiple using optional verbose mode
//!
//! ## 4. How to use?
//!
//! The code snippets below show how to use the public api.
//!
//! For [more detailed examples](https://github.com/pemistahl/grex/tree/main#53-examples), please
//! take a look at the project's readme file on GitHub.
//!
//! ### 4.1 Default settings
//!
//! Test cases are passed either from a collection via [`RegExpBuilder::from()`]
//! or from a file via [`RegExpBuilder::from_file()`].
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["a", "aa", "aaa"]).build();
//! assert_eq!(regexp, "^a(?:aa?)?$");
//! ```
//!
//! ### 4.2 Convert to character classes
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["a", "aa", "123"])
//! .with_conversion_of_digits()
//! .with_conversion_of_words()
//! .build();
//! assert_eq!(regexp, "^(?:\\d\\d\\d|\\w(?:\\w)?)$");
//! ```
//!
//! ### 4.3 Convert repeated substrings
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
//! .with_conversion_of_repetitions()
//! .build();
//! assert_eq!(regexp, "^(?:a{2}|(?:bc){2}|(?:def){3})$");
//! ```
//!
//! By default, *grex* converts each substring this way which is at least a single character long
//! and which is subsequently repeated at least once. You can customize these two parameters
//! if you like.
//!
//! In the following example, the test case `aa` is not converted to `a{2}` because the repeated
//! substring `a` has a length of 1, but the minimum substring length has been set to 2.
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
//! .with_conversion_of_repetitions()
//! .with_minimum_substring_length(2)
//! .build();
//! assert_eq!(regexp, "^(?:aa|(?:bc){2}|(?:def){3})$");
//! ```
//!
//! Setting a minimum number of 2 repetitions in the next example, only the test case `defdefdef`
//! will be converted because it is the only one that is repeated twice.
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
//! .with_conversion_of_repetitions()
//! .with_minimum_repetitions(2)
//! .build();
//! assert_eq!(regexp, "^(?:bcbc|aa|(?:def){3})$");
//! ```
//!
//! ### 4.4 Escape non-ascii characters
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["You smell like 💩."])
//! .with_escaping_of_non_ascii_chars(false)
//! .build();
//! assert_eq!(regexp, "^You smell like \\u{1f4a9}\\.$");
//! ```
//!
//! Old versions of JavaScript do not support unicode escape sequences for
//! the astral code planes (range `U+010000` to `U+10FFFF`). In order to
//! support these symbols in JavaScript regular expressions, the conversion
//! to surrogate pairs is necessary. More information on that matter can be
//! found [here](https://mathiasbynens.be/notes/javascript-unicode).
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["You smell like 💩."])
//! .with_escaping_of_non_ascii_chars(true)
//! .build();
//! assert_eq!(regexp, "^You smell like \\u{d83d}\\u{dca9}\\.$");
//! ```
//!
//! ### 4.5 Case-insensitive matching
//!
//! The regular expressions that *grex* generates are case-sensitive by default.
//! Case-insensitive matching can be enabled like so:
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["big", "BIGGER"])
//! .with_case_insensitive_matching()
//! .build();
//! assert_eq!(regexp, "(?i)^big(?:ger)?$");
//! ```
//!
//! ### 4.6 Capturing Groups
//!
//! Non-capturing groups are used by default.
//! Extending the previous example, you can switch to capturing groups instead.
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["big", "BIGGER"])
//! .with_case_insensitive_matching()
//! .with_capturing_groups()
//! .build();
//! assert_eq!(regexp, "(?i)^big(ger)?$");
//! ```
//!
//! ### 4.7 Verbose mode
//!
//! If you find the generated regular expression hard to read, you can enable verbose mode.
//! The expression is then put on multiple lines and indented to make it more pleasant to the eyes.
//!
//! ```
//! use grex::RegExpBuilder;
//! use indoc::indoc;
//!
//! let regexp = RegExpBuilder::from(&["a", "b", "bcd"])
//! .with_verbose_mode()
//! .build();
//!
//! assert_eq!(regexp, indoc!(
//! r#"
//! (?x)
//! ^
//! (?:
//! b
//! (?:
//! cd
//! )?
//! |
//! a
//! )
//! $"#
//! ));
//! ```
//!
//! ### 4.8 Disable anchors
//!
//! By default, the anchors `^` and `$` are put around every generated regular expression in order
//! to ensure that it matches only the test cases given as input. Often enough, however, it is
//! desired to use the generated pattern as part of a larger one. For this purpose, the anchors
//! can be disabled, either separately or both of them.
//!
//! ```
//! use grex::RegExpBuilder;
//!
//! let regexp = RegExpBuilder::from(&["a", "aa", "aaa"])
//! .without_anchors()
//! .build();
//! assert_eq!(regexp, "a(?:aa?)?");
//! ```
//!
//! ### 5. How does it work?
//!
//! 1. A [deterministic finite automaton](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) (DFA)
//! is created from the input strings.
//!
//! 2. The number of states and transitions between states in the DFA is reduced by applying
//! [Hopcroft's DFA minimization algorithm](https://en.wikipedia.org/wiki/DFA_minimization#Hopcroft.27s_algorithm).
//!
//! 3. The minimized DFA is expressed as a system of linear equations which are solved with
//! [Brzozowski's algebraic method](http://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions#2392),
//! resulting in the final regular expression.
#[macro_use]
mod macros;
mod builder;
mod char_range;
mod cluster;
mod component;
mod config;
mod dfa;
mod expression;
mod format;
mod grapheme;
mod quantifier;
mod regexp;
mod substring;
mod unicode_tables;
#[cfg(feature = "python")]
mod python;
#[cfg(target_family = "wasm")]
mod wasm;
pub use builder::RegExpBuilder;
#[cfg(target_family = "wasm")]
pub use wasm::RegExpBuilder as WasmRegExpBuilder;
================================================
FILE: src/macros.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
macro_rules! btreeset {
( $( $value: expr ),* ) => {{
let mut set = std::collections::BTreeSet::new();
$( set.insert($value); )*
set
}};
}
================================================
FILE: src/main.rs
================================================
/*
* Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#[cfg(not(target_family = "wasm"))]
mod cli {
use clap::ArgAction;
use clap::Parser;
use grex::RegExpBuilder;
use itertools::Itertools;
use std::io::{stdin, BufRead, Error, ErrorKind, IsTerminal, Read};
use std::path::PathBuf;
#[derive(Parser)]
#[command(
author = "© 2019-today Peter M. Stahl <pemistahl@gmail.com>",
about = "Licensed under the Apache License, Version 2.0\n\
Downloadable from https://crates.io/crates/grex\n\
Source code at https://github.com/pemistahl/grex\n\n\
grex generates reg
gitextract_b41iglrb/
├── .editorconfig
├── .github/
│ ├── dependabot.yml
│ └── workflows/
│ ├── python-build.yml
│ ├── release.yml
│ └── rust-build.yml
├── .gitignore
├── Cargo.toml
├── LICENSE
├── README.md
├── README_PYPI.md
├── RELEASE_NOTES.md
├── benches/
│ ├── benchmark.rs
│ └── testcases.txt
├── demo.tape
├── grex.pyi
├── pyproject.toml
├── requirements.txt
├── src/
│ ├── builder.rs
│ ├── char_range.rs
│ ├── cluster.rs
│ ├── component.rs
│ ├── config.rs
│ ├── dfa.rs
│ ├── expression.rs
│ ├── format.rs
│ ├── grapheme.rs
│ ├── lib.rs
│ ├── macros.rs
│ ├── main.rs
│ ├── python.rs
│ ├── quantifier.rs
│ ├── regexp.rs
│ ├── substring.rs
│ ├── unicode_tables/
│ │ ├── decimal.rs
│ │ ├── mod.rs
│ │ ├── space.rs
│ │ └── word.rs
│ └── wasm.rs
└── tests/
├── cli_integration_tests.rs
├── lib_integration_tests.rs
├── property_tests.rs
├── python/
│ └── test_grex.py
├── wasm_browser_tests.rs
└── wasm_node_tests.rs
SYMBOL INDEX (641 symbols across 26 files)
FILE: benches/benchmark.rs
function load_test_cases (line 23) | fn load_test_cases() -> Vec<String> {
function benchmark_grex_with_default_settings (line 32) | fn benchmark_grex_with_default_settings(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_repetitions (line 39) | fn benchmark_grex_with_conversion_of_repetitions(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_digits (line 50) | fn benchmark_grex_with_conversion_of_digits(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_non_digits (line 61) | fn benchmark_grex_with_conversion_of_non_digits(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_words (line 72) | fn benchmark_grex_with_conversion_of_words(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_non_words (line 83) | fn benchmark_grex_with_conversion_of_non_words(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_whitespace (line 94) | fn benchmark_grex_with_conversion_of_whitespace(c: &mut Criterion) {
function benchmark_grex_with_conversion_of_non_whitespace (line 105) | fn benchmark_grex_with_conversion_of_non_whitespace(c: &mut Criterion) {
function benchmark_grex_with_case_insensitive_matching (line 116) | fn benchmark_grex_with_case_insensitive_matching(c: &mut Criterion) {
function benchmark_grex_with_verbose_mode (line 127) | fn benchmark_grex_with_verbose_mode(c: &mut Criterion) {
FILE: grex.pyi
class RegExpBuilder (line 19) | class RegExpBuilder:
method from_test_cases (line 23) | def from_test_cases(cls, test_cases: List[str]) -> "RegExpBuilder":
method with_conversion_of_digits (line 35) | def with_conversion_of_digits(self) -> "RegExpBuilder":
method with_conversion_of_non_digits (line 45) | def with_conversion_of_non_digits(self) -> "RegExpBuilder":
method with_conversion_of_whitespace (line 55) | def with_conversion_of_whitespace(self) -> "RegExpBuilder":
method with_conversion_of_non_whitespace (line 65) | def with_conversion_of_non_whitespace(self) -> "RegExpBuilder":
method with_conversion_of_words (line 68) | def with_conversion_of_words(self) -> "RegExpBuilder":
method with_conversion_of_non_words (line 78) | def with_conversion_of_non_words(self) -> "RegExpBuilder":
method with_conversion_of_repetitions (line 85) | def with_conversion_of_repetitions(self) -> "RegExpBuilder":
method with_case_insensitive_matching (line 88) | def with_case_insensitive_matching(self) -> "RegExpBuilder":
method with_capturing_groups (line 91) | def with_capturing_groups(self) -> "RegExpBuilder":
method with_minimum_repetitions (line 94) | def with_minimum_repetitions(self, quantity: int) -> "RegExpBuilder":
method with_minimum_substring_length (line 107) | def with_minimum_substring_length(self, length: int) -> "RegExpBuilder":
method with_escaping_of_non_ascii_chars (line 120) | def with_escaping_of_non_ascii_chars(self, use_surrogate_pairs: bool) ...
method with_verbose_mode (line 130) | def with_verbose_mode(self) -> "RegExpBuilder":
method without_start_anchor (line 133) | def without_start_anchor(self) -> "RegExpBuilder":
method without_end_anchor (line 139) | def without_end_anchor(self) -> "RegExpBuilder":
method without_anchors (line 145) | def without_anchors(self) -> "RegExpBuilder":
method build (line 151) | def build(self) -> str:
FILE: src/builder.rs
constant MISSING_TEST_CASES_MESSAGE (line 23) | pub(crate) const MISSING_TEST_CASES_MESSAGE: &str =
constant MINIMUM_REPETITIONS_MESSAGE (line 26) | pub(crate) const MINIMUM_REPETITIONS_MESSAGE: &str =
constant MINIMUM_SUBSTRING_LENGTH_MESSAGE (line 29) | pub(crate) const MINIMUM_SUBSTRING_LENGTH_MESSAGE: &str =
type RegExpBuilder (line 35) | pub struct RegExpBuilder {
method from (line 46) | pub fn from<T: Clone + Into<String>>(test_cases: &[T]) -> Self {
method from_file (line 69) | pub fn from_file<T: Into<PathBuf>>(file_path: T) -> Self {
method with_conversion_of_digits (line 97) | pub fn with_conversion_of_digits(&mut self) -> &mut Self {
method with_conversion_of_non_digits (line 111) | pub fn with_conversion_of_non_digits(&mut self) -> &mut Self {
method with_conversion_of_whitespace (line 125) | pub fn with_conversion_of_whitespace(&mut self) -> &mut Self {
method with_conversion_of_non_whitespace (line 131) | pub fn with_conversion_of_non_whitespace(&mut self) -> &mut Self {
method with_conversion_of_words (line 145) | pub fn with_conversion_of_words(&mut self) -> &mut Self {
method with_conversion_of_non_words (line 155) | pub fn with_conversion_of_non_words(&mut self) -> &mut Self {
method with_conversion_of_repetitions (line 162) | pub fn with_conversion_of_repetitions(&mut self) -> &mut Self {
method with_case_insensitive_matching (line 169) | pub fn with_case_insensitive_matching(&mut self) -> &mut Self {
method with_capturing_groups (line 175) | pub fn with_capturing_groups(&mut self) -> &mut Self {
method with_minimum_repetitions (line 186) | pub fn with_minimum_repetitions(&mut self, quantity: u32) -> &mut Self {
method with_minimum_substring_length (line 200) | pub fn with_minimum_substring_length(&mut self, length: u32) -> &mut S...
method with_escaping_of_non_ascii_chars (line 211) | pub fn with_escaping_of_non_ascii_chars(&mut self, use_surrogate_pairs...
method with_verbose_mode (line 218) | pub fn with_verbose_mode(&mut self) -> &mut Self {
method without_start_anchor (line 226) | pub fn without_start_anchor(&mut self) -> &mut Self {
method without_end_anchor (line 234) | pub fn without_end_anchor(&mut self) -> &mut Self {
method without_anchors (line 242) | pub fn without_anchors(&mut self) -> &mut Self {
method with_syntax_highlighting (line 255) | pub fn with_syntax_highlighting(&mut self) -> &mut Self {
method build (line 261) | pub fn build(&mut self) -> String {
FILE: src/char_range.rs
type CharRange (line 20) | pub(crate) struct CharRange {
method closed (line 27) | pub(crate) fn closed(start: char, end: char) -> Self {
method contains (line 32) | pub(crate) fn contains(&self, c: char) -> bool {
method all (line 39) | pub(crate) fn all() -> CharRangeIter {
type CharRangeIter (line 48) | pub(crate) struct CharRangeIter {
type Item (line 54) | type Item = char;
method next (line 56) | fn next(&mut self) -> Option<Self::Item> {
function test_char_range_contains (line 95) | fn test_char_range_contains() {
function test_char_range_all (line 105) | fn test_char_range_all() {
function test_char_range_all_count (line 112) | fn test_char_range_all_count() {
FILE: src/cluster.rs
type GraphemeCluster (line 30) | pub(crate) struct GraphemeCluster<'a> {
function from (line 36) | pub(crate) fn from(s: &str, config: &'a RegExpConfig) -> Self {
function from_graphemes (line 77) | pub(crate) fn from_graphemes(graphemes: Vec<Grapheme>, config: &'a RegEx...
function new (line 81) | pub(crate) fn new(grapheme: Grapheme, config: &'a RegExpConfig) -> Self {
function convert_to_char_classes (line 88) | pub(crate) fn convert_to_char_classes(&mut self) {
function convert_repetitions (line 125) | pub(crate) fn convert_repetitions(&mut self) {
function merge (line 133) | pub(crate) fn merge(
function graphemes (line 144) | pub(crate) fn graphemes(&self) -> &Vec<Grapheme> {
function graphemes_mut (line 148) | pub(crate) fn graphemes_mut(&mut self) -> &mut Vec<Grapheme> {
function size (line 152) | pub(crate) fn size(&self) -> usize {
function char_count (line 156) | pub(crate) fn char_count(&self, is_non_ascii_char_escaped: bool) -> usize {
function is_empty (line 163) | pub(crate) fn is_empty(&self) -> bool {
function is_digit (line 168) | fn is_digit(c: char) -> bool {
function is_word (line 174) | fn is_word(c: char) -> bool {
function is_space (line 182) | fn is_space(c: char) -> bool {
function convert_repetitions (line 188) | fn convert_repetitions(
function collect_repeated_substrings (line 199) | fn collect_repeated_substrings(graphemes: &[Grapheme]) -> HashMap<Vec<St...
function create_ranges_of_repetitions (line 215) | fn create_ranges_of_repetitions(
function coalesce_repetitions (line 254) | fn coalesce_repetitions(
function replace_graphemes_with_repetitions (line 282) | fn replace_graphemes_with_repetitions(
function convert_chars_to_range (line 342) | fn convert_chars_to_range(chars: &[(char, char)]) -> Vec<CharRange> {
FILE: src/component.rs
type Component (line 20) | pub(crate) enum Component {
method to_repr (line 42) | pub(crate) fn to_repr(&self, is_output_colorized: bool) -> String {
method to_colored_string (line 49) | pub(crate) fn to_colored_string(&self, is_escaped: bool) -> String {
method black_on_bright_yellow (line 187) | fn black_on_bright_yellow(value: &str, is_escaped: bool) -> String {
method bright_yellow_on_black (line 191) | fn bright_yellow_on_black(value: &str, is_escaped: bool) -> String {
method cyan_bold (line 195) | fn cyan_bold(value: &str, is_escaped: bool) -> String {
method green_bold (line 199) | fn green_bold(value: &str, is_escaped: bool) -> String {
method purple_bold (line 203) | fn purple_bold(value: &str, is_escaped: bool) -> String {
method red_bold (line 207) | fn red_bold(value: &str, is_escaped: bool) -> String {
method white_on_bright_blue (line 211) | fn white_on_bright_blue(value: &str, is_escaped: bool) -> String {
method yellow_bold (line 215) | fn yellow_bold(value: &str, is_escaped: bool) -> String {
method color_code (line 219) | fn color_code(code: &str, value: &str, is_escaped: bool) -> String {
method fmt (line 229) | fn fmt(&self, f: &mut Formatter<'_>) -> Result {
FILE: src/config.rs
type RegExpConfig (line 18) | pub(crate) struct RegExpConfig {
method new (line 39) | pub(crate) fn new() -> Self {
method is_char_class_feature_enabled (line 61) | pub(crate) fn is_char_class_feature_enabled(&self) -> bool {
FILE: src/dfa.rs
type State (line 28) | type State = NodeIndex<u32>;
type StateLabel (line 29) | type StateLabel = String;
type EdgeLabel (line 30) | type EdgeLabel = Grapheme;
type Dfa (line 32) | pub(crate) struct Dfa<'a> {
function from (line 41) | pub(crate) fn from(
function state_count (line 56) | pub(crate) fn state_count(&self) -> usize {
function states_in_depth_first_order (line 60) | pub(crate) fn states_in_depth_first_order(&self) -> Vec<State> {
function outgoing_edges (line 69) | pub(crate) fn outgoing_edges(&self, state: State) -> Edges<'_, Grapheme,...
function is_final_state (line 73) | pub(crate) fn is_final_state(&self, state: State) -> bool {
function new (line 77) | fn new(config: &'a RegExpConfig) -> Self {
function insert (line 89) | fn insert(&mut self, cluster: &GraphemeCluster) {
function return_next_state (line 99) | fn return_next_state(&mut self, current_state: State, edge_label: &Graph...
function find_next_state (line 106) | fn find_next_state(&mut self, current_state: State, grapheme: &Grapheme)...
function add_new_state (line 136) | fn add_new_state(&mut self, current_state: State, edge_label: &Grapheme)...
function minimize (line 144) | fn minimize(&mut self) {
function get_initial_partition (line 202) | fn get_initial_partition(&self) -> Vec<HashSet<State>> {
function get_parent_states (line 211) | fn get_parent_states(&self, a: &HashSet<State>, label: &Grapheme) -> Has...
function recreate_graph (line 231) | fn recreate_graph(&mut self, p: Vec<&HashSet<State>>) {
function test_state_count (line 279) | fn test_state_count() {
function test_is_final_state (line 289) | fn test_is_final_state() {
function test_outgoing_edges (line 305) | fn test_outgoing_edges() {
function test_states_in_depth_first_order (line 337) | fn test_states_in_depth_first_order() {
function test_minimization_algorithm (line 408) | fn test_minimization_algorithm() {
function test_dfa_constructor (line 428) | fn test_dfa_constructor() {
FILE: src/expression.rs
type Expression (line 31) | pub(crate) enum Expression<'a> {
function from (line 40) | pub(crate) fn from(dfa: Dfa, config: &'a RegExpConfig) -> Self {
function new_alternation (line 108) | pub(crate) fn new_alternation(exprs: Vec<Expression<'a>>, config: &RegEx...
function new_character_class (line 120) | fn new_character_class(
function new_concatenation (line 129) | fn new_concatenation(
function new_literal (line 143) | pub(crate) fn new_literal(cluster: GraphemeCluster<'a>, config: &RegExpC...
function new_repetition (line 151) | fn new_repetition(expr: Expression<'a>, quantifier: Quantifier, config: ...
function is_empty (line 161) | fn is_empty(&self) -> bool {
function is_single_codepoint (line 168) | pub(crate) fn is_single_codepoint(&self) -> bool {
function len (line 179) | fn len(&self) -> usize {
function precedence (line 189) | pub(crate) fn precedence(&self) -> u8 {
function remove_substring (line 197) | pub(crate) fn remove_substring(&mut self, substring: &Substring, length:...
function value (line 224) | pub(crate) fn value(&self, substring: Option<&Substring>) -> Option<Vec<...
function repeat_zero_or_more_times (line 238) | fn repeat_zero_or_more_times(
function concatenate (line 246) | fn concatenate(
function union (line 317) | fn union(
function flatten_alternations (line 430) | fn flatten_alternations(
function extract_character_set (line 443) | fn extract_character_set(expr: Expression) -> BTreeSet<char> {
function remove_common_substring (line 461) | fn remove_common_substring(
function find_common_substring (line 474) | fn find_common_substring(
function ensure_correct_string_representation_of_alternation_1 (line 518) | fn ensure_correct_string_representation_of_alternation_1() {
function ensure_correct_string_representation_of_alternation_2 (line 527) | fn ensure_correct_string_representation_of_alternation_2() {
function ensure_correct_string_representation_of_character_class_1 (line 537) | fn ensure_correct_string_representation_of_character_class_1() {
function ensure_correct_string_representation_of_character_class_2 (line 544) | fn ensure_correct_string_representation_of_character_class_2() {
function ensure_correct_string_representation_of_concatenation_1 (line 552) | fn ensure_correct_string_representation_of_concatenation_1() {
function ensure_correct_string_representation_of_concatenation_2 (line 561) | fn ensure_correct_string_representation_of_concatenation_2() {
function ensure_correct_removal_of_prefix_in_literal (line 571) | fn ensure_correct_removal_of_prefix_in_literal() {
function ensure_correct_removal_of_suffix_in_literal (line 608) | fn ensure_correct_removal_of_suffix_in_literal() {
function ensure_correct_string_representation_of_repetition_1 (line 645) | fn ensure_correct_string_representation_of_repetition_1() {
function ensure_correct_string_representation_of_repetition_2 (line 653) | fn ensure_correct_string_representation_of_repetition_2() {
FILE: src/format.rs
method fmt (line 27) | fn fmt(&self, f: &mut Formatter<'_>) -> Result {
function get_codepoint_position (line 89) | fn get_codepoint_position(c: char) -> usize {
function format_alternation (line 93) | fn format_alternation(
function format_character_class (line 135) | fn format_character_class(
function format_concatenation (line 207) | fn format_concatenation(
function format_literal (line 249) | fn format_literal(
function format_repetition (line 283) | fn format_repetition(
FILE: src/grapheme.rs
constant CHARS_TO_ESCAPE (line 21) | const CHARS_TO_ESCAPE: [&str; 14] = [
constant CHAR_CLASSES (line 25) | const CHAR_CLASSES: [&str; 6] = ["\\d", "\\s", "\\w", "\\D", "\\S", "\\W"];
type Grapheme (line 28) | pub(crate) struct Grapheme {
method from (line 39) | pub(crate) fn from(
method new (line 56) | pub(crate) fn new(
method value (line 75) | pub(crate) fn value(&self) -> String {
method chars (line 79) | pub(crate) fn chars(&self) -> &Vec<String> {
method chars_mut (line 83) | pub(crate) fn chars_mut(&mut self) -> &mut Vec<String> {
method has_repetitions (line 87) | pub(crate) fn has_repetitions(&self) -> bool {
method repetitions_mut (line 91) | pub(crate) fn repetitions_mut(&mut self) -> &mut Vec<Grapheme> {
method minimum (line 95) | pub(crate) fn minimum(&self) -> u32 {
method maximum (line 99) | pub(crate) fn maximum(&self) -> u32 {
method char_count (line 103) | pub(crate) fn char_count(&self, is_non_ascii_char_escaped: bool) -> us...
method escape_non_ascii_chars (line 116) | pub(crate) fn escape_non_ascii_chars(&mut self, use_surrogate_pairs: b...
method escape_regexp_symbols (line 128) | pub(crate) fn escape_regexp_symbols(
method escape (line 161) | fn escape(&self, c: char, use_surrogate_pairs: bool) -> String {
method convert_to_surrogate_pair (line 171) | fn convert_to_surrogate_pair(&self, c: char) -> String {
method fmt (line 180) | fn fmt(&self, f: &mut Formatter<'_>) -> Result {
FILE: src/main.rs
type Cli (line 39) | pub(crate) struct Cli {
function obtain_input (line 293) | pub(crate) fn obtain_input(cli: &Cli) -> Result<Vec<String>, Error> {
function handle_input (line 330) | pub(crate) fn handle_input(
function repetition_options_parser (line 422) | fn repetition_options_parser(value: &str) -> Result<u32, String> {
function main (line 437) | fn main() {
function main (line 447) | fn main() {}
FILE: src/python.rs
function grex (line 29) | fn grex(m: &Bound<'_, PyModule>) -> PyResult<()> {
method new (line 37) | fn new(test_cases: Vec<String>) -> PyResult<Self> {
method from_test_cases (line 58) | fn from_test_cases(_cls: &Bound<PyType>, test_cases: Vec<String>) -> PyR...
method py_with_conversion_of_digits (line 70) | fn py_with_conversion_of_digits(mut self_: PyRefMut<Self>) -> PyRefMut<S...
method py_with_conversion_of_non_digits (line 83) | fn py_with_conversion_of_non_digits(mut self_: PyRefMut<Self>) -> PyRefM...
method py_with_conversion_of_whitespace (line 96) | fn py_with_conversion_of_whitespace(mut self_: PyRefMut<Self>) -> PyRefM...
method py_with_conversion_of_non_whitespace (line 103) | fn py_with_conversion_of_non_whitespace(mut self_: PyRefMut<Self>) -> Py...
method py_with_conversion_of_words (line 116) | fn py_with_conversion_of_words(mut self_: PyRefMut<Self>) -> PyRefMut<Se...
method py_with_conversion_of_non_words (line 126) | fn py_with_conversion_of_non_words(mut self_: PyRefMut<Self>) -> PyRefMu...
method py_with_conversion_of_repetitions (line 133) | fn py_with_conversion_of_repetitions(mut self_: PyRefMut<Self>) -> PyRef...
method py_with_case_insensitive_matching (line 140) | fn py_with_case_insensitive_matching(mut self_: PyRefMut<Self>) -> PyRef...
method py_with_capturing_groups (line 147) | fn py_with_capturing_groups(mut self_: PyRefMut<Self>) -> PyRefMut<Self> {
method py_with_minimum_repetitions (line 162) | fn py_with_minimum_repetitions(
method py_with_minimum_substring_length (line 184) | fn py_with_minimum_substring_length(
method py_with_escaping_of_non_ascii_chars (line 204) | fn py_with_escaping_of_non_ascii_chars(
method py_with_verbose_mode (line 215) | fn py_with_verbose_mode(mut self_: PyRefMut<Self>) -> PyRefMut<Self> {
method py_without_start_anchor (line 223) | fn py_without_start_anchor(mut self_: PyRefMut<Self>) -> PyRefMut<Self> {
method py_without_end_anchor (line 231) | fn py_without_end_anchor(mut self_: PyRefMut<Self>) -> PyRefMut<Self> {
method py_without_anchors (line 240) | fn py_without_anchors(mut self_: PyRefMut<Self>) -> PyRefMut<Self> {
method py_build (line 248) | fn py_build(&mut self) -> String {
function replace_unicode_escape_sequences (line 259) | fn replace_unicode_escape_sequences(regexp: String) -> String {
FILE: src/quantifier.rs
type Quantifier (line 20) | pub(crate) enum Quantifier {
method fmt (line 26) | fn fmt(&self, f: &mut Formatter<'_>) -> Result {
FILE: src/regexp.rs
type RegExp (line 27) | pub(crate) struct RegExp<'a> {
function from (line 33) | pub(crate) fn from(test_cases: &'a mut Vec<String>, config: &'a RegExpCo...
function convert_for_case_insensitive_matching (line 71) | fn convert_for_case_insensitive_matching(test_cases: &mut Vec<String>) {
function convert_expr_to_regex (line 88) | fn convert_expr_to_regex(expr: &Expression, config: &RegExpConfig) -> Re...
function regex_matches_all_test_cases (line 97) | fn regex_matches_all_test_cases(regex: &Regex, test_cases: &[String]) ->...
function sort (line 103) | fn sort(test_cases: &mut Vec<String>) {
function grapheme_clusters (line 112) | fn grapheme_clusters(
function is_each_test_case_matched_after_rotating_alternations (line 136) | fn is_each_test_case_matched_after_rotating_alternations(
method fmt (line 162) | fn fmt(&self, f: &mut Formatter<'_>) -> Result {
function indent_regexp (line 248) | fn indent_regexp(regexp: String, config: &RegExpConfig) -> String {
FILE: src/substring.rs
type Substring (line 17) | pub(crate) enum Substring {
FILE: src/unicode_tables/decimal.rs
constant DECIMAL_NUMBER (line 25) | pub const DECIMAL_NUMBER: &[(char, char)] = &[
FILE: src/unicode_tables/space.rs
constant WHITE_SPACE (line 25) | pub const WHITE_SPACE: &[(char, char)] = &[
FILE: src/unicode_tables/word.rs
constant WORD (line 25) | pub const WORD: &[(char, char)] = &[
FILE: src/wasm.rs
type RegExpBuilder (line 29) | pub struct RegExpBuilder {
method from (line 40) | pub fn from(testCases: Box<[JsValue]>) -> Result<RegExpBuilder, JsValu...
method withConversionOfDigits (line 61) | pub fn withConversionOfDigits(&mut self) -> RegExpBuilder {
method withConversionOfNonDigits (line 74) | pub fn withConversionOfNonDigits(&mut self) -> RegExpBuilder {
method withConversionOfWhitespace (line 86) | pub fn withConversionOfWhitespace(&mut self) -> RegExpBuilder {
method withConversionOfNonWhitespace (line 93) | pub fn withConversionOfNonWhitespace(&mut self) -> RegExpBuilder {
method withConversionOfWords (line 105) | pub fn withConversionOfWords(&mut self) -> RegExpBuilder {
method withConversionOfNonWords (line 115) | pub fn withConversionOfNonWords(&mut self) -> RegExpBuilder {
method withConversionOfRepetitions (line 122) | pub fn withConversionOfRepetitions(&mut self) -> RegExpBuilder {
method withCaseInsensitiveMatching (line 129) | pub fn withCaseInsensitiveMatching(&mut self) -> RegExpBuilder {
method withCapturingGroups (line 135) | pub fn withCapturingGroups(&mut self) -> RegExpBuilder {
method withEscapingOfNonAsciiChars (line 143) | pub fn withEscapingOfNonAsciiChars(&mut self, useSurrogatePairs: bool)...
method withVerboseMode (line 152) | pub fn withVerboseMode(&mut self) -> RegExpBuilder {
method withoutStartAnchor (line 160) | pub fn withoutStartAnchor(&mut self) -> RegExpBuilder {
method withoutEndAnchor (line 168) | pub fn withoutEndAnchor(&mut self) -> RegExpBuilder {
method withoutAnchors (line 176) | pub fn withoutAnchors(&mut self) -> RegExpBuilder {
method withMinimumRepetitions (line 188) | pub fn withMinimumRepetitions(&mut self, quantity: u32) -> Result<RegE...
method withMinimumSubstringLength (line 202) | pub fn withMinimumSubstringLength(&mut self, length: u32) -> Result<Re...
method build (line 211) | pub fn build(&mut self) -> String {
FILE: tests/cli_integration_tests.rs
constant TEST_CASE (line 25) | const TEST_CASE: &str = "I ♥♥♥ 36 and ٣ and y̆y̆ and 💩💩.";
function succeeds (line 34) | fn succeeds() {
function succeeds_with_ignore_case_option (line 43) | fn succeeds_with_ignore_case_option() {
function succeeds_with_leading_hyphen (line 52) | fn succeeds_with_leading_hyphen() {
function succeeds_with_escape_option (line 61) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 70) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 79) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 93) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 107) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds_with_file_input (line 121) | fn succeeds_with_file_input() {
function succeeds_with_test_cases_from_stdin (line 133) | fn succeeds_with_test_cases_from_stdin() {
function succeeds_with_file_from_stdin (line 142) | fn succeeds_with_file_from_stdin() {
function fails_with_surrogate_but_without_escape_option (line 154) | fn fails_with_surrogate_but_without_escape_option() {
function fails_without_arguments (line 163) | fn fails_without_arguments() {
function fails_when_file_name_is_not_provided (line 171) | fn fails_when_file_name_is_not_provided() {
function fails_when_file_does_not_exist (line 180) | fn fails_when_file_does_not_exist() {
function fails_with_first_file_input_and_then_direct_input (line 192) | fn fails_with_first_file_input_and_then_direct_input() {
function succeeds (line 205) | fn succeeds() {
function succeeds_with_ignore_case_option (line 214) | fn succeeds_with_ignore_case_option() {
function succeeds_with_escape_option (line 223) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 232) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 241) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 259) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 277) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds_with_increased_minimum_repetitions (line 305) | fn succeeds_with_increased_minimum_repetitions() {
function succeeds_with_increased_minimum_substring_length (line 314) | fn succeeds_with_increased_minimum_substring_length() {
function fails_with_minimum_repetitions_equal_to_zero (line 323) | fn fails_with_minimum_repetitions_equal_to_zero() {
function fails_with_minimum_repetitions_equal_to_invalid_value (line 332) | fn fails_with_minimum_repetitions_equal_to_invalid_value() {
function fails_with_minimum_substring_length_equal_to_zero (line 341) | fn fails_with_minimum_substring_length_equal_to_zero() {
function fails_with_minimum_substring_length_equal_to_invalid_value (line 350) | fn fails_with_minimum_substring_length_equal_to_invalid_value() {
function succeeds (line 367) | fn succeeds() {
function succeeds_with_escape_option (line 376) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 385) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 394) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 408) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 422) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds_with_capturing_groups_option (line 442) | fn succeeds_with_capturing_groups_option() {
function succeeds_with_syntax_highlighting (line 451) | fn succeeds_with_syntax_highlighting() {
function succeeds (line 464) | fn succeeds() {
function succeeds_with_escape_option (line 473) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 482) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 497) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 518) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 545) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds_with_increased_minimum_repetitions (line 577) | fn succeeds_with_increased_minimum_repetitions() {
function succeeds_with_increased_minimum_substring_length (line 592) | fn succeeds_with_increased_minimum_substring_length() {
function succeeds (line 615) | fn succeeds() {
function succeeds_with_escape_option (line 624) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 633) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 642) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 656) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 670) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 694) | fn succeeds() {
function succeeds_with_escape_option (line 703) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 712) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 727) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 745) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 769) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 806) | fn succeeds() {
function succeeds_with_escape_option (line 815) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 824) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 833) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 847) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 861) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 885) | fn succeeds() {
function succeeds_with_escape_option (line 894) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 903) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 918) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 939) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 966) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1005) | fn succeeds() {
function succeeds_with_escape_option (line 1014) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1023) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1038) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1052) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1066) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1091) | fn succeeds() {
function succeeds_with_escape_option (line 1100) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1115) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1131) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1158) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1186) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1227) | fn succeeds() {
function succeeds_with_escape_option (line 1236) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1245) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1260) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1274) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1288) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1313) | fn succeeds() {
function succeeds_with_escape_option (line 1322) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1337) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1353) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1381) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1410) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1451) | fn succeeds() {
function succeeds_with_escape_option (line 1460) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1469) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1484) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1498) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1512) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1537) | fn succeeds() {
function succeeds_with_escape_option (line 1546) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1561) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1577) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1604) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1632) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1672) | fn succeeds() {
function succeeds_with_escape_option (line 1681) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1690) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1706) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1720) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1741) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1767) | fn succeeds() {
function succeeds_with_escape_option (line 1782) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1798) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1815) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1844) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1874) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1916) | fn succeeds() {
function succeeds_with_escape_option (line 1925) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 1934) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 1943) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 1957) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 1971) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 1995) | fn succeeds() {
function succeeds_with_escape_option (line 2004) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2013) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2028) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2042) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2062) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2091) | fn succeeds() {
function succeeds_with_escape_option (line 2100) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2109) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2118) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2132) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2146) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2170) | fn succeeds() {
function succeeds_with_escape_option (line 2179) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2188) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2203) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2227) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2257) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2296) | fn succeeds() {
function succeeds_with_escape_option (line 2305) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2314) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2323) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2337) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2351) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2375) | fn succeeds() {
function succeeds_with_escape_option (line 2384) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2393) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2408) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2426) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2450) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2483) | fn succeeds() {
function succeeds_with_escape_option (line 2492) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2501) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2516) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2530) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2550) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2575) | fn succeeds() {
function succeeds_with_escape_option (line 2584) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2599) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2615) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2635) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2656) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2686) | fn succeeds() {
function succeeds_with_escape_option (line 2695) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2704) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2719) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2733) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2753) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2778) | fn succeeds() {
function succeeds_with_escape_option (line 2787) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2802) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2818) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2838) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2859) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2889) | fn succeeds() {
function succeeds_with_escape_option (line 2898) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 2907) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 2922) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 2936) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 2956) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 2981) | fn succeeds() {
function succeeds_with_escape_option (line 2990) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3005) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3021) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3045) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3070) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3104) | fn succeeds() {
function succeeds_with_escape_option (line 3113) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3128) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3144) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3164) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3185) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3211) | fn succeeds() {
function succeeds_with_escape_option (line 3226) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3242) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3259) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3280) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3302) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3333) | fn succeeds() {
function succeeds_with_escape_option (line 3342) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3351) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3366) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3380) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3400) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3425) | fn succeeds() {
function succeeds_with_escape_option (line 3434) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3449) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3465) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3485) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3506) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3536) | fn succeeds() {
function succeeds_with_escape_option (line 3545) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3554) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3569) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3583) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3603) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3628) | fn succeeds() {
function succeeds_with_escape_option (line 3637) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3652) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3668) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3698) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3729) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3769) | fn succeeds() {
function succeeds_with_escape_option (line 3778) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3787) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3802) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3816) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3830) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds (line 3855) | fn succeeds() {
function succeeds_with_escape_option (line 3864) | fn succeeds_with_escape_option() {
function succeeds_with_escape_and_surrogate_option (line 3879) | fn succeeds_with_escape_and_surrogate_option() {
function succeeds_with_verbose_mode_option (line 3895) | fn succeeds_with_verbose_mode_option() {
function succeeds_with_escape_and_verbose_mode_option (line 3919) | fn succeeds_with_escape_and_verbose_mode_option() {
function succeeds_with_escape_and_surrogate_and_verbose_mode_option (line 3944) | fn succeeds_with_escape_and_surrogate_and_verbose_mode_option() {
function succeeds_with_no_start_anchor_option (line 3978) | fn succeeds_with_no_start_anchor_option() {
function succeeds_with_no_end_anchor_option (line 3987) | fn succeeds_with_no_end_anchor_option() {
function succeeds_with_no_anchors_option (line 3996) | fn succeeds_with_no_anchors_option() {
function succeeds_with_verbose_mode_and_no_start_anchor_option (line 4009) | fn succeeds_with_verbose_mode_and_no_start_anchor_option() {
function succeeds_with_verbose_mode_and_no_end_anchor_option (line 4022) | fn succeeds_with_verbose_mode_and_no_end_anchor_option() {
function succeeds_with_verbose_mode_and_no_anchors_option (line 4035) | fn succeeds_with_verbose_mode_and_no_anchors_option() {
function init_command (line 4048) | fn init_command() -> Command {
FILE: tests/lib_integration_tests.rs
function succeeds (line 107) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_ignore_case_option (line 119) | fn succeeds_with_ignore_case_option(test_cases: Vec<&str>, expected_outp...
function succeeds_with_escape_option (line 135) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 151) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_capturing_groups_option (line 164) | fn succeeds_with_capturing_groups_option(test_cases: Vec<&str>, expected...
function succeeds_with_verbose_mode_option (line 315) | fn succeeds_with_verbose_mode_option(test_cases: Vec<&str>, expected_out...
function succeeds_with_ignore_case_and_verbose_mode_option (line 351) | fn succeeds_with_ignore_case_and_verbose_mode_option(
function succeeds_with_file_input (line 364) | fn succeeds_with_file_input() {
function succeeds (line 439) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_ignore_case_option (line 452) | fn succeeds_with_ignore_case_option(test_cases: Vec<&str>, expected_outp...
function succeeds_with_escape_option (line 469) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 486) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_verbose_mode_option (line 621) | fn succeeds_with_verbose_mode_option(test_cases: Vec<&str>, expected_out...
function succeeds_with_increased_minimum_repetitions (line 655) | fn succeeds_with_increased_minimum_repetitions(
function succeeds_with_increased_minimum_substring_length (line 675) | fn succeeds_with_increased_minimum_substring_length(
function succeeds_with_increased_minimum_repetitions_and_substring_length (line 695) | fn succeeds_with_increased_minimum_repetitions_and_substring_length(
function succeeds (line 736) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 750) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 765) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 783) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 798) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 814) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_increased_minimum_repetitions (line 832) | fn succeeds_with_increased_minimum_repetitions(
function succeeds (line 867) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 881) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 896) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 914) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 929) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 945) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_increased_minimum_repetitions (line 966) | fn succeeds_with_increased_minimum_repetitions(
function succeeds (line 1005) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1019) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1034) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1052) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1067) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1083) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_increased_minimum_repetitions (line 1104) | fn succeeds_with_increased_minimum_repetitions(
function succeeds (line 1131) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1146) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1162) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1181) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1197) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1214) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds_with_increased_minimum_repetitions (line 1236) | fn succeeds_with_increased_minimum_repetitions(
function succeeds_with_increased_minimum_substring_length (line 1254) | fn succeeds_with_increased_minimum_substring_length(
function succeeds_with_increased_minimum_repetitions_and_substring_length (line 1272) | fn succeeds_with_increased_minimum_repetitions_and_substring_length(
function succeeds (line 1299) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1314) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1330) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1349) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1365) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1382) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1406) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1421) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1437) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1456) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1472) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1489) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1513) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1529) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1546) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1566) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1583) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds_with_escape_and_surrogate_option (line 1601) | fn succeeds_with_escape_and_surrogate_option(test_cases: Vec<&str>, expe...
function succeeds (line 1626) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1640) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds (line 1656) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1668) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds (line 1692) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1710) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1733) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1747) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds (line 1766) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_escape_option (line 1781) | fn succeeds_with_escape_option(test_cases: Vec<&str>, expected_output: &...
function succeeds (line 1805) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1821) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1845) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1861) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1885) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1904) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1928) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1945) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1970) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 1986) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 2010) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 2029) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 2053) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds (line 2072) | fn succeeds(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_without_start_anchor_option (line 2094) | fn succeeds_without_start_anchor_option(test_cases: Vec<&str>, expected_...
function succeeds_without_end_anchor_option (line 2106) | fn succeeds_without_end_anchor_option(test_cases: Vec<&str>, expected_ou...
function succeeds_without_anchors (line 2126) | fn succeeds_without_anchors(test_cases: Vec<&str>, expected_output: &str) {
function succeeds_with_verbose_mode_and_without_start_anchor_option (line 2144) | fn succeeds_with_verbose_mode_and_without_start_anchor_option(
function succeeds_with_verbose_mode_and_without_end_anchor_option (line 2164) | fn succeeds_with_verbose_mode_and_without_end_anchor_option(
function succeeds_with_verbose_mode_and_without_anchors_option (line 2192) | fn succeeds_with_verbose_mode_and_without_anchors_option(
function assert_that_regexp_is_correct (line 2206) | fn assert_that_regexp_is_correct(regexp: String, expected_output: &str, ...
function assert_that_regexp_matches_test_cases (line 2214) | fn assert_that_regexp_matches_test_cases(expected_output: &str, test_cas...
FILE: tests/property_tests.rs
function compile_regexp (line 633) | fn compile_regexp(regexp: &str) -> Result<Regex, Error> {
FILE: tests/python/test_grex.py
function test_default_settings (line 29) | def test_default_settings(test_cases, expected_pattern):
function test_escaping (line 42) | def test_escaping(test_cases, expected_pattern):
function test_escaping_with_surrogate_pairs (line 57) | def test_escaping_with_surrogate_pairs(test_cases, expected_pattern):
function test_capturing_groups (line 71) | def test_capturing_groups(test_cases, expected_pattern):
function test_without_anchors (line 86) | def test_without_anchors(test_cases, expected_pattern):
function test_case_insensitive_matching (line 101) | def test_case_insensitive_matching(test_cases, expected_pattern):
function test_verbose_mode (line 128) | def test_verbose_mode(test_cases, expected_pattern):
function test_case_insensitive_matching_and_verbose_mode (line 151) | def test_case_insensitive_matching_and_verbose_mode(test_cases, expected...
function test_conversion_of_repetitions (line 167) | def test_conversion_of_repetitions(test_cases, expected_pattern):
function test_escaping_and_conversion_of_repetitions (line 182) | def test_escaping_and_conversion_of_repetitions(test_cases, expected_pat...
function test_conversion_of_digits (line 198) | def test_conversion_of_digits(test_cases, expected_pattern):
function test_conversion_of_non_digits (line 213) | def test_conversion_of_non_digits(test_cases, expected_pattern):
function test_conversion_of_whitespace (line 228) | def test_conversion_of_whitespace(test_cases, expected_pattern):
function test_conversion_of_non_whitespace (line 243) | def test_conversion_of_non_whitespace(test_cases, expected_pattern):
function test_conversion_of_words (line 258) | def test_conversion_of_words(test_cases, expected_pattern):
function test_conversion_of_non_words (line 273) | def test_conversion_of_non_words(test_cases, expected_pattern):
function test_minimum_repetitions (line 289) | def test_minimum_repetitions(test_cases, expected_pattern):
function test_minimum_substring_length (line 306) | def test_minimum_substring_length(test_cases, expected_pattern):
function test_error_for_empty_test_cases (line 316) | def test_error_for_empty_test_cases():
function test_error_for_invalid_minimum_repetitions (line 325) | def test_error_for_invalid_minimum_repetitions():
function test_error_for_invalid_minimum_substring_length (line 334) | def test_error_for_invalid_minimum_substring_length():
FILE: tests/wasm_browser_tests.rs
function assert_regexpbuilder_succeeds (line 27) | fn assert_regexpbuilder_succeeds() {
function assert_regexpbuilder_fails (line 36) | fn assert_regexpbuilder_fails() {
function test_conversion_of_digits (line 47) | fn test_conversion_of_digits() {
function test_conversion_of_non_digits (line 57) | fn test_conversion_of_non_digits() {
function test_conversion_of_whitespace (line 67) | fn test_conversion_of_whitespace() {
function test_conversion_of_non_whitespace (line 77) | fn test_conversion_of_non_whitespace() {
function test_conversion_of_words (line 87) | fn test_conversion_of_words() {
function test_conversion_of_non_words (line 97) | fn test_conversion_of_non_words() {
function test_conversion_of_repetitions (line 107) | fn test_conversion_of_repetitions() {
function test_case_insensitive_matching (line 117) | fn test_case_insensitive_matching() {
function test_capturing_groups (line 131) | fn test_capturing_groups() {
function test_escaping_of_non_ascii_chars (line 141) | fn test_escaping_of_non_ascii_chars() {
function test_verbose_mode (line 155) | fn test_verbose_mode() {
function test_without_start_anchor (line 184) | fn test_without_start_anchor() {
function test_without_end_anchor (line 194) | fn test_without_end_anchor() {
function test_without_anchors (line 204) | fn test_without_anchors() {
function test_minimum_repetitions (line 214) | fn test_minimum_repetitions() {
function test_minimum_substring_length (line 228) | fn test_minimum_substring_length() {
FILE: tests/wasm_node_tests.rs
function assert_regexpbuilder_succeeds (line 25) | fn assert_regexpbuilder_succeeds() {
function assert_regexpbuilder_fails (line 34) | fn assert_regexpbuilder_fails() {
function test_conversion_of_digits (line 45) | fn test_conversion_of_digits() {
function test_conversion_of_non_digits (line 55) | fn test_conversion_of_non_digits() {
function test_conversion_of_whitespace (line 65) | fn test_conversion_of_whitespace() {
function test_conversion_of_non_whitespace (line 75) | fn test_conversion_of_non_whitespace() {
function test_conversion_of_words (line 85) | fn test_conversion_of_words() {
function test_conversion_of_non_words (line 95) | fn test_conversion_of_non_words() {
function test_conversion_of_repetitions (line 105) | fn test_conversion_of_repetitions() {
function test_case_insensitive_matching (line 115) | fn test_case_insensitive_matching() {
function test_capturing_groups (line 129) | fn test_capturing_groups() {
function test_escaping_of_non_ascii_chars (line 139) | fn test_escaping_of_non_ascii_chars() {
function test_verbose_mode (line 153) | fn test_verbose_mode() {
function test_without_start_anchor (line 182) | fn test_without_start_anchor() {
function test_without_end_anchor (line 192) | fn test_without_end_anchor() {
function test_without_anchors (line 202) | fn test_without_anchors() {
function test_minimum_repetitions (line 212) | fn test_minimum_repetitions() {
function test_minimum_substring_length (line 226) | fn test_minimum_substring_length() {
Condensed preview — 44 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (560K chars).
[
{
"path": ".editorconfig",
"chars": 876,
"preview": "# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
},
{
"path": ".github/dependabot.yml",
"chars": 206,
"preview": "version: 2\nupdates:\n - package-ecosystem: \"cargo\"\n directory: \"/\"\n schedule:\n interval: \"daily\"\n\n - package"
},
{
"path": ".github/workflows/python-build.yml",
"chars": 2146,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": ".github/workflows/release.yml",
"chars": 6883,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": ".github/workflows/rust-build.yml",
"chars": 4422,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": ".gitignore",
"chars": 1008,
"preview": "# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
},
{
"path": "Cargo.toml",
"chars": 2046,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "LICENSE",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 22251,
"preview": "<div align=\"center\">\n\n \n\n <br>\n\n [\n\n<br>\n\n[\n\n### Improvements\n- All characters from the current Unicode standard 16.0 are no"
},
{
"path": "benches/benchmark.rs",
"chars": 4866,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "benches/testcases.txt",
"chars": 663,
"preview": "Rocket Sled\nElysian Heirloom\nKaleb's Favor\nBlazing Renegade\nFlash Fire\nSilence\nTalir's Favored\nTimekeeper\nOasis Sanctuar"
},
{
"path": "demo.tape",
"chars": 582,
"preview": "# demo.gif created with https://github.com/charmbracelet/vhs on macOS 13 (Ventura)\n\nRequire grex\nOutput demo.gif\n\nSet Sh"
},
{
"path": "grex.pyi",
"chars": 6626,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "pyproject.toml",
"chars": 1218,
"preview": "[project]\nname = \"grex\"\nversion = \"1.0.2\"\nauthors = [{name = \"Peter M. Stahl\", email = \"pemistahl@gmail.com\"}]\ndescripti"
},
{
"path": "requirements.txt",
"chars": 34,
"preview": "maturin == 1.10.1\npytest == 9.0.1\n"
},
{
"path": "src/builder.rs",
"chars": 11082,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/char_range.rs",
"chars": 3409,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/cluster.rs",
"chars": 11907,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/component.rs",
"chars": 14077,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/config.rs",
"chars": 2741,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/dfa.rs",
"chars": 15044,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/expression.rs",
"chars": 23720,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/format.rs",
"chars": 10109,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/grapheme.rs",
"chars": 7962,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/lib.rs",
"chars": 10797,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/macros.rs",
"chars": 796,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/main.rs",
"chars": 16375,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/python.rs",
"chars": 11031,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/quantifier.rs",
"chars": 1060,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/regexp.rs",
"chars": 9928,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/substring.rs",
"chars": 679,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/unicode_tables/decimal.rs",
"chars": 2037,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/unicode_tables/mod.rs",
"chars": 741,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/unicode_tables/space.rs",
"chars": 1158,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/unicode_tables/word.rs",
"chars": 15862,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "src/wasm.rs",
"chars": 9033,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "tests/cli_integration_tests.rs",
"chars": 123600,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "tests/lib_integration_tests.rs",
"chars": 90001,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "tests/property_tests.rs",
"chars": 25047,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "tests/python/test_grex.py",
"chars": 10368,
"preview": "#\n# Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tests/wasm_browser_tests.rs",
"chars": 6890,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
},
{
"path": "tests/wasm_node_tests.rs",
"chars": 6824,
"preview": "/*\n * Copyright © 2019-today Peter M. Stahl pemistahl@gmail.com\n *\n * Licensed under the Apache License, Version 2.0 (th"
}
]
About this extraction
This page contains the full source code of the pemistahl/grex GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 44 files (513.5 KB), approximately 135.9k tokens, and a symbol index with 641 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.