Repository: uhop/node-re2 Branch: master Commit: 209646f8995e Files: 103 Total size: 248.6 KB Directory structure: gitextract_8333oox3/ ├── .clinerules ├── .cursorrules ├── .editorconfig ├── .github/ │ ├── COPILOT-INSTRUCTIONS.md │ ├── FUNDING.yml │ ├── actions/ │ │ ├── linux-alpine-node-20/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-alpine-node-22/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-alpine-node-24/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-alpine-node-25/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-node-20/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-node-22/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ ├── linux-node-24/ │ │ │ ├── Dockerfile │ │ │ ├── action.yml │ │ │ └── entrypoint.sh │ │ └── linux-node-25/ │ │ ├── Dockerfile │ │ ├── action.yml │ │ └── entrypoint.sh │ ├── dependabot.yml │ └── workflows/ │ ├── build.yml │ └── tests.yml ├── .gitignore ├── .gitmodules ├── .prettierignore ├── .prettierrc ├── .vscode/ │ ├── c_cpp_properties.json │ ├── launch.json │ ├── settings.json │ └── tasks.json ├── .windsurf/ │ ├── skills/ │ │ ├── docs-review/ │ │ │ └── SKILL.md │ │ └── write-tests/ │ │ └── SKILL.md │ └── workflows/ │ ├── add-module.md │ ├── ai-docs-update.md │ └── release-check.md ├── .windsurfrules ├── AGENTS.md ├── ARCHITECTURE.md ├── CLAUDE.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── bench/ │ ├── bad-pattern.mjs │ └── set-match.mjs ├── binding.gyp ├── lib/ │ ├── accessors.cc │ ├── addon.cc │ ├── exec.cc │ ├── isolate_data.h │ ├── match.cc │ ├── new.cc │ ├── pattern.cc │ ├── pattern.h │ ├── replace.cc │ ├── search.cc │ ├── set.cc │ ├── split.cc │ ├── test.cc │ ├── to_string.cc │ ├── util.cc │ ├── util.h │ ├── wrapped_re2.h │ └── wrapped_re2_set.h ├── llms-full.txt ├── llms.txt ├── package.json ├── re2.d.ts ├── re2.js ├── scripts/ │ └── verify-build.js ├── tests/ │ ├── manual/ │ │ ├── matchall-bench.js │ │ ├── memory-check.js │ │ ├── memory-monitor.js │ │ ├── test-unicode-warning.mjs │ │ └── worker.js │ ├── test-cjs.cjs │ ├── test-exec.mjs │ ├── test-general.mjs │ ├── test-groups.mjs │ ├── test-invalid.mjs │ ├── test-match.mjs │ ├── test-matchAll.mjs │ ├── test-prototype.mjs │ ├── test-replace.mjs │ ├── test-search.mjs │ ├── test-set.mjs │ ├── test-source.mjs │ ├── test-split.mjs │ ├── test-symbols.mjs │ ├── test-test.mjs │ ├── test-toString.mjs │ └── test-unicode-classes.mjs ├── ts-tests/ │ └── test-types.ts └── tsconfig.json ================================================ FILE CONTENTS ================================================ ================================================ FILE: .clinerules ================================================ # node-re2 — AI Agent Rules ## Project identity node-re2 provides Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`. ## Critical rules - **CommonJS.** The project is `"type": "commonjs"`. Use `require()` in source, `import` in tests (`.mjs`). - **No transpilation.** JavaScript code runs directly. - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules. - **Do not modify or delete test expectations** without understanding why they changed. - **Do not add comments or remove comments** unless explicitly asked. - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`. - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64). - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly. - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input. ## Code style - C++ code: tabs, 4-wide indentation. JavaScript: 2-space indentation. - Prettier: 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid" (see `.prettierrc`). - nan (Native Abstractions for Node.js) for the C++ addon API. - Semicolons are enforced by Prettier (default `semi: true`). ## Architecture quick reference - `re2.js` is the main entry point. Loads `build/Release/re2.node`, sets up Symbol aliases (`Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, `Symbol.matchAll`). - C++ addon (`lib/*.cc`) wraps Google's RE2 via nan. Each RegExp method has its own `.cc` file. - `lib/new.cc` handles construction: parse pattern/flags, translate RegExp → RE2 syntax (via `lib/pattern.cc`). - `lib/pattern.cc` translates Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`). - `lib/set.cc` implements `RE2.Set` for multi-pattern matching. - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion and buffer helpers. - Prebuilt artifacts downloaded at install time via `install-artifact-from-github`. ## Verification commands - `npm test` — run the full test suite (worker threads) - `node tests/test-.mjs` — run a single test file directly - `npm run test:seq` — run sequentially - `npm run test:proc` — run multi-process - `npm run ts-check` — TypeScript type checking - `npm run lint` — Prettier check - `npm run lint:fix` — Prettier write - `npm run verify-build` — quick smoke test - `npm run rebuild` — rebuild the native addon (release) - `npm run rebuild:dev` — rebuild the native addon (debug) ## File layout - Entry point: `re2.js` + `re2.d.ts` - C++ addon: `lib/*.cc`, `lib/*.h` - Build config: `binding.gyp` - Tests: `tests/test-*.mjs` - TypeScript tests: `ts-tests/test-*.ts` - Benchmarks: `bench/` - Vendored deps: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) - CI: `.github/workflows/`, `.github/actions/` ## When reading the codebase - Start with `ARCHITECTURE.md` for the module map and dependency graph. - `re2.d.ts` is the best API reference for the public API. It includes `internalSource` and Buffer overloads. - `re2.js` is tiny — read it first for the JS-side setup. - `lib/addon.cc` shows how all C++ methods are registered. - `lib/wrapped_re2.h` defines the core C++ class. ================================================ FILE: .cursorrules ================================================ # node-re2 — AI Agent Rules ## Project identity node-re2 provides Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`. ## Critical rules - **CommonJS.** The project is `"type": "commonjs"`. Use `require()` in source, `import` in tests (`.mjs`). - **No transpilation.** JavaScript code runs directly. - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules. - **Do not modify or delete test expectations** without understanding why they changed. - **Do not add comments or remove comments** unless explicitly asked. - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`. - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64). - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly. - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input. ## Code style - C++ code: tabs, 4-wide indentation. JavaScript: 2-space indentation. - Prettier: 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid" (see `.prettierrc`). - nan (Native Abstractions for Node.js) for the C++ addon API. - Semicolons are enforced by Prettier (default `semi: true`). ## Architecture quick reference - `re2.js` is the main entry point. Loads `build/Release/re2.node`, sets up Symbol aliases (`Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, `Symbol.matchAll`). - C++ addon (`lib/*.cc`) wraps Google's RE2 via nan. Each RegExp method has its own `.cc` file. - `lib/new.cc` handles construction: parse pattern/flags, translate RegExp → RE2 syntax (via `lib/pattern.cc`). - `lib/pattern.cc` translates Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`). - `lib/set.cc` implements `RE2.Set` for multi-pattern matching. - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion and buffer helpers. - Prebuilt artifacts downloaded at install time via `install-artifact-from-github`. ## Verification commands - `npm test` — run the full test suite (worker threads) - `node tests/test-.mjs` — run a single test file directly - `npm run test:seq` — run sequentially - `npm run test:proc` — run multi-process - `npm run ts-check` — TypeScript type checking - `npm run lint` — Prettier check - `npm run lint:fix` — Prettier write - `npm run verify-build` — quick smoke test - `npm run rebuild` — rebuild the native addon (release) - `npm run rebuild:dev` — rebuild the native addon (debug) ## File layout - Entry point: `re2.js` + `re2.d.ts` - C++ addon: `lib/*.cc`, `lib/*.h` - Build config: `binding.gyp` - Tests: `tests/test-*.mjs` - TypeScript tests: `ts-tests/test-*.ts` - Benchmarks: `bench/` - Vendored deps: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) - CI: `.github/workflows/`, `.github/actions/` ## When reading the codebase - Start with `ARCHITECTURE.md` for the module map and dependency graph. - `re2.d.ts` is the best API reference for the public API. It includes `internalSource` and Buffer overloads. - `re2.js` is tiny — read it first for the JS-side setup. - `lib/addon.cc` shows how all C++ methods are registered. - `lib/wrapped_re2.h` defines the core C++ class. ================================================ FILE: .editorconfig ================================================ root = true [*] charset = utf-8 end_of_line = lf insert_final_newline = true trim_trailing_whitespace = true indent_style = space indent_size = 2 [*.{h,cc,cpp}] indent_style = tab indent_size = 4 ================================================ FILE: .github/COPILOT-INSTRUCTIONS.md ================================================ See [AGENTS.md](../AGENTS.md) for all AI agent rules and project conventions. ================================================ FILE: .github/FUNDING.yml ================================================ github: uhop buy_me_a_coffee: uhop ================================================ FILE: .github/actions/linux-alpine-node-20/Dockerfile ================================================ FROM node:20-alpine RUN apk add --no-cache python3 make gcc g++ linux-headers COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-alpine-node-20/action.yml ================================================ name: 'Create a binary artifact for Node 20 on Alpine Linux' description: 'Create a binary artifact for Node 20 on Alpine Linux using musl' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-alpine-node-20/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-alpine-node-22/Dockerfile ================================================ FROM node:22-alpine RUN apk add --no-cache python3 make gcc g++ linux-headers COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-alpine-node-22/action.yml ================================================ name: 'Create a binary artifact for Node 22 on Alpine Linux' description: 'Create a binary artifact for Node 22 on Alpine Linux using musl' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-alpine-node-22/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-alpine-node-24/Dockerfile ================================================ FROM node:24-alpine RUN apk add --no-cache python3 make gcc g++ linux-headers COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-alpine-node-24/action.yml ================================================ name: 'Create a binary artifact for Node 24 on Alpine Linux' description: 'Create a binary artifact for Node 24 on Alpine Linux using musl' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-alpine-node-24/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-alpine-node-25/Dockerfile ================================================ FROM node:25-alpine RUN apk add --no-cache python3 make gcc g++ linux-headers COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-alpine-node-25/action.yml ================================================ name: 'Create a binary artifact for Node 25 on Alpine Linux' description: 'Create a binary artifact for Node 25 on Alpine Linux using musl' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-alpine-node-25/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-node-20/Dockerfile ================================================ FROM node:20-bullseye RUN apt install python3 make gcc g++ COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-node-20/action.yml ================================================ name: 'Create a binary artifact for Node 20 on Debian Bullseye Linux' description: 'Create a binary artifact for Node 20 on Debian Bullseye Linux' inputs: node-version: description: 'Node.js version' required: false default: '20' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-node-20/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-node-22/Dockerfile ================================================ FROM node:22-bullseye RUN apt install python3 make gcc g++ COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-node-22/action.yml ================================================ name: 'Create a binary artifact for Node 22 on Debian Bullseye Linux' description: 'Create a binary artifact for Node 22 on Debian Bullseye Linux' inputs: node-version: description: 'Node.js version' required: false default: '22' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-node-22/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-node-24/Dockerfile ================================================ FROM node:24-bullseye RUN apt install python3 make gcc g++ COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-node-24/action.yml ================================================ name: 'Create a binary artifact for Node 24 on Debian Bullseye Linux' description: 'Create a binary artifact for Node 24 on Debian Bullseye Linux' inputs: node-version: description: 'Node.js version' required: false default: '24' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-node-24/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/actions/linux-node-25/Dockerfile ================================================ FROM node:25-trixie RUN apt install python3 make gcc g++ COPY entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] ================================================ FILE: .github/actions/linux-node-25/action.yml ================================================ name: 'Create a binary artifact for Node 25 on Debian Trixie Linux' description: 'Create a binary artifact for Node 25 on Debian Trixie Linux' inputs: node-version: description: 'Node.js version' required: false default: '25' runs: using: 'docker' image: 'Dockerfile' args: - ${{inputs.node-version}} ================================================ FILE: .github/actions/linux-node-25/entrypoint.sh ================================================ #!/bin/sh set -e export USERNAME=`whoami` export DEVELOPMENT_SKIP_GETTING_ASSET=true npm i npm run build --if-present npm test npm run save-to-github ================================================ FILE: .github/dependabot.yml ================================================ # To get started with Dependabot version updates, you'll need to specify which # package ecosystems to update and where the package manifests are located. # Please see the documentation for all configuration options: # https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates version: 2 updates: - package-ecosystem: "npm" # See documentation for possible values directory: "/" # Location of package manifests schedule: interval: "weekly" - package-ecosystem: "github-actions" directory: "/" schedule: interval: "weekly" ================================================ FILE: .github/workflows/build.yml ================================================ name: Node.js builds on: push: tags: - v?[0-9]+.[0-9]+.[0-9]+.[0-9]+ - v?[0-9]+.[0-9]+.[0-9]+ - v?[0-9]+.[0-9]+ permissions: id-token: write contents: write attestations: write jobs: create-release: name: Create release runs-on: ubuntu-latest steps: - uses: actions/checkout@v6 - env: GH_TOKEN: ${{github.token}} run: | REF=${{github.ref}} TAG=${REF#"refs/tags/"} gh release create -t "Release ${TAG}" -n "" "${{github.ref}}" build: name: Node.js ${{matrix.node-version}} on ${{matrix.os}} needs: create-release runs-on: ${{matrix.os}} strategy: matrix: os: [macos-latest, windows-latest, macos-15-intel, windows-11-arm] node-version: [20, 22, 24, 25] steps: - uses: actions/checkout@v6 with: submodules: true - name: Setup Node.js ${{matrix.node-version}} uses: actions/setup-node@v6 with: node-version: ${{matrix.node-version}} - name: Install the package and run tests env: DEVELOPMENT_SKIP_GETTING_ASSET: true run: | npm i npm run build --if-present npm test - name: Save to GitHub env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} run: npm run save-to-github - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-node-20: name: Node.js 20 on Bullseye needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-20/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-node-22: name: Node.js 22 on Bullseye needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-22/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-alpine-node-20: name: Node.js 20 on Alpine needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-20/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-alpine-node-22: name: Node.js 22 on Alpine needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-22/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-node-20: name: Node.js 20 on Bullseye ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-20/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-node-22: name: Node.js 22 on Bullseye ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-22/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-alpine-node-20: name: Node.js 20 on Alpine ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-20/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-alpine-node-22: name: Node.js 22 on Alpine ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-22/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-node-24: name: Node.js 24 on Bullseye needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-24/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-alpine-node-24: name: Node.js 24 on Alpine needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-24/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-node-24: name: Node.js 24 on Bullseye ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-24/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-alpine-node-24: name: Node.js 24 on Alpine ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-24/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-node-25: name: Node.js 25 on Trixie needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-25/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-alpine-node-25: name: Node.js 25 on Alpine needs: create-release runs-on: ubuntu-latest continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-25/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-node-25: name: Node.js 25 on Trixie ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-node-25/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' build-linux-arm64-alpine-node-25: name: Node.js 25 on Alpine ARM64 needs: create-release runs-on: ubuntu-24.04-arm continue-on-error: true steps: - uses: actions/checkout@v6 with: submodules: true - name: Install, test, and create artifact uses: ./.github/actions/linux-alpine-node-25/ env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} - name: Attest if: env.CREATED_ASSET_NAME != '' uses: actions/attest-build-provenance@v4 with: subject-name: '${{ env.CREATED_ASSET_NAME }}' subject-path: '${{ github.workspace }}/build/Release/re2.node' ================================================ FILE: .github/workflows/tests.yml ================================================ name: Node.js CI on: push: branches: ['*'] pull_request: branches: [master] jobs: tests: name: Node.js ${{matrix.node-version}} on ${{matrix.os}} permissions: contents: read runs-on: ${{matrix.os}} strategy: matrix: os: [ubuntu-latest, macOS-latest, windows-latest] node-version: [20, 22, 24, 25] steps: - uses: actions/checkout@v6 with: submodules: true - name: Setup Node.js ${{matrix.node-version}} uses: actions/setup-node@v6 with: node-version: ${{matrix.node-version}} - name: Install the package and run tests env: DEVELOPMENT_SKIP_GETTING_ASSET: true run: | npm i npm run build --if-present npm test ================================================ FILE: .gitignore ================================================ node_modules/ build/ report/ coverage/ .AppleDouble /.development /.developmentx /.xdevelopment /scripts/save-local.sh ================================================ FILE: .gitmodules ================================================ [submodule "vendor/re2"] path = vendor/re2 url = https://github.com/google/re2 [submodule "vendor/abseil-cpp"] path = vendor/abseil-cpp url = https://github.com/abseil/abseil-cpp [submodule "wiki"] path = wiki url = git@github.com:uhop/node-re2.wiki.git ================================================ FILE: .prettierignore ================================================ /.windsurf/workflows ================================================ FILE: .prettierrc ================================================ { "printWidth": 80, "singleQuote": true, "bracketSpacing": false, "arrowParens": "avoid", "trailingComma": "none" } ================================================ FILE: .vscode/c_cpp_properties.json ================================================ { "configurations": [ { "name": "Mac", "includePath": [ "${workspaceFolder}/**", "/${env.NVM_INC}/**" ], "defines": [], "macFrameworkPath": [ "/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/System/Library/Frameworks" ], "compilerPath": "/usr/bin/clang", "cStandard": "c17", "cppStandard": "c++17", "intelliSenseMode": "macos-clang-arm64" } ], "version": 4 } ================================================ FILE: .vscode/launch.json ================================================ { // Use IntelliSense to learn about possible attributes. // Hover to view descriptions of existing attributes. // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 "version": "0.2.0", "configurations": [ { "type": "lldb", "request": "launch", "name": "Debug tests", "preLaunchTask": "npm: build:dev", "program": "${env:NVM_BIN}/node", "args": ["${workspaceFolder}/tests/tests.js"], "cwd": "${workspaceFolder}" } ] } ================================================ FILE: .vscode/settings.json ================================================ { "cSpell.words": [ "heya", "PCRE", "replacee", "Submatch" ] } ================================================ FILE: .vscode/tasks.json ================================================ { "version": "2.0.0", "tasks": [ { "type": "npm", "script": "build:dev", "group": "build", "problemMatcher": [], "label": "npm: build:dev", "detail": "node-gyp -j max build --debug" } ] } ================================================ FILE: .windsurf/skills/docs-review/SKILL.md ================================================ --- name: docs-review description: Review and improve English in documentation files for brevity and clarity. Use when asked to review docs, improve documentation writing, or edit prose for clarity. --- # Review Documentation for node-re2 Review and improve English in documentation files for brevity, clarity, and correctness. ## Steps 1. Read the target documentation file(s). 2. Check for: - Grammatical errors and awkward phrasing. - Verbose or redundant sentences — prefer concise, direct language. - Consistency with existing project terminology (RE2, RegExp, Buffer, nan, node-gyp, etc.). - Correct code examples that match the current API. - Accurate links (wiki, npm, GitHub). 3. Make edits directly in the file: - Preserve the existing structure and headings. - Do not add or remove comments in code examples unless explicitly asked. - Keep technical accuracy — do not change meaning. 4. If reviewing `README.md`, cross-check API descriptions against `re2.d.ts`. 5. If reviewing `llms.txt` or `llms-full.txt`, ensure examples are runnable and API signatures match `re2.d.ts`. 6. Report a summary of changes made. ## Style guidelines - Use active voice. - Prefer short sentences. - Use "RE2" (not "re2" or "Re2") when referring to the engine or the JS object. - Use backticks for code references: `RE2`, `Buffer`, `exec()`, etc. - Use "e.g." and "i.e." sparingly — prefer "for example" and "that is" in longer prose. - American English spelling. ================================================ FILE: .windsurf/skills/write-tests/SKILL.md ================================================ --- name: write-tests description: Write or update tape-six tests for a module or feature. Use when asked to write tests, add test coverage, or create typing tests for node-re2. --- # Write Tests for node-re2 Write or update tests using the tape-six testing library. ## Steps 1. Read `node_modules/tape-six/TESTING.md` for the full tape-six API reference (assertions, hooks, patterns, configuration). 2. Identify the module or feature to test. Read its source code to understand the public API. 3. Check existing tests in `tests/` for node-re2 conventions and patterns. 4. Create or update the test file in `tests/`: - For runtime tests use `.mjs`. - Import RE2 with: `import {RE2} from '../re2.js';` - Import tape-six with: `import test from 'tape-six';` - Test with both **string** and **Buffer** inputs — Buffer support is a first-class feature. - Test edge cases: empty strings, no match, global flag behavior, lastIndex, Unicode input. 5. For TypeScript typing tests, update `ts-tests/test-types.ts`: - Verify typed usage patterns compile correctly. // turbo 6. Run the new test file directly to verify: `node tests/test-.mjs` // turbo 7. Run the full test suite to check for regressions: `npm test` - If debugging, use `npm run test:seq` (runs sequentially, easier to trace issues). 8. Report results and any failures. ## node-re2 test conventions - Test file naming: `test-*.mjs` in `tests/`. - TypeScript typing tests: `test-*.ts` in `ts-tests/`. - Runtime tests (`.mjs`): ESM imports, `import test from 'tape-six'`. - Tests are configured in `package.json` under the `"tape6"` section. - Test files should be directly executable: `node tests/test-foo.mjs`. - Existing tests use synchronous `t => { ... }` style (not async/promise-based). - Always test both string and Buffer variants of methods. - Use `t.ok()`, `t.equal()`, `t.deepEqual()`, `t.fail()` for assertions. - Use try/catch blocks to test error conditions (e.g., invalid patterns throwing `SyntaxError`). ================================================ FILE: .windsurf/workflows/add-module.md ================================================ --- description: Checklist for adding a new C++ method or JS feature to node-re2 --- # Add a New Module Follow these steps when adding a new method, feature, or C++ implementation. ## New C++ method (e.g., `lib/foo.cc`) 1. Create `lib/foo.cc` with the implementation. - Use nan for the Node.js addon API. - Follow existing patterns in `lib/exec.cc` or `lib/test.cc`. - Tabs for indentation, 4-wide. - Include `lib/wrapped_re2.h` and `lib/util.h` as needed. 2. Register the method in `lib/addon.cc`: - Add `Nan::SetPrototypeMethod(tpl, "foo", Foo);` or equivalent. 3. Add the method to `lib/wrapped_re2.h` if it needs a static declaration. 4. Add the source file to `binding.gyp` in the `"sources"` array. // turbo 5. Rebuild the addon: `npm run rebuild` 6. Update `re2.js` if JS-side setup is needed (e.g., Symbol aliases). 7. Update `re2.d.ts` with TypeScript declarations for the new method. - Keep `re2.js` and `re2.d.ts` in sync. 8. Create `tests/test-foo.mjs` with automated tests (tape-six, ESM): - `import {RE2} from '../re2.js';` - Test with strings and Buffers. - Test edge cases (empty input, no match, global flag, etc.). // turbo 9. Run the new test: `node tests/test-foo.mjs` 10. Update TypeScript tests in `ts-tests/test-types.ts` if the public API changed. 11. Update `README.md` with documentation for the new feature. 12. Update `ARCHITECTURE.md` — add to project layout and C++ addon table. 13. Update `llms.txt` and `llms-full.txt` with a description and examples. 14. Update `AGENTS.md` if the architecture quick reference needs updating. // turbo 15. Verify: `npm test` // turbo 16. Verify: `npm run ts-check` // turbo 17. Verify: `npm run lint` ## JS-only feature (e.g., new Symbol alias, helper) 1. Add the implementation to `re2.js`. 2. Update `re2.d.ts` with TypeScript declarations. 3. Create or update tests in `tests/`. // turbo 4. Run the new test: `node tests/test-.mjs` 5. Update `README.md`, `llms.txt`, `llms-full.txt`. 6. Update `AGENTS.md` and `ARCHITECTURE.md` if needed. // turbo 7. Verify: `npm test` // turbo 8. Verify: `npm run ts-check` // turbo 9. Verify: `npm run lint` ================================================ FILE: .windsurf/workflows/ai-docs-update.md ================================================ --- description: Update AI-facing documentation files after API or architecture changes --- # AI Documentation Update Update all AI-facing files after changes to the public API, modules, or project structure. ## Steps 1. Read `re2.js` and `re2.d.ts` to identify the current public API. 2. Read `AGENTS.md` and `ARCHITECTURE.md` for current state. 3. Identify what changed (new methods, new flags, new C++ files, renamed exports, removed features, etc.). 4. Update `llms.txt`: - Ensure the API section matches `re2.d.ts`. - Update common patterns if new features were added. - Keep it concise — this is for quick LLM consumption. 5. Update `llms-full.txt`: - Full API reference with all methods, options, and examples. - Include any new features, RE2.Set changes, or Buffer behavior. 6. Update `ARCHITECTURE.md` if project structure or module dependencies changed. 7. Update `AGENTS.md` if critical rules, commands, or architecture quick reference changed. 8. Sync `.windsurfrules`, `.cursorrules`, `.clinerules` if `AGENTS.md` changed: - These three files should be identical copies of the condensed rules. 9. Update `README.md` if the public-facing docs need to reflect new features. 10. Track progress with the todo list and provide a summary when done. ================================================ FILE: .windsurf/workflows/release-check.md ================================================ --- description: Pre-release verification checklist for node-re2 --- # Release Check Run through this checklist before publishing a new version. ## Steps 1. Check that `re2.js` and `re2.d.ts` are in sync (all exports, all types). 2. Check that `ARCHITECTURE.md` reflects any structural changes. 3. Check that `AGENTS.md` is up to date with any rule or workflow changes. 4. Check that `.windsurfrules`, `.clinerules`, `.cursorrules` are in sync with `AGENTS.md`. 5. Check that `llms.txt` and `llms-full.txt` are up to date with any API changes. 6. Verify `package.json`: - `files` array includes all necessary entries (`binding.gyp`, `lib`, `re2.d.ts`, `scripts/*.js`, `vendor`). - `main` points to `re2.js`. - `types` points to `re2.d.ts`. 7. Check that the copyright year in `LICENSE` includes the current year. 8. Bump `version` in `package.json`. 9. Update release history in `README.md`. 10. Run `npm install` to regenerate `package-lock.json`. // turbo 11. Rebuild the native addon: `npm run rebuild` // turbo 12. Run the quick smoke test: `npm run verify-build` // turbo 13. Run the full test suite: `npm test` // turbo 14. Run TypeScript check: `npm run ts-check` // turbo 15. Run lint: `npm run lint` // turbo 16. Dry-run publish to verify package contents: `npm pack --dry-run` ================================================ FILE: .windsurfrules ================================================ # node-re2 — AI Agent Rules ## Project identity node-re2 provides Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`. ## Critical rules - **CommonJS.** The project is `"type": "commonjs"`. Use `require()` in source, `import` in tests (`.mjs`). - **No transpilation.** JavaScript code runs directly. - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules. - **Do not modify or delete test expectations** without understanding why they changed. - **Do not add comments or remove comments** unless explicitly asked. - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`. - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64). - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly. - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input. ## Code style - C++ code: tabs, 4-wide indentation. JavaScript: 2-space indentation. - Prettier: 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid" (see `.prettierrc`). - nan (Native Abstractions for Node.js) for the C++ addon API. - Semicolons are enforced by Prettier (default `semi: true`). ## Architecture quick reference - `re2.js` is the main entry point. Loads `build/Release/re2.node`, sets up Symbol aliases (`Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, `Symbol.matchAll`). - C++ addon (`lib/*.cc`) wraps Google's RE2 via nan. Each RegExp method has its own `.cc` file. - `lib/new.cc` handles construction: parse pattern/flags, translate RegExp → RE2 syntax (via `lib/pattern.cc`). - `lib/pattern.cc` translates Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`). - `lib/set.cc` implements `RE2.Set` for multi-pattern matching. - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion and buffer helpers. - Prebuilt artifacts downloaded at install time via `install-artifact-from-github`. ## Verification commands - `npm test` — run the full test suite (worker threads) - `node tests/test-.mjs` — run a single test file directly - `npm run test:seq` — run sequentially - `npm run test:proc` — run multi-process - `npm run ts-check` — TypeScript type checking - `npm run lint` — Prettier check - `npm run lint:fix` — Prettier write - `npm run verify-build` — quick smoke test - `npm run rebuild` — rebuild the native addon (release) - `npm run rebuild:dev` — rebuild the native addon (debug) ## File layout - Entry point: `re2.js` + `re2.d.ts` - C++ addon: `lib/*.cc`, `lib/*.h` - Build config: `binding.gyp` - Tests: `tests/test-*.mjs` - TypeScript tests: `ts-tests/test-*.ts` - Benchmarks: `bench/` - Vendored deps: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) - CI: `.github/workflows/`, `.github/actions/` ## When reading the codebase - Start with `ARCHITECTURE.md` for the module map and dependency graph. - `re2.d.ts` is the best API reference for the public API. It includes `internalSource` and Buffer overloads. - `re2.js` is tiny — read it first for the JS-side setup. - `lib/addon.cc` shows how all C++ methods are registered. - `lib/wrapped_re2.h` defines the core C++ class. ================================================ FILE: AGENTS.md ================================================ # AGENTS.md — node-re2 > `node-re2` provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`. For project structure, module dependencies, and the architecture overview see [ARCHITECTURE.md](./ARCHITECTURE.md). For detailed usage docs see the [README](./README.md) and the [wiki](https://github.com/uhop/node-re2/wiki). ## Setup This project uses git submodules for vendored dependencies (RE2 and Abseil): ```bash git clone --recursive git@github.com:uhop/node-re2.git cd node-re2 npm install ``` If the native addon fails to download a prebuilt artifact, it builds locally via `node-gyp`. ## Commands - **Install:** `npm install` (downloads prebuilt artifact or builds from source) - **Build (release):** `npm run rebuild` (or `node-gyp -j max rebuild`) - **Build (debug):** `npm run rebuild:dev` (or `node-gyp -j max rebuild --debug`) - **Test:** `npm test` (runs `tape6 --flags FO`, worker threads) - **Test (sequential):** `npm run test:seq` - **Test (multi-process):** `npm run test:proc` - **Test (single file):** `node tests/test-.mjs` - **TypeScript check:** `npm run ts-check` - **Lint:** `npm run lint` (Prettier check) - **Lint fix:** `npm run lint:fix` (Prettier write) - **Verify build:** `npm run verify-build` ## Project structure ``` node-re2/ ├── package.json # Package config; "tape6" section configures test discovery ├── binding.gyp # node-gyp build configuration for the C++ addon ├── re2.js # Main entry point: loads native addon, sets up Symbol aliases ├── re2.d.ts # TypeScript declarations for the public API ├── tsconfig.json # TypeScript config (noEmit, strict, types: ["node"]) ├── lib/ # C++ source code (native addon) │ ├── addon.cc # Node.js addon initialization, method registration │ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper) │ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper) │ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state │ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance │ ├── exec.cc # RE2.prototype.exec() implementation │ ├── test.cc # RE2.prototype.test() implementation │ ├── match.cc # RE2.prototype.match() implementation │ ├── replace.cc # RE2.prototype.replace() implementation │ ├── search.cc # RE2.prototype.search() implementation │ ├── split.cc # RE2.prototype.split() implementation │ ├── to_string.cc # RE2.prototype.toString() implementation │ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.) │ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes) │ ├── set.cc # RE2.Set implementation (multi-pattern matching) │ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers) │ ├── util.h # Utility declarations │ └── pattern.h # Pattern translation declarations ├── scripts/ │ └── verify-build.js # Quick smoke test for the built addon ├── tests/ # Test files (test-*.mjs using tape-six) ├── ts-tests/ # TypeScript type-checking tests │ └── test-types.ts # Verifies type declarations compile correctly ├── bench/ # Benchmarks ├── vendor/ # Vendored C++ dependencies (git submodules) │ ├── re2/ # Google RE2 library source │ └── abseil-cpp/ # Abseil C++ library (RE2 dependency) └── .github/ # CI workflows, Dependabot config, actions ``` ## Code style - **CommonJS** throughout (`"type": "commonjs"` in package.json). - **No transpilation** — JavaScript code runs directly. - **C++ code** uses tabs for indentation, 4-wide. JavaScript uses 2-space indentation. - **Prettier** for JS/TS formatting (see `.prettierrc`): 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid". - **nan** (Native Abstractions for Node.js) for the C++ addon API. - Semicolons are enforced by Prettier (default `semi: true`). - Imports use `require()` syntax in source, `import` in tests (`.mjs`). ## Critical rules - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules. - **Do not modify or delete test expectations** without understanding why they changed. - **Do not add comments or remove comments** unless explicitly asked. - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`. - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64). - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly. - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input. ## Architecture - `re2.js` is the main entry point. It loads the native C++ addon from `build/Release/re2.node` and sets up `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` on the prototype. - The C++ addon (`lib/*.cc`) wraps Google's RE2 library via nan. Each RegExp method has its own `.cc` file. - `lib/new.cc` handles construction: parsing patterns, translating RegExp syntax to RE2 syntax (via `lib/pattern.cc`), and creating the underlying `re2::RE2` instance. - `lib/pattern.cc` translates JavaScript RegExp features to RE2 equivalents, including Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`). - `lib/set.cc` implements `RE2.Set` for multi-pattern matching using `re2::RE2::Set`. - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion helpers and buffer utilities. - Prebuilt native artifacts are hosted on GitHub Releases and downloaded at install time via `install-artifact-from-github`. ## Writing tests ```js import test from 'tape-six'; import {RE2} from '../re2.js'; test('example', t => { const re = new RE2('a(b*)', 'i'); const result = re.exec('aBbC'); t.ok(result); t.equal(result[0], 'aBb'); t.equal(result[1], 'Bb'); }); ``` - Test files use `tape-six`: `.mjs` for runtime tests, `.ts` for TypeScript typing tests. - Test file naming convention: `test-*.mjs` in `tests/`, `test-*.ts` in `ts-tests/`. - Tests are configured in `package.json` under the `"tape6"` section. - Test files should be directly executable: `node tests/test-foo.mjs`. ## Key conventions - The library is a drop-in replacement for `RegExp` — the `RE2` object emulates the standard `RegExp` API. - `RE2.Set` provides multi-pattern matching: `new RE2.Set(patterns, flags, options)`. - Static helpers: `RE2.getUtf8Length(str)`, `RE2.getUtf16Length(buf)`. - `RE2.unicodeWarningLevel` controls behavior when non-Unicode regexps are created. - The `install` script tries to download a prebuilt `.node` artifact before falling back to `node-gyp rebuild`. - All C++ source is in `lib/`, all vendored third-party C++ is in `vendor/`. ================================================ FILE: ARCHITECTURE.md ================================================ # Architecture `node-re2` provides Node.js bindings for Google's [RE2](https://github.com/google/re2) regular expression engine. It is a C++ native addon built with `node-gyp` and `nan`. The `RE2` object is a drop-in replacement for `RegExp` with guaranteed linear-time matching (no ReDoS). ## Project layout ``` package.json # Package config; "tape6" section configures test discovery binding.gyp # node-gyp build configuration for the C++ addon re2.js # Main entry point: loads native addon, sets up Symbol aliases re2.d.ts # TypeScript declarations for the public API tsconfig.json # TypeScript config (noEmit, strict, types: ["node"]) lib/ # C++ source code (native addon) ├── addon.cc # Node.js addon initialization, method registration ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper) ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper) ├── isolate_data.h # Per-isolate data struct for thread-safe addon state ├── new.cc # Constructor: parse pattern/flags, create RE2 instance ├── exec.cc # RE2.prototype.exec() implementation ├── test.cc # RE2.prototype.test() implementation ├── match.cc # RE2.prototype.match() implementation ├── replace.cc # RE2.prototype.replace() implementation ├── search.cc # RE2.prototype.search() implementation ├── split.cc # RE2.prototype.split() implementation ├── to_string.cc # RE2.prototype.toString() implementation ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.) ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes) ├── pattern.h # Pattern translation declarations ├── set.cc # RE2.Set implementation (multi-pattern matching) ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers) └── util.h # Utility declarations scripts/ └── verify-build.js # Quick smoke test for the built addon tests/ # Test files (test-*.mjs using tape-six) ts-tests/ # TypeScript type-checking tests └── test-types.ts # Verifies type declarations compile correctly bench/ # Benchmarks vendor/ # Vendored C++ dependencies (git submodules) — DO NOT MODIFY ├── re2/ # Google RE2 library source └── abseil-cpp/ # Abseil C++ library (RE2 dependency) .github/ # CI workflows, Dependabot config, actions ``` ## Core concepts ### How the addon works 1. `re2.js` is the entry point. It loads the compiled C++ addon from `build/Release/re2.node`. 2. The addon exposes an `RE2` constructor that wraps `re2::RE2` from Google's RE2 library. 3. `re2.js` adds `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` to the prototype so `RE2` instances work with ES6 string methods. 4. The `RE2` constructor can be called with or without `new` (factory mode). ### C++ addon structure Each RegExp method has its own `.cc` file for maintainability: | File | Purpose | | --------------- | ---------------------------------------------------------------- | | `addon.cc` | Node.js module initialization, registers all methods/accessors | | `isolate_data.h` | Per-isolate data struct (`AddonData`) for thread-safe addon state | | `wrapped_re2.h` | `WrappedRE2` class: holds `re2::RE2*`, flags, lastIndex, source | | `new.cc` | Constructor: parses pattern + flags, translates syntax, creates RE2 instance | | `exec.cc` | `exec()` — find match with capture groups | | `test.cc` | `test()` — boolean match check | | `match.cc` | `match()` — String.prototype.match equivalent | | `replace.cc` | `replace()` — substitution with string or function replacer | | `search.cc` | `search()` — find index of first match | | `split.cc` | `split()` — split string by pattern | | `to_string.cc` | `toString()` — `/pattern/flags` representation | | `accessors.cc` | Property getters: `source`, `flags`, `lastIndex`, `global`, `ignoreCase`, `multiline`, `dotAll`, `unicode`, `sticky`, `hasIndices`, `internalSource` | | `pattern.cc` | Translates JS RegExp syntax to RE2 syntax, maps Unicode property names | | `set.cc` | `RE2.Set` — multi-pattern matching via `re2::RE2::Set` | | `util.cc` | UTF-8 ↔ UTF-16 conversion, buffer/string helpers | ### Pattern translation (pattern.cc) JavaScript RegExp features are translated to RE2 equivalents: - Named groups: `(?...)` syntax is preserved (RE2 supports it natively). - Unicode classes: long names like `\p{Letter}` are mapped to short names `\p{L}`. Script names like `\p{Script=Latin}` are mapped to `\p{Latin}`. - Backreferences and lookahead assertions are **not supported** — RE2 throws `SyntaxError`. ### Buffer support All methods accept both strings and Node.js Buffers: - Buffer inputs are assumed UTF-8 encoded. - Buffer inputs produce Buffer outputs (in composite result objects too). - Offsets and lengths are in bytes (not characters) when using Buffers. - The `useBuffers` property on replacer functions controls offset reporting in `replace()`. ### RE2.Set (set.cc) Multi-pattern matching using `re2::RE2::Set`: - `new RE2.Set(patterns, flags?, options?)` — compile multiple patterns into a single automaton. - `set.test(str)` — returns `true` if any pattern matches. - `set.match(str)` — returns array of indices of matching patterns. - Properties: `size`, `source`, `sources`, `flags`, `anchor`. ### Build system - `binding.gyp` defines the node-gyp build: compiles all `.cc` files in `lib/` plus vendored RE2 and Abseil sources. - Platform-specific compiler flags are set for GCC, Clang, and MSVC. - The `install` npm script first tries to download a prebuilt `re2.node` from GitHub Releases via `install-artifact-from-github`, falling back to a local `node-gyp rebuild`. - Prebuilt artifacts cover: Linux (x64, arm64, Alpine/musl), macOS (x64, arm64), Windows (x64, arm64). ## Module dependency graph ``` re2.js ──→ build/Release/re2.node (compiled C++ addon) │ ├── lib/addon.cc (init) │ ├── lib/new.cc ──→ lib/pattern.cc │ ├── lib/exec.cc │ ├── lib/test.cc │ ├── lib/match.cc │ ├── lib/replace.cc │ ├── lib/search.cc │ ├── lib/split.cc │ ├── lib/to_string.cc │ ├── lib/accessors.cc │ └── lib/set.cc │ ├── lib/wrapped_re2.h (shared class definition) ├── lib/wrapped_re2_set.h (RE2.Set class) ├── lib/util.cc / lib/util.h (shared utilities) │ └── vendor/ (re2 + abseil-cpp) ``` ## Testing - **Framework**: tape-six (`tape6`) - **Run all**: `npm test` (worker threads via `tape6 --flags FO`) - **Run sequential**: `npm run test:seq` - **Run multi-process**: `npm run test:proc` - **Run single file**: `node tests/test-.mjs` - **TypeScript check**: `npm run ts-check` - **Lint**: `npm run lint` (Prettier check) - **Lint fix**: `npm run lint:fix` (Prettier write) - **Verify build**: `npm run verify-build` (quick smoke test) ## Import paths ```js // CommonJS (source, scripts) const RE2 = require('re2'); // ESM (tests) import {RE2} from '../re2.js'; ``` ================================================ FILE: CLAUDE.md ================================================ See [AGENTS.md](./AGENTS.md) for all AI agent rules and project conventions. ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing to node-re2 Thank you for your interest in contributing! ## Getting started This project uses git submodules for vendored dependencies (RE2 and Abseil). Clone recursively: ```bash git clone --recursive git@github.com:uhop/node-re2.git cd node-re2 npm install ``` See [ARCHITECTURE.md](./ARCHITECTURE.md) for the module map and dependency graph. ## Development workflow 1. Make your changes. 2. Rebuild the addon: `npm run rebuild` 3. Lint: `npm run lint:fix` 4. Test: `npm test` 5. Type-check: `npm run ts-check` ## Code style - CommonJS (`require()`/`module.exports`) in JavaScript source, ESM (`import`) in tests (`.mjs`). - C++ code uses tabs (4-wide indentation). JavaScript uses 2-space indentation. - Formatted with Prettier — see `.prettierrc` for settings. - C++ addon API uses nan (Native Abstractions for Node.js). - Keep `re2.js` and `re2.d.ts` in sync. ## Important notes - Never edit files under `vendor/` — they are git submodules. - RE2 always operates in Unicode mode — the `u` flag is added implicitly. - Buffer support is a first-class feature — all methods must handle both strings and Buffers. ## AI agents If you are an AI coding agent, see [AGENTS.md](./AGENTS.md) for detailed project conventions, commands, and architecture. ================================================ FILE: LICENSE ================================================ This library is available under the terms of the modified BSD license. No external contributions are allowed under licenses which are fundamentally incompatible with the BSD license that this library is distributed under. The text of the BSD license is reproduced below. ------------------------------------------------------------------------------- The "New" BSD License: ********************** Copyright (c) 2005-2026, Eugene Lazutkin All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Eugene Lazutkin nor the names of other contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================ FILE: README.md ================================================ # node-re2 [![NPM version][npm-img]][npm-url] [npm-img]: https://img.shields.io/npm/v/re2.svg [npm-url]: https://npmjs.org/package/re2 This project provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines written by [Russ Cox](http://swtch.com/~rsc/) in C++. To learn more about RE2, start with [Regular Expression Matching in the Wild](http://swtch.com/~rsc/regexp/regexp3.html). More resources are on his [Implementing Regular Expressions](http://swtch.com/~rsc/regexp/) page. `RE2`'s regular expression language is almost a superset of what `RegExp` provides (see [Syntax](https://github.com/google/re2/wiki/Syntax)), but it lacks backreferences and lookahead assertions. See below for details. `RE2` always works in [Unicode mode](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) — character codes are interpreted as Unicode code points, not as binary values of UTF-16. See `RE2.unicodeWarningLevel` below for details. `RE2` emulates standard `RegExp`, making it a practical drop-in replacement in most cases. It also provides `String`-based regular expression methods. The constructor accepts `RegExp` directly, honoring all properties. It can work with [Node.js Buffers](https://nodejs.org/api/buffer.html) directly, reducing overhead and making processing of long files fast. The project is a C++ addon built with [nan](https://github.com/nodejs/nan). It cannot be used in web browsers. All documentation is in this README and in the [wiki](https://github.com/uhop/node-re2/wiki). ## Why use node-re2? The built-in Node.js regular expression engine can run in exponential time with a special combination: - A vulnerable regular expression - "Evil input" This can lead to what is known as a [Regular Expression Denial of Service (ReDoS)](https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS). To check if your regular expressions are vulnerable, try one of these projects: - [rxxr2](http://www.cs.bham.ac.uk/~hxt/research/rxxr2/) - [safe-regex](https://github.com/substack/safe-regex) Neither project is perfect. node-re2 protects against ReDoS by evaluating patterns in `RE2` instead of the built-in regex engine. To run the bundled benchmark (make sure node-re2 is built first): ```bash npx nano-bench bench/bad-pattern.mjs ``` ## Standard features `RE2` objects are created just like `RegExp`: * [`new RE2(pattern[, flags])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) Supported flags: `g` (global), `i` (ignoreCase), `m` (multiline), `s` (dotAll), `u` (unicode, always on), `y` (sticky), `d` (hasIndices). Supported properties: * [`re2.lastIndex`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex) * [`re2.global`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/global) * [`re2.ignoreCase`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/ignoreCase) * [`re2.multiline`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/multiline) * [`re2.dotAll`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll) * [`re2.unicode`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) — always `true`; see details below. * [`re2.sticky`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky) * [`re2.hasIndices`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices) * [`re2.source`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source) * [`re2.flags`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/flags) Supported methods: * [`re2.exec(str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec) * [`re2.test(str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test) * [`re2.toString()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/toString) Well-known symbol-based methods are supported (see [Symbols](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol)): * [`re2[Symbol.match](str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/match) * [`re2[Symbol.matchAll](str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/matchAll) * [`re2[Symbol.search](str)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/search) * [`re2[Symbol.replace](str, newSubStr|function)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/replace) * [`re2[Symbol.split](str[, limit])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/split) This lets you use `RE2` instances on strings directly, just like `RegExp`: ```js const re = new RE2('1'); '213'.match(re); // [ '1', index: 1, input: '213' ] '213'.search(re); // 1 '213'.replace(re, '+'); // 2+3 '213'.split(re); // [ '2', '3' ] Array.from('2131'.matchAll(new RE2('1', 'g'))); // matchAll requires the g flag // [['1', index: 1, input: '2131'], ['1', index: 3, input: '2131']] ``` [Named groups](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Named_capturing_group) are supported. ## Extensions ### Shortcut construction `RE2` can be created from a regular expression: ```js const re1 = new RE2(/ab*/ig); // from a RegExp object const re2 = new RE2(re1); // from another RE2 object ``` ### `String` methods `RE2` provides the standard `String` regex methods with swapped receiver and argument: * `re2.match(str)` * See [`str.match(regexp)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match) * `re2.replace(str, newSubStr|function)` * See [`str.replace(regexp, newSubStr|function)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) * `re2.search(str)` * See [`str.search(regexp)`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search) * `re2.split(str[, limit])` * See [`str.split(regexp[, limit])`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split) These methods are also available as well-known symbol-based methods for transparent use with ES6 string/regex machinery. ### `Buffer` support Most methods accept Buffers instead of strings for direct UTF-8 processing: * `re2.exec(buf)` * `re2.test(buf)` * `re2.match(buf)` * `re2.search(buf)` * `re2.split(buf[, limit])` * `re2.replace(buf, replacer)` Differences from string-based versions: * All buffers are assumed to be encoded as [UTF-8](https://en.wikipedia.org/wiki/UTF-8) (ASCII is a proper subset of UTF-8). * Results are `Buffer` objects, even in composite objects. Convert with [`buf.toString()`](https://nodejs.org/api/buffer.html#buffer_buf_tostring_encoding_start_end). * All offsets and lengths are in bytes, not characters (each UTF-8 character occupies 1–4 bytes). This lets you slice buffers directly without costly character-to-byte recalculations. When `re2.replace()` is used with a replacer function, the replacer receives string arguments and character offsets by default. Set `useBuffers` to `true` on the function to receive byte offsets instead: ```js function strReplacer(match, offset, input) { // typeof match == "string" return "<= " + offset + " characters|"; } RE2("б").replace("абв", strReplacer); // "а<= 1 characters|в" function bufReplacer(match, offset, input) { // typeof match == "string" return "<= " + offset + " bytes|"; } bufReplacer.useBuffers = true; RE2("б").replace("абв", bufReplacer); // "а<= 2 bytes|в" ``` This works for both string and buffer inputs. Buffer input produces buffer output; string input produces string output. ### `RE2.Set` Use `RE2.Set` when the same string must be tested against many patterns. It builds a single automaton and frequently beats running individual regular expressions one by one. While `test()` can be simulated by combining patterns with `|`, `match()` returns which patterns matched — something a single regular expression cannot do. * `new RE2.Set(patterns[, flagsOrOptions][, options])` * `patterns` is any iterable of strings, `Buffer`s, `RegExp`, or `RE2` instances; flags (if provided) apply to the whole set. * `flagsOrOptions` can be a string/`Buffer` with standard flags (`i`, `m`, `s`, `u`, `g`, `y`, `d`). * `options.anchor` can be `'unanchored'` (default), `'start'`, or `'both'`. * `set.test(str)` returns `true` if any pattern matches and `false` otherwise. * `set.match(str)` returns an array of indexes of matching patterns. * This is an array of integer indices of patterns that matched sorted in ascending order. * If no patterns matched, an empty array is returned. * Read-only properties: * `set.size` (number of patterns), `set.flags` (`RegExp` flags as a string), `set.anchor` (anchor mode as a string) * `set.source` (all patterns joined with `|` as a string), `set.sources` (individual pattern sources as an array of strings) It is based on [RE2::Set](https://github.com/google/re2/blob/main/re2/set.h). Example: ```js const routes = new RE2.Set([ '^/users/\\d+$', '^/posts/\\d+$' ], 'i', {anchor: 'start'}); routes.test('/users/7'); // true routes.match('/posts/42'); // [1] routes.sources; // ['^/users/\\d+$', '^/posts/\\d+$'] routes.toString(); // '/^/users/\\d+$|^/posts/\\d+$/iu' ``` To run the bundled benchmark (make sure node-re2 is built first): ```bash npx nano-bench bench/set-match.mjs ``` ### Calculate length Two helpers convert between UTF-8 and UTF-16 sizes: * `RE2.getUtf8Length(str)` — byte size needed to encode a string as a UTF-8 buffer. * `RE2.getUtf16Length(buf)` — character count needed to decode a UTF-8 buffer as a string. ### Property: `internalSource` `source` emulates the standard `RegExp` property and can recreate an identical `RE2` or `RegExp` instance. To inspect the RE2-translated pattern (useful for debugging), use the read-only `internalSource` property. ### Unicode warning level `RE2` always works in Unicode mode. In most cases this is either invisible or preferred. For applications that need tight control, the static property `RE2.unicodeWarningLevel` governs what happens when a non-Unicode regular expression is created. If a regular expression lacks the `u` flag, it is added silently by default: ```js const x = /./; x.flags; // '' const y = new RE2(x); y.flags; // 'u' ``` Values of `RE2.unicodeWarningLevel`: * `'nothing'` (default) — silently add `u`. * `'warnOnce'` — warn once, then silently add `u`. Assigning this value resets the one-time flag. * `'warn'` — warn every time, still add `u`. * `'throw'` — throw `SyntaxError`. * Any other value is silently ignored, leaving the previous value unchanged. Warnings and exceptions help audit an application for stray non-Unicode regular expressions. `RE2.unicodeWarningLevel` is global. Be careful in multi-threaded environments — it is shared across threads. ## How to install ```bash npm install re2 ``` The project works with other package managers but is not tested with them. See the wiki for notes on [yarn](https://github.com/uhop/node-re2/wiki/Using-with-yarn) and [pnpm](https://github.com/uhop/node-re2/wiki/Using-with-pnpm). ### Precompiled artifacts The [install script](https://github.com/uhop/install-artifact-from-github/blob/master/bin/install-from-cache.js) attempts to download a prebuilt artifact from GitHub Releases. Override the download location with the `RE2_DOWNLOAD_MIRROR` environment variable. If the download fails, the script builds RE2 locally using [node-gyp](https://github.com/nodejs/node-gyp). ## How to use It is used just like `RegExp`. ```js const RE2 = require('re2'); // with default flags let re = new RE2('a(b*)'); let result = re.exec('abbc'); console.log(result[0]); // 'abb' console.log(result[1]); // 'bb' result = re.exec('aBbC'); console.log(result[0]); // 'a' console.log(result[1]); // '' // with explicit flags re = new RE2('a(b*)', 'i'); result = re.exec('aBbC'); console.log(result[0]); // 'aBb' console.log(result[1]); // 'Bb' // from regular expression object const regexp = new RegExp('a(b*)', 'i'); re = new RE2(regexp); result = re.exec('aBbC'); console.log(result[0]); // 'aBb' console.log(result[1]); // 'Bb' // from regular expression literal re = new RE2(/a(b*)/i); result = re.exec('aBbC'); console.log(result[0]); // 'aBb' console.log(result[1]); // 'Bb' // from another RE2 object const rex = new RE2(re); result = rex.exec('aBbC'); console.log(result[0]); // 'aBb' console.log(result[1]); // 'Bb' // shortcut result = new RE2('ab*').exec('abba'); // factory result = RE2('ab*').exec('abba'); ``` ## Limitations (things RE2 does not support) `RE2` avoids any regular expression features that require worst-case exponential time to evaluate. The most notable missing features are backreferences and lookahead assertions. If your application uses them, you should continue to use `RegExp` — but since they are fundamentally vulnerable to [ReDoS](https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS), consider replacing them. `RE2` throws `SyntaxError` for unsupported features. Wrap `RE2` declarations in a try-catch to fall back to `RegExp`: ```js let re = /(a)+(b)*/; try { re = new RE2(re); // use RE2 as a drop-in replacement } catch (e) { // use the original RegExp } const result = re.exec(sample); ``` `RE2` may also behave differently from the built-in engine in corner cases. ### Backreferences `RE2` does not support backreferences — numbered references to previously matched groups (`\1`, `\2`, etc.). Example: ```js /(cat|dog)\1/.test("catcat"); // true /(cat|dog)\1/.test("dogdog"); // true /(cat|dog)\1/.test("catdog"); // false /(cat|dog)\1/.test("dogcat"); // false ``` ### Lookahead assertions `RE2` does not support lookahead assertions, which make a match depend on subsequent contents. ```js /abc(?=def)/; // match abc only if it is followed by def /abc(?!def)/; // match abc only if it is not followed by def ``` ### Mismatched behavior `RE2` and the built-in engine may disagree in edge cases. Verify your regular expressions before switching. They should work in the vast majority of cases. Example: ```js const RE2 = require('re2'); const pattern = '(?:(a)|(b)|(c))+'; const built_in = new RegExp(pattern); const re2 = new RE2(pattern); const input = 'abc'; const bi_res = built_in.exec(input); const re2_res = re2.exec(input); console.log('bi_res: ' + bi_res); // prints: bi_res: abc,,,c console.log('re2_res : ' + re2_res); // prints: re2_res : abc,a,b,c ``` ### Unicode `RE2` always works in Unicode mode. See `RE2.unicodeWarningLevel` above for details. #### Unicode classes `\p{...}` and `\P{...}` `RE2` supports a subset of Unicode classes as defined in [RE2 Syntax](https://github.com/google/re2/wiki/Syntax). Google RE2 natively supports only short names (e.g., `L` for `Letter`). Like `RegExp`, node-re2 also accepts long names by translating them to short names. Only the `\p{name}` form is supported, not `\p{name=value}` in general. The exception is `Script` and `sc`, e.g., `\p{Script=Latin}` and `\p{sc=Cyrillic}`. The same applies to `\P{...}`. ## Release history - 1.24.0 *Fixed multi-threaded crash in worker threads (#235). Added named import: `import {RE2} from 're2'`. Added CJS test. Updated docs and dependencies.* - 1.23.3 *Updated Abseil and dev dependencies.* - 1.23.2 *Updated dev dependencies.* - 1.23.1 *Updated Abseil and dev dependencies.* - 1.23.0 *Updated all dependencies, upgraded tooling. New feature: `RE2.Set` (thx, [Wes](https://github.com/wrmedford)).* - 1.22.3 *Technical release: upgraded QEMU emulations to native ARM runners to speed up the build process.* - 1.22.2 *Updated all dependencies and the list of pre-compiled targets: Node 20, 22, 24, 25 (thx, [Jiayu Liu](https://github.com/jimexist)).* - 1.22.1 *Added support for translation of scripts as Unicode classes.* - 1.22.0 *Added support for translation of Unicode classes (thx, [John Livingston](https://github.com/JohnXLivingston)). Added [attestations](https://github.com/uhop/node-re2/attestations).* - 1.21.5 *Updated all dependencies and the list of pre-compiled targets. Fixed minor bugs. C++ style fix (thx, [Benjamin Brienen](https://github.com/BenjaminBrienen)). Added Windows 11 ARM build runner (thx, [Kagami Sascha Rosylight](https://github.com/saschanaz)).* - 1.21.4 *Fixed a regression reported by [caroline-matsec](https://github.com/caroline-matsec), thx! Added pre-compilation targets for Alpine Linux on ARM. Updated deps.* - 1.21.3 *Fixed an empty string regression reported by [Rhys Arkins](https://github.com/rarkins), thx! Updated deps.* - 1.21.2 *Fixed another memory regression reported by [matthewvalentine](https://github.com/matthewvalentine), thx! Updated deps. Added more tests and benchmarks.* - 1.21.1 *Fixed a memory regression reported by [matthewvalentine](https://github.com/matthewvalentine), thx! Updated deps.* - 1.21.0 *Fixed the performance problem reported by [matthewvalentine](https://github.com/matthewvalentine) (thx!). The change improves performance for multiple use cases.* - 1.20.12 *Updated deps. Maintenance chores. Fixes for buffer-related bugs: `exec()` index (reported by [matthewvalentine](https://github.com/matthewvalentine), thx) and `match()` index.* - 1.20.11 *Updated deps. Added support for Node 22 (thx, [Elton Leong](https://github.com/eltonkl)).* - 1.20.10 *Updated deps. Removed files the pack used for development (thx, [Haruaki OTAKE](https://github.com/aaharu)). Added arm64 Linux prebilds (thx, [Christopher M](https://github.com/cmanou)). Fixed non-`npm` `corepack` problem (thx, [Steven](https://github.com/styfle)).* - 1.20.9 *Updated deps. Added more `absail-cpp` files that manifested itself on NixOS. Thx, [Laura Hausmann](https://github.com/zotanmew).* - 1.20.8 *Updated deps: `install-artifact-from-github`. A default HTTPS agent is used for fetching precompiled artifacts avoiding unnecessary long wait times.* - 1.20.7 *Added more `absail-cpp` files that manifested itself on ARM Alpine. Thx, [Laura Hausmann](https://github.com/zotanmew).* - 1.20.6 *Updated deps, notably `node-gyp`.* - 1.20.5 *Updated deps, added Node 21 and retired Node 16 as pre-compilation targets.* - 1.20.4 *Updated deps. Fix: the 2nd argument of the constructor overrides flags. Thx, [gost-serb](https://github.com/gost-serb).* - 1.20.3 *Fix: subsequent numbers are incorporated into group if they would form a legal group reference. Thx, [Oleksii Vasyliev](https://github.com/le0pard).* - 1.20.2 *Fix: added a missing C++ file, which caused a bug on Alpine Linux. Thx, [rbitanga-manticore](https://github.com/rbitanga-manticore).* - 1.20.1 *Fix: files included in the npm package to build the C++ code.* - 1.20.0 *Updated RE2. New version uses `abseil-cpp` and required the adaptation work. Thx, [Stefano Rivera](https://github.com/stefanor).* The rest can be consulted in the project's wiki [Release history](https://github.com/uhop/node-re2/wiki/Release-history). ## License BSD-3-Clause ================================================ FILE: bench/bad-pattern.mjs ================================================ import {RE2} from '../re2.js'; const BAD_PATTERN = '([a-z]+)+$'; const BAD_INPUT = 'a'.repeat(10) + '!'; const regExp = new RegExp(BAD_PATTERN); const re2 = new RE2(BAD_PATTERN); export default { RegExp: n => { let count = 0; for (let i = 0; i < n; ++i) { if (regExp.test(BAD_INPUT)) ++count; } return count; }, RE2: n => { let count = 0; for (let i = 0; i < n; ++i) { if (re2.test(BAD_INPUT)) ++count; } return count; } }; ================================================ FILE: bench/set-match.mjs ================================================ import {RE2} from '../re2.js'; const PATTERN_COUNT = 200; const patterns = []; for (let i = 0; i < PATTERN_COUNT; ++i) { patterns.push('token' + i + '(?:[a-z]+)?'); } const INPUT_COUNT = 500; const inputs = []; for (let j = 0; j < INPUT_COUNT; ++j) { inputs.push( 'xx' + (j % PATTERN_COUNT) + ' ' + (j & 7) + ' token' + (j % PATTERN_COUNT) + ' tail' ); } const re2Set = new RE2.Set(patterns); const re2List = patterns.map(p => new RE2(p)); const jsList = patterns.map(p => new RegExp(p)); export default { RegExp: n => { let count = 0; for (let i = 0; i < n; ++i) { for (const input of inputs) { const matches = []; for (const pattern of jsList) { if (pattern.test(input)) matches.push(pattern); } count += matches.length; } } return count; }, RE2: n => { let count = 0; for (let i = 0; i < n; ++i) { for (const input of inputs) { const matches = []; for (const pattern of re2List) { if (pattern.test(input)) matches.push(pattern); } count += matches.length; } } return count; }, 'RE2.Set': n => { let count = 0; for (let i = 0; i < n; ++i) { for (const input of inputs) { const matches = re2Set.match(input); count += matches.length; } } return count; } }; ================================================ FILE: binding.gyp ================================================ { "targets": [ { "target_name": "re2", "sources": [ "lib/addon.cc", "lib/accessors.cc", "lib/pattern.cc", "lib/util.cc", "lib/new.cc", "lib/exec.cc", "lib/test.cc", "lib/match.cc", "lib/replace.cc", "lib/search.cc", "lib/split.cc", "lib/to_string.cc", "lib/set.cc", "vendor/re2/re2/bitmap256.cc", "vendor/re2/re2/bitstate.cc", "vendor/re2/re2/compile.cc", "vendor/re2/re2/dfa.cc", "vendor/re2/re2/filtered_re2.cc", "vendor/re2/re2/mimics_pcre.cc", "vendor/re2/re2/nfa.cc", "vendor/re2/re2/onepass.cc", "vendor/re2/re2/parse.cc", "vendor/re2/re2/perl_groups.cc", "vendor/re2/re2/prefilter.cc", "vendor/re2/re2/prefilter_tree.cc", "vendor/re2/re2/prog.cc", "vendor/re2/re2/re2.cc", "vendor/re2/re2/regexp.cc", "vendor/re2/re2/set.cc", "vendor/re2/re2/simplify.cc", "vendor/re2/re2/tostring.cc", "vendor/re2/re2/unicode_casefold.cc", "vendor/re2/re2/unicode_groups.cc", "vendor/re2/util/pcre.cc", "vendor/re2/util/rune.cc", "vendor/re2/util/strutil.cc", "vendor/abseil-cpp/absl/base/internal/cycleclock.cc", "vendor/abseil-cpp/absl/base/internal/low_level_alloc.cc", "vendor/abseil-cpp/absl/base/internal/raw_logging.cc", "vendor/abseil-cpp/absl/base/internal/spinlock.cc", "vendor/abseil-cpp/absl/base/internal/spinlock_wait.cc", "vendor/abseil-cpp/absl/base/internal/strerror.cc", "vendor/abseil-cpp/absl/base/internal/sysinfo.cc", "vendor/abseil-cpp/absl/base/internal/thread_identity.cc", "vendor/abseil-cpp/absl/base/internal/throw_delegate.cc", "vendor/abseil-cpp/absl/base/internal/unscaledcycleclock.cc", "vendor/abseil-cpp/absl/container/internal/hashtablez_sampler.cc", "vendor/abseil-cpp/absl/container/internal/hashtablez_sampler_force_weak_definition.cc", "vendor/abseil-cpp/absl/container/internal/raw_hash_set.cc", "vendor/abseil-cpp/absl/debugging/internal/borrowed_fixup_buffer.cc", "vendor/abseil-cpp/absl/debugging/internal/decode_rust_punycode.cc", "vendor/abseil-cpp/absl/debugging/internal/demangle.cc", "vendor/abseil-cpp/absl/debugging/internal/demangle_rust.cc", "vendor/abseil-cpp/absl/debugging/internal/address_is_readable.cc", "vendor/abseil-cpp/absl/debugging/internal/elf_mem_image.cc", "vendor/abseil-cpp/absl/debugging/internal/examine_stack.cc", "vendor/abseil-cpp/absl/debugging/internal/utf8_for_code_point.cc", "vendor/abseil-cpp/absl/debugging/internal/vdso_support.cc", "vendor/abseil-cpp/absl/debugging/stacktrace.cc", "vendor/abseil-cpp/absl/debugging/symbolize.cc", "vendor/abseil-cpp/absl/flags/commandlineflag.cc", "vendor/abseil-cpp/absl/flags/internal/commandlineflag.cc", "vendor/abseil-cpp/absl/flags/internal/flag.cc", "vendor/abseil-cpp/absl/flags/internal/private_handle_accessor.cc", "vendor/abseil-cpp/absl/flags/internal/program_name.cc", "vendor/abseil-cpp/absl/flags/marshalling.cc", "vendor/abseil-cpp/absl/flags/reflection.cc", "vendor/abseil-cpp/absl/flags/usage_config.cc", "vendor/abseil-cpp/absl/hash/internal/city.cc", "vendor/abseil-cpp/absl/hash/internal/hash.cc", "vendor/abseil-cpp/absl/log/internal/globals.cc", "vendor/abseil-cpp/absl/log/internal/log_format.cc", "vendor/abseil-cpp/absl/log/internal/log_message.cc", "vendor/abseil-cpp/absl/log/internal/log_sink_set.cc", "vendor/abseil-cpp/absl/log/internal/nullguard.cc", "vendor/abseil-cpp/absl/log/internal/proto.cc", "vendor/abseil-cpp/absl/log/internal/structured_proto.cc", "vendor/abseil-cpp/absl/log/globals.cc", "vendor/abseil-cpp/absl/log/log_sink.cc", "vendor/abseil-cpp/absl/numeric/int128.cc", "vendor/abseil-cpp/absl/strings/ascii.cc", "vendor/abseil-cpp/absl/strings/charconv.cc", "vendor/abseil-cpp/absl/strings/internal/charconv_bigint.cc", "vendor/abseil-cpp/absl/strings/internal/charconv_parse.cc", "vendor/abseil-cpp/absl/strings/internal/memutil.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/arg.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/bind.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/extension.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/float_conversion.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/output.cc", "vendor/abseil-cpp/absl/strings/internal/str_format/parser.cc", "vendor/abseil-cpp/absl/strings/internal/utf8.cc", "vendor/abseil-cpp/absl/strings/match.cc", "vendor/abseil-cpp/absl/strings/numbers.cc", "vendor/abseil-cpp/absl/strings/str_cat.cc", "vendor/abseil-cpp/absl/strings/str_split.cc", "vendor/abseil-cpp/absl/synchronization/internal/create_thread_identity.cc", "vendor/abseil-cpp/absl/synchronization/internal/graphcycles.cc", "vendor/abseil-cpp/absl/synchronization/internal/futex_waiter.cc", "vendor/abseil-cpp/absl/synchronization/internal/kernel_timeout.cc", "vendor/abseil-cpp/absl/synchronization/internal/per_thread_sem.cc", "vendor/abseil-cpp/absl/synchronization/internal/waiter_base.cc", "vendor/abseil-cpp/absl/synchronization/mutex.cc", "vendor/abseil-cpp/absl/time/clock.cc", "vendor/abseil-cpp/absl/time/duration.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_fixed.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_if.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_impl.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_info.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_libc.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_lookup.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/time_zone_posix.cc", "vendor/abseil-cpp/absl/time/internal/cctz/src/zone_info_source.cc", "vendor/abseil-cpp/absl/time/time.cc", ], "cflags": [ "-std=c++2a", "-Wall", "-Wextra", "-Wno-sign-compare", "-Wno-unused-parameter", "-Wno-missing-field-initializers", "-Wno-cast-function-type", "-O3", "-g" ], "defines": [ "NDEBUG", "NOMINMAX" ], "include_dirs": [ " #include #include NAN_GETTER(WrappedRE2::GetSource) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().Set(Nan::New("(?:)").ToLocalChecked()); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(Nan::New(re2->source).ToLocalChecked()); } NAN_GETTER(WrappedRE2::GetInternalSource) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().Set(Nan::New("(?:)").ToLocalChecked()); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(Nan::New(re2->regexp.pattern()).ToLocalChecked()); } NAN_GETTER(WrappedRE2::GetFlags) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().Set(Nan::New("").ToLocalChecked()); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); std::string flags; if (re2->hasIndices) { flags += "d"; } if (re2->global) { flags += "g"; } if (re2->ignoreCase) { flags += "i"; } if (re2->multiline) { flags += "m"; } if (re2->dotAll) { flags += "s"; } flags += "u"; if (re2->sticky) { flags += "y"; } info.GetReturnValue().Set(Nan::New(flags).ToLocalChecked()); } NAN_GETTER(WrappedRE2::GetGlobal) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->global); } NAN_GETTER(WrappedRE2::GetIgnoreCase) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->ignoreCase); } NAN_GETTER(WrappedRE2::GetMultiline) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->multiline); } NAN_GETTER(WrappedRE2::GetDotAll) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->dotAll); } NAN_GETTER(WrappedRE2::GetUnicode) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } info.GetReturnValue().Set(true); } NAN_GETTER(WrappedRE2::GetSticky) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->sticky); } NAN_GETTER(WrappedRE2::GetHasIndices) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(re2->hasIndices); } NAN_GETTER(WrappedRE2::GetLastIndex) { if (!WrappedRE2::HasInstance(info.This())) { info.GetReturnValue().SetUndefined(); return; } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); info.GetReturnValue().Set(static_cast(re2->lastIndex)); } NAN_SETTER(WrappedRE2::SetLastIndex) { if (!WrappedRE2::HasInstance(info.This())) { return Nan::ThrowTypeError("Cannot set lastIndex of an invalid RE2 object."); } auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (value->IsNumber()) { int n = value->NumberValue(Nan::GetCurrentContext()).FromMaybe(0); re2->lastIndex = n <= 0 ? 0 : n; } } std::atomic WrappedRE2::unicodeWarningLevel{WrappedRE2::NOTHING}; NAN_GETTER(WrappedRE2::GetUnicodeWarningLevel) { std::string level; switch (unicodeWarningLevel) { case THROW: level = "throw"; break; case WARN: level = "warn"; break; case WARN_ONCE: level = "warnOnce"; break; default: level = "nothing"; break; } info.GetReturnValue().Set(Nan::New(level).ToLocalChecked()); } NAN_SETTER(WrappedRE2::SetUnicodeWarningLevel) { if (value->IsString()) { Nan::Utf8String s(value); if (!strcmp(*s, "throw")) { unicodeWarningLevel = THROW; return; } if (!strcmp(*s, "warn")) { unicodeWarningLevel = WARN; return; } if (!strcmp(*s, "warnOnce")) { unicodeWarningLevel = WARN_ONCE; alreadyWarnedAboutUnicode = false; return; } if (!strcmp(*s, "nothing")) { unicodeWarningLevel = NOTHING; return; } } } ================================================ FILE: lib/addon.cc ================================================ #include "./wrapped_re2.h" #include "./wrapped_re2_set.h" #include "./isolate_data.h" #include #include static std::mutex addonDataMutex; static std::unordered_map addonDataMap; AddonData *getAddonData(v8::Isolate *isolate) { std::lock_guard lock(addonDataMutex); auto it = addonDataMap.find(isolate); return it != addonDataMap.end() ? it->second : nullptr; } void setAddonData(v8::Isolate *isolate, AddonData *data) { std::lock_guard lock(addonDataMutex); addonDataMap[isolate] = data; } void deleteAddonData(v8::Isolate *isolate) { std::lock_guard lock(addonDataMutex); auto it = addonDataMap.find(isolate); if (it != addonDataMap.end()) { delete it->second; addonDataMap.erase(it); } } static NAN_METHOD(GetUtf8Length) { auto t = info[0]->ToString(Nan::GetCurrentContext()); if (t.IsEmpty()) { return; } auto s = t.ToLocalChecked(); info.GetReturnValue().Set(static_cast(s->Utf8Length(v8::Isolate::GetCurrent()))); } static NAN_METHOD(GetUtf16Length) { if (node::Buffer::HasInstance(info[0])) { const auto *s = node::Buffer::Data(info[0]); info.GetReturnValue().Set(static_cast(getUtf16Length(s, s + node::Buffer::Length(info[0])))); return; } info.GetReturnValue().Set(-1); } static void cleanup(void *p) { v8::Isolate *isolate = static_cast(p); deleteAddonData(isolate); } // NAN_MODULE_INIT(WrappedRE2::Init) v8::Local WrappedRE2::Init() { Nan::EscapableHandleScope scope; // prepare constructor template auto tpl = Nan::New(New); tpl->SetClassName(Nan::New("RE2").ToLocalChecked()); auto instanceTemplate = tpl->InstanceTemplate(); instanceTemplate->SetInternalFieldCount(1); // save the template in per-isolate storage auto isolate = v8::Isolate::GetCurrent(); auto data = new AddonData(); data->re2Tpl.Reset(tpl); setAddonData(isolate, data); node::AddEnvironmentCleanupHook(isolate, cleanup, isolate); // prototype Nan::SetPrototypeMethod(tpl, "toString", ToString); Nan::SetPrototypeMethod(tpl, "exec", Exec); Nan::SetPrototypeMethod(tpl, "test", Test); Nan::SetPrototypeMethod(tpl, "match", Match); Nan::SetPrototypeMethod(tpl, "replace", Replace); Nan::SetPrototypeMethod(tpl, "search", Search); Nan::SetPrototypeMethod(tpl, "split", Split); Nan::SetPrototypeTemplate(tpl, "source", Nan::New("(?:)").ToLocalChecked()); Nan::SetPrototypeTemplate(tpl, "flags", Nan::New("").ToLocalChecked()); Nan::SetAccessor(instanceTemplate, Nan::New("source").ToLocalChecked(), GetSource); Nan::SetAccessor(instanceTemplate, Nan::New("flags").ToLocalChecked(), GetFlags); Nan::SetAccessor(instanceTemplate, Nan::New("global").ToLocalChecked(), GetGlobal); Nan::SetAccessor(instanceTemplate, Nan::New("ignoreCase").ToLocalChecked(), GetIgnoreCase); Nan::SetAccessor(instanceTemplate, Nan::New("multiline").ToLocalChecked(), GetMultiline); Nan::SetAccessor(instanceTemplate, Nan::New("dotAll").ToLocalChecked(), GetDotAll); Nan::SetAccessor(instanceTemplate, Nan::New("unicode").ToLocalChecked(), GetUnicode); Nan::SetAccessor(instanceTemplate, Nan::New("sticky").ToLocalChecked(), GetSticky); Nan::SetAccessor(instanceTemplate, Nan::New("hasIndices").ToLocalChecked(), GetHasIndices); Nan::SetAccessor(instanceTemplate, Nan::New("lastIndex").ToLocalChecked(), GetLastIndex, SetLastIndex); Nan::SetAccessor(instanceTemplate, Nan::New("internalSource").ToLocalChecked(), GetInternalSource); auto ctr = Nan::GetFunction(tpl).ToLocalChecked(); auto setCtr = WrappedRE2Set::Init(); Nan::Set(ctr, Nan::New("Set").ToLocalChecked(), setCtr); // properties Nan::Export(ctr, "getUtf8Length", GetUtf8Length); Nan::Export(ctr, "getUtf16Length", GetUtf16Length); Nan::SetAccessor(v8::Local(ctr), Nan::New("unicodeWarningLevel").ToLocalChecked(), GetUnicodeWarningLevel, SetUnicodeWarningLevel); return scope.Escape(ctr); } NODE_MODULE_INIT() { Nan::HandleScope scope; Nan::Set(module->ToObject(context).ToLocalChecked(), Nan::New("exports").ToLocalChecked(), WrappedRE2::Init()); } WrappedRE2::~WrappedRE2() { dropCache(); } // private methods void WrappedRE2::dropCache() { if (!lastString.IsEmpty()) { // lastString.ClearWeak(); lastString.Reset(); } if (!lastCache.IsEmpty()) { // lastCache.ClearWeak(); lastCache.Reset(); } lastStringValue.clear(); } const StrVal &WrappedRE2::prepareArgument(const v8::Local &arg, bool ignoreLastIndex) { size_t startFrom = ignoreLastIndex ? 0 : lastIndex; if (!lastString.IsEmpty()) { lastString.ClearWeak(); } if (!lastCache.IsEmpty()) { lastCache.ClearWeak(); } if (lastString == arg && !node::Buffer::HasInstance(arg) && !lastCache.IsEmpty()) { // we have a properly cached string lastStringValue.setIndex(startFrom); return lastStringValue; } dropCache(); if (node::Buffer::HasInstance(arg)) { // no need to cache buffers lastString.Reset(arg); auto argSize = node::Buffer::Length(arg); lastStringValue.reset(arg, argSize, argSize, startFrom, true); return lastStringValue; } // caching the string auto t = arg->ToString(Nan::GetCurrentContext()); if (t.IsEmpty()) { // do not process bad strings lastStringValue.isBad = true; return lastStringValue; } lastString.Reset(arg); auto isolate = v8::Isolate::GetCurrent(); auto s = t.ToLocalChecked(); auto argLength = s->Utf8Length(isolate); auto buffer = node::Buffer::New(isolate, s).ToLocalChecked(); lastCache.Reset(buffer); auto argSize = node::Buffer::Length(buffer); lastStringValue.reset(buffer, argSize, argLength, startFrom); return lastStringValue; }; void WrappedRE2::doneWithLastString() { if (!lastString.IsEmpty()) { static_cast &>(lastString).SetWeak(); } if (!lastCache.IsEmpty()) { static_cast &>(lastCache).SetWeak(); } } // StrVal void StrVal::setIndex(size_t newIndex) { isValidIndex = newIndex <= length; if (!isValidIndex) { index = newIndex; byteIndex = 0; return; } if (newIndex == index) return; if (isBuffer) { byteIndex = index = newIndex; return; } // String if (!newIndex) { byteIndex = index = 0; return; } if (newIndex == length) { byteIndex = size; index = length; return; } byteIndex = index < newIndex ? getUtf16PositionByCounter(data, byteIndex, newIndex - index) : getUtf16PositionByCounter(data, 0, newIndex); index = newIndex; } static char null_buffer[] = {'\0'}; void StrVal::reset(const v8::Local &arg, size_t argSize, size_t argLength, size_t newIndex, bool buffer) { clear(); isBuffer = buffer; size = argSize; length = argLength; data = size ? node::Buffer::Data(arg) : null_buffer; setIndex(newIndex); } ================================================ FILE: lib/exec.cc ================================================ #include "./wrapped_re2.h" #include NAN_METHOD(WrappedRE2::Exec) { // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().SetNull(); return; } PrepareLastString prep(re2, info[0]); StrVal& str = prep; if (str.isBad) return; // throws an exception if (re2->global || re2->sticky) { if (!str.isValidIndex) { re2->lastIndex = 0; info.GetReturnValue().SetNull(); return; } } // actual work std::vector groups(re2->regexp.NumberOfCapturingGroups() + 1); if (!re2->regexp.Match(str, str.byteIndex, str.size, re2->sticky ? re2::RE2::ANCHOR_START : re2::RE2::UNANCHORED, &groups[0], groups.size())) { if (re2->global || re2->sticky) { re2->lastIndex = 0; } info.GetReturnValue().SetNull(); return; } // form a result auto result = Nan::New(), indices = Nan::New(); int indexOffset = re2->global || re2->sticky ? re2->lastIndex : 0; if (str.isBuffer) { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { Nan::Set(result, i, Nan::CopyBuffer(data, item.size()).ToLocalChecked()); if (re2->hasIndices) { auto pair = Nan::New(); auto offset = data - str.data - str.byteIndex; auto length = item.size(); Nan::Set(pair, 0, Nan::New(indexOffset + static_cast(offset))); Nan::Set(pair, 1, Nan::New(indexOffset + static_cast(offset + length))); Nan::Set(indices, i, pair); } } else { Nan::Set(result, i, Nan::Undefined()); if (re2->hasIndices) { Nan::Set(indices, i, Nan::Undefined()); } } } Nan::Set(result, Nan::New("index").ToLocalChecked(), Nan::New(indexOffset + static_cast(groups[0].data() - str.data - str.byteIndex))); } else { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { Nan::Set(result, i, Nan::New(data, item.size()).ToLocalChecked()); if (re2->hasIndices) { auto pair = Nan::New(); auto offset = getUtf16Length(str.data + str.byteIndex, data); auto length = getUtf16Length(data, data + item.size()); Nan::Set(pair, 0, Nan::New(indexOffset + static_cast(offset))); Nan::Set(pair, 1, Nan::New(indexOffset + static_cast(offset + length))); Nan::Set(indices, i, pair); } } else { Nan::Set(result, i, Nan::Undefined()); if (re2->hasIndices) { Nan::Set(indices, i, Nan::Undefined()); } } } Nan::Set( result, Nan::New("index").ToLocalChecked(), Nan::New(indexOffset + static_cast(getUtf16Length(str.data + str.byteIndex, groups[0].data())))); } if (re2->global || re2->sticky) { re2->lastIndex += str.isBuffer ? groups[0].data() - str.data + groups[0].size() - str.byteIndex : getUtf16Length(str.data + str.byteIndex, groups[0].data() + groups[0].size()); } Nan::Set(result, Nan::New("input").ToLocalChecked(), info[0]); const auto &groupNames = re2->regexp.CapturingGroupNames(); if (!groupNames.empty()) { auto groups = Nan::New(); Nan::SetPrototype(groups, Nan::Null()); for (auto group : groupNames) { auto value = Nan::Get(result, group.first); if (!value.IsEmpty()) { Nan::Set(groups, Nan::New(group.second).ToLocalChecked(), value.ToLocalChecked()); } } Nan::Set(result, Nan::New("groups").ToLocalChecked(), groups); if (re2->hasIndices) { auto indexGroups = Nan::New(); Nan::SetPrototype(indexGroups, Nan::Null()); for (auto group : groupNames) { auto value = Nan::Get(indices, group.first); if (!value.IsEmpty()) { Nan::Set(indexGroups, Nan::New(group.second).ToLocalChecked(), value.ToLocalChecked()); } } Nan::Set(indices, Nan::New("groups").ToLocalChecked(), indexGroups); } } else { Nan::Set(result, Nan::New("groups").ToLocalChecked(), Nan::Undefined()); if (re2->hasIndices) { Nan::Set(indices, Nan::New("groups").ToLocalChecked(), Nan::Undefined()); } } if (re2->hasIndices) { Nan::Set(result, Nan::New("indices").ToLocalChecked(), indices); } info.GetReturnValue().Set(result); } ================================================ FILE: lib/isolate_data.h ================================================ #pragma once #include struct AddonData { Nan::Persistent re2Tpl; Nan::Persistent re2SetTpl; }; AddonData *getAddonData(v8::Isolate *isolate); void setAddonData(v8::Isolate *isolate, AddonData *data); void deleteAddonData(v8::Isolate *isolate); ================================================ FILE: lib/match.cc ================================================ #include "./wrapped_re2.h" #include NAN_METHOD(WrappedRE2::Match) { // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().SetNull(); return; } PrepareLastString prep(re2, info[0]); StrVal& str = prep; if (str.isBad) return; // throws an exception if (!str.isValidIndex) { re2->lastIndex = 0; info.GetReturnValue().SetNull(); return; } std::vector groups; size_t byteIndex = 0; auto anchor = re2::RE2::UNANCHORED; // actual work if (re2->global) { // global: collect all matches re2::StringPiece match; if (re2->sticky) { anchor = re2::RE2::ANCHOR_START; } while (re2->regexp.Match(str, byteIndex, str.size, anchor, &match, 1)) { groups.push_back(match); byteIndex = match.data() - str.data + match.size(); } if (groups.empty()) { info.GetReturnValue().SetNull(); return; } } else { // non-global: just like exec() if (re2->sticky) { byteIndex = str.byteIndex; anchor = RE2::ANCHOR_START; } groups.resize(re2->regexp.NumberOfCapturingGroups() + 1); if (!re2->regexp.Match(str, byteIndex, str.size, anchor, &groups[0], groups.size())) { if (re2->sticky) re2->lastIndex = 0; info.GetReturnValue().SetNull(); return; } } // form a result auto result = Nan::New(), indices = Nan::New(); if (str.isBuffer) { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { Nan::Set(result, i, Nan::CopyBuffer(data, item.size()).ToLocalChecked()); if (!re2->global && re2->hasIndices) { auto pair = Nan::New(); auto offset = data - str.data - byteIndex; auto length = item.size(); Nan::Set(pair, 0, Nan::New(static_cast(offset))); Nan::Set(pair, 1, Nan::New(static_cast(offset + length))); Nan::Set(indices, i, pair); } } else { Nan::Set(result, i, Nan::Undefined()); if (!re2->global && re2->hasIndices) Nan::Set(indices, i, Nan::Undefined()); } } if (!re2->global) { Nan::Set(result, Nan::New("index").ToLocalChecked(), Nan::New(static_cast(groups[0].data() - str.data))); Nan::Set(result, Nan::New("input").ToLocalChecked(), info[0]); } } else { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { Nan::Set(result, i, Nan::New(data, item.size()).ToLocalChecked()); if (!re2->global && re2->hasIndices) { auto pair = Nan::New(); auto offset = getUtf16Length(str.data + byteIndex, data); auto length = getUtf16Length(data, data + item.size()); Nan::Set(pair, 0, Nan::New(static_cast(offset))); Nan::Set(pair, 1, Nan::New(static_cast(offset + length))); Nan::Set(indices, i, pair); } } else { Nan::Set(result, i, Nan::Undefined()); if (!re2->global && re2->hasIndices) { Nan::Set(indices, i, Nan::Undefined()); } } } if (!re2->global) { Nan::Set(result, Nan::New("index").ToLocalChecked(), Nan::New(static_cast(getUtf16Length(str.data, groups[0].data())))); Nan::Set(result, Nan::New("input").ToLocalChecked(), info[0]); } } if (re2->global) { re2->lastIndex = 0; } else if (re2->sticky) { re2->lastIndex += str.isBuffer ? groups[0].data() - str.data + groups[0].size() - byteIndex : getUtf16Length(str.data + byteIndex, groups[0].data() + groups[0].size()); } if (!re2->global) { const auto &groupNames = re2->regexp.CapturingGroupNames(); if (!groupNames.empty()) { auto groups = Nan::New(); Nan::SetPrototype(groups, Nan::Null()); for (auto group : groupNames) { auto value = Nan::Get(result, group.first); if (!value.IsEmpty()) { Nan::Set(groups, Nan::New(group.second).ToLocalChecked(), value.ToLocalChecked()); } } Nan::Set(result, Nan::New("groups").ToLocalChecked(), groups); if (re2->hasIndices) { auto indexGroups = Nan::New(); Nan::SetPrototype(indexGroups, Nan::Null()); for (auto group : groupNames) { auto value = Nan::Get(indices, group.first); if (!value.IsEmpty()) { Nan::Set(indexGroups, Nan::New(group.second).ToLocalChecked(), value.ToLocalChecked()); } } Nan::Set(indices, Nan::New("groups").ToLocalChecked(), indexGroups); } } else { Nan::Set(result, Nan::New("groups").ToLocalChecked(), Nan::Undefined()); if (re2->hasIndices) { Nan::Set(indices, Nan::New("groups").ToLocalChecked(), Nan::Undefined()); } } if (re2->hasIndices) { Nan::Set(result, Nan::New("indices").ToLocalChecked(), indices); } } info.GetReturnValue().Set(result); } ================================================ FILE: lib/new.cc ================================================ #include "./wrapped_re2.h" #include "./util.h" #include "./pattern.h" #include #include #include #include #include std::atomic WrappedRE2::alreadyWarnedAboutUnicode{false}; static const char *deprecationMessage = "BMP patterns aren't supported by node-re2. An implicit \"u\" flag is assumed by the RE2 constructor. In a future major version, calling the RE2 constructor without the \"u\" flag may become forbidden, or cause a different behavior. Please see https://github.com/uhop/node-re2/issues/21 for more information."; inline bool ensureUniqueNamedGroups(const std::map &groups) { std::unordered_set names; for (auto group : groups) { if (!names.insert(group.second).second) { return false; } } return true; } NAN_METHOD(WrappedRE2::New) { if (!info.IsConstructCall()) { // call a constructor and return the result std::vector> parameters(info.Length()); for (size_t i = 0, n = info.Length(); i < n; ++i) { parameters[i] = info[i]; } auto isolate = v8::Isolate::GetCurrent(); auto data = getAddonData(isolate); if (!data) return; auto newObject = Nan::NewInstance(Nan::GetFunction(data->re2Tpl.Get(isolate)).ToLocalChecked(), parameters.size(), ¶meters[0]); if (!newObject.IsEmpty()) { info.GetReturnValue().Set(newObject.ToLocalChecked()); } return; } // process arguments std::vector buffer; char *data = NULL; size_t size = 0; std::string source; bool global = false; bool ignoreCase = false; bool multiline = false; bool dotAll = false; bool unicode = false; bool sticky = false; bool hasIndices = false; auto context = Nan::GetCurrentContext(); bool needFlags = true; if (info.Length() > 1) { if (info[1]->IsString()) { auto isolate = v8::Isolate::GetCurrent(); auto t = info[1]->ToString(Nan::GetCurrentContext()); auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); data = &buffer[0]; s->WriteUtf8(isolate, data, buffer.size()); buffer[size] = '\0'; } else if (node::Buffer::HasInstance(info[1])) { size = node::Buffer::Length(info[1]); data = node::Buffer::Data(info[1]); } for (size_t i = 0; i < size; ++i) { switch (data[i]) { case 'g': global = true; break; case 'i': ignoreCase = true; break; case 'm': multiline = true; break; case 's': dotAll = true; break; case 'u': unicode = true; break; case 'y': sticky = true; break; case 'd': hasIndices = true; break; } } size = 0; needFlags = false; } bool needConversion = true; if (node::Buffer::HasInstance(info[0])) { size = node::Buffer::Length(info[0]); data = node::Buffer::Data(info[0]); source = escapeRegExp(data, size); } else if (info[0]->IsRegExp()) { const auto *re = v8::RegExp::Cast(*info[0]); auto isolate = v8::Isolate::GetCurrent(); auto t = re->GetSource()->ToString(Nan::GetCurrentContext()); auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); data = &buffer[0]; s->WriteUtf8(isolate, data, buffer.size()); buffer[size] = '\0'; source = escapeRegExp(data, size); if (needFlags) { v8::RegExp::Flags flags = re->GetFlags(); global = bool(flags & v8::RegExp::kGlobal); ignoreCase = bool(flags & v8::RegExp::kIgnoreCase); multiline = bool(flags & v8::RegExp::kMultiline); dotAll = bool(flags & v8::RegExp::kDotAll); unicode = bool(flags & v8::RegExp::kUnicode); sticky = bool(flags & v8::RegExp::kSticky); hasIndices = bool(flags & v8::RegExp::kHasIndices); needFlags = false; } } else if (info[0]->IsObject() && !info[0]->IsString()) { WrappedRE2 *re2 = nullptr; auto object = info[0]->ToObject(context).ToLocalChecked(); if (!object.IsEmpty() && object->InternalFieldCount() > 0) { re2 = Nan::ObjectWrap::Unwrap(object); } if (re2) { const auto &pattern = re2->regexp.pattern(); size = pattern.size(); buffer.resize(size); data = &buffer[0]; memcpy(data, pattern.data(), size); needConversion = false; source = re2->source; if (needFlags) { global = re2->global; ignoreCase = re2->ignoreCase; multiline = re2->multiline; dotAll = re2->dotAll; unicode = true; sticky = re2->sticky; hasIndices = re2->hasIndices; needFlags = false; } } } else if (info[0]->IsString()) { auto isolate = v8::Isolate::GetCurrent(); auto t = info[0]->ToString(Nan::GetCurrentContext()); auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); data = &buffer[0]; s->WriteUtf8(isolate, data, buffer.size()); buffer[size] = '\0'; source = escapeRegExp(data, size); } if (!data) { return Nan::ThrowTypeError("Expected string, Buffer, RegExp, or RE2 as the 1st argument."); } if (!unicode) { switch (unicodeWarningLevel) { case THROW: return Nan::ThrowSyntaxError(deprecationMessage); case WARN: printDeprecationWarning(deprecationMessage); break; case WARN_ONCE: if (!alreadyWarnedAboutUnicode) { printDeprecationWarning(deprecationMessage); alreadyWarnedAboutUnicode = true; } break; default: break; } } if (needConversion && translateRegExp(data, size, multiline, buffer)) { size = buffer.size() - 1; data = &buffer[0]; } // create and return an object re2::RE2::Options options; options.set_case_sensitive(!ignoreCase); options.set_one_line(!multiline); // to track this state, otherwise it is ignored options.set_dot_nl(dotAll); options.set_log_errors(false); // inappropriate when embedding std::unique_ptr re2(new WrappedRE2(re2::StringPiece(data, size), options, source, global, ignoreCase, multiline, dotAll, sticky, hasIndices)); if (!re2->regexp.ok()) { return Nan::ThrowSyntaxError(re2->regexp.error().c_str()); } if (!ensureUniqueNamedGroups(re2->regexp.CapturingGroupNames())) { return Nan::ThrowSyntaxError("duplicate capture group name"); } re2->Wrap(info.This()); re2.release(); info.GetReturnValue().Set(info.This()); } ================================================ FILE: lib/pattern.cc ================================================ #include "./pattern.h" #include "./wrapped_re2.h" #include #include #include static char hex[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; inline bool isUpperCaseAlpha(char ch) { return 'A' <= ch && ch <= 'Z'; } inline bool isHexadecimal(char ch) { return ('0' <= ch && ch <= '9') || ('A' <= ch && ch <= 'F') || ('a' <= ch && ch <= 'f'); } static std::map unicodeClasses = { {"Uppercase_Letter", "Lu"}, {"Lowercase_Letter", "Ll"}, {"Titlecase_Letter", "Lt"}, {"Cased_Letter", "LC"}, {"Modifier_Letter", "Lm"}, {"Other_Letter", "Lo"}, {"Letter", "L"}, {"Nonspacing_Mark", "Mn"}, {"Spacing_Mark", "Mc"}, {"Enclosing_Mark", "Me"}, {"Mark", "M"}, {"Decimal_Number", "Nd"}, {"Letter_Number", "Nl"}, {"Other_Number", "No"}, {"Number", "N"}, {"Connector_Punctuation", "Pc"}, {"Dash_Punctuation", "Pd"}, {"Open_Punctuation", "Ps"}, {"Close_Punctuation", "Pe"}, {"Initial_Punctuation", "Pi"}, {"Final_Punctuation", "Pf"}, {"Other_Punctuation", "Po"}, {"Punctuation", "P"}, {"Math_Symbol", "Sm"}, {"Currency_Symbol", "Sc"}, {"Modifier_Symbol", "Sk"}, {"Other_Symbol", "So"}, {"Symbol", "S"}, {"Space_Separator", "Zs"}, {"Line_Separator", "Zl"}, {"Paragraph_Separator", "Zp"}, {"Separator", "Z"}, {"Control", "Cc"}, {"Format", "Cf"}, {"Surrogate", "Cs"}, {"Private_Use", "Co"}, {"Unassigned", "Cn"}, {"Other", "C"}, }; bool translateRegExp(const char *data, size_t size, bool multiline, std::vector &buffer) { std::string result; bool changed = false; if (!size) { result = "(?:)"; changed = true; } else if (multiline) { result = "(?m)"; changed = true; } for (size_t i = 0; i < size;) { char ch = data[i]; if (ch == '\\') { if (i + 1 < size) { ch = data[i + 1]; switch (ch) { case '\\': result += "\\\\"; i += 2; continue; case 'c': if (i + 2 < size) { ch = data[i + 2]; if (isUpperCaseAlpha(ch)) { result += "\\x"; result += hex[((ch - '@') / 16) & 15]; result += hex[(ch - '@') & 15]; i += 3; changed = true; continue; } } result += "\\c"; i += 2; continue; case 'u': if (i + 2 < size) { ch = data[i + 2]; if (isHexadecimal(ch)) { result += "\\x{"; result += ch; i += 3; for (size_t j = 0; j < 3 && i < size; ++i, ++j) { ch = data[i]; if (!isHexadecimal(ch)) { break; } result += ch; } result += '}'; changed = true; continue; } else if (ch == '{') { result += "\\x"; i += 2; changed = true; continue; } } result += "\\u"; i += 2; continue; case 'p': case 'P': if (i + 2 < size) { if (data[i + 2] == '{') { size_t j = i + 3; while (j < size && data[j] != '}') ++j; if (j < size) { result += "\\"; result += data[i + 1]; std::string name(data + i + 3, j - i - 3); if (unicodeClasses.find(name) != unicodeClasses.end()) { name = unicodeClasses[name]; } else if (name.size() > 7 && !strncmp(name.c_str(), "Script=", 7)) { name = name.substr(7); } else if (name.size() > 3 && !strncmp(name.c_str(), "sc=", 3)) { name = name.substr(3); } if (name.size() == 1) { result += name; } else { result += "{"; result += name; result += "}"; } i = j + 1; changed = true; continue; } } } result += "\\"; result += data[i + 1]; i += 2; continue; default: result += "\\"; size_t sym_size = getUtf8CharSize(ch); result.append(data + i + 1, sym_size); i += sym_size + 1; continue; } } } else if (ch == '/') { result += "\\/"; i += 1; changed = true; continue; } else if (ch == '(' && i + 2 < size && data[i + 1] == '?' && data[i + 2] == '<') { if (i + 3 >= size || (data[i + 3] != '=' && data[i + 3] != '!')) { result += "(?P<"; i += 3; changed = true; continue; } } size_t sym_size = getUtf8CharSize(ch); result.append(data + i, sym_size); i += sym_size; } if (!changed) { return false; } buffer.resize(0); buffer.insert(buffer.end(), result.data(), result.data() + result.size()); buffer.push_back('\0'); return true; } std::string escapeRegExp(const char *data, size_t size) { std::string result; if (!size) { result = "(?:)"; } size_t prevBackSlashes = 0; for (size_t i = 0; i < size;) { char ch = data[i]; if (ch == '\\') { ++prevBackSlashes; } else if (ch == '/' && !(prevBackSlashes & 1)) { result += "\\/"; i += 1; prevBackSlashes = 0; continue; } else { prevBackSlashes = 0; } size_t sym_size = getUtf8CharSize(ch); result.append(data + i, sym_size); i += sym_size; } return result; } ================================================ FILE: lib/pattern.h ================================================ #pragma once #include #include // Shared helpers for translating JavaScript-style regular expressions // into RE2-compatible patterns. bool translateRegExp(const char *data, size_t size, bool multiline, std::vector &buffer); std::string escapeRegExp(const char *data, size_t size); ================================================ FILE: lib/replace.cc ================================================ #include "./wrapped_re2.h" #include #include #include #include inline int getMaxSubmatch( const char *data, size_t size, const std::map &namedGroups) { int maxSubmatch = 0, index, index2; const char *nameBegin; const char *nameEnd; for (size_t i = 0; i < size;) { char ch = data[i]; if (ch == '$') { if (i + 1 < size) { ch = data[i + 1]; switch (ch) { case '$': case '&': case '`': case '\'': i += 2; continue; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': index = ch - '0'; if (i + 2 < size) { ch = data[i + 2]; if ('0' <= ch && ch <= '9') { index2 = index * 10 + (ch - '0'); if (maxSubmatch < index2) maxSubmatch = index2; i += 3; continue; } } if (maxSubmatch < index) maxSubmatch = index; i += 2; continue; case '<': nameBegin = data + i + 2; nameEnd = (const char *)memchr(nameBegin, '>', size - i - 2); if (nameEnd) { std::string name(nameBegin, nameEnd - nameBegin); auto group = namedGroups.find(name); if (group != namedGroups.end()) { index = group->second; if (maxSubmatch < index) maxSubmatch = index; } i = nameEnd + 1 - data; } else { i += 2; } continue; } } ++i; continue; } i += getUtf8CharSize(ch); } return maxSubmatch; } inline std::string replace( const char *data, size_t size, const std::vector &groups, const re2::StringPiece &str, const std::map &namedGroups) { std::string result; size_t index, index2; const char *nameBegin; const char *nameEnd; for (size_t i = 0; i < size;) { char ch = data[i]; if (ch == '$') { if (i + 1 < size) { ch = data[i + 1]; switch (ch) { case '$': result += ch; i += 2; continue; case '&': result += (std::string)groups[0]; i += 2; continue; case '`': result += std::string(str.data(), groups[0].data() - str.data()); i += 2; continue; case '\'': result += std::string(groups[0].data() + groups[0].size(), str.data() + str.size() - groups[0].data() - groups[0].size()); i += 2; continue; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': index = ch - '0'; if (i + 2 < size) { ch = data[i + 2]; if ('0' <= ch && ch <= '9') { i += 3; index2 = index * 10 + (ch - '0'); if (index2 && index2 < groups.size()) { result += (std::string)groups[index2]; continue; } else if (index && index < groups.size()) { result += (std::string)groups[index]; result += ch; continue; } result += '$'; result += '0' + index; result += ch; continue; } ch = '0' + index; } i += 2; if (index && index < groups.size()) { result += (std::string)groups[index]; continue; } result += '$'; result += ch; continue; case '<': if (!namedGroups.empty()) { nameBegin = data + i + 2; nameEnd = (const char *)memchr(nameBegin, '>', size - i - 2); if (nameEnd) { std::string name(nameBegin, nameEnd - nameBegin); auto group = namedGroups.find(name); if (group != namedGroups.end()) { index = group->second; result += (std::string)groups[index]; } i = nameEnd + 1 - data; } else { result += "$<"; i += 2; } } else { result += "$<"; i += 2; } continue; } } result += '$'; ++i; continue; } size_t sym_size = getUtf8CharSize(ch); result.append(data + i, sym_size); i += sym_size; } return result; } static Nan::Maybe replace( WrappedRE2 *re2, const StrVal &replacee, const char *replacer, size_t replacer_size) { const re2::StringPiece str = replacee; const char *data = str.data(); size_t size = str.size(); const auto &namedGroups = re2->regexp.NamedCapturingGroups(); std::vector groups(std::min(re2->regexp.NumberOfCapturingGroups(), getMaxSubmatch(replacer, replacer_size, namedGroups)) + 1); const auto &match = groups[0]; size_t byteIndex = 0; std::string result; auto anchor = re2::RE2::UNANCHORED; if (re2->sticky) { if (!re2->global) byteIndex = replacee.byteIndex; anchor = re2::RE2::ANCHOR_START; } if (byteIndex) { result = std::string(data, byteIndex); } bool noMatch = true; while (byteIndex <= size && re2->regexp.Match(str, byteIndex, size, anchor, &groups[0], groups.size())) { noMatch = false; auto offset = match.data() - data; if (!re2->global && re2->sticky) { re2->lastIndex += replacee.isBuffer ? offset + match.size() - byteIndex : getUtf16Length(data + byteIndex, match.data() + match.size()); } if (match.data() == data || offset > static_cast(byteIndex)) { result += std::string(data + byteIndex, offset - byteIndex); } result += replace(replacer, replacer_size, groups, str, namedGroups); if (match.size()) { byteIndex = offset + match.size(); } else if ((size_t)offset < size) { auto sym_size = getUtf8CharSize(data[offset]); result.append(data + offset, sym_size); byteIndex = offset + sym_size; } else { byteIndex = size; break; } if (!re2->global) { break; } } if (byteIndex < size) { result += std::string(data + byteIndex, size - byteIndex); } if (re2->global) { re2->lastIndex = 0; } else if (re2->sticky) { if (noMatch) re2->lastIndex = 0; } return Nan::Just(result); } inline Nan::Maybe replace( const Nan::Callback *replacer, const std::vector &groups, const re2::StringPiece &str, const v8::Local &input, bool useBuffers, const std::map &namedGroups) { std::vector> argv; auto context = Nan::GetCurrentContext(); if (useBuffers) { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { argv.push_back(Nan::CopyBuffer(data, item.size()).ToLocalChecked()); } else { argv.push_back(Nan::Undefined()); } } argv.push_back(Nan::New(static_cast(groups[0].data() - str.data()))); } else { for (size_t i = 0, n = groups.size(); i < n; ++i) { const auto &item = groups[i]; const auto data = item.data(); if (data) { argv.push_back(Nan::New(data, item.size()).ToLocalChecked()); } else { argv.push_back(Nan::Undefined()); } } argv.push_back(Nan::New(static_cast(getUtf16Length(str.data(), groups[0].data())))); } argv.push_back(input); if (!namedGroups.empty()) { auto groups = Nan::New(); Nan::SetPrototype(groups, Nan::Null()); for (std::pair group : namedGroups) { Nan::Set(groups, Nan::New(group.first).ToLocalChecked(), argv[group.second]); } argv.push_back(groups); } auto maybeResult = Nan::CallAsFunction(replacer->GetFunction(), context->Global(), static_cast(argv.size()), &argv[0]); if (maybeResult.IsEmpty()) { return Nan::Nothing(); } auto result = maybeResult.ToLocalChecked(); if (node::Buffer::HasInstance(result)) { return Nan::Just(std::string(node::Buffer::Data(result), node::Buffer::Length(result))); } auto t = result->ToString(Nan::GetCurrentContext()); if (t.IsEmpty()) { return Nan::Nothing(); } v8::String::Utf8Value s(v8::Isolate::GetCurrent(), t.ToLocalChecked()); return Nan::Just(std::string(*s)); } static Nan::Maybe replace( WrappedRE2 *re2, const StrVal &replacee, const Nan::Callback *replacer, const v8::Local &input, bool useBuffers) { const re2::StringPiece str = replacee; const char *data = str.data(); size_t size = str.size(); std::vector groups(re2->regexp.NumberOfCapturingGroups() + 1); const auto &match = groups[0]; size_t byteIndex = 0; std::string result; auto anchor = re2::RE2::UNANCHORED; if (re2->sticky) { if (!re2->global) byteIndex = replacee.byteIndex; anchor = RE2::ANCHOR_START; } if (byteIndex) { result = std::string(data, byteIndex); } const auto &namedGroups = re2->regexp.NamedCapturingGroups(); bool noMatch = true; while (byteIndex <= size && re2->regexp.Match(str, byteIndex, size, anchor, &groups[0], groups.size())) { noMatch = false; auto offset = match.data() - data; if (!re2->global && re2->sticky) { re2->lastIndex += replacee.isBuffer ? offset + match.size() - byteIndex : getUtf16Length(data + byteIndex, match.data() + match.size()); } if (match.data() == data || offset > static_cast(byteIndex)) { result += std::string(data + byteIndex, offset - byteIndex); } const auto part = replace(replacer, groups, str, input, useBuffers, namedGroups); if (part.IsNothing()) { return part; } result += part.FromJust(); if (match.size()) { byteIndex = offset + match.size(); } else if ((size_t)offset < size) { auto sym_size = getUtf8CharSize(data[offset]); result.append(data + offset, sym_size); byteIndex = offset + sym_size; } else { byteIndex = size; break; } if (!re2->global) { break; } } if (byteIndex < size) { result += std::string(data + byteIndex, size - byteIndex); } if (re2->global) { re2->lastIndex = 0; } else if (re2->sticky) { if (noMatch) { re2->lastIndex = 0; } } return Nan::Just(result); } static bool requiresBuffers(const v8::Local &f) { auto flag(Nan::Get(f, Nan::New("useBuffers").ToLocalChecked()).ToLocalChecked()); if (flag->IsUndefined() || flag->IsNull() || flag->IsFalse()) { return false; } if (flag->IsNumber()) { return flag->NumberValue(Nan::GetCurrentContext()).FromMaybe(0) != 0; } if (flag->IsString()) { return flag->ToString(Nan::GetCurrentContext()).ToLocalChecked()->Length() > 0; } return true; } NAN_METHOD(WrappedRE2::Replace) { auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().Set(info[0]); return; } PrepareLastString prep(re2, info[0]); StrVal& replacee = prep; if (replacee.isBad) return; // throws an exception if (!replacee.isValidIndex) { info.GetReturnValue().Set(info[0]); return; } std::string result; if (info[1]->IsFunction()) { auto fun = info[1].As(); const std::unique_ptr cb(new Nan::Callback(fun)); const auto replaced = replace(re2, replacee, cb.get(), info[0], requiresBuffers(fun)); if (replaced.IsNothing()) { info.GetReturnValue().Set(info[0]); return; } result = replaced.FromJust(); } else { v8::Local replacer; if (node::Buffer::HasInstance(info[1])) { replacer = info[1].As(); } else { auto t = info[1]->ToString(Nan::GetCurrentContext()); if (t.IsEmpty()) return; // throws an exception replacer = node::Buffer::New(v8::Isolate::GetCurrent(), t.ToLocalChecked()).ToLocalChecked(); } auto data = node::Buffer::Data(replacer); auto size = node::Buffer::Length(replacer); const auto replaced = replace(re2, replacee, data, size); if (replaced.IsNothing()) { info.GetReturnValue().Set(info[0]); return; } result = replaced.FromJust(); } if (replacee.isBuffer) { info.GetReturnValue().Set(Nan::CopyBuffer(result.data(), result.size()).ToLocalChecked()); return; } info.GetReturnValue().Set(Nan::New(result).ToLocalChecked()); } ================================================ FILE: lib/search.cc ================================================ #include "./wrapped_re2.h" NAN_METHOD(WrappedRE2::Search) { // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().Set(-1); return; } PrepareLastString prep(re2, info[0]); StrVal& str = prep; if (str.isBad) return; // throws an exception if (!str.data) return; // actual work re2::StringPiece match; if (re2->regexp.Match(str, 0, str.size, re2->sticky ? re2::RE2::ANCHOR_START : re2::RE2::UNANCHORED, &match, 1)) { info.GetReturnValue().Set(static_cast(str.isBuffer ? match.data() - str.data : getUtf16Length(str.data, match.data()))); return; } info.GetReturnValue().Set(-1); } ================================================ FILE: lib/set.cc ================================================ #include "./wrapped_re2_set.h" #include "./pattern.h" #include "./util.h" #include "./wrapped_re2.h" #include #include #include #include struct SetFlags { bool global = false; bool ignoreCase = false; bool multiline = false; bool dotAll = false; bool unicode = false; bool sticky = false; bool hasIndices = false; }; static bool parseFlags(const v8::Local &arg, SetFlags &flags) { const char *data = nullptr; size_t size = 0; std::vector buffer; if (arg->IsString()) { auto isolate = v8::Isolate::GetCurrent(); auto t = arg->ToString(Nan::GetCurrentContext()); if (t.IsEmpty()) { return false; } auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); s->WriteUtf8(isolate, &buffer[0], buffer.size()); buffer[buffer.size() - 1] = '\0'; data = &buffer[0]; } else if (node::Buffer::HasInstance(arg)) { size = node::Buffer::Length(arg); data = node::Buffer::Data(arg); } else { return false; } for (size_t i = 0; i < size; ++i) { switch (data[i]) { case 'd': flags.hasIndices = true; break; case 'g': flags.global = true; break; case 'i': flags.ignoreCase = true; break; case 'm': flags.multiline = true; break; case 's': flags.dotAll = true; break; case 'u': flags.unicode = true; break; case 'y': flags.sticky = true; break; default: return false; } } return true; } static bool sameEffectiveOptions(const SetFlags &a, const SetFlags &b) { return a.ignoreCase == b.ignoreCase && a.multiline == b.multiline && a.dotAll == b.dotAll && a.unicode == b.unicode; } static std::string flagsToString(const SetFlags &flags) { std::string result; if (flags.hasIndices) { result += 'd'; } if (flags.global) { result += 'g'; } if (flags.ignoreCase) { result += 'i'; } if (flags.multiline) { result += 'm'; } if (flags.dotAll) { result += 's'; } result += 'u'; if (flags.sticky) { result += 'y'; } return result; } static bool collectIterable(const v8::Local &input, std::vector> &items) { auto context = Nan::GetCurrentContext(); auto isolate = v8::Isolate::GetCurrent(); if (input->IsArray()) { auto array = v8::Local::Cast(input); auto length = array->Length(); items.reserve(length); for (uint32_t i = 0; i < length; ++i) { auto maybe = Nan::Get(array, i); if (maybe.IsEmpty()) { return false; } items.push_back(maybe.ToLocalChecked()); } return true; } auto maybeObject = input->ToObject(context); if (maybeObject.IsEmpty()) { return false; } auto object = maybeObject.ToLocalChecked(); auto maybeIteratorFn = object->Get(context, v8::Symbol::GetIterator(isolate)); if (maybeIteratorFn.IsEmpty()) { return false; } auto iteratorFn = maybeIteratorFn.ToLocalChecked(); if (!iteratorFn->IsFunction()) { return false; } auto maybeIterator = iteratorFn.As()->Call(context, object, 0, nullptr); if (maybeIterator.IsEmpty()) { return false; } auto iterator = maybeIterator.ToLocalChecked(); if (!iterator->IsObject()) { return false; } auto nextKey = Nan::New("next").ToLocalChecked(); auto valueKey = Nan::New("value").ToLocalChecked(); auto doneKey = Nan::New("done").ToLocalChecked(); for (;;) { auto maybeNext = Nan::Get(iterator.As(), nextKey); if (maybeNext.IsEmpty()) { return false; } auto next = maybeNext.ToLocalChecked(); if (!next->IsFunction()) { return false; } auto maybeResult = next.As()->Call(context, iterator, 0, nullptr); if (maybeResult.IsEmpty()) { return false; } auto result = maybeResult.ToLocalChecked(); if (!result->IsObject()) { return false; } auto resultObj = result->ToObject(context).ToLocalChecked(); auto maybeDone = Nan::Get(resultObj, doneKey); if (maybeDone.IsEmpty()) { return false; } if (maybeDone.ToLocalChecked()->BooleanValue(isolate)) { break; } auto maybeValue = Nan::Get(resultObj, valueKey); if (maybeValue.IsEmpty()) { return false; } items.push_back(maybeValue.ToLocalChecked()); } return true; } static bool parseAnchor(const v8::Local &arg, re2::RE2::Anchor &anchor) { if (arg.IsEmpty() || arg->IsUndefined() || arg->IsNull()) { anchor = re2::RE2::UNANCHORED; return true; } v8::Local value = arg; if (arg->IsObject() && !arg->IsString()) { auto context = Nan::GetCurrentContext(); auto object = arg->ToObject(context).ToLocalChecked(); auto maybeAnchor = Nan::Get(object, Nan::New("anchor").ToLocalChecked()); if (maybeAnchor.IsEmpty()) { return false; } value = maybeAnchor.ToLocalChecked(); if (value->IsUndefined() || value->IsNull()) { anchor = re2::RE2::UNANCHORED; return true; } } if (!value->IsString()) { return false; } Nan::Utf8String val(value); std::string text(*val, val.length()); if (text == "unanchored") { anchor = re2::RE2::UNANCHORED; return true; } if (text == "start") { anchor = re2::RE2::ANCHOR_START; return true; } if (text == "both") { anchor = re2::RE2::ANCHOR_BOTH; return true; } return false; } static bool fillInput(const v8::Local &arg, StrVal &str, v8::Local &keepAlive) { if (node::Buffer::HasInstance(arg)) { auto size = node::Buffer::Length(arg); str.reset(arg, size, size, 0, true); return true; } auto context = Nan::GetCurrentContext(); auto isolate = v8::Isolate::GetCurrent(); auto t = arg->ToString(context); if (t.IsEmpty()) { return false; } auto s = t.ToLocalChecked(); auto utf8Length = s->Utf8Length(isolate); auto buffer = node::Buffer::New(isolate, s).ToLocalChecked(); keepAlive = buffer; str.reset(buffer, node::Buffer::Length(buffer), utf8Length, 0); return true; } static std::string anchorToString(re2::RE2::Anchor anchor) { switch (anchor) { case re2::RE2::ANCHOR_BOTH: return "both"; case re2::RE2::ANCHOR_START: return "start"; default: return "unanchored"; } } static std::string makeCombinedSource(const std::vector &sources) { if (sources.empty()) { return "(?:)"; } std::string combined; for (size_t i = 0, n = sources.size(); i < n; ++i) { if (i) { combined += '|'; } combined += sources[i]; } return combined; } static const char setDeprecationMessage[] = "BMP patterns aren't supported by node-re2. An implicit \"u\" flag is assumed by RE2.Set. In a future major version, calling RE2.Set without the \"u\" flag may become forbidden, or cause a different behavior. Please see https://github.com/uhop/node-re2/issues/21 for more information."; NAN_METHOD(WrappedRE2Set::New) { auto context = Nan::GetCurrentContext(); auto isolate = context->GetIsolate(); if (!info.IsConstructCall()) { std::vector> parameters(info.Length()); for (size_t i = 0, n = info.Length(); i < n; ++i) { parameters[i] = info[i]; } auto isolate = context->GetIsolate(); auto addonData = getAddonData(isolate); if (!addonData) return; auto maybeNew = Nan::NewInstance(Nan::GetFunction(addonData->re2SetTpl.Get(isolate)).ToLocalChecked(), parameters.size(), ¶meters[0]); if (!maybeNew.IsEmpty()) { info.GetReturnValue().Set(maybeNew.ToLocalChecked()); } return; } if (!info.Length()) { return Nan::ThrowTypeError("Expected an iterable of patterns as the 1st argument."); } SetFlags flags; bool haveFlags = false; bool flagsFromArg = false; v8::Local flagsArg; v8::Local optionsArg; if (info.Length() > 1) { if (info[1]->IsObject() && !info[1]->IsString() && !node::Buffer::HasInstance(info[1])) { optionsArg = info[1]; } else { flagsArg = info[1]; if (info.Length() > 2) { optionsArg = info[2]; } } } if (!flagsArg.IsEmpty()) { if (!parseFlags(flagsArg, flags)) { return Nan::ThrowTypeError("Invalid flags for RE2.Set."); } haveFlags = true; flagsFromArg = true; } re2::RE2::Anchor anchor = re2::RE2::UNANCHORED; if (!optionsArg.IsEmpty()) { if (!parseAnchor(optionsArg, anchor)) { return Nan::ThrowTypeError("Invalid anchor option for RE2.Set."); } } std::vector> patterns; if (!collectIterable(info[0], patterns)) { return Nan::ThrowTypeError("Expected an iterable of patterns as the 1st argument."); } auto mergeFlags = [&](const SetFlags &candidate) { if (flagsFromArg) { return true; } if (!haveFlags) { flags = candidate; haveFlags = true; return true; } return sameEffectiveOptions(flags, candidate); }; for (auto &value : patterns) { SetFlags patternFlags; bool hasFlagsForPattern = false; if (value->IsRegExp()) { const auto *re = v8::RegExp::Cast(*value); v8::RegExp::Flags reFlags = re->GetFlags(); patternFlags.global = bool(reFlags & v8::RegExp::kGlobal); patternFlags.ignoreCase = bool(reFlags & v8::RegExp::kIgnoreCase); patternFlags.multiline = bool(reFlags & v8::RegExp::kMultiline); patternFlags.dotAll = bool(reFlags & v8::RegExp::kDotAll); patternFlags.unicode = bool(reFlags & v8::RegExp::kUnicode); patternFlags.sticky = bool(reFlags & v8::RegExp::kSticky); patternFlags.hasIndices = bool(reFlags & v8::RegExp::kHasIndices); hasFlagsForPattern = true; } else if (value->IsObject()) { auto maybeObj = value->ToObject(context); if (!maybeObj.IsEmpty()) { auto obj = maybeObj.ToLocalChecked(); if (WrappedRE2::HasInstance(obj)) { auto re2 = Nan::ObjectWrap::Unwrap(obj); patternFlags.global = re2->global; patternFlags.ignoreCase = re2->ignoreCase; patternFlags.multiline = re2->multiline; patternFlags.dotAll = re2->dotAll; patternFlags.unicode = true; patternFlags.sticky = re2->sticky; patternFlags.hasIndices = re2->hasIndices; hasFlagsForPattern = true; } } } if (hasFlagsForPattern && !mergeFlags(patternFlags)) { return Nan::ThrowTypeError("All patterns in RE2.Set must use the same flags."); } } if (!flags.unicode) { switch (WrappedRE2::unicodeWarningLevel) { case WrappedRE2::THROW: return Nan::ThrowSyntaxError(setDeprecationMessage); case WrappedRE2::WARN: printDeprecationWarning(setDeprecationMessage); break; case WrappedRE2::WARN_ONCE: if (!WrappedRE2::alreadyWarnedAboutUnicode) { printDeprecationWarning(setDeprecationMessage); WrappedRE2::alreadyWarnedAboutUnicode = true; } break; default: break; } } re2::RE2::Options options; options.set_case_sensitive(!flags.ignoreCase); options.set_one_line(!flags.multiline); options.set_dot_nl(flags.dotAll); options.set_log_errors(false); std::unique_ptr set(new WrappedRE2Set(options, anchor, flagsToString(flags))); std::vector buffer; for (auto &value : patterns) { const char *data = nullptr; size_t size = 0; std::string source; if (node::Buffer::HasInstance(value)) { size = node::Buffer::Length(value); data = node::Buffer::Data(value); source = escapeRegExp(data, size); } else if (value->IsRegExp()) { const auto *re = v8::RegExp::Cast(*value); auto t = re->GetSource()->ToString(context); if (t.IsEmpty()) { return; } auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); s->WriteUtf8(isolate, &buffer[0], buffer.size()); buffer[size] = '\0'; data = &buffer[0]; source = escapeRegExp(data, size); } else if (value->IsString()) { auto t = value->ToString(context); if (t.IsEmpty()) { return; } auto s = t.ToLocalChecked(); size = s->Utf8Length(isolate); buffer.resize(size + 1); s->WriteUtf8(isolate, &buffer[0], buffer.size()); buffer[size] = '\0'; data = &buffer[0]; source = escapeRegExp(data, size); } else if (value->IsObject()) { auto maybeObj = value->ToObject(context); if (maybeObj.IsEmpty()) { return; } auto obj = maybeObj.ToLocalChecked(); if (!WrappedRE2::HasInstance(obj)) { return Nan::ThrowTypeError("Expected a string, Buffer, RegExp, or RE2 instance in the pattern list."); } auto re2 = Nan::ObjectWrap::Unwrap(obj); source = re2->source; data = source.data(); size = source.size(); } else { return Nan::ThrowTypeError("Expected a string, Buffer, RegExp, or RE2 instance in the pattern list."); } if (translateRegExp(data, size, flags.multiline, buffer)) { data = &buffer[0]; size = buffer.size() - 1; } std::string error; if (set->set.Add(re2::StringPiece(data, size), &error) < 0) { if (error.empty()) { error = "Invalid pattern in RE2.Set."; } return Nan::ThrowSyntaxError(error.c_str()); } set->sources.push_back(source); } if (!set->set.Compile()) { return Nan::ThrowError("RE2.Set could not be compiled."); } set->combinedSource = makeCombinedSource(set->sources); set->Wrap(info.This()); set.release(); info.GetReturnValue().Set(info.This()); } NAN_METHOD(WrappedRE2Set::Test) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(false); return; } StrVal str; v8::Local keepAlive; if (!fillInput(info[0], str, keepAlive)) { return; } re2::RE2::Set::ErrorInfo errorInfo{re2::RE2::Set::kNoError}; bool matched = re2set->set.Match(str, nullptr, &errorInfo); if (!matched && errorInfo.kind != re2::RE2::Set::kNoError) { const char *message = "RE2.Set matching failed."; switch (errorInfo.kind) { case re2::RE2::Set::kOutOfMemory: message = "RE2.Set matching failed: out of memory."; break; case re2::RE2::Set::kInconsistent: message = "RE2.Set matching failed: inconsistent result."; break; case re2::RE2::Set::kNotCompiled: message = "RE2.Set matching failed: set is not compiled."; break; default: break; } return Nan::ThrowError(message); } info.GetReturnValue().Set(matched); } NAN_METHOD(WrappedRE2Set::Match) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(Nan::New(0)); return; } StrVal str; v8::Local keepAlive; if (!fillInput(info[0], str, keepAlive)) { return; } std::vector matches; re2::RE2::Set::ErrorInfo errorInfo{re2::RE2::Set::kNoError}; bool matched = re2set->set.Match(str, &matches, &errorInfo); if (!matched && errorInfo.kind != re2::RE2::Set::kNoError) { const char *message = "RE2.Set matching failed."; switch (errorInfo.kind) { case re2::RE2::Set::kOutOfMemory: message = "RE2.Set matching failed: out of memory."; break; case re2::RE2::Set::kInconsistent: message = "RE2.Set matching failed: inconsistent result."; break; case re2::RE2::Set::kNotCompiled: message = "RE2.Set matching failed: set is not compiled."; break; default: break; } return Nan::ThrowError(message); } std::sort(matches.begin(), matches.end()); auto result = Nan::New(matches.size()); for (size_t i = 0, n = matches.size(); i < n; ++i) { Nan::Set(result, i, Nan::New(matches[i])); } info.GetReturnValue().Set(result); } NAN_METHOD(WrappedRE2Set::ToString) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().SetEmptyString(); return; } std::string result = "/"; result += re2set->combinedSource; result += "/"; result += re2set->flags; info.GetReturnValue().Set(Nan::New(result).ToLocalChecked()); } NAN_GETTER(WrappedRE2Set::GetFlags) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(Nan::New("u").ToLocalChecked()); return; } info.GetReturnValue().Set(Nan::New(re2set->flags).ToLocalChecked()); } NAN_GETTER(WrappedRE2Set::GetSources) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(Nan::New(0)); return; } auto result = Nan::New(re2set->sources.size()); for (size_t i = 0, n = re2set->sources.size(); i < n; ++i) { Nan::Set(result, i, Nan::New(re2set->sources[i]).ToLocalChecked()); } info.GetReturnValue().Set(result); } NAN_GETTER(WrappedRE2Set::GetSource) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(Nan::New("(?:)").ToLocalChecked()); return; } info.GetReturnValue().Set(Nan::New(re2set->combinedSource).ToLocalChecked()); } NAN_GETTER(WrappedRE2Set::GetSize) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(0); return; } info.GetReturnValue().Set(static_cast(re2set->sources.size())); } NAN_GETTER(WrappedRE2Set::GetAnchor) { auto re2set = Nan::ObjectWrap::Unwrap(info.This()); if (!re2set) { info.GetReturnValue().Set(Nan::New("unanchored").ToLocalChecked()); return; } info.GetReturnValue().Set(Nan::New(anchorToString(re2set->anchor)).ToLocalChecked()); } v8::Local WrappedRE2Set::Init() { Nan::EscapableHandleScope scope; auto tpl = Nan::New(New); tpl->SetClassName(Nan::New("RE2Set").ToLocalChecked()); auto instanceTemplate = tpl->InstanceTemplate(); instanceTemplate->SetInternalFieldCount(1); Nan::SetPrototypeMethod(tpl, "test", Test); Nan::SetPrototypeMethod(tpl, "match", Match); Nan::SetPrototypeMethod(tpl, "toString", ToString); Nan::SetAccessor(instanceTemplate, Nan::New("flags").ToLocalChecked(), GetFlags); Nan::SetAccessor(instanceTemplate, Nan::New("sources").ToLocalChecked(), GetSources); Nan::SetAccessor(instanceTemplate, Nan::New("source").ToLocalChecked(), GetSource); Nan::SetAccessor(instanceTemplate, Nan::New("size").ToLocalChecked(), GetSize); Nan::SetAccessor(instanceTemplate, Nan::New("anchor").ToLocalChecked(), GetAnchor); auto isolate = v8::Isolate::GetCurrent(); auto data = getAddonData(isolate); if (data) { data->re2SetTpl.Reset(tpl); } return scope.Escape(Nan::GetFunction(tpl).ToLocalChecked()); } ================================================ FILE: lib/split.cc ================================================ #include "./wrapped_re2.h" #include #include #include NAN_METHOD(WrappedRE2::Split) { auto result = Nan::New(); // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { Nan::Set(result, 0, info[0]); info.GetReturnValue().Set(result); return; } PrepareLastString prep(re2, info[0]); StrVal& str = prep; if (str.isBad) return; // throws an exception size_t limit = std::numeric_limits::max(); if (info.Length() > 1 && info[1]->IsNumber()) { size_t lim = info[1]->NumberValue(Nan::GetCurrentContext()).FromMaybe(0); if (lim > 0) { limit = lim; } } // actual work std::vector groups(re2->regexp.NumberOfCapturingGroups() + 1), pieces; const auto &match = groups[0]; size_t byteIndex = 0; while (byteIndex < str.size && re2->regexp.Match(str, byteIndex, str.size, RE2::UNANCHORED, &groups[0], groups.size())) { if (match.size()) { pieces.push_back(re2::StringPiece(str.data + byteIndex, match.data() - str.data - byteIndex)); byteIndex = match.data() - str.data + match.size(); pieces.insert(pieces.end(), groups.begin() + 1, groups.end()); } else { size_t sym_size = getUtf8CharSize(str.data[byteIndex]); pieces.push_back(re2::StringPiece(str.data + byteIndex, sym_size)); byteIndex += sym_size; } if (pieces.size() >= limit) { break; } } if (pieces.size() < limit && (byteIndex < str.size || (byteIndex == str.size && match.size()))) { pieces.push_back(re2::StringPiece(str.data + byteIndex, str.size - byteIndex)); } if (pieces.empty()) { Nan::Set(result, 0, info[0]); info.GetReturnValue().Set(result); return; } // form a result if (str.isBuffer) { for (size_t i = 0, n = std::min(pieces.size(), limit); i < n; ++i) { const auto &item = pieces[i]; if (item.data()) { Nan::Set(result, i, Nan::CopyBuffer(item.data(), item.size()).ToLocalChecked()); } else { Nan::Set(result, i, Nan::Undefined()); } } } else { for (size_t i = 0, n = std::min(pieces.size(), limit); i < n; ++i) { const auto &item = pieces[i]; if (item.data()) { Nan::Set(result, i, Nan::New(item.data(), item.size()).ToLocalChecked()); } else { Nan::Set(result, i, Nan::Undefined()); } } } info.GetReturnValue().Set(result); } ================================================ FILE: lib/test.cc ================================================ #include "./wrapped_re2.h" #include NAN_METHOD(WrappedRE2::Test) { // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().Set(false); return; } PrepareLastString prep(re2, info[0]); StrVal& str = prep; if (str.isBad) return; // throws an exception if (!re2->global && !re2->sticky) { info.GetReturnValue().Set(re2->regexp.Match(str, 0, str.size, re2::RE2::UNANCHORED, NULL, 0)); return; } if (!str.isValidIndex) { re2->lastIndex = 0; info.GetReturnValue().Set(false); return; } // actual work re2::StringPiece match; if (re2->regexp.Match(str, str.byteIndex, str.size, re2->sticky ? re2::RE2::ANCHOR_START : re2::RE2::UNANCHORED, &match, 1)) { re2->lastIndex += str.isBuffer ? match.data() - str.data + match.size() - str.byteIndex : getUtf16Length(str.data + str.byteIndex, match.data() + match.size()); info.GetReturnValue().Set(true); return; } re2->lastIndex = 0; info.GetReturnValue().Set(false); } ================================================ FILE: lib/to_string.cc ================================================ #include "./wrapped_re2.h" #include NAN_METHOD(WrappedRE2::ToString) { // unpack arguments auto re2 = Nan::ObjectWrap::Unwrap(info.This()); if (!re2) { info.GetReturnValue().SetEmptyString(); return; } // actual work std::string buffer("/"); buffer += re2->source; buffer += "/"; if (re2->hasIndices) { buffer += "d"; } if (re2->global) { buffer += "g"; } if (re2->ignoreCase) { buffer += "i"; } if (re2->multiline) { buffer += "m"; } if (re2->dotAll) { buffer += "s"; } buffer += "u"; if (re2->sticky) { buffer += "y"; } info.GetReturnValue().Set(Nan::New(buffer).ToLocalChecked()); } ================================================ FILE: lib/util.cc ================================================ #include "./util.h" void consoleCall(const v8::Local &methodName, v8::Local text) { auto context = Nan::GetCurrentContext(); auto maybeConsole = bind( Nan::Get(context->Global(), Nan::New("console").ToLocalChecked()), [context](v8::Local console) { return console->ToObject(context); }); if (maybeConsole.IsEmpty()) return; auto console = maybeConsole.ToLocalChecked(); auto maybeMethod = bind( Nan::Get(console, methodName), [context](v8::Local method) { return method->ToObject(context); }); if (maybeMethod.IsEmpty()) return; auto method = maybeMethod.ToLocalChecked(); if (!method->IsFunction()) return; Nan::CallAsFunction(method, console, 1, &text); } void printDeprecationWarning(const char *warning) { std::string prefixedWarning = "DeprecationWarning: "; prefixedWarning += warning; consoleCall(Nan::New("error").ToLocalChecked(), Nan::New(prefixedWarning).ToLocalChecked()); } v8::Local callToString(const v8::Local &object) { auto context = Nan::GetCurrentContext(); auto maybeMethod = bind( Nan::Get(object, Nan::New("toString").ToLocalChecked()), [context](v8::Local method) { return method->ToObject(context); }); if (maybeMethod.IsEmpty()) return Nan::New("No toString() is found").ToLocalChecked(); auto method = maybeMethod.ToLocalChecked(); if (!method->IsFunction()) return Nan::New("No toString() is found").ToLocalChecked(); auto maybeResult = Nan::CallAsFunction(method, object, 0, nullptr); if (maybeResult.IsEmpty()) { return Nan::New("nothing was returned").ToLocalChecked(); } auto result = maybeResult.ToLocalChecked(); if (result->IsObject()) { return callToString(result->ToObject(context).ToLocalChecked()); } Nan::Utf8String val(result->ToString(context).ToLocalChecked()); return Nan::New(std::string(*val, val.length())).ToLocalChecked(); } ================================================ FILE: lib/util.h ================================================ #pragma once #include "./wrapped_re2.h" template inline v8::MaybeLocal bind(v8::MaybeLocal

param, L lambda) { return param.IsEmpty() ? v8::MaybeLocal() : lambda(param.ToLocalChecked()); } void consoleCall(const v8::Local &methodName, v8::Local text); void printDeprecationWarning(const char *warning); v8::Local callToString(const v8::Local &object); ================================================ FILE: lib/wrapped_re2.h ================================================ #pragma once #include #include #include #include #include "./isolate_data.h" struct StrVal { char *data; size_t size, length; size_t index, byteIndex; bool isBuffer, isValidIndex, isBad; StrVal() : data(NULL), size(0), length(0), index(0), byteIndex(0), isBuffer(false), isValidIndex(false), isBad(false) {} operator re2::StringPiece() const { return re2::StringPiece(data, size); } void setIndex(size_t newIndex = 0); void reset(const v8::Local &arg, size_t size, size_t length, size_t newIndex = 0, bool buffer = false); void clear() { isBad = isBuffer = isValidIndex = false; size = length = index = byteIndex = 0; data = nullptr; } }; class WrappedRE2 : public Nan::ObjectWrap { private: WrappedRE2( const re2::StringPiece &pattern, const re2::RE2::Options &options, const std::string &src, const bool &g, const bool &i, const bool &m, const bool &s, const bool &y, const bool &d) : regexp(pattern, options), source(src), global(g), ignoreCase(i), multiline(m), dotAll(s), sticky(y), hasIndices(d), lastIndex(0) {} static NAN_METHOD(New); static NAN_METHOD(ToString); static NAN_GETTER(GetSource); static NAN_GETTER(GetFlags); static NAN_GETTER(GetGlobal); static NAN_GETTER(GetIgnoreCase); static NAN_GETTER(GetMultiline); static NAN_GETTER(GetDotAll); static NAN_GETTER(GetUnicode); static NAN_GETTER(GetSticky); static NAN_GETTER(GetHasIndices); static NAN_GETTER(GetLastIndex); static NAN_SETTER(SetLastIndex); static NAN_GETTER(GetInternalSource); // RegExp methods static NAN_METHOD(Exec); static NAN_METHOD(Test); // String methods static NAN_METHOD(Match); static NAN_METHOD(Replace); static NAN_METHOD(Search); static NAN_METHOD(Split); // strict Unicode warning support static NAN_GETTER(GetUnicodeWarningLevel); static NAN_SETTER(SetUnicodeWarningLevel); public: ~WrappedRE2(); static v8::Local Init(); static inline bool HasInstance(v8::Local object) { auto isolate = v8::Isolate::GetCurrent(); auto data = getAddonData(isolate); if (!data || data->re2Tpl.IsEmpty()) return false; return data->re2Tpl.Get(isolate)->HasInstance(object); } enum UnicodeWarningLevels { NOTHING, WARN_ONCE, WARN, THROW }; static std::atomic unicodeWarningLevel; static std::atomic alreadyWarnedAboutUnicode; re2::RE2 regexp; std::string source; bool global; bool ignoreCase; bool multiline; bool dotAll; bool sticky; bool hasIndices; size_t lastIndex; friend struct PrepareLastString; private: Nan::Persistent lastString; // weak pointer Nan::Persistent lastCache; // weak pointer StrVal lastStringValue; void dropCache(); const StrVal &prepareArgument(const v8::Local &arg, bool ignoreLastIndex = false); void doneWithLastString(); }; struct PrepareLastString { PrepareLastString(WrappedRE2 *re2, const v8::Local &arg, bool ignoreLastIndex = false) : re2(re2) { re2->prepareArgument(arg, ignoreLastIndex); } ~PrepareLastString() { re2->doneWithLastString(); } operator const StrVal&() const { return re2->lastStringValue; } operator StrVal&() { return re2->lastStringValue; } WrappedRE2 *re2; }; // utilities inline size_t getUtf8Length(const uint16_t *from, const uint16_t *to) { size_t n = 0; while (from != to) { uint16_t ch = *from++; if (ch <= 0x7F) ++n; else if (ch <= 0x7FF) n += 2; else if (0xD800 <= ch && ch <= 0xDFFF) { n += 4; if (from == to) break; ++from; } else if (ch < 0xFFFF) n += 3; else n += 4; } return n; } inline size_t getUtf16Length(const char *from, const char *to) { size_t n = 0; while (from != to) { unsigned ch = *from & 0xFF; if (ch < 0xF0) { if (ch < 0x80) { ++from; } else { if (ch < 0xE0) { from += 2; if (from == to + 1) { ++n; break; } } else { from += 3; if (from > to && from < to + 3) { ++n; break; } } } ++n; } else { from += 4; n += 2; if (from > to && from < to + 4) break; } } return n; } inline size_t getUtf8CharSize(char ch) { return ((0xE5000000 >> ((ch >> 3) & 0x1E)) & 3) + 1; } inline size_t getUtf16PositionByCounter(const char *data, size_t from, size_t n) { for (; n > 0; --n) { size_t s = getUtf8CharSize(data[from]); from += s; if (s == 4 && n >= 2) --n; // this utf8 character will take two utf16 characters // the decrement above is protected to avoid an overflow of an unsigned integer } return from; } ================================================ FILE: lib/wrapped_re2_set.h ================================================ #pragma once #include #include #include #include "./isolate_data.h" #include #include class WrappedRE2Set : public Nan::ObjectWrap { public: static v8::Local Init(); static inline bool HasInstance(v8::Local object) { auto isolate = v8::Isolate::GetCurrent(); auto data = getAddonData(isolate); if (!data || data->re2SetTpl.IsEmpty()) return false; return data->re2SetTpl.Get(isolate)->HasInstance(object); } private: WrappedRE2Set(const re2::RE2::Options &options, re2::RE2::Anchor anchor, const std::string &flags) : set(options, anchor), flags(flags), anchor(anchor) {} static NAN_METHOD(New); static NAN_METHOD(Test); static NAN_METHOD(Match); static NAN_METHOD(ToString); static NAN_GETTER(GetFlags); static NAN_GETTER(GetSources); static NAN_GETTER(GetSource); static NAN_GETTER(GetSize); static NAN_GETTER(GetAnchor); re2::RE2::Set set; std::vector sources; std::string combinedSource; std::string flags; re2::RE2::Anchor anchor; }; ================================================ FILE: llms-full.txt ================================================ # node-re2 > Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS (Regular Expression Denial of Service). Works with strings and Buffers. C++ native addon built with node-gyp and nan. - Drop-in replacement for RegExp with linear-time matching guarantee - Prevents ReDoS by disallowing backreferences and lookahead assertions - Full Unicode mode (always on) - Buffer support for high-performance binary/UTF-8 processing - Named capture groups - Symbol-based methods (Symbol.match, Symbol.search, Symbol.replace, Symbol.split, Symbol.matchAll) - RE2.Set for multi-pattern matching - Prebuilt binaries for Linux, macOS, Windows (x64 + arm64) - TypeScript declarations included ## Install ```bash npm install re2 ``` Prebuilt native binaries are downloaded automatically. Falls back to building from source via node-gyp if no prebuilt is available. ## Quick start ```js const RE2 = require('re2'); // Create and use like RegExp const re = new RE2('a(b*)', 'i'); const result = re.exec('aBbC'); console.log(result[0]); // "aBb" console.log(result[1]); // "Bb" // Works with ES6 string methods 'hello world'.match(new RE2('\\w+', 'g')); // ['hello', 'world'] 'hello world'.replace(new RE2('world'), 'RE2'); // 'hello RE2' ``` ## Importing ```js // CommonJS const RE2 = require('re2'); // ESM import { RE2 } from 're2'; ``` ## Construction `new RE2(pattern[, flags])` or `RE2(pattern[, flags])` (factory mode). Pattern can be: - **String**: `new RE2('\\d+')` - **String with flags**: `new RE2('\\d+', 'gi')` - **RegExp**: `new RE2(/ab*/ig)` — copies pattern and flags. - **RE2**: `new RE2(existingRE2)` — copies pattern and flags. - **Buffer**: `new RE2(Buffer.from('pattern'))` — pattern from UTF-8 buffer. Supported flags: - `g` — global (find all matches) - `i` — ignoreCase - `m` — multiline (`^`/`$` match line boundaries) - `s` — dotAll (`.` matches `\n`) - `u` — unicode (always on, added implicitly) - `y` — sticky (match at lastIndex only) - `d` — hasIndices (include index info for capture groups) Invalid patterns throw `SyntaxError`. Patterns with backreferences or lookahead throw `SyntaxError`. ## Properties ### Instance properties - `re.source` (string) — the pattern string, escaped for use in `new RE2(re.source)` or `new RegExp(re.source)`. - `re.flags` (string) — the flags string (e.g., `'giu'`). - `re.lastIndex` (number) — the index at which to start the next match (used with `g` or `y` flags). - `re.global` (boolean) — whether the `g` flag is set. - `re.ignoreCase` (boolean) — whether the `i` flag is set. - `re.multiline` (boolean) — whether the `m` flag is set. - `re.dotAll` (boolean) — whether the `s` flag is set. - `re.unicode` (boolean) — always `true` (RE2 always operates in Unicode mode). - `re.sticky` (boolean) — whether the `y` flag is set. - `re.hasIndices` (boolean) — whether the `d` flag is set. - `re.internalSource` (string) — the RE2-translated pattern (for debugging; may differ from `source`). ### Static properties - `RE2.unicodeWarningLevel` (string) — controls behavior when a non-Unicode regexp is created: - `'nothing'` (default) — silently add `u` flag. - `'warnOnce'` — warn once, then silently add `u`. Assigning resets the one-time flag. - `'warn'` — warn every time. - `'throw'` — throw `SyntaxError` every time. ## RegExp methods ### re.exec(str) Executes a search for a match. Returns a result array or `null`. ```js const re = new RE2('a(b+)', 'g'); const result = re.exec('abbc abbc'); // result[0] === 'abb' // result[1] === 'bb' // result.index === 0 // result.input === 'abbc abbc' // re.lastIndex === 3 ``` With `d` flag (hasIndices), result has `.indices` property with `[start, end]` pairs for each group. With `g` or `y` flag, advances `lastIndex`. Call repeatedly to iterate matches. ### re.test(str) Returns `true` if the pattern matches, `false` otherwise. ```js new RE2('\\d+').test('abc123'); // true new RE2('\\d+').test('abcdef'); // false ``` With `g` or `y` flag, advances `lastIndex`. ### re.toString() Returns `'/pattern/flags'` string representation. ```js new RE2('abc', 'gi').toString(); // '/abc/giu' ``` ## String methods (via Symbol) RE2 instances implement well-known symbols, so they work directly with ES6 string methods: ### str.match(re) / re[Symbol.match](str) ```js 'test 123 test 456'.match(new RE2('\\d+', 'g')); // ['123', '456'] 'test 123'.match(new RE2('(\\d+)')); // ['123', '123', index: 5, input: 'test 123'] ``` ### str.matchAll(re) / re[Symbol.matchAll](str) Returns an iterator of all matches (requires `g` flag). ```js const re = new RE2('\\d+', 'g'); for (const m of '1a2b3c'.matchAll(re)) { console.log(m[0]); // '1', '2', '3' } ``` ### str.search(re) / re[Symbol.search](str) Returns the index of the first match, or `-1`. ```js 'hello world'.search(new RE2('world')); // 6 ``` ### str.replace(re, replacement) / re[Symbol.replace](str, replacement) Returns a new string with matches replaced. ```js 'aabba'.replace(new RE2('b', 'g'), 'c'); // 'aacca' ``` Replacement string supports: - `$1`, `$2`, ... — numbered capture groups. - `$` — named capture groups. - `$&` — the matched substring. - `` $` `` — portion before the match. - `$'` — portion after the match. - `$$` — literal `$`. Replacement function receives `(match, ...groups, offset, input)`: ```js 'abc'.replace(new RE2('(b)'), (match, g1, offset) => `[${g1}@${offset}]`); // 'a[b@1]c' ``` ### str.split(re[, limit]) / re[Symbol.split](str[, limit]) Splits string by pattern. ```js 'a1b2c3'.split(new RE2('\\d')); // ['a', 'b', 'c', ''] 'a1b2c3'.split(new RE2('\\d'), 2); // ['a', 'b'] ``` ## String methods (direct) These are convenience methods on the RE2 instance with swapped argument order: - `re.match(str)` — equivalent to `str.match(re)`. - `re.search(str)` — equivalent to `str.search(re)`. - `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`. - `re.split(str[, limit])` — equivalent to `str.split(re, limit)`. ```js const re = new RE2('\\d+', 'g'); re.match('test 123 test 456'); // ['123', '456'] re.search('test 123'); // 5 re.replace('test 1 and 2', 'N'); // 'test N and N' (global replaces all) re.split('a1b2c'); // ['a', 'b', 'c'] ``` ## Buffer support All methods accept Node.js Buffers (UTF-8) instead of strings. When given Buffer input, they return Buffer output. ```js const re = new RE2('матч', 'g'); const buf = Buffer.from('тест матч тест'); const result = re.exec(buf); // result[0] is a Buffer containing 'матч' in UTF-8 // result.index is in bytes (not characters) ``` Differences from string mode: - All offsets and lengths are in **bytes**, not characters. - Results contain Buffers instead of strings. - Use `buf.toString()` to convert results back to strings. ### useBuffers on replacer functions When using `re.replace(buf, replacerFn)`, the replacer receives string arguments and character offsets by default. Set `replacerFn.useBuffers = true` to receive byte offsets instead: ```js function replacer(match, offset, input) { return '<' + offset + ' bytes>'; } replacer.useBuffers = true; new RE2('б').replace(Buffer.from('абв'), replacer); ``` ## RE2.Set Multi-pattern matching — compile many patterns into a single automaton and test/match against all of them at once. Faster than testing individual patterns when the number of patterns is large. ### Constructor ```js new RE2.Set(patterns[, flagsOrOptions][, options]) ``` - `patterns` — any iterable of strings, Buffers, RegExp, or RE2 instances. - `flagsOrOptions` — optional string/Buffer with flags (apply to all patterns), or options object. - `options.anchor` — `'unanchored'` (default), `'start'`, or `'both'`. ```js const set = new RE2.Set([ '^/users/\\d+$', '^/posts/\\d+$', '^/api/.*$' ], 'i', {anchor: 'start'}); ``` ### set.test(str) Returns `true` if any pattern matches, `false` otherwise. ```js set.test('/users/42'); // true set.test('/unknown'); // false ``` ### set.match(str) Returns an array of indices of matching patterns, sorted ascending. Empty array if none match. ```js set.match('/users/42'); // [0] set.match('/api/users'); // [2] set.match('/unknown'); // [] ``` ### Properties - `set.size` (number) — number of patterns. - `set.source` (string) — all patterns joined with `|`. - `set.sources` (string[]) — individual pattern sources. - `set.flags` (string) — flags string. - `set.anchor` (string) — anchor mode. ### set.toString() Returns `'/pattern1|pattern2|.../flags'`. ```js set.toString(); // '/^/users/\\d+$|^/posts/\\d+$|^/api/.*$/iu' ``` ## Static helpers ### RE2.getUtf8Length(str) Calculate the byte size needed to encode a UTF-16 string as UTF-8. ```js RE2.getUtf8Length('hello'); // 5 RE2.getUtf8Length('привет'); // 12 ``` ### RE2.getUtf16Length(buf) Calculate the character count needed to encode a UTF-8 buffer as a UTF-16 string. ```js RE2.getUtf16Length(Buffer.from('hello')); // 5 RE2.getUtf16Length(Buffer.from('привет')); // 6 ``` ## Named groups Named capture groups are supported: ```js const re = new RE2('(?\\d{4})-(?\\d{2})-(?\\d{2})'); const result = re.exec('2024-01-15'); result.groups.year; // '2024' result.groups.month; // '01' result.groups.day; // '15' ``` Named backreferences in replacement strings: ```js '2024-01-15'.replace( new RE2('(?\\d{4})-(?\\d{2})-(?\\d{2})'), '$/$/$' ); // '15/01/2024' ``` ## Unicode classes RE2 supports Unicode property escapes. Long names are translated to RE2 short names: ```js new RE2('\\p{Letter}+'); // same as \p{L}+ new RE2('\\p{Number}+'); // same as \p{N}+ new RE2('\\p{Script=Latin}+'); // same as \p{Latin}+ new RE2('\\p{sc=Cyrillic}+'); // same as \p{Cyrillic}+ new RE2('\\P{Letter}+'); // negated: non-letters ``` Only `\p{name}` form is supported (not `\p{name=value}` in general). Exception: `Script` and `sc` names. ## Limitations RE2 does **not** support: - **Backreferences** (`\1`, `\2`, etc.) — throw `SyntaxError`. - **Lookahead assertions** (`(?=...)`, `(?!...)`) — throw `SyntaxError`. - **Lookbehind assertions** (`(?<=...)`, `(? 0 ? matches[0] : -1; } findRoute('/users/42'); // 0 findRoute('/posts/7'); // 1 findRoute('/api/v2/foo'); // 2 findRoute('/unknown'); // -1 ``` ### Validate user-supplied patterns safely ```js const RE2 = require('re2'); function safeMatch(input, pattern, flags) { try { const re = new RE2(pattern, flags); return re.test(input); } catch (e) { return false; // invalid pattern } } ``` ## TypeScript ```ts import RE2 from 're2'; const re: RE2 = new RE2('\\d+', 'g'); const result: RegExpExecArray | null = re.exec('test 123'); // Buffer overloads const bufResult: RE2BufferExecArray | null = re.exec(Buffer.from('test 123')); // RE2.Set const set: RE2Set = new RE2.Set(['a', 'b'], 'i'); const matches: number[] = set.match('abc'); ``` ## Project structure notes - Entry point: `re2.js` (loads native addon), types: `re2.d.ts`. - C++ addon source: `lib/*.cc`, `lib/*.h`. - Tests: `tests/test-*.mjs` (runtime), `ts-tests/test-*.ts` (type-checking). - Vendored dependencies: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) — **never modify files under `vendor/`**. ## Links - Docs: https://github.com/uhop/node-re2/wiki - npm: https://www.npmjs.com/package/re2 - Repository: https://github.com/uhop/node-re2 - RE2 syntax: https://github.com/google/re2/wiki/Syntax ================================================ FILE: llms.txt ================================================ # node-re2 > Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS. Works with strings and Buffers. ## Install npm install re2 ## Quick start ```js // CommonJS const RE2 = require('re2'); // ESM import {RE2} from 're2'; const re = new RE2('a(b*)', 'i'); const result = re.exec('aBbC'); console.log(result[0]); // "aBb" console.log(result[1]); // "Bb" ``` ## Why use node-re2? The built-in Node.js RegExp engine can run in exponential time with vulnerable patterns (ReDoS). RE2 guarantees linear-time matching by disallowing backreferences and lookahead assertions. ## API ### Construction ```js const RE2 = require('re2'); const re1 = new RE2('\\d+'); // from string const re2 = new RE2('\\d+', 'gi'); // with flags const re3 = new RE2(/ab*/ig); // from RegExp const re4 = new RE2(re3); // from another RE2 const re5 = RE2('\\d+'); // factory (no new) ``` Supported flags: `g` (global), `i` (ignoreCase), `m` (multiline), `s` (dotAll), `u` (unicode, always on), `y` (sticky), `d` (hasIndices). ### RegExp methods - `re.exec(str)` — find match with capture groups. - `re.test(str)` — boolean match check. - `re.toString()` — `/pattern/flags` representation. ### String methods (via Symbol) RE2 instances work with ES6 string methods: ```js 'abc'.match(re); 'abc'.search(re); 'abc'.replace(re, 'x'); 'abc'.split(re); Array.from('abc'.matchAll(re)); ``` ### String methods (direct) - `re.match(str)` — equivalent to `str.match(re)`. - `re.search(str)` — equivalent to `str.search(re)`. - `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`. - `re.split(str[, limit])` — equivalent to `str.split(re, limit)`. ### Properties - `re.source` — pattern string. - `re.flags` — flags string. - `re.lastIndex` — index for next match (with `g` or `y` flag). - `re.global`, `re.ignoreCase`, `re.multiline`, `re.dotAll`, `re.unicode`, `re.sticky`, `re.hasIndices` — boolean flag accessors. - `re.internalSource` — RE2-translated pattern (for debugging). ### Buffer support All methods accept Buffers (UTF-8) instead of strings. Buffer input produces Buffer output. Offsets are in bytes. ```js const re = new RE2('матч', 'g'); const buf = Buffer.from('тест матч тест'); const result = re.exec(buf); // result[0] is a Buffer ``` ### RE2.Set Multi-pattern matching — test a string against many patterns at once. ```js const set = new RE2.Set(['^/users/\\d+$', '^/posts/\\d+$'], 'i'); set.test('/users/7'); // true set.match('/posts/42'); // [1] set.sources; // ['^/users/\\d+$', '^/posts/\\d+$'] ``` - `new RE2.Set(patterns[, flags][, options])` — compile patterns. - `options.anchor`: `'unanchored'` (default), `'start'`, or `'both'`. - `set.test(str)` — returns `true` if any pattern matches. - `set.match(str)` — returns array of matching pattern indices. - Properties: `size`, `source`, `sources`, `flags`, `anchor`. ### Static helpers - `RE2.getUtf8Length(str)` — byte size of string as UTF-8. - `RE2.getUtf16Length(buf)` — character count of UTF-8 buffer as UTF-16 string. - `RE2.unicodeWarningLevel` — `'nothing'` (default), `'warnOnce'`, `'warn'`, or `'throw'`. ## Limitations RE2 does not support: - **Backreferences** (`\1`, `\2`, etc.) - **Lookahead assertions** (`(?=...)`, `(?!...)`) These throw `SyntaxError`. Use try-catch to fall back to RegExp when needed: ```js let re = /pattern-with-lookahead/; try { re = new RE2(re); } catch (e) { /* use original RegExp */ } ``` ## Project notes - C++ addon source is in `lib/`. Vendored deps (`vendor/re2/`, `vendor/abseil-cpp/`) are git submodules — **never modify files under `vendor/`**. ## Links - Docs: https://github.com/uhop/node-re2/wiki - npm: https://www.npmjs.com/package/re2 - Full LLM reference: https://github.com/uhop/node-re2/blob/master/llms-full.txt ================================================ FILE: package.json ================================================ { "name": "re2", "version": "1.24.0", "description": "Bindings for RE2: fast, safe alternative to backtracking regular expression engines.", "homepage": "https://github.com/uhop/node-re2", "bugs": "https://github.com/uhop/node-re2/issues", "type": "commonjs", "main": "re2.js", "types": "re2.d.ts", "files": [ "binding.gyp", "lib", "re2.d.ts", "scripts/*.js", "vendor" ], "dependencies": { "install-artifact-from-github": "^1.4.0", "nan": "^2.26.2", "node-gyp": "^12.2.0" }, "devDependencies": { "@types/node": "^25.5.0", "nano-benchmark": "^1.0.15", "prettier": "^3.8.1", "tape-six": "^1.7.13", "tape-six-proc": "^1.2.8", "typescript": "^6.0.2" }, "scripts": { "test": "tape6 --flags FO", "test:seq": "tape6-seq --flags FO", "test:proc": "tape6-proc --flags FO", "save-to-github": "save-to-github-cache --artifact build/Release/re2.node", "install": "install-from-cache --artifact build/Release/re2.node --host-var RE2_DOWNLOAD_MIRROR --skip-path-var RE2_DOWNLOAD_SKIP_PATH --skip-ver-var RE2_DOWNLOAD_SKIP_VER || node-gyp -j max rebuild", "verify-build": "node scripts/verify-build.js", "build:dev": "node-gyp -j max build --debug", "build": "node-gyp -j max build", "build1": "node-gyp build", "rebuild:dev": "node-gyp -j max rebuild --debug", "rebuild": "node-gyp -j max rebuild", "rebuild1": "node-gyp rebuild", "clean": "node-gyp clean && node-gyp configure", "clean-build": "node-gyp clean", "ts-check": "tsc --noEmit", "lint": "prettier --check *.js *.ts tests/ bench/", "lint:fix": "prettier --write *.js *.ts tests/ bench/" }, "github": "https://github.com/uhop/node-re2", "repository": { "type": "git", "url": "git://github.com/uhop/node-re2.git" }, "keywords": [ "RegExp", "RegEx", "text processing", "PCRE alternative" ], "author": "Eugene Lazutkin (https://lazutkin.com/)", "funding": "https://github.com/sponsors/uhop", "license": "BSD-3-Clause", "tape6": { "tests": [ "/tests/test-*.*js", "/tests/test-*.*ts" ] } } ================================================ FILE: re2.d.ts ================================================ /// declare module 're2' { interface RE2BufferExecArray { index: number; input: Buffer; 0: Buffer; groups?: { [key: string]: Buffer; }; indices?: RegExpIndicesArray; } interface RE2BufferMatchArray { index?: number; input?: Buffer; 0: Buffer; groups?: { [key: string]: Buffer; }; } interface RE2 extends RegExp { readonly internalSource: string; exec(str: string): RegExpExecArray | null; exec(str: Buffer): RE2BufferExecArray | null; match(str: string): RegExpMatchArray | null; match(str: Buffer): RE2BufferMatchArray | null; test(str: string | Buffer): boolean; replace( str: K, replaceValue: string | Buffer ): K; replace( str: K, replacer: (substring: string, ...args: any[]) => string | Buffer ): K; search(str: string | Buffer): number; split(str: K, limit?: number): K[]; } interface RE2SetOptions { anchor?: 'unanchored' | 'start' | 'both'; } interface RE2Set { readonly size: number; readonly source: string; readonly sources: string[]; readonly flags: string; readonly anchor: 'unanchored' | 'start' | 'both'; match(str: string | Buffer): number[]; test(str: string | Buffer): boolean; toString(): string; } interface RE2SetConstructor { new ( patterns: Iterable, flagsOrOptions?: string | Buffer | RE2SetOptions, options?: RE2SetOptions ): RE2Set; ( patterns: Iterable, flagsOrOptions?: string | Buffer | RE2SetOptions, options?: RE2SetOptions ): RE2Set; readonly prototype: RE2Set; } interface RE2Constructor extends RegExpConstructor { new (pattern: Buffer | RegExp | RE2 | string): RE2; new (pattern: Buffer | string, flags?: string | Buffer): RE2; (pattern: Buffer | RegExp | RE2 | string): RE2; (pattern: Buffer | string, flags?: string | Buffer): RE2; readonly prototype: RE2; unicodeWarningLevel: 'nothing' | 'warnOnce' | 'warn' | 'throw'; getUtf8Length(value: string): number; getUtf16Length(value: Buffer): number; Set: RE2SetConstructor; RE2: RE2Constructor; } var RE2: RE2Constructor; export = RE2; } ================================================ FILE: re2.js ================================================ 'use strict'; const RE2 = require('./build/Release/re2.node'); // const RE2 = require('./build/Debug/re2.node'); const setAliases = (object, dict) => { for (let [name, alias] of Object.entries(dict)) { Object.defineProperty( object, alias, Object.getOwnPropertyDescriptor(object, name) ); } }; setAliases(RE2.prototype, { match: Symbol.match, search: Symbol.search, replace: Symbol.replace, split: Symbol.split }); RE2.prototype[Symbol.matchAll] = function* (str) { if (!this.global) throw TypeError( 'String.prototype.matchAll() is called with a non-global RE2 argument' ); const re = new RE2(this); re.lastIndex = this.lastIndex; for (;;) { const result = re.exec(str); if (!result) break; if (result[0] === '') ++re.lastIndex; yield result; } }; module.exports = RE2; module.exports.RE2 = RE2; ================================================ FILE: scripts/verify-build.js ================================================ 'use strict'; // This is a light-weight script to make sure that the package works. const assert = require('assert').strict; const RE2 = require("../re2"); const sample = "abbcdefabh"; const re1 = new RE2("ab*", "g"); assert(re1.test(sample)); const re2 = RE2("ab*"); assert(re2.test(sample)); const re3 = new RE2("abc"); assert(!re3.test(sample)); ================================================ FILE: tests/manual/matchall-bench.js ================================================ 'use strict'; const RE2 = require('../../re2'); const N = 1_000_000; const s = 'a'.repeat(N), re = new RE2('a', 'g'), matches = s.matchAll(re); let n = 0; for (const _ of matches) ++n; if (n !== s.length) console.log('Wrong result.'); console.log('Done.'); ================================================ FILE: tests/manual/memory-check.js ================================================ 'use strict'; const RE2 = require('../../re2.js'); const L = 20 * 1024 * 1024, N = 100; if (typeof globalThis.gc != 'function') console.log( "Warning: to run it with explicit gc() calls, you should use --expose-gc as a node's argument." ); const gc = typeof globalThis.gc == 'function' ? globalThis.gc : () => {}; const s = 'a'.repeat(L), objects = []; for (let i = 0; i < N; ++i) { const re2 = new RE2('x', 'g'); objects.push(re2); const result = s.replace(re2, ''); if (result.length !== s.length) console.log('Wrong result.'); gc(); } console.log( 'Done. Now it is spinning: check the memory consumption! To stop it, press Ctrl+C.' ); for (;;); ================================================ FILE: tests/manual/memory-monitor.js ================================================ 'use strict'; const RE2 = require('../../re2'); const N = 5_000_000; console.log('Never-ending loop: exit with Ctrl+C.'); const aCharCode = 'a'.charCodeAt(0); const randomAlpha = () => String.fromCharCode(aCharCode + Math.floor(Math.random() * 26)); const humanizeNumber = n => { const negative = n < 0; if (negative) n = -n; const s = n.toFixed(); let group1 = s.length % 3; if (!group1) group1 = 3; let result = s.substring(0, group1); for (let i = group1; i < s.length; i += 3) { result += ',' + s.substring(i, i + 3); } return (negative ? '-' : '') + result; }; const CSI = '\x1B['; const cursorUp = (n = 1) => CSI + (n > 1 ? n.toFixed() : '') + 'A'; const sgr = (cmd = '') => CSI + (Array.isArray(cmd) ? cmd.join(';') : cmd) + 'm'; const RESET = sgr(); const NOTE = sgr(91); let first = true; const maxMemory = { heapTotal: 0, heapUsed: 0, external: 0, arrayBuffers: 0, rss: 0 }, labels = { heapTotal: 'heap total', heapUsed: 'heap used', external: 'external', arrayBuffers: 'array buffers', rss: 'resident set size' }, maxLabelSize = Math.max( ...Array.from(Object.values(labels)).map(label => label.length) ); const report = () => { const memoryUsage = process.memoryUsage(), previousMax = {...maxMemory}; console.log( (first ? '' : '\r' + cursorUp(6)) + ''.padStart(maxLabelSize + 1), 'Current'.padStart(15), 'Max'.padStart(15) ); for (const name in maxMemory) { const prefix = previousMax[name] && previousMax[name] < memoryUsage[name] ? NOTE : RESET; console.log( (labels[name] + ':').padStart(maxLabelSize + 1), prefix + humanizeNumber(memoryUsage[name]).padStart(15) + RESET, humanizeNumber(maxMemory[name]).padStart(15) ); } for (const [name, value] of Object.entries(maxMemory)) { maxMemory[name] = Math.max(value, memoryUsage[name]); } first = false; }; for (;;) { const re2 = new RE2(randomAlpha(), 'g'); let s = ''; for (let i = 0; i < N; ++i) s += randomAlpha(); let n = 0; for (const _ of s.matchAll(re2)) ++n; re2.lastIndex = 0; const r = s.replace(re2, ''); if (r.length + n != s.length) { console.log( 'ERROR!', 's:', s.length, 'r:', r.length, 'n:', n, 're2:', re2.toString() ); break; } report(); } ================================================ FILE: tests/manual/test-unicode-warning.mjs ================================================ import test from 'tape-six'; import {RE2} from '../../re2.js'; // tests // these tests modify the global state of RE2 and cannot be run in parallel with other tests in the same process test('test new unicode warnOnce', t => { let errorMessage = ''; const oldConsole = console; console = {error: msg => (errorMessage = msg)}; RE2.unicodeWarningLevel = 'warnOnce'; let a = new RE2('.*'); t.ok(errorMessage); errorMessage = ''; a = new RE2('.?'); t.notOk(errorMessage); RE2.unicodeWarningLevel = 'warnOnce'; a = new RE2('.+'); t.ok(errorMessage); RE2.unicodeWarningLevel = 'nothing'; console = oldConsole; }); test('test new unicode warn', t => { let errorMessage = ''; const oldConsole = console; console = {error: msg => (errorMessage = msg)}; RE2.unicodeWarningLevel = 'warn'; let a = new RE2('.*'); t.ok(errorMessage); errorMessage = ''; a = new RE2('.?'); t.ok(errorMessage); RE2.unicodeWarningLevel = 'nothing'; console = oldConsole; }); test('test new unicode throw', t => { RE2.unicodeWarningLevel = 'throw'; try { let a = new RE2('.'); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof SyntaxError); } RE2.unicodeWarningLevel = 'nothing'; }); ================================================ FILE: tests/manual/worker.js ================================================ 'use strict'; const {Worker, isMainThread} = require('worker_threads'); const RE2 = require('../../re2'); if (isMainThread) { // This re-loads the current file inside a Worker instance. console.log('Inside Master!'); const worker = new Worker(__filename); worker.on('exit', code => { console.log('Exit code:', code); test('#2'); }); test('#1'); } else { console.log('Inside Worker!'); test(); } function test(msg) { msg && console.log(isMainThread ? 'Main' : 'Worker', msg); const a = new RE2('^\\d+$'); console.log( isMainThread, a.test('123'), a.test('abc'), a.test('123abc'), a instanceof RE2 ); const b = RE2('^\\d+$'); console.log( isMainThread, b.test('123'), b.test('abc'), b.test('123abc'), b instanceof RE2 ); } ================================================ FILE: tests/test-cjs.cjs ================================================ const {test} = require('tape-six'); const RE2 = require('../re2.js'); test('CJS require', t => { t.ok(RE2, 'RE2 is loaded'); t.equal(typeof RE2, 'function', 'RE2 is a constructor'); }); test('CJS construct and test', t => { const re = new RE2('a(b*)', 'u'); t.ok(re instanceof RE2, 'instanceof RE2'); t.ok(re.test('aBb'), 'test matches'); t.notOk(re.test('xyz'), 'test rejects non-match'); }); test('CJS exec', t => { const re = new RE2('(\\d+)', 'u'); const result = re.exec('abc 123 def'); t.ok(result, 'exec returns a result'); t.equal(result[0], '123'); t.equal(result[1], '123'); t.equal(result.index, 4); }); test('CJS exec with Buffer', t => { const re = new RE2('(\\d+)', 'u'); const result = re.exec(Buffer.from('abc 123 def')); t.ok(result, 'exec returns a result'); t.ok(Buffer.isBuffer(result[0]), 'result is a Buffer'); t.equal(result[0].toString(), '123'); }); test('CJS match', t => { const re = new RE2('\\w+', 'gu'); const result = 'hello world'.match(re); t.ok(result, 'match returns a result'); t.deepEqual(result, ['hello', 'world']); }); test('CJS search', t => { const re = new RE2('world', 'u'); const idx = 'hello world'.search(re); t.equal(idx, 6); }); test('CJS replace', t => { const re = new RE2('world', 'u'); const result = 'hello world'.replace(re, 'RE2'); t.equal(result, 'hello RE2'); }); test('CJS split', t => { const re = new RE2('\\s+', 'u'); const result = 'a b c'.split(re); t.deepEqual(result, ['a', 'b', 'c']); }); test('CJS named groups', t => { const re = new RE2('(?P\\d{4})-(?P\\d{2})', 'u'); const result = re.exec('2025-03'); t.ok(result, 'exec returns a result'); t.equal(result.groups.year, '2025'); t.equal(result.groups.month, '03'); }); test('CJS RE2.Set', t => { const set = new RE2.Set(['abc', 'def', 'ghi'], 'u'); t.ok(set, 'set is created'); t.ok(set.test('abc'), 'test matches first pattern'); t.deepEqual(set.match('abcghi'), [0, 2]); }); test('CJS named import pattern', t => { const {RE2: NamedRE2} = require('../re2.js'); t.equal(NamedRE2, RE2, 'RE2.RE2 === RE2'); const re = new NamedRE2('abc', 'u'); t.ok(re instanceof RE2, 'instance created via named import'); t.ok(re.test('abc'), 'works correctly'); }); test('CJS static helpers', t => { t.equal(RE2.getUtf8Length('hello'), 5); t.equal(RE2.getUtf16Length(Buffer.from('hello')), 5); }); ================================================ FILE: tests/test-exec.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec test('exec basic', t => { const re = new RE2('quick\\s(brown).+?(jumps)', 'ig'); t.equal(re.source, 'quick\\s(brown).+?(jumps)'); t.ok(re.ignoreCase); t.ok(re.global); t.ok(!re.multiline); const result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog'); t.deepEqual(Array.from(result), ['Quick Brown Fox Jumps', 'Brown', 'Jumps']); t.equal(result.index, 4); t.equal(result.input, 'The Quick Brown Fox Jumps Over The Lazy Dog'); t.equal(re.lastIndex, 25); }); test('exec succ', t => { const str = 'abbcdefabh'; const re = new RE2('ab*', 'g'); let result = re.exec(str); t.ok(result); t.equal(result[0], 'abb'); t.equal(result.index, 0); t.equal(re.lastIndex, 3); result = re.exec(str); t.ok(result); t.equal(result[0], 'ab'); t.equal(result.index, 7); t.equal(re.lastIndex, 9); result = re.exec(str); t.notOk(result); }); test('exec simple', t => { const re = new RE2('(hello \\S+)'); const result = re.exec('This is a hello world!'); t.equal(result[1], 'hello world!'); }); test('exec fail', t => { const re = new RE2('(a+)?(b+)?'); let result = re.exec('aaabb'); t.equal(result[1], 'aaa'); t.equal(result[2], 'bb'); result = re.exec('aaacbb'); t.equal(result[1], 'aaa'); t.equal(result[2], undefined); t.equal(result.length, 3); }); test('exec anchored to beginning', t => { const re = RE2('^hello', 'g'); const result = re.exec('hellohello'); t.deepEqual(Array.from(result), ['hello']); t.equal(result.index, 0); t.equal(re.lastIndex, 5); t.equal(re.exec('hellohello'), null); }); test('exec invalid', t => { const re = RE2(''); try { re.exec({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } }); test('exec anchor 1', t => { const re = new RE2('b|^a', 'g'); let result = re.exec('aabc'); t.ok(result); t.equal(result.index, 0); t.equal(re.lastIndex, 1); result = re.exec('aabc'); t.ok(result); t.equal(result.index, 2); t.equal(re.lastIndex, 3); result = re.exec('aabc'); t.notOk(result); }); test('exec anchor 2', t => { const re = new RE2('(?:^a)', 'g'); let result = re.exec('aabc'); t.ok(result); t.equal(result.index, 0); t.equal(re.lastIndex, 1); result = re.exec('aabc'); t.notOk(result); }); // Unicode tests test('exec unicode', t => { const re = new RE2('охотник\\s(желает).+?(где)', 'ig'); t.equal(re.source, 'охотник\\s(желает).+?(где)'); t.ok(re.ignoreCase); t.ok(re.global); t.ok(!re.multiline); const result = re.exec('Каждый Охотник Желает Знать Где Сидит Фазан'); t.deepEqual(Array.from(result), [ 'Охотник Желает Знать Где', 'Желает', 'Где' ]); t.equal(result.index, 7); t.equal(result.input, 'Каждый Охотник Желает Знать Где Сидит Фазан'); t.equal(re.lastIndex, 31); t.equal( result.input.substr(result.index), 'Охотник Желает Знать Где Сидит Фазан' ); t.equal(result.input.substr(re.lastIndex), ' Сидит Фазан'); }); test('exec unicode subsequent', t => { const str = 'аббвгдеабё'; const re = new RE2('аб*', 'g'); let result = re.exec(str); t.ok(result); t.equal(result[0], 'абб'); t.equal(result.index, 0); t.equal(re.lastIndex, 3); result = re.exec(str); t.ok(result); t.equal(result[0], 'аб'); t.equal(result.index, 7); t.equal(re.lastIndex, 9); result = re.exec(str); t.notOk(result); }); test('exec unicode supplementary', t => { const re = new RE2('\\u{1F603}', 'g'); t.equal(re.source, '\\u{1F603}'); t.notOk(re.ignoreCase); t.ok(re.global); t.notOk(re.multiline); const result = re.exec('\u{1F603}'); // 1F603 is the SMILING FACE WITH OPEN MOUTH emoji t.deepEqual(Array.from(result), ['\u{1F603}']); t.equal(result.index, 0); t.equal(result.input, '\u{1F603}'); t.equal(re.lastIndex, 2); const re2 = new RE2('.', 'g'); t.equal(re2.source, '.'); t.notOk(re2.ignoreCase); t.ok(re2.global); t.notOk(re2.multiline); const result2 = re2.exec('\u{1F603}'); t.deepEqual(Array.from(result2), ['\u{1F603}']); t.equal(result2.index, 0); t.equal(result2.input, '\u{1F603}'); t.equal(re2.lastIndex, 2); const re3 = new RE2('[\u{1F603}-\u{1F605}]', 'g'); t.equal(re3.source, '[\u{1F603}-\u{1F605}]'); t.notOk(re3.ignoreCase); t.ok(re3.global); t.notOk(re3.multiline); const result3 = re3.exec('\u{1F604}'); t.deepEqual(Array.from(result3), ['\u{1F604}']); t.equal(result3.index, 0); t.equal(result3.input, '\u{1F604}'); t.equal(re3.lastIndex, 2); }); // Buffer tests test('exec buffer', t => { const re = new RE2('охотник\\s(желает).+?(где)', 'ig'); const buf = Buffer.from('Каждый Охотник Желает Знать Где Сидит Фазан'); const result = re.exec(buf); t.equal(result.length, 3); t.ok(result[0] instanceof Buffer); t.ok(result[1] instanceof Buffer); t.ok(result[2] instanceof Buffer); t.equal(result[0].toString(), 'Охотник Желает Знать Где'); t.equal(result[1].toString(), 'Желает'); t.equal(result[2].toString(), 'Где'); t.equal(result.index, 13); t.ok(result.input instanceof Buffer); t.equal( result.input.toString(), 'Каждый Охотник Желает Знать Где Сидит Фазан' ); t.equal(re.lastIndex, 58); t.equal( result.input.toString('utf8', result.index), 'Охотник Желает Знать Где Сидит Фазан' ); t.equal(result.input.toString('utf8', re.lastIndex), ' Сидит Фазан'); }); // Sticky tests test('exec sticky', t => { const re = new RE2('\\s+', 'y'); t.equal(re.exec('Hello world, how are you?'), null); re.lastIndex = 5; const result = re.exec('Hello world, how are you?'); t.deepEqual(Array.from(result), [' ']); t.equal(result.index, 5); t.equal(re.lastIndex, 6); const re2 = new RE2('\\s+', 'gy'); t.equal(re2.exec('Hello world, how are you?'), null); re2.lastIndex = 5; const result2 = re2.exec('Hello world, how are you?'); t.deepEqual(Array.from(result2), [' ']); t.equal(result2.index, 5); t.equal(re2.lastIndex, 6); }); test('exec supplemental', t => { const re = new RE2('\\w+', 'g'); const testString = '🤡🤡🤡 Hello clown world!'; let result = re.exec(testString); t.deepEqual(Array.from(result), ['Hello']); result = re.exec(testString); t.deepEqual(Array.from(result), ['clown']); result = re.exec(testString); t.deepEqual(Array.from(result), ['world']); }); // Multiline test test('exec multiline', t => { const re = new RE2('^xy', 'm'), pattern = ` xy1 xy2 (at start of line) xy3`; const result = re.exec(pattern); t.ok(result); t.equal(result[0], 'xy'); t.ok(result.index > 3); t.ok(result.index < pattern.length - 4); t.equal( result[0], pattern.substring(result.index, result.index + result[0].length) ); }); // dotAll tests test('exec dotAll', t => { t.ok(new RE2('a.c').test('abc')); t.ok(new RE2(/a.c/).test('a c')); t.notOk(new RE2(/a.c/).test('a\nc')); t.ok(new RE2('a.c', 's').test('abc')); t.ok(new RE2(/a.c/s).test('a c')); t.ok(new RE2(/a.c/s).test('a\nc')); }); // hasIndices tests test('exec hasIndices', t => { t.notOk(new RE2('1').hasIndices); t.notOk(new RE2(/1/).hasIndices); const re = new RE2('(aa)(?b)?(?ccc)', 'd'); t.ok(re.hasIndices); let result = re.exec('1aabccc2'); t.equal(result.length, 4); t.equal(result.input, '1aabccc2'); t.equal(result.index, 1); t.equal(Object.keys(result.groups).length, 2); t.equal(result.groups.b, 'b'); t.equal(result.groups.c, 'ccc'); t.equal(result[0], 'aabccc'); t.equal(result[1], 'aa'); t.equal(result[2], 'b'); t.equal(result[3], 'ccc'); t.equal(result.indices.length, 4); t.deepEqual(Array.from(result.indices), [ [1, 7], [1, 3], [3, 4], [4, 7] ]); t.equal(Object.keys(result.indices.groups).length, 2); t.deepEqual(result.indices.groups.b, [3, 4]); t.deepEqual(result.indices.groups.c, [4, 7]); result = re.exec('1aaccc2'); t.equal(result.length, 4); t.equal(result.input, '1aaccc2'); t.equal(result.index, 1); t.equal(Object.keys(result.groups).length, 2); t.equal(result.groups.b, undefined); t.equal(result.groups.c, 'ccc'); t.equal(result[0], 'aaccc'); t.equal(result[1], 'aa'); t.equal(result[2], undefined); t.equal(result[3], 'ccc'); t.equal(result.indices.length, 4); t.deepEqual(Array.from(result.indices), [[1, 6], [1, 3], undefined, [3, 6]]); t.equal(Object.keys(result.indices.groups).length, 2); t.deepEqual(result.indices.groups.b, undefined); t.deepEqual(result.indices.groups.c, [3, 6]); try { const re = new RE2(new RegExp('1', 'd')); t.ok(re.hasIndices); } catch (e) { // squelch } }); test('exec hasIndices lastIndex', t => { const re2 = new RE2('a', 'dg'); t.equal(re2.lastIndex, 0); let result = re2.exec('abca'); t.equal(re2.lastIndex, 1); t.equal(result.index, 0); t.deepEqual(Array.from(result.indices), [[0, 1]]); result = re2.exec('abca'); t.equal(re2.lastIndex, 4); t.equal(result.index, 3); t.deepEqual(Array.from(result.indices), [[3, 4]]); result = re2.exec('abca'); t.equal(re2.lastIndex, 0); t.equal(result, null); }); test('exec buffer vs string', t => { const re2 = new RE2('.', 'g'), pattern = 'abcdefg'; re2.lastIndex = 2; const result1 = re2.exec(pattern); re2.lastIndex = 2; const result2 = re2.exec(Buffer.from(pattern)); t.equal(result1[0], 'c'); t.deepEqual(result2[0], Buffer.from('c')); t.equal(result1.index, 2); t.equal(result2.index, 2); }); test('exec found empty string', t => { const re2 = new RE2('^.*?'), match = re2.exec(''); t.equal(match[0], ''); t.equal(match.index, 0); t.equal(match.input, ''); t.equal(match.groups, undefined); }); ================================================ FILE: tests/test-general.mjs ================================================ import test from 'tape-six'; import {default as RE2} from '../re2.js'; // utilities const compare = (re1, re2, t) => { // compares regular expression objects t.equal(re1.source, re2.source); t.equal(re1.global, re2.global); t.equal(re1.ignoreCase, re2.ignoreCase); t.equal(re1.multiline, re2.multiline); // (t.equal(re1.unicode, re2.unicode)); t.equal(re1.sticky, re2.sticky); }; // tests test('general ctr', t => { t.ok(!!RE2); t.ok(!!RE2.prototype); t.equal(RE2.toString(), 'function RE2() { [native code] }'); }); test('general inst', t => { let re1 = new RE2('\\d+'); t.ok(!!re1); t.ok(re1 instanceof RE2); let re2 = RE2('\\d+'); t.ok(!!re2); t.ok(re2 instanceof RE2); compare(re1, re2, t); re1 = new RE2('\\d+', 'm'); t.ok(!!re1); t.ok(re1 instanceof RE2); re2 = RE2('\\d+', 'm'); t.ok(!!re2); t.ok(re2 instanceof RE2); compare(re1, re2, t); }); test('general inst errors', t => { try { const re = new RE2([]); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = new RE2({}); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = new RE2(new Date()); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = new RE2(null); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = new RE2(); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = RE2(); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } try { const re = RE2({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } }); test('general in', t => { const re = new RE2('\\d+'); t.ok('exec' in re); t.ok('test' in re); t.ok('match' in re); t.ok('replace' in re); t.ok('search' in re); t.ok('split' in re); t.ok('source' in re); t.ok('flags' in re); t.ok('global' in re); t.ok('ignoreCase' in re); t.ok('multiline' in re); t.ok('dotAll' in re); t.ok('sticky' in re); t.ok('lastIndex' in re); }); test('general present', t => { const re = new RE2('\\d+'); t.equal(typeof re.exec, 'function'); t.equal(typeof re.test, 'function'); t.equal(typeof re.match, 'function'); t.equal(typeof re.replace, 'function'); t.equal(typeof re.search, 'function'); t.equal(typeof re.split, 'function'); t.equal(typeof re.source, 'string'); t.equal(typeof re.flags, 'string'); t.equal(typeof re.global, 'boolean'); t.equal(typeof re.ignoreCase, 'boolean'); t.equal(typeof re.multiline, 'boolean'); t.equal(typeof re.dotAll, 'boolean'); t.equal(typeof re.sticky, 'boolean'); t.equal(typeof re.lastIndex, 'number'); }); test('general lastIndex', t => { const re = new RE2('\\d+'); t.equal(re.lastIndex, 0); re.lastIndex = 5; t.equal(re.lastIndex, 5); re.lastIndex = 0; t.equal(re.lastIndex, 0); }); test('general RegExp', t => { let re1 = new RegExp('\\d+'); let re2 = new RE2('\\d+'); compare(re1, re2, t); re2 = new RE2(re1); compare(re1, re2, t); re1 = new RegExp('a', 'ig'); re2 = new RE2('a', 'ig'); compare(re1, re2, t); re2 = new RE2(re1); compare(re1, re2, t); re1 = /\s/gm; re2 = new RE2('\\s', 'mg'); compare(re1, re2, t); re2 = new RE2(re1); compare(re1, re2, t); re2 = new RE2(/\s/gm); compare(/\s/gm, re2, t); re1 = new RE2('b', 'gm'); re2 = new RE2(re1); compare(re1, re2, t); re1 = new RE2('b', 'sgm'); re2 = new RE2(re1); compare(re1, re2, t); re2 = new RE2(/\s/gms); compare(/\s/gms, re2, t); }); test('general utf8', t => { const s = 'Привет!'; t.equal(s.length, 7); t.equal(RE2.getUtf8Length(s), 13); const b = Buffer.from(s); t.equal(b.length, 13); t.equal(RE2.getUtf16Length(b), 7); const s2 = '\u{1F603}'; t.equal(s2.length, 2); t.equal(RE2.getUtf8Length(s2), 4); const b2 = Buffer.from(s2); t.equal(b2.length, 4); t.equal(RE2.getUtf16Length(b2), 2); const s3 = '\uD83D'; t.equal(s3.length, 1); t.equal(RE2.getUtf8Length(s3), 3); const s4 = '🤡'; t.equal(s4.length, 2); t.equal(RE2.getUtf8Length(s4), 4); t.equal(RE2.getUtf16Length(Buffer.from(s4, 'utf8')), s4.length); const b3 = Buffer.from([0xf0]); t.equal(b3.length, 1); t.equal(RE2.getUtf16Length(b3), 2); try { RE2.getUtf8Length({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } t.equal( RE2.getUtf16Length({ toString() { throw 'corner'; } }), -1 ); }); test('general flags', t => { let re = new RE2('a', 'u'); t.equal(re.flags, 'u'); re = new RE2('a', 'iu'); t.equal(re.flags, 'iu'); re = new RE2('a', 'mu'); t.equal(re.flags, 'mu'); re = new RE2('a', 'gu'); t.equal(re.flags, 'gu'); re = new RE2('a', 'yu'); t.equal(re.flags, 'uy'); re = new RE2('a', 'yiu'); t.equal(re.flags, 'iuy'); re = new RE2('a', 'yigu'); t.equal(re.flags, 'giuy'); re = new RE2('a', 'miu'); t.equal(re.flags, 'imu'); re = new RE2('a', 'ygu'); t.equal(re.flags, 'guy'); re = new RE2('a', 'myu'); t.equal(re.flags, 'muy'); re = new RE2('a', 'migyu'); t.equal(re.flags, 'gimuy'); re = new RE2('a', 'smigyu'); t.equal(re.flags, 'gimsuy'); }); test('general flags 2nd', t => { let re = new RE2(/a/, 'u'); t.equal(re.flags, 'u'); re = new RE2(/a/gm, 'iu'); t.equal(re.flags, 'iu'); re = new RE2(/a/gi, 'mu'); t.equal(re.flags, 'mu'); re = new RE2(/a/g, 'gu'); t.equal(re.flags, 'gu'); re = new RE2(/a/m, 'yu'); t.equal(re.flags, 'uy'); re = new RE2(/a/, 'yiu'); t.equal(re.flags, 'iuy'); re = new RE2(/a/gim, 'yigu'); t.equal(re.flags, 'giuy'); re = new RE2(/a/gm, 'miu'); t.equal(re.flags, 'imu'); re = new RE2(/a/i, 'ygu'); t.equal(re.flags, 'guy'); re = new RE2(/a/g, 'myu'); t.equal(re.flags, 'muy'); re = new RE2(/a/, 'migyu'); t.equal(re.flags, 'gimuy'); re = new RE2(/a/s, 'smigyu'); t.equal(re.flags, 'gimsuy'); }); ================================================ FILE: tests/test-groups.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('groups normal', t => { t.equal(RE2('(?\\d)').test('9'), true); t.deepEqual(RE2('(?-)', 'g').match('a-b-c'), ['-', '-']); t.deepEqual(RE2('(?-)').split('a-b-c'), ['a', '-', 'b', '-', 'c']); t.equal(RE2('(?-)', 'g').search('a-b-c'), 1); }); test('groups exec', t => { let result = new RE2('(\\d)').exec('k9'); t.ok(result); t.equal(result[0], '9'); t.equal(result[1], '9'); t.equal(result.index, 1); t.equal(result.input, 'k9'); t.equal(typeof result.groups, 'undefined'); result = new RE2('(?\\d)').exec('k9'); t.ok(result); t.equal(result[0], '9'); t.equal(result[1], '9'); t.equal(result.index, 1); t.equal(result.input, 'k9'); t.deepEqual(result.groups, {a: '9'}); }); test('groups match', t => { let result = new RE2('(\\d)').match('k9'); t.ok(result); t.equal(result[0], '9'); t.equal(result[1], '9'); t.equal(result.index, 1); t.equal(result.input, 'k9'); t.equal(typeof result.groups, 'undefined'); result = new RE2('(?\\d)').match('k9'); t.ok(result); t.equal(result[0], '9'); t.equal(result[1], '9'); t.equal(result.index, 1); t.equal(result.input, 'k9'); t.deepEqual(result.groups, {a: '9'}); }); test('groups replace', t => { t.equal(RE2('(?\\w)(?\\d)', 'g').replace('a1b2c', '$2$1'), '1a2bc'); t.equal(RE2('(?\\w)(?\\d)', 'g').replace('a1b2c', '$$'), '1a2bc'); t.equal( RE2('(?\\w)(?\\d)', 'g').replace('a1b2c', replacerByNumbers), '1a2bc' ); t.equal( RE2('(?\\w)(?\\d)', 'g').replace('a1b2c', replacerByNames), '1a2bc' ); function replacerByNumbers(match, group1, group2, index, source, groups) { return group2 + group1; } function replacerByNames(match, group1, group2, index, source, groups) { return groups.d + groups.w; } }); test('groups invalid', t => { try { RE2('(?<>.)'); t.fail(); // shouldn'be here } catch (e) { t.ok(e instanceof SyntaxError); } // TODO: do we need to enforce the correct id? // try { // RE2('(?<1>.)'); // t.fail(); // shouldn'be here // } catch(e) { // eval(t.TEST("e instanceof SyntaxError")); // } try { RE2('(?.)(?.)'); t.fail(); // shouldn'be here } catch (e) { t.ok(e instanceof SyntaxError); } }); ================================================ FILE: tests/test-invalid.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('invalid', t => { let threw; // Backreferences threw = false; try { new RE2(/(a)\1/); } catch (e) { threw = true; t.ok(e instanceof SyntaxError); t.equal(e.message, 'invalid escape sequence: \\1'); } t.ok(threw); // Lookahead assertions // Positive threw = false; try { new RE2(/a(?=b)/); } catch (e) { threw = true; t.ok(e instanceof SyntaxError); t.equal(e.message, 'invalid perl operator: (?='); } t.ok(threw); // Negative threw = false; try { new RE2(/a(?!b)/); } catch (e) { threw = true; t.ok(e instanceof SyntaxError); t.equal(e.message, 'invalid perl operator: (?!'); } t.ok(threw); }); ================================================ FILE: tests/test-match.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match test('test match', t => { const str = 'For more information, see Chapter 3.4.5.1'; const re = new RE2(/(chapter \d+(\.\d)*)/i); const result = re.match(str); t.equal(result.input, str); t.equal(result.index, 26); t.equal(result.length, 3); t.equal(result[0], 'Chapter 3.4.5.1'); t.equal(result[1], 'Chapter 3.4.5.1'); t.equal(result[2], '.1'); }); test('test_matchGlobal', t => { const re = new RE2(/[A-E]/gi); const result = re.match( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' ); t.deepEqual(result, ['A', 'B', 'C', 'D', 'E', 'a', 'b', 'c', 'd', 'e']); }); test('test match fail', t => { const re = new RE2('(a+)?(b+)?'); let result = re.match('aaabb'); t.equal(result[1], 'aaa'); t.equal(result[2], 'bb'); result = re.match('aaacbb'); t.equal(result[1], 'aaa'); t.equal(result[2], undefined); }); test('test match invalid', t => { const re = RE2(''); try { re.match({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } }); // Unicode tests test('test match unicode', t => { const str = 'Это ГЛАВА 3.4.5.1'; const re = new RE2(/(глава \d+(\.\d)*)/i); const result = re.match(str); t.equal(result.input, str); t.equal(result.index, 4); t.equal(result.length, 3); t.equal(result[0], 'ГЛАВА 3.4.5.1'); t.equal(result[1], 'ГЛАВА 3.4.5.1'); t.equal(result[2], '.1'); }); // Buffer tests test('test match buffer', t => { const buf = Buffer.from('Это ГЛАВА 3.4.5.1'); const re = new RE2(/(глава \d+(\.\d)*)/i); const result = re.match(buf); t.ok(result.input instanceof Buffer); t.equal(result.length, 3); t.ok(result[0] instanceof Buffer); t.ok(result[1] instanceof Buffer); t.ok(result[2] instanceof Buffer); t.equal(result.input, buf); t.equal(result.index, 7); t.equal(result.input.toString('utf8', result.index), 'ГЛАВА 3.4.5.1'); t.equal(result[0].toString(), 'ГЛАВА 3.4.5.1'); t.equal(result[1].toString(), 'ГЛАВА 3.4.5.1'); t.equal(result[2].toString(), '.1'); }); // Sticky tests test('test match sticky', t => { const re = new RE2('\\s+', 'y'); t.equal(re.match('Hello world, how are you?'), null); re.lastIndex = 5; const result = re.match('Hello world, how are you?'); t.deepEqual(Array.from(result), [' ']); t.equal(result.index, 5); t.equal(re.lastIndex, 6); const re2 = new RE2('\\s+', 'gy'); t.equal(re2.match('Hello world, how are you?'), null); re2.lastIndex = 5; t.equal(re2.match('Hello world, how are you?'), null); const re3 = new RE2(/[A-E]/giy); const result3 = re3.match( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' ); t.deepEqual(result3, ['A', 'B', 'C', 'D', 'E']); }); // hasIndices tests test('test match has indices', t => { const re = new RE2('(aa)(?b)?(?ccc)', 'd'), str1 = '1aabccc2', str2 = '1aaccc2'; t.deepEqual(str1.match(re), re.exec(str1)); t.deepEqual(str2.match(re), re.exec(str2)); }); test('test match has indices global', t => { const re = new RE2('(?a)', 'dg'), result = 'abca'.match(re); t.deepEqual(result, ['a', 'a']); t.notOk('indices' in result); t.notOk('groups' in result); }); test('test match lastIndex', t => { const re = new RE2(/./g), pattern = 'Я123'; re.lastIndex = 2; const result1 = pattern.match(re); t.deepEqual(result1, ['Я', '1', '2', '3']); t.equal(re.lastIndex, 0); const re2 = RE2(re); re2.lastIndex = 2; const result2 = re2.match(Buffer.from(pattern)); t.deepEqual( result2.map(b => b.toString()), ['Я', '1', '2', '3'] ); t.equal(re2.lastIndex, 0); }); ================================================ FILE: tests/test-matchAll.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll test('test matchAll', t => { const str = 'test1test2'; const re = new RE2(/t(e)(st(\d?))/g); const result = Array.from(str.matchAll(re)); t.equal(result.length, 2); t.equal(result[0].input, str); t.equal(result[0].index, 0); t.equal(result[0].length, 4); t.equal(result[0][0], 'test1'); t.equal(result[0][1], 'e'); t.equal(result[0][2], 'st1'); t.equal(result[0][3], '1'); t.equal(result[1].input, str); t.equal(result[1].index, 5); t.equal(result[1].length, 4); t.equal(result[1][0], 'test2'); t.equal(result[1][1], 'e'); t.equal(result[1][2], 'st2'); t.equal(result[1][3], '2'); }); test('test matchAll iterator', t => { const str = 'table football, foosball'; const re = new RE2('foo[a-z]*', 'g'); const expected = [ {start: 6, finish: 14}, {start: 16, finish: 24} ]; let i = 0; for (const match of str.matchAll(re)) { t.equal(match.index, expected[i].start); t.equal(match.index + match[0].length, expected[i].finish); ++i; } }); test('test matchAll non global', t => { const re = RE2('b'); try { 'abc'.matchAll(re); t.fail(); // shouldn't be here } catch (e) { t.ok(e instanceof TypeError); } }); test('test matchAll lastIndex', t => { const re = RE2('[a-c]', 'g'); re.lastIndex = 1; const expected = ['b', 'c']; let i = 0; for (const match of 'abc'.matchAll(re)) { t.equal(re.lastIndex, 1); t.equal(match[0], expected[i]); ++i; } }); test('test matchAll empty match', t => { const str = 'foo'; // Matches empty strings, but should not cause an infinite loop const re = new RE2('(?:)', 'g'); const result = Array.from(str.matchAll(re)); t.equal(result.length, str.length + 1); for (let i = 0; i < result.length; ++i) { t.equal(result[i][0], ''); } }); ================================================ FILE: tests/test-prototype.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test prototype', t => { t.equal(RE2.prototype.source, '(?:)'); t.equal(RE2.prototype.flags, ''); t.equal(RE2.prototype.global, undefined); t.equal(RE2.prototype.ignoreCase, undefined); t.equal(RE2.prototype.multiline, undefined); t.equal(RE2.prototype.dotAll, undefined); t.equal(RE2.prototype.sticky, undefined); t.equal(RE2.prototype.lastIndex, undefined); }); ================================================ FILE: tests/test-replace.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace test('test replace string', t => { let re = new RE2(/apples/gi); let result = re.replace('Apples are round, and apples are juicy.', 'oranges'); t.equal(result, 'oranges are round, and oranges are juicy.'); re = new RE2(/xmas/i); result = re.replace('Twas the night before Xmas...', 'Christmas'); t.equal(result, 'Twas the night before Christmas...'); re = new RE2(/(\w+)\s(\w+)/); result = re.replace('John Smith', '$2, $1'); t.equal(result, 'Smith, John'); }); test('test replace functional replacer', t => { function replacer(match, p1, p2, p3, offset, string) { // p1 is nondigits, p2 digits, and p3 non-alphanumerics return [p1, p2, p3].join(' - '); } const re = new RE2(/([^\d]*)(\d*)([^\w]*)/); const result = re.replace('abc12345#$*%', replacer); t.equal(result, 'abc - 12345 - #$*%'); }); test('test replace functional upper to hyphen lower', t => { function upperToHyphenLower(match) { return '-' + match.toLowerCase(); } const re = new RE2(/[A-Z]/g); const result = re.replace('borderTop', upperToHyphenLower); t.equal(result, 'border-top'); }); test('test replace functional convert', t => { function convert(str, p1, offset, s) { return ((p1 - 32) * 5) / 9 + 'C'; } const re = new RE2(/(\d+(?:\.\d*)?)F\b/g); t.equal(re.replace('32F', convert), '0C'); t.equal(re.replace('41F', convert), '5C'); t.equal(re.replace('50F', convert), '10C'); t.equal(re.replace('59F', convert), '15C'); t.equal(re.replace('68F', convert), '20C'); t.equal(re.replace('77F', convert), '25C'); t.equal(re.replace('86F', convert), '30C'); t.equal(re.replace('95F', convert), '35C'); t.equal(re.replace('104F', convert), '40C'); t.equal(re.replace('113F', convert), '45C'); t.equal(re.replace('212F', convert), '100C'); }); test('test replace functional loop', t => { const logs = []; RE2(/(x_*)|(-)/g).replace('x-x_', function (match, p1, p2) { if (p1) { logs.push('on: ' + p1.length); } if (p2) { logs.push('off: 1'); } }); t.deepEqual(logs, ['on: 1', 'off: 1', 'on: 2']); }); test('test replace invalid', t => { const re = RE2(''); try { re.replace( { toString() { throw 'corner1'; } }, '' ); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner1'); } try { re.replace('', { toString() { throw 'corner2'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner2'); } let arg2Stringified = false; try { re.replace( { toString() { throw 'corner1'; } }, { toString() { arg2Stringified = true; throw 'corner2'; } } ); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner1'); t.notOk(arg2Stringified); } try { re.replace('', () => { throw 'corner2'; }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner2'); } try { re.replace('', () => ({ toString() { throw 'corner2'; } })); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner2'); } }); // Unicode tests test('test replace string unicode', t => { let re = new RE2(/яблоки/gi); let result = re.replace('Яблоки красны, яблоки сочны.', 'апельсины'); t.equal(result, 'апельсины красны, апельсины сочны.'); re = new RE2(/иван/i); result = re.replace('Могуч Иван Иванов...', 'Сидор'); t.equal(result, 'Могуч Сидор Иванов...'); re = new RE2(/иван/gi); result = re.replace('Могуч Иван Иванов...', 'Сидор'); t.equal(result, 'Могуч Сидор Сидоров...'); re = new RE2(/([а-яё]+)\s+([а-яё]+)/i); result = re.replace('Пётр Петров', '$2, $1'); t.equal(result, 'Петров, Пётр'); }); test('test replace functional unicode', t => { function replacer(match, offset, string) { t.equal(typeof offset, 'number'); t.equal(typeof string, 'string'); t.ok(offset === 0 || offset === 7); t.equal(string, 'ИВАН и пЁтр'); return match.charAt(0).toUpperCase() + match.substr(1).toLowerCase(); } const re = new RE2(/(?:иван|пётр|сидор)/gi); const result = re.replace('ИВАН и пЁтр', replacer); t.equal(result, 'Иван и Пётр'); }); // Buffer tests test('test replace string buffer', t => { const re = new RE2(/яблоки/gi); let result = re.replace( Buffer.from('Яблоки красны, яблоки сочны.'), 'апельсины' ); t.ok(result instanceof Buffer); t.equal(result.toString(), 'апельсины красны, апельсины сочны.'); result = re.replace( Buffer.from('Яблоки красны, яблоки сочны.'), Buffer.from('апельсины') ); t.ok(result instanceof Buffer); t.equal(result.toString(), 'апельсины красны, апельсины сочны.'); result = re.replace('Яблоки красны, яблоки сочны.', Buffer.from('апельсины')); t.equal(typeof result, 'string'); t.equal(result, 'апельсины красны, апельсины сочны.'); }); test('test replace functional buffer', t => { function replacer(match, offset, string) { t.ok(match instanceof Buffer); t.equal(typeof offset, 'number'); t.equal(typeof string, 'string'); t.ok(offset === 0 || offset === 12); t.equal(string, 'ИВАН и пЁтр'); const s = match.toString(); return s.charAt(0).toUpperCase() + s.substr(1).toLowerCase(); } replacer.useBuffers = true; const re = new RE2(/(?:иван|пётр|сидор)/gi); const result = re.replace('ИВАН и пЁтр', replacer); t.equal(typeof result, 'string'); t.equal(result, 'Иван и Пётр'); }); test('test replace0', t => { const replacer = match => 'MARKER' + match; let re = new RE2(/^/g); let result = re.replace('foo bar', 'MARKER'); t.equal(result, 'MARKERfoo bar'); result = re.replace('foo bar', replacer); t.equal(result, 'MARKERfoo bar'); re = new RE2(/$/g); result = re.replace('foo bar', 'MARKER'); t.equal(result, 'foo barMARKER'); result = re.replace('foo bar', replacer); t.equal(result, 'foo barMARKER'); re = new RE2(/\b/g); result = re.replace('foo bar', 'MARKER'); t.equal(result, 'MARKERfooMARKER MARKERbarMARKER'); result = re.replace('foo bar', replacer); t.equal(result, 'MARKERfooMARKER MARKERbarMARKER'); }); // Sticky tests test('test replace sticky', t => { const re = new RE2(/[A-E]/y); t.equal(re.replace('ABCDEFABCDEF', '!'), '!BCDEFABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), 'A!CDEFABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), 'AB!DEFABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), 'ABC!EFABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), 'ABCD!FABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), 'ABCDEFABCDEF'); t.equal(re.replace('ABCDEFABCDEF', '!'), '!BCDEFABCDEF'); const re2 = new RE2(/[A-E]/gy); t.equal(re2.replace('ABCDEFABCDEF', '!'), '!!!!!FABCDEF'); t.equal(re2.replace('FABCDEFABCDE', '!'), 'FABCDEFABCDE'); re2.lastIndex = 3; t.equal(re2.replace('ABCDEFABCDEF', '!'), '!!!!!FABCDEF'); t.equal(re2.lastIndex, 0); }); // Non-matches test('test replace one non-match', t => { const replacer = (match, capture, offset, string) => { t.equal(typeof offset, 'number'); t.equal(typeof match, 'string'); t.equal(typeof string, 'string'); t.equal(typeof capture, 'undefined'); t.equal(offset, 0); t.equal(string, 'hello '); return ''; }; const re = new RE2(/hello (world)?/); re.replace('hello ', replacer); }); test('test replace two non-matches', t => { const replacer = (match, capture1, capture2, offset, string, groups) => { t.equal(typeof offset, 'number'); t.equal(typeof match, 'string'); t.equal(typeof string, 'string'); t.equal(typeof capture1, 'undefined'); t.equal(typeof capture2, 'undefined'); t.equal(offset, 1); t.equal(match, 'b & y'); t.equal(string, 'ab & yz'); t.equal(typeof groups, 'object'); t.equal(Object.keys(groups).length, 2); t.equal(groups.a, undefined); t.equal(groups.b, undefined); return ''; }; const re = new RE2(/b(?1)? & (?2)?y/); const result = re.replace('ab & yz', replacer); t.equal(result, 'az'); }); test('test replace group simple', t => { const re = new RE2(/(2)/); let result = re.replace('123', '$0'); t.equal(result, '1$03'); result = re.replace('123', '$1'); t.equal(result, '123'); result = re.replace('123', '$2'); t.equal(result, '1$23'); result = re.replace('123', '$00'); t.equal(result, '1$003'); result = re.replace('123', '$01'); t.equal(result, '123'); result = re.replace('123', '$02'); t.equal(result, '1$023'); }); test('test replace group cases', t => { let re = new RE2(/(test)/g); let result = re.replace('123', '$1$20'); t.equal(result, '123'); re = new RE2(/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/g); result = re.replace('abcdefghijklmnopqrstuvwxyz123', '$10$20'); t.equal(result, 'jb0wo0123'); re = new RE2(/(.)(.)(.)(.)(.)/g); result = re.replace('abcdefghijklmnopqrstuvwxyz123', '$10$20'); t.equal(result, 'a0b0f0g0k0l0p0q0u0v0z123'); re = new RE2( /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)/g ); result = re.replace('abcdefghijklmnopqrstuvwxyz123', '$10$20'); t.equal(result, 'jtvwxyz123'); re = new RE2(/abcd/g); result = re.replace('abcd123', '$1$2'); t.equal(result, '$1$2123'); }); test('test replace empty replacement', t => { t.equal('ac', 'abc'.replace(RE2('b'), '')); }); ================================================ FILE: tests/test-search.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test search', t => { const str = 'Total is 42 units.'; let re = new RE2(/\d+/i); let result = re.search(str); t.equal(result, 9); re = new RE2('\\b[a-z]+\\b'); result = re.search(str); t.equal(result, 6); re = new RE2('\\b\\w+\\b'); result = re.search(str); t.equal(result, 0); re = new RE2('z', 'gm'); result = re.search(str); t.equal(result, -1); }); test('test search invalid', t => { const re = RE2(''); try { re.search({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } }); test('test search unicode', t => { const str = 'Всего 42 штуки.'; let re = new RE2(/\d+/i); let result = re.search(str); t.equal(result, 6); re = new RE2('\\s[а-я]+'); result = re.search(str); t.equal(result, 8); re = new RE2('[а-яА-Я]+'); result = re.search(str); t.equal(result, 0); re = new RE2('z', 'gm'); result = re.search(str); t.equal(result, -1); }); test('test search buffer', t => { const buf = Buffer.from('Всего 42 штуки.'); let re = new RE2(/\d+/i); let result = re.search(buf); t.equal(result, 11); re = new RE2('\\s[а-я]+'); result = re.search(buf); t.equal(result, 13); re = new RE2('[а-яА-Я]+'); result = re.search(buf); t.equal(result, 0); re = new RE2('z', 'gm'); result = re.search(buf); t.equal(result, -1); }); test('test search sticky', t => { const str = 'Total is 42 units.'; let re = new RE2(/\d+/y); let result = re.search(str); t.equal(result, -1); re = new RE2('\\b[a-z]+\\b', 'y'); result = re.search(str); t.equal(result, -1); re = new RE2('\\b\\w+\\b', 'y'); result = re.search(str); t.equal(result, 0); re = new RE2('z', 'gmy'); result = re.search(str); t.equal(result, -1); }); ================================================ FILE: tests/test-set.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; test('test set basics', t => { const set = new RE2.Set(['foo', 'bar'], 'im'); t.ok(set instanceof Object); t.equal(typeof set.match, 'function'); t.equal(set.size, 2); t.equal(set.flags, 'imu'); t.equal(set.anchor, 'unanchored'); t.ok(Array.isArray(set.sources)); t.equal(set.sources[0], 'foo'); t.equal(set.source, 'foo|bar'); t.equal(set.toString(), '/foo|bar/imu'); }); test('test set matching', t => { const set = new RE2.Set(['foo', 'bar'], 'i'); const result = set.match('xxFOOxxbar'); t.equal(result.length, 2); result.sort((a, b) => a - b); t.deepEqual(result, [0, 1]); t.equal(set.test('nothing here'), false); t.equal(set.match('nothing here').length, 0); }); test('test set anchors', t => { const start = new RE2.Set(['abc'], {anchor: 'start'}); const both = new RE2.Set(['abc'], {anchor: 'both'}); t.equal(start.test('zabc'), false); t.equal(start.test('abc'), true); t.ok(both.test('abc')); t.notOk(both.test('abc1')); }); test('test set iterable', t => { function* gen() { yield 'cat'; yield 'dog'; } const set = new RE2.Set(gen()); t.equal(set.size, 2); const result = set.match('hotdog'); t.equal(result.length, 1); t.equal(result[0], 1); }); test('test set flags override', t => { const set = new RE2.Set([/abc/], 'i'); t.ok(set.test('ABC')); t.equal(set.flags, 'iu'); }); test('test set unicode inputs', t => { const patterns = ['🙂', '猫', '🍣+', '東京', '\\p{Hiragana}+']; const set = new RE2.Set(patterns, 'u'); const input = 'prefix🙂と猫と🍣🍣を食べる東京ひらがな'; const result = set.match(input); t.equal(result.length, 5); t.notEqual(result.indexOf(0), -1); t.notEqual(result.indexOf(1), -1); t.notEqual(result.indexOf(2), -1); t.notEqual(result.indexOf(3), -1); t.notEqual(result.indexOf(4), -1); const buf = Buffer.from(input); const bufResult = set.match(buf); t.equal(bufResult.length, 5); t.ok(set.test(buf)); const miss = new RE2.Set(['🚀', '漢字'], 'u'); t.notOk(miss.test(input)); t.equal(miss.match(input).length, 0); }); test('test set empty and duplicates', t => { const emptySet = new RE2.Set([]); t.equal(emptySet.size, 0); t.equal(emptySet.test('anything'), false); const dup = new RE2.Set(['foo', 'foo', 'bar']); const r = dup.match('foo bar'); // two foo entries plus bar t.equal(r.length, 3); r.sort((a, b) => a - b); t.deepEqual(r, [0, 1, 2]); }); test('test set inconsistent flags', t => { try { const set = new RE2.Set([/abc/i, /abc/m]); t.fail(); } catch (e) { t.ok(e instanceof TypeError); } }); test('test set invalid flags char', t => { try { const set = new RE2.Set(['foo'], 'q'); t.fail(); } catch (e) { t.ok(e instanceof TypeError); } }); test('test set anchor option with flags', t => { const set = new RE2.Set(['^foo', '^bar'], 'i', {anchor: 'both'}); t.equal(set.anchor, 'both'); t.equal(set.match('foo').length, 1); t.equal(set.match('xfoo').length, 0); }); test('test set invalid', t => { try { const set = new RE2.Set([null]); t.fail(); } catch (e) { t.ok(e instanceof TypeError); } }); ================================================ FILE: tests/test-source.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test source identity', t => { let re = new RE2('a\\cM\\u34\\u1234\\u10abcdz'); t.equal(re.source, 'a\\cM\\u34\\u1234\\u10abcdz'); re = new RE2('a\\cM\\u34\\u1234\\u{10abcd}z'); t.equal(re.source, 'a\\cM\\u34\\u1234\\u{10abcd}z'); re = new RE2(''); t.equal(re.source, '(?:)'); re = new RE2('foo/bar'); t.equal(re.source, 'foo\\/bar'); re = new RE2('foo\\/bar'); t.equal(re.source, 'foo\\/bar'); re = new RE2('(?bar)', 'u'); t.equal(re.source, '(?bar)'); }); test('test source translation', t => { let re = new RE2('a\\cM\\u34\\u1234\\u10abcdz'); t.equal(re.internalSource, 'a\\x0D\\x{34}\\x{1234}\\x{10ab}cdz'); re = new RE2('a\\cM\\u34\\u1234\\u{10abcd}z'); t.equal(re.internalSource, 'a\\x0D\\x{34}\\x{1234}\\x{10abcd}z'); re = new RE2(''); t.equal(re.internalSource, '(?:)'); re = new RE2('foo/bar'); t.equal(re.internalSource, 'foo\\/bar'); re = new RE2('foo\\/bar'); t.equal(re.internalSource, 'foo\\/bar'); re = new RE2('(?bar)', 'u'); t.equal(re.internalSource, '(?Pbar)'); re = new RE2('foo\\/bar', 'm'); t.equal(re.internalSource, '(?m)foo\\/bar'); }); test('test source backslashes', t => { const compare = (source, expected) => { const s = new RE2(source).source; t.equal(s, expected); }; compare('a/b', 'a\\/b'); compare('a\/b', 'a\\/b'); compare('a\\/b', 'a\\/b'); compare('a\\\/b', 'a\\/b'); compare('a\\\\/b', 'a\\\\\\/b'); compare('a\\\\\/b', 'a\\\\\\/b'); compare('/a/b', '\\/a\\/b'); compare('\\/a/b', '\\/a\\/b'); compare('\\/a\\/b', '\\/a\\/b'); compare('\\/a\\\\/b', '\\/a\\\\\\/b'); }); ================================================ FILE: tests/test-split.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // utilities const verifyBuffer = (bufArray, t) => bufArray.map(x => { t.ok(x instanceof Buffer); return x.toString(); }); // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split test('test split', t => { let re = new RE2(/\s+/); let result = re.split('Oh brave new world that has such people in it.'); t.deepEqual(result, [ 'Oh', 'brave', 'new', 'world', 'that', 'has', 'such', 'people', 'in', 'it.' ]); re = new RE2(','); result = re.split('Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec'); t.deepEqual(result, [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]); re = new RE2(','); result = re.split(',Jan,Feb,Mar,Apr,May,Jun,,Jul,Aug,Sep,Oct,Nov,Dec,'); t.deepEqual(result, [ '', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', '', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', '' ]); re = new RE2(/\s*;\s*/); result = re.split( 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ' ); t.deepEqual(result, [ 'Harry Trump', 'Fred Barney', 'Helen Rigby', 'Bill Abel', 'Chris Hand ' ]); re = new RE2(/\s+/); result = re.split('Hello World. How are you doing?', 3); t.deepEqual(result, ['Hello', 'World.', 'How']); re = new RE2(/(\d)/); result = re.split('Hello 1 word. Sentence number 2.'); t.deepEqual(result, ['Hello ', '1', ' word. Sentence number ', '2', '.']); t.deepEqual( RE2(/[x-z]*/) .split('asdfghjkl') .reverse() .join(''), 'lkjhgfdsa' ); }); test('test_splitInvalid', t => { const re = RE2(''); try { re.split({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } }); test('test_cornerCases', t => { const re = new RE2(/1/); const result = re.split('23456'); t.deepEqual(result, ['23456']); }); // Unicode tests test('test split unicode', t => { let re = new RE2(/\s+/); let result = re.split('Она не понимает, что этим убивает меня.'); t.deepEqual(result, [ 'Она', 'не', 'понимает,', 'что', 'этим', 'убивает', 'меня.' ]); re = new RE2(','); result = re.split('Пн,Вт,Ср,Чт,Пт,Сб,Вс'); t.deepEqual(result, ['Пн', 'Вт', 'Ср', 'Чт', 'Пт', 'Сб', 'Вс']); re = new RE2(/\s*;\s*/); result = re.split('Ваня Иванов ;Петро Петренко; Саша Машин ; Маша Сашина'); t.deepEqual(result, [ 'Ваня Иванов', 'Петро Петренко', 'Саша Машин', 'Маша Сашина' ]); re = new RE2(/\s+/); result = re.split('Привет мир. Как дела?', 3); t.deepEqual(result, ['Привет', 'мир.', 'Как']); re = new RE2(/(\d)/); result = re.split('Привет 1 слово. Предложение номер 2.'); t.deepEqual(result, ['Привет ', '1', ' слово. Предложение номер ', '2', '.']); t.deepEqual( RE2(/[э-я]*/) .split('фывапролд') .reverse() .join(''), 'длорпавыф' ); }); // Buffer tests test('test split buffer', t => { let re = new RE2(/\s+/); let result = re.split(Buffer.from('Она не понимает, что этим убивает меня.')); t.deepEqual(verifyBuffer(result, t), [ 'Она', 'не', 'понимает,', 'что', 'этим', 'убивает', 'меня.' ]); re = new RE2(','); result = re.split(Buffer.from('Пн,Вт,Ср,Чт,Пт,Сб,Вс')); t.deepEqual(verifyBuffer(result, t), [ 'Пн', 'Вт', 'Ср', 'Чт', 'Пт', 'Сб', 'Вс' ]); re = new RE2(/\s*;\s*/); result = re.split( Buffer.from('Ваня Иванов ;Петро Петренко; Саша Машин ; Маша Сашина') ); t.deepEqual(verifyBuffer(result, t), [ 'Ваня Иванов', 'Петро Петренко', 'Саша Машин', 'Маша Сашина' ]); re = new RE2(/\s+/); result = re.split(Buffer.from('Привет мир. Как дела?'), 3); t.deepEqual(verifyBuffer(result, t), ['Привет', 'мир.', 'Как']); re = new RE2(/(\d)/); result = re.split(Buffer.from('Привет 1 слово. Предложение номер 2.')); t.deepEqual(verifyBuffer(result, t), [ 'Привет ', '1', ' слово. Предложение номер ', '2', '.' ]); t.deepEqual( RE2(/[э-я]*/) .split(Buffer.from('фывапролд')) .reverse() .join(''), 'длорпавыф' ); }); test('test split alternation groups', t => { const re = new RE2(/(a)|(b)/); const result = re.split('xaxbx'); t.deepEqual(result, ['x', 'a', undefined, 'x', undefined, 'b', 'x']); const re2 = new RE2(/(a)|(b)/); const bufResult = re2.split(Buffer.from('xaxbx')); t.equal(bufResult.length, 7); t.ok(bufResult[0] instanceof Buffer); t.equal(bufResult[0].toString(), 'x'); t.ok(bufResult[1] instanceof Buffer); t.equal(bufResult[1].toString(), 'a'); t.equal(bufResult[2], undefined); t.equal(bufResult[4], undefined); t.ok(bufResult[5] instanceof Buffer); t.equal(bufResult[5].toString(), 'b'); }); // Sticky tests test('test split sticky', t => { const re = new RE2(/\s+/y); // sticky is ignored const result = re.split('Oh brave new world that has such people in it.'); t.deepEqual(result, [ 'Oh', 'brave', 'new', 'world', 'that', 'has', 'such', 'people', 'in', 'it.' ]); const result2 = re.split(' Oh brave new world that has such people in it.'); t.deepEqual(result2, [ '', 'Oh', 'brave', 'new', 'world', 'that', 'has', 'such', 'people', 'in', 'it.' ]); }); ================================================ FILE: tests/test-symbols.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test match symbol', t => { if (typeof Symbol == 'undefined' || !Symbol.match) return; const str = 'For more information, see Chapter 3.4.5.1'; const re = new RE2(/(chapter \d+(\.\d)*)/i); const result = str.match(re); t.equal(result.input, str); t.equal(result.index, 26); t.equal(result.length, 3); t.equal(result[0], 'Chapter 3.4.5.1'); t.equal(result[1], 'Chapter 3.4.5.1'); t.equal(result[2], '.1'); }); test('test search symbol', t => { if (typeof Symbol == 'undefined' || !Symbol.search) return; const str = 'Total is 42 units.'; let re = new RE2(/\d+/i); let result = str.search(re); t.equal(result, 9); re = new RE2('\\b[a-z]+\\b'); result = str.search(re); t.equal(result, 6); re = new RE2('\\b\\w+\\b'); result = str.search(re); t.equal(result, 0); re = new RE2('z', 'gm'); result = str.search(re); t.equal(result, -1); }); test('test replace symbol', t => { if (typeof Symbol == 'undefined' || !Symbol.replace) return; let re = new RE2(/apples/gi); let result = 'Apples are round, and apples are juicy.'.replace(re, 'oranges'); t.equal(result, 'oranges are round, and oranges are juicy.'); re = new RE2(/xmas/i); result = 'Twas the night before Xmas...'.replace(re, 'Christmas'); t.equal(result, 'Twas the night before Christmas...'); re = new RE2(/(\w+)\s(\w+)/); result = 'John Smith'.replace(re, '$2, $1'); t.equal(result, 'Smith, John'); }); test('test split symbol', t => { if (typeof Symbol == 'undefined' || !Symbol.split) return; let re = new RE2(/\s+/); let result = 'Oh brave new world that has such people in it.'.split(re); t.deepEqual(result, [ 'Oh', 'brave', 'new', 'world', 'that', 'has', 'such', 'people', 'in', 'it.' ]); re = new RE2(','); result = 'Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec'.split(re); t.deepEqual(result, [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]); re = new RE2(/\s*;\s*/); result = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand '.split(re); t.deepEqual(result, [ 'Harry Trump', 'Fred Barney', 'Helen Rigby', 'Bill Abel', 'Chris Hand ' ]); re = new RE2(/\s+/); result = 'Hello World. How are you doing?'.split(re, 3); t.deepEqual(result, ['Hello', 'World.', 'How']); re = new RE2(/(\d)/); result = 'Hello 1 word. Sentence number 2.'.split(re); t.deepEqual(result, ['Hello ', '1', ' word. Sentence number ', '2', '.']); t.equal( 'asdfghjkl' .split(RE2(/[x-z]*/)) .reverse() .join(''), 'lkjhgfdsa' ); }); ================================================ FILE: tests/test-test.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests // These tests are copied from MDN: // https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test test('test test from exec', t => { let re = new RE2('quick\\s(brown).+?(jumps)', 'i'); t.equal(re.test('The Quick Brown Fox Jumps Over The Lazy Dog'), true); t.equal(re.test('tHE qUICK bROWN fOX jUMPS oVER tHE lAZY dOG'), true); t.equal(re.test('the quick brown fox jumps over the lazy dog'), true); t.equal(re.test('THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG'), true); t.equal(re.test('THE KWIK BROWN FOX JUMPS OVER THE LAZY DOG'), false); re = new RE2('ab*', 'g'); t.ok(re.test('abbcdefabh')); t.notOk(re.test('qwerty')); re = new RE2('(hello \\S+)'); t.ok(re.test('This is a hello world!')); t.notOk(re.test('This is a Hello world!')); }); test('test test successive', t => { const str = 'abbcdefabh'; const re = new RE2('ab*', 'g'); let result = re.test(str); t.ok(result); t.equal(re.lastIndex, 3); result = re.test(str); t.ok(result); t.equal(re.lastIndex, 9); result = re.test(str); t.notOk(result); }); test('test test simple', t => { const str = 'abbcdefabh'; const re1 = new RE2('ab*', 'g'); t.ok(re1.test(str)); const re2 = new RE2('ab*'); t.ok(re2.test(str)); const re3 = new RE2('abc'); t.notOk(re3.test(str)); }); test('test test anchored to beginning', t => { const re = RE2('^hello', 'g'); t.ok(re.test('hellohello')); t.notOk(re.test('hellohello')); }); test('test test invalid', t => { const re = RE2(''); try { re.test({ toString() { throw 'corner'; } }); t.fail(); // shouldn't be here } catch (e) { t.equal(e, 'corner'); } }); test('test test anchor 1', t => { const re = new RE2('b|^a', 'g'); let result = re.test('aabc'); t.ok(result); t.equal(re.lastIndex, 1); result = re.test('aabc'); t.ok(result); t.equal(re.lastIndex, 3); result = re.test('aabc'); t.notOk(result); }); test('test test anchor 2', t => { const re = new RE2('(?:^a)', 'g'); let result = re.test('aabc'); t.ok(result); t.equal(re.lastIndex, 1); result = re.test('aabc'); t.notOk(result); }); // Unicode tests test('test test unicode', t => { let re = new RE2('охотник\\s(желает).+?(где)', 'i'); t.ok(re.test('Каждый Охотник Желает Знать Где Сидит Фазан')); t.ok(re.test('кАЖДЫЙ оХОТНИК жЕЛАЕТ зНАТЬ гДЕ сИДИТ фАЗАН')); t.ok(re.test('каждый охотник желает знать где сидит фазан')); t.ok(re.test('КАЖДЫЙ ОХОТНИК ЖЕЛАЕТ ЗНАТЬ ГДЕ СИДИТ ФАЗАН')); t.notOk(re.test('Кажный Стрелок Хочет Найти Иде Прячется Птица')); re = new RE2('аб*', 'g'); t.ok(re.test('аббвгдеабё')); t.notOk(re.test('йцукен')); re = new RE2('(привет \\S+)'); t.ok(re.test('Это просто привет всем.')); t.notOk(re.test('Это просто Привет всем.')); }); test('test test unicode subsequent', t => { const str = 'аббвгдеабё'; const re = new RE2('аб*', 'g'); let result = re.test(str); t.ok(result); t.equal(re.lastIndex, 3); result = re.test(str); t.ok(result); t.equal(re.lastIndex, 9); result = re.test(str); t.notOk(result); }); // Buffer tests test('test test buffer', t => { let re = new RE2('охотник\\s(желает).+?(где)', 'i'); t.ok(re.test(Buffer.from('Каждый Охотник Желает Знать Где Сидит Фазан'))); t.ok(re.test(Buffer.from('кАЖДЫЙ оХОТНИК жЕЛАЕТ зНАТЬ гДЕ сИДИТ фАЗАН'))); t.ok(re.test(Buffer.from('каждый охотник желает знать где сидит фазан'))); t.ok(re.test(Buffer.from('КАЖДЫЙ ОХОТНИК ЖЕЛАЕТ ЗНАТЬ ГДЕ СИДИТ ФАЗАН'))); t.notOk( re.test(Buffer.from('Кажный Стрелок Хочет Найти Иде Прячется Птица')) ); re = new RE2('аб*', 'g'); t.ok(re.test(Buffer.from('аббвгдеабё'))); t.notOk(re.test(Buffer.from('йцукен'))); re = new RE2('(привет \\S+)'); t.ok(re.test(Buffer.from('Это просто привет всем.'))); t.notOk(re.test(Buffer.from('Это просто Привет всем.'))); }); // Sticky tests test('test test sticky', t => { const re = new RE2('\\s+', 'y'); t.notOk(re.test('Hello world, how are you?')); re.lastIndex = 5; t.ok(re.test('Hello world, how are you?')); t.equal(re.lastIndex, 6); const re2 = new RE2('\\s+', 'gy'); t.notOk(re2.test('Hello world, how are you?')); re2.lastIndex = 5; t.ok(re2.test('Hello world, how are you?')); t.equal(re2.lastIndex, 6); }); ================================================ FILE: tests/test-toString.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test toString', t => { t.equal(RE2('').toString(), '/(?:)/u'); t.equal(RE2('a').toString(), '/a/u'); t.equal(RE2('b', 'i').toString(), '/b/iu'); t.equal(RE2('c', 'g').toString(), '/c/gu'); t.equal(RE2('d', 'm').toString(), '/d/mu'); t.equal(RE2('\\d+', 'gi') + '', '/\\d+/giu'); t.equal(RE2('\\s*', 'gm') + '', '/\\s*/gmu'); t.equal(RE2('\\S{1,3}', 'ig') + '', '/\\S{1,3}/giu'); t.equal(RE2('\\D{,2}', 'mig') + '', '/\\D{,2}/gimu'); t.equal(RE2('^a{2,}', 'mi') + '', '/^a{2,}/imu'); t.equal(RE2('^a{5}$', 'gim') + '', '/^a{5}$/gimu'); t.equal(RE2('\\u{1F603}/', 'iy') + '', '/\\u{1F603}\\//iuy'); t.equal(RE2('^a{2,}', 'smi') + '', '/^a{2,}/imsu'); t.equal(RE2('c', 'ug').toString(), '/c/gu'); t.equal(RE2('d', 'um').toString(), '/d/mu'); t.equal(RE2('a', 'd').toString(), '/a/du'); t.equal(RE2('a', 'dg').toString(), '/a/dgu'); t.equal(RE2('a', 'dgi').toString(), '/a/dgiu'); t.equal(RE2('a', 'dgimsy').toString(), '/a/dgimsuy'); }); ================================================ FILE: tests/test-unicode-classes.mjs ================================================ import test from 'tape-six'; import {RE2} from '../re2.js'; // tests test('test_unicodeClasses', t => { 'use strict'; let re2 = new RE2(/\p{L}/u); t.ok(re2.test('a')); t.notOk(re2.test('1')); re2 = new RE2(/\p{Letter}/u); t.ok(re2.test('a')); t.notOk(re2.test('1')); re2 = new RE2(/\p{Lu}/u); t.ok(re2.test('A')); t.notOk(re2.test('a')); re2 = new RE2(/\p{Uppercase_Letter}/u); t.ok(re2.test('A')); t.notOk(re2.test('a')); re2 = new RE2(/\p{Script=Latin}/u); t.ok(re2.test('a')); t.notOk(re2.test('ф')); re2 = new RE2(/\p{sc=Cyrillic}/u); t.notOk(re2.test('a')); t.ok(re2.test('ф')); }); ================================================ FILE: ts-tests/test-types.ts ================================================ import RE2 from 're2'; function assertType(_val: T) {} function test_constructors() { const re1 = new RE2('abc'); const re2 = new RE2('abc', 'gi'); const re3 = new RE2(Buffer.from('abc')); const re4 = new RE2(Buffer.from('abc'), 'i'); const re5 = new RE2(/abc/i); const re6 = new RE2(re1); const re7 = RE2('abc'); const re8 = RE2('abc', 'gi'); const re9 = RE2(Buffer.from('abc')); const re10 = RE2(/abc/i); assertType(re1); assertType(re2); assertType(re3); assertType(re4); assertType(re5); assertType(re6); assertType(re7); assertType(re8); assertType(re9); assertType(re10); } function test_properties() { const re = new RE2('abc', 'dgimsuy'); assertType(re.source); assertType(re.flags); assertType(re.global); assertType(re.ignoreCase); assertType(re.multiline); assertType(re.dotAll); assertType(re.unicode); assertType(re.sticky); assertType(re.hasIndices); assertType(re.lastIndex); assertType(re.internalSource); re.lastIndex = 5; } function test_execTypes() { const re = new RE2('quick\\s(brown).+?(?jumps)', 'ig'); const result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog'); if (!(result && result.groups)) { throw 'Unexpected Result'; } assertType(result.index); assertType(result.input); assertType(result.groups['verb']); } function test_execBufferTypes() { const re = new RE2('abc', 'ig'); const result = re.exec(Buffer.from('xabcx')); if (!result) { throw 'Unexpected Result'; } assertType(result.index); assertType(result.input); assertType(result[0]); } function test_matchTypes() { const re = new RE2('quick\\s(brown).+?(?jumps)', 'ig'); const result = re.match('The Quick Brown Fox Jumps Over The Lazy Dog'); if (!(result && result.index && result.input && result.groups)) { throw 'Unexpected Result'; } assertType(result.index); assertType(result.input); assertType(result.groups['verb']); } function test_matchBufferTypes() { const re = new RE2('abc', 'i'); const result = re.match(Buffer.from('xabcx')); if (!result) { throw 'Unexpected Result'; } assertType(result[0]); } function test_testTypes() { const re = new RE2('abc'); assertType(re.test('xabcx')); assertType(re.test(Buffer.from('xabcx'))); } function test_searchTypes() { const re = new RE2('abc'); assertType(re.search('xabcx')); assertType(re.search(Buffer.from('xabcx'))); } function test_replaceTypes() { const re = new RE2('abc', 'g'); assertType(re.replace('xabcx', 'def')); assertType(re.replace('xabcx', (match: string) => match.toUpperCase())); assertType(re.replace(Buffer.from('xabcx'), Buffer.from('def'))); } function test_splitTypes() { const re = new RE2(','); assertType(re.split('a,b,c')); assertType(re.split('a,b,c', 2)); assertType(re.split(Buffer.from('a,b,c'))); assertType(re.split(Buffer.from('a,b,c'), 2)); } function test_toStringType() { const re = new RE2('abc', 'gi'); assertType(re.toString()); } function test_staticMembers() { assertType(RE2.getUtf8Length('hello')); assertType(RE2.getUtf16Length(Buffer.from('hello'))); assertType<'nothing' | 'warnOnce' | 'warn' | 'throw'>( RE2.unicodeWarningLevel ); RE2.unicodeWarningLevel = 'nothing'; const {RE2: NamedRE2} = RE2; assertType(NamedRE2); const re = new NamedRE2('abc'); assertType(re); } function test_setTypes() { const set = new RE2.Set(['alpha', Buffer.from('beta')], 'i', { anchor: 'start' }); assertType(set.match('alphabet')); assertType(set.test(Buffer.from('alphabet'))); assertType<'unanchored' | 'start' | 'both'>(set.anchor); assertType(set.sources); assertType(set.flags); assertType(set.size); assertType(set.source); assertType(set.toString()); const set2 = RE2.Set(['a', 'b']); assertType(set2.match('a')); const set3 = new RE2.Set([new RE2('a'), /b/]); assertType(set3.test('a')); const set4 = new RE2.Set(['a'], {anchor: 'both'}); assertType<'unanchored' | 'start' | 'both'>(set4.anchor); } test_constructors(); test_properties(); test_execTypes(); test_execBufferTypes(); test_matchTypes(); test_matchBufferTypes(); test_testTypes(); test_searchTypes(); test_replaceTypes(); test_splitTypes(); test_toStringType(); test_staticMembers(); test_setTypes(); ================================================ FILE: tsconfig.json ================================================ { "compilerOptions": { "noEmit": true, "lib": ["ES2022"], "types": ["node"], "declaration": true, "esModuleInterop": true, "strict": true, "allowUnusedLabels": false, "allowUnreachableCode": false, "exactOptionalPropertyTypes": true, "noFallthroughCasesInSwitch": true, "noImplicitOverride": true, "noImplicitReturns": true, "noPropertyAccessFromIndexSignature": true, "noUncheckedIndexedAccess": true, "noUnusedLocals": true, "noUnusedParameters": true, "forceConsistentCasingInFileNames": true, "skipDefaultLibCheck": false }, "include": ["**/*.ts"], "exclude": ["vendor/re2/app/**"] }