Full Code of cc1a2b/JShunter for AI

main d5cd4aeb5ea4 cached

30 files

366.9 KB

96.3k tokens

235 symbols

1 requests

Download .txt

Showing preview only (381K chars total). Download the full file or copy to clipboard to get everything.

Repository: cc1a2b/JShunter
Branch: main
Commit: d5cd4aeb5ea4
Files: 30
Total size: 366.9 KB

Directory structure:
gitextract_f1fvidta/

├── .gitignore
├── .jshunterignore.example
├── CHANGELOG.md
├── CREDITS.md
├── LICENSE
├── README.md
├── RULES.md
├── cmd/
│   └── jshunter/
│       └── main.go
├── go.mod
├── go.sum
├── internal/
│   └── jshunter/
│       ├── aws_pair.go
│       ├── cache.go
│       ├── concurrent_verify.go
│       ├── crawler.go
│       ├── csp.go
│       ├── detection.go
│       ├── diff.go
│       ├── har.go
│       ├── html_extract.go
│       ├── ignore.go
│       ├── jshunter.go
│       ├── ndjson.go
│       ├── robots.go
│       ├── rules_cli.go
│       ├── rules_loader.go
│       ├── sarif.go
│       ├── sourcemap.go
│       ├── stats.go
│       └── verify.go
└── patterns.json

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Compiled binary (rebuilt locally; release artifacts attached on GitHub)
/jshunter
/jshunter.exe
/jshunter_*
/dist/

# Go build/test cache
*.test
*.out

# Editor
.vscode/
.idea/
*.swp
.DS_Store

# Operator state
.jshunterignore
*.sarif
*.har


================================================
FILE: .jshunterignore.example
================================================
# JSHunter ignore file.
# One entry per line. Blank lines and lines starting with `#` are skipped.
#
# Supported kinds:
#   hash:<value_hash>           # suppress one specific finding (stable across runs)
#   rule:<rule_id_or_glob>      # suppress an entire rule or family
#   source:<source_glob>        # suppress all findings from a source matching glob
#   rule_value:<rule>:<value_glob>
#                               # suppress when rule matches AND value matches glob
#
# Globs use filepath.Match syntax (`*`, `?`, `[abc]`).

# Example: suppress an analytics SDK that always carries a public-but-rotating key
rule:slack.webhook
source:*/cdn.segment.com/*

# Example: suppress one specific known FP by its sha256 prefix
hash:a1b2c3d4e5f60718

# Example: suppress only test JWTs from a specific bundle
rule_value:jwt.token:*test_* 


================================================
FILE: CHANGELOG.md
================================================
# Changelog

All notable changes to JSHunter are tracked here. Dates are ISO-8601.

## [v0.6 — page-aware crawling, sourcemaps, cache, concurrent verify] — 2026-05-08

The "JS-aware crawler, not just a JS-file scanner" iteration.

### HTML page-awareness (`--inline-html`)

`golang.org/x/net/html` tokenizer walks the response, extracts:

- Every inline `<script>` body (scanned under `URL#inline[N]` source label)
- Every external `<script src=…>` reference plus its `integrity=` SRI hash
- Every `<meta http-equiv="Content-Security-Policy" content=…>` directive
- `<link rel="preload|modulepreload|prefetch" as="script">` referenced JS

Homepage HTML is the most common place JS-only crawlers miss secrets —
the `window.__INITIAL_STATE__ = {…}` blob, dev-only `<script type="module">`
init code, etc. Now first-class.

### Source map real parsing (`--sourcemap`)

`//# sourceMappingURL=` markers now drive a real fetch+parse pipeline:

- HTTP(S) maps fetched through the same hardened client (host limiter,
  max-bytes, SSRF guard).
- `data:application/json;base64,...` inline maps decoded.
- `data:application/json,...` percent-encoded maps unescaped.
- Each entry in `sourcesContent[]` is scanned as its own source under
  `<URL>.map#<original-path>`.

Modern bundlers (Vite, esbuild, webpack 5, Turbopack, Rspack) routinely
ship pre-minification sources verbatim — comments, dev-only code paths,
original variable names. This is the highest-leverage signal for secret
recon on a production site.

### CSP origin extraction (`--csp-origins`)

`Content-Security-Policy` response headers and `<meta http-equiv>` tags
are parsed; the host origins (excluding `'self'`, `'unsafe-*'`, `nonce-*`,
hash sources, `data:`, `blob:`, `ws:`, …) are emitted as
`[CSP] <source>\t<origin>` lines suitable for piping back into the URL
queue of a follow-up scan.

### robots.txt ingest (`--robots`)

Fetches `/robots.txt` for every unique host in the input and prints
`Disallow`, `Allow`, and `Sitemap` lines. Pure recon helper — JSHunter
does NOT respect robots.txt for its own crawling. Operators wanting
compliance pipe these paths back as the input list.

### Disk cache (`--cache-dir`)

Per-URL SHA-256 keyed on-disk cache. Two files per URL:

- `<hash>.body` — the response body
- `<hash>.meta` — JSON: status, content-type, ETag, Last-Modified, fetched_at

Re-runs attach `If-None-Match` / `If-Modified-Since`; 304 responses serve
from disk. Set-Cookie / Authorization-bearing responses are skipped
(security hazard). Mode 0600 on disk because cached bodies may carry
secrets.

### Concurrent verifier worker pool (`--verify-workers`)

`VerifyAllConcurrent` replaces the serial loop in `emitFinalOutput`. With
50 findings × 10 s timeout, serial = 8+ minutes; pooled (default 8) =
~1 min worst case. Per-host limiter still applies inside the workers, so
no provider gets slammed.

### SARIF partialFingerprints

Each SARIF result now carries `partialFingerprints`:

```json
"partialFingerprints": {
  "jshunter/valueHash": "<sha256/16>",
  "jshunter/ruleSecretType": "<rule_id>:<secret_type>"
}
```

GitHub Code Scanning uses these to persist dismissed/suppressed decisions
across runs even when the finding moves source/line.

### Go 1.24 modernization

- `ioutil.ReadAll` → `io.ReadAll` everywhere.
- `ioutil.ReadFile/WriteFile` → `os.ReadFile/WriteFile`.
- `rand.Seed(time.Now().UnixNano())` removed (Go 1.20+ auto-seeds the
  global source).

### Files added

- `html_extract.go` — `golang.org/x/net/html` tokenizer-based extractor.
- `csp.go` — Content-Security-Policy origin parser.
- `sourcemap.go` — sourceMappingURL fetch + JSON parse + sourcesContent walk.
- `cache.go` — `DiskCache` with ETag/Last-Modified revalidation.
- `robots.go` — RFC 9309 subset parser.
- `concurrent_verify.go` — bounded worker pool for liveness probes.

## [v0.6 — outputs, suppressions, AWS pair, registry CLI] — 2026-05-08

The "make it ship-ready" iteration. Output formats for CI, persistent
suppressions, registry introspection, alternative inputs, and the AWS pair
verifier that closes the verification gap left in the previous slice.

### AWS pair verifier (SigV4)

When the registry detects an Access Key ID and a Secret Access Key in the
**same source**, JSHunter pairs them and runs `sts:GetCallerIdentity` via
SigV4 — pure-stdlib HMAC-SHA256 signing, no aws-sdk dependency. A live
response sets `verified=true` on **both** findings and surfaces the IAM
ARN as `verify.account`. Strict pairing: same-source-only with single
AKID + single secret per source — multi-of-either is left to manual
triage to avoid mis-attribution.

### Output formats

| Flag       | Format                                              | Use case                          |
|------------|-----------------------------------------------------|-----------------------------------|
| `--sarif`  | SARIF 2.1.0 envelope                                | GitHub code-scanning upload       |
| `--ndjson` | One Finding per line, `json.Encoder` (no HTML escape) | jq, mlr, SIEM streaming         |

When either is set, per-source console output is suppressed so the
structured stream stays parseable.

### Suppressions

`--ignore-file PATH` — `.jshunterignore` syntax:

```
hash:<value_hash_hex>           # single finding by hash
rule:<rule_id_or_glob>          # entire rule or family
source:<glob>                   # all findings from matching source
rule_value:<rule>:<value-glob>  # rule + value-glob combo
```

Globs use `filepath.Match`. Applied at `recordFinding` so suppressions
work across both registry and legacy paths.

`--diff PREVIOUS.json` — reads a previous schema-v2 envelope, computes
the set of `value_hash` values already reported, and reports only
findings NOT in that set. Schema-version mismatch is a hard error.

### Registry introspection / selection

| Flag                  | Effect                                                 |
|-----------------------|--------------------------------------------------------|
| `--list-rules`        | Tabular dump of `rule_id severity provider name [flags]` |
| `--explain RULE_ID`   | Full rule JSON (incl. TP/FP fixtures)                  |
| `--only-rules a,b,*c` | Run only matching rules (glob suffix supported)        |
| `--disable-rule x,y`  | Drop matching rules from the registry                  |

Selection is applied **before** `--list-rules` / `--explain` /
`--self-test`, so an operator can scope CI gates to specific rule families.

### Alternative inputs

`--har FILE` — ingest a Chrome DevTools HAR archive directly, skipping
the fetcher. Only entries with JS-typed Content-Type (or `.js` URL
suffix) and 2xx/3xx status are scanned. base64-encoded response bodies
are decoded automatically (std/URL/raw variants tolerated).

### Quality of life

`--no-color` disables ANSI color; if stdout is not a TTY,
`disableColors()` runs automatically so piping to a file produces clean
text.

### Files added

- `aws_pair.go` — SigV4 + pair verifier.
- `sarif.go` — SARIF 2.1.0 envelope builder.
- `ndjson.go` — streaming output.
- `har.go` — HAR ingestion.
- `ignore.go` — `.jshunterignore` loader and matcher.
- `diff.go` — previous-envelope baseline.
- `rules_cli.go` — `--list-rules`, `--explain`, registry selection.
- `.jshunterignore.example` — operator template.

## [v0.6 — verifier + observability + crawler hardening] — 2026-05-08

The "trust the output" iteration on top of the v0.6 FP pipeline.

### Live verification (`--verify`)

Off-by-default, opt-in liveness probes against documented read-only
endpoints. A verified secret carries `verified=true` and confidence is
elevated to `1.0`. Per-host limiter + bounded timeout per probe; secrets
are never leaked into transport-error strings (sanitized).

| Provider     | Endpoint                                      | Auth                          |
|--------------|-----------------------------------------------|-------------------------------|
| Stripe       | `GET /v1/balance`                             | `Authorization: Bearer …`     |
| GitHub       | `GET /user`                                   | `Authorization: token …`      |
| OpenAI       | `GET /v1/models`                              | `Authorization: Bearer …`     |
| Anthropic    | `GET /v1/models`                              | `x-api-key` + `anthropic-version` |
| Slack        | `GET /api/auth.test`                          | `Authorization: Bearer …`     |
| SendGrid     | `GET /v3/scopes`                              | `Authorization: Bearer …`     |
| Mailgun      | `GET /v3/domains`                             | HTTP Basic `api:<key>`        |
| HuggingFace  | `GET /api/whoami-v2`                          | `Authorization: Bearer …`     |

Citations live next to each verifier in `verify.go`.

### Operator observability (`--stats`)

Per-stage atomic counters with a fresh run-id per process:

- `URLsFetched`, `URLsBlocked`, `BytesParsed`, `BytesTruncated`
- `RegistryHits`, `LegacyMatchesRaw`
- `DroppedVendorNoise`, `DroppedFixture`, `DroppedSourcemap`
- `DroppedLowEntropy`, `DroppedNoContext`, `DroppedBelowConf`, `DroppedRegistryDup`
- `FindingsAfterFilter`, `FindingsAfterDedupe`
- `VerifyAttempts`, `VerifyAlive`, `VerifyDead`, `VerifyError`

Printed to stderr at end of run when `--stats` is set, so stdout pipelines
stay clean.

### Crawler hardening

- Per-host outbound concurrency cap (default 4, configurable via `--per-host`).
- Exponential backoff with ±25% jitter between retries.
- `Retry-After` header (seconds and HTTP-date forms) is honored.
- 429/5xx circuit breaker: after 5 consecutive bad responses on a host, all
  requests to that host are dropped for 30 s (or the longest `Retry-After`
  observed, whichever is greater).

### Output schema

- `Finding` now carries `line`, `column`, and `verify{alive,status,account,note}`.
- `Location[]` carries `line` and `column` per occurrence — operators can
  paste the JSON straight into `vim file:line:col`.

### Console redaction (`--show-secrets`)

By default the console prints redacted values (`AKIA****GHIJ`); the full
value is written to the `-o` output file because that's what the operator
explicitly asked for. `--show-secrets` reverts to v0.6.0 behavior.

### Extensibility (`--rules-file`)

Operators ship custom rules at runtime via JSON pack. Format documented in
`RULES.md`. External rules participate in `--self-test` automatically.
Loader rejects the whole pack on any validation failure (no partial loads).

### Tests

`detection_test.go` ships with:
- Property tests for `shannonEntropy` (bounds, monotonicity).
- Length and middle-mask tests for `redactValue`.
- Round-trip CRC32 base62 test for `validateGitHubToken`.
- Structural tests for `validateJWT`, `validateAWSAccessKeyID`,
  `validateStripeKey`.
- Vendor-noise denylist coverage (canonical AWS docs example).
- Schema-version assertion test (golden-file in spirit).
- Loader contract tests (missing/duplicate id, bad regex, oversized regex).
- Backoff-bounds and `parseRetryAfter` smoke tests.
- `runSelfTest` is invoked by `TestRegistry_AllRules_FixturesPass` so any
  rule whose TP fixture stops matching, or whose FP fixture starts being
  reported, fails CI.

### Documentation

- New `RULES.md` covering the full schema, confidence model, and rule
  authoring contract.
- New `CREDITS.md` honestly naming TruffleHog, Gitleaks, detect-secrets,
  Nosey Parker, secretlint, Semgrep secrets pack as inspirations.

## [v0.6 — initial false-positive surgery] — 2026-05-08

The "kill the false positives" release. Every secret-class match now flows
through a confidence-scoring pipeline before it is reported, the highest-volume
providers get format-and-checksum validators, and the JSON output is
schema-versioned so downstream tools can detect breaking changes.

### Detector additions

- New curated rule registry (`detection.go`) for highest-precision providers:
  AWS access keys (prefix family + 16-char base32 body), AWS secret keys,
  Stripe (`sk/rk/pk_(live|test)_` + clean base62), GitHub PATs (CRC32 base62
  checksum verified), GitHub fine-grained PATs, OpenAI legacy/project/svcacct,
  Anthropic, Google API + ya29 OAuth, Slack token family + app + webhook,
  Discord webhook + bot token, Twilio SK + AC, SendGrid, Mailgun, Mailchimp,
  GitHub App installation tokens, GitLab PAT + pipeline trigger, Vercel,
  Doppler, DigitalOcean, Shopify (access + shared secret), npm, PyPI, JWT
  (with structural decode), private keys (RSA/OpenSSH/EC/PGP), Facebook
  access token, Linear, HuggingFace, Supabase service-role JWT.

### False-positive fixes

- Vendor-noise denylist: canonical AWS docs example
  (`AKIAIOSFODNN7EXAMPLE`), `wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`,
  Stripe public test fixtures, the canonical 3-part `eyJ...` example JWT.
- Substring denylist for placeholder fragments (`YOUR_API_KEY`,
  `REPLACE_ME`, `PLACEHOLDER`, `XXXXXXXX`, etc.).
- Sourcemap-line skip: any match on a `//# sourceMappingURL=` line is
  dropped — that line is a build artifact, not a secret.
- Vendor/chunk filename gate: matches in `vendor*.js`, `chunk-*.js`,
  `runtime-*.js`, `polyfill*.js`, `framework*.js`, `node_modules/*` paths
  start with a confidence penalty.
- Fixture-context penalty: surrounding text containing `example`,
  `dummy`, `sample`, `placeholder`, `mock_`, `stub_`, `fake_`, `lorem`,
  `FIXME` lowers the score by 0.30.
- Generic-rule context gate: `Generic Api Key`, `Generic Secret`,
  `Quickbooks Api Key`, `Cisco Access Token`, `Sanity Token`,
  `Atlassian Access Token`, `Heroku Api Key 2/3` now require a
  `key|token|secret|auth|bearer|api|password|...` keyword within ±96
  chars and entropy ≥ 3.2 with character-class diversity ≥ 2.
- Provider validators: AWS access key (prefix + base32 body), Stripe
  (prefix family + clean base62), GitHub (CRC32 base62 checksum), Slack
  (hyphen-segment shape), JWT (header decodes to JSON with `alg`),
  Twilio (32-hex body + entropy gate).

### Broken patterns repaired

The following v0.5 patterns could **never match** in a body because they
were anchored with `^...$` (which only match a complete one-line input)
or had escape errors. v0.6 corrects them:

- `Dropbox Access Token` — was `^sl\.…$`, now `\bsl\.…\b`.
- `Twitter Bearer Token` — was `^AAAA…$`, now bounded `\bAAAA…\b`.
- `Username Password Combo` — was `^[a-z]+://…@`, now scoped to body matches.
- `Crowdstrike Api Key` — was `^…$`, now requires a `crowdstrike/cs` keyword.
- `Azure Storage Account Key` — was `^…={0,43}$`, now anchored to
  `AccountKey=`/`azure_storage_key` context.
- `Phone Number` — was `^\+\d{9,14}$`, now matches inside text bodies.
- `Ali Cloud Access Key` / `Tencent Cloud Access Key` — anchors removed.
- `Json Web Token` — was `ey…\.…\.…$` (trailing-only), now `\beyJ…\.eyJ…\.…\b`.
- `Github Access Token` — `com*` typo (matched `co`, `com`, `comm…`)
  replaced with proper `\.com\b`.
- `Password in Url` — broken `\\s` / `\\\\` escapes replaced with valid
  regex character classes.
- `Amazon Mws Auth Token` — `\\.` escape errors replaced with `\.`.
- `Heroku Api Key 3` — unbounded `.*` (ReDoS hazard) replaced with
  bounded `[^\n]{0,80}`.

### High-FP rules tightened

- `Quickbooks Api Key` (`A[0-9a-f]{32}` matched any commit hash starting
  with A) — now requires `quickbooks|qbo|intuit` context keyword.
- `Cisco Api Key` (`cisco[A-Za-z0-9]{30}`) — now requires
  `cisco_api_key=` style assignment.
- `Cisco Access Token` (`access_token=\w+`) — now requires `cisco_`
  keyword to avoid generic OAuth flows.
- `Sanity Token` (`sk[a-zA-Z0-9]{32,}`) — now requires `sanity_token=`
  context.
- `Atlassian Access Token` (`{20,}\.{6,}\.{25,}`) — replaced with the
  documented Atlassian `ATATT3…` token shape, gated by context keyword.
- `Heroku Api Key 2` (`heroku[A-Za-z0-9]{32}`) — replaced with the
  proper `heroku_api_key=UUID` shape.

### CLI additions

- `--min-confidence FLOAT` / `-mc` (default `0.50`) — gate on per-finding
  confidence.
- `--show-confidence` / `-sc` — print `[conf=X.XX]` next to each finding.
- `--no-fp-filter` — disable the FP filter (debug; v0.5-compatible output).
- `--self-test` — run the rule registry against its built-in TP/FP
  fixtures and exit non-zero on regression. Suitable for CI.
- `--max-bytes N` (default 32 MiB) — cap response body reads to defend
  against gzip bombs and pathological streaming.
- `--allow-internal` — permit `localhost`, `127.0.0.0/8`, RFC1918, and
  link-local targets. **Off by default** to prevent SSRF when piping
  untrusted URL lists.

### Output schema

- Top-level `schema_version: 2` field added to all `--json` output.
- New `findings[]` array carries: `rule_id`, `name`, `provider`,
  `secret_type`, `severity`, `value`, `redacted`, `value_hash`,
  `source`, `confidence`, `entropy`, `verified`, `reasons[]`,
  `locations[]`. Same secret seen in N sources collapses to one
  Finding with `locations[]` listing all N.
- The legacy `matches{name: [value, …]}` map is retained for backward
  compatibility within schema v2.

### Operational hardening

- Target URL validation refuses non-HTTP(S) schemes (no `file://`),
  loopback, RFC1918, and link-local hosts unless `--allow-internal` is
  passed. The intent is making JSHunter safe to run against
  user-supplied URL lists.
- Response body reads are now bounded by `--max-bytes` via
  `io.LimitReader`. 

## [v0.5] — 2026-01-22

Pre-release baseline. Single-file `jshunter.go`, ~190 regex patterns,
basic match/no-match output, no confidence scoring.


================================================
FILE: CREDITS.md
================================================
# Credits

JSHunter is a competitive recon tool. Pretending it sprang from nowhere would
be dishonest — the secret-detection space has years of public work that
inspired both the rule shapes and the false-positive techniques baked into
v0.6. This file names them.

## Prior art that shaped the v0.6 detection layer

- **[TruffleHog](https://github.com/trufflesecurity/trufflehog)** — the
  reference for "regex match, then live-verify against the provider." The
  per-provider verifier endpoints (Stripe `/v1/balance`, GitHub `/user`,
  Slack `auth.test`, SendGrid `/v3/scopes`, Mailgun `/v3/domains`) used in
  JSHunter's `--verify` flow are the same lightweight, read-only endpoints
  TruffleHog adopted; we cite them in `verify.go` so a reviewer can audit.
- **[Gitleaks](https://github.com/gitleaks/gitleaks)** — TOML rule pack
  shape, the idea of explicit per-rule TP/FP fixtures, and many of the
  long-tail provider regexes JSHunter inherited. Gitleaks's "rules.toml"
  was the model for JSHunter's external `--rules-file` JSON loader.
- **[Yelp/detect-secrets](https://github.com/Yelp/detect-secrets)** — entropy
  thresholds and the "high-entropy-string" plugin family. The Shannon-entropy
  + character-class-diversity gate in `detection.go::scoreFinding` is in the
  spirit of detect-secrets's filters.
- **[praetorian-inc/noseyparker](https://github.com/praetorian-inc/noseyparker)** —
  performance reference for high-volume scanning; the multi-pattern ideas
  that will land in v0.7 trace back here.
- **[secretlint](https://github.com/secretlint/secretlint)** — provider
  rotation tracking; their issue tracker is the canonical place to learn
  when a provider has changed token format.
- **[Semgrep secrets pack](https://semgrep.dev/p/secrets)** — context-aware
  rule construction, especially the "shape + context window" pattern that
  JSHunter's `RequiresContext` + `ContextKeywords` per-rule fields encode.
- **GitHub Engineering blog: "Behind GitHub's new authentication token formats"**
  ([link](https://github.blog/engineering/platform-security/behind-githubs-new-authentication-token-formats/))
  — source for the CRC32 base62 checksum used in
  `detection.go::validateGitHubToken`.
- **AWS access key bitwise analysis (WithSecure Labs)**
  ([link](https://labs.withsecure.com/publications/a-bitwise-analysis-of-aws-access-key-identifiers))
  — base32 alphabet (A–Z, 2–7) + prefix family encoded in
  `validateAWSAccessKeyID`.

## Vendor & provider documentation cited inline

Every validator in `verify.go` carries a vendor docs link as a comment.
When a provider rotates token format or moves an endpoint, those comments
are the single source of truth for what to update.

## License
 
JSHunter is MIT-licensed. The works above retain their respective licenses;
JSHunter does not vendor source from any of them. Where a regex is similar
to a Gitleaks or detect-secrets pattern, that is convergent design on
provider-published shapes, not a direct copy.

— Hussain Alsharman, JSHunter author


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024-2026 Hussain Alsharman

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# JSHunter

<div align="center">

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
[![Go Version](https://img.shields.io/badge/Go-1.22.5+-00ADD8?style=flat&logo=go)](https://golang.org)
[![Release](https://img.shields.io/github/release/cc1a2b/jshunter.svg)](https://github.com/cc1a2b/jshunter/releases)
[![GitHub stars](https://img.shields.io/github/stars/cc1a2b/jshunter)](https://github.com/cc1a2b/jshunter/stargazers)
[![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)](https://github.com/cc1a2b/jshunter/releases)

**🔍 Professional JavaScript Security Analysis Tool**

*Complete endpoint discovery, sensitive data detection, and advanced code analysis for security professionals*

</div>

## 📖 About

**JSHunter** is a comprehensive command-line tool for JavaScript security analysis and endpoint discovery. Built for security professionals, penetration testers, and developers, it delivers enterprise-grade analysis capabilities with high accuracy detection algorithms and professional reporting features.

> **Surgical false-positive reduction.** Every secret-class match flows through a confidence-scoring pipeline (entropy gate, character-class diversity, vendor-noise denylist, fixture-context penalty, sourcemap-line skip) before it is reported. Highest-volume providers (AWS, Stripe, GitHub, OpenAI, Slack, JWT) get format-and-checksum validators. Live verification is opt-in via `--verify`. Page-aware crawling, source-map analysis, HAR ingestion, and on-disk response cache are first-class. Output is JSON (`schema_version: 2`), NDJSON, SARIF 2.1.0, CSV, or Burp-compatible. Run `jshunter --self-test` to exercise the rule registry against its built-in TP/FP fixtures.

<div align="center">
<img alt="JSHunter Demo Screenshot" src="https://github.com/user-attachments/assets/f0197c36-c40b-48e9-bec5-c306acd4a613" width="100%">

*JSHunter in action - Professional JavaScript security analysis*
</div>

---

## 📑 Table of Contents

- [About](#-about)
- [Features](#-features)
- [Installation](#-installation)  
- [Quick Start](#-quick-start)
- [Usage Examples](#-usage-examples)
- [Command Reference](#-command-reference)
- [Advanced Usage](#-advanced-usage)
- [Contributing](#-contributing)
- [License](#-license)
- [Support](#-support)

---

## ✨ Features

### 🎯 Core Capabilities
- **🔍 Comprehensive Endpoint Discovery**: Automatically extracts URLs, API endpoints, and hidden parameters from JavaScript files
- **🔐 Advanced Security Analysis**: Identifies API keys, JWT tokens, credentials, and potential vulnerabilities with high accuracy  
- **📥 Flexible Input Methods**: Supports URLs, file lists, local files, stdin piping, and recursive discovery
- **⚡ High-Performance Architecture**: Multi-threaded concurrent processing with intelligent rate limiting
- **🎭 Professional Stealth Features**: Proxy support, custom headers, user-agent rotation, and bypass detection

### 🎯 Intelligent Detection Engine
> **Enterprise-grade accuracy with advanced analysis algorithms**

- **🎯 Smart Base64 Detection**: High-accuracy filtering eliminates false positives from media content and encoded data
- **🏢 Professional Interface**: Enterprise-ready terminology, documentation, and comprehensive reporting formats
- **🧠 Context-Aware Analysis**: Advanced algorithms distinguish real security tokens from encoded media data
- **📊 Entropy Analysis**: Mathematical algorithms identify genuine security tokens and credentials with precision

### 🌐 Professional HTTP & Networking Suite
<details>
<summary><strong>Enterprise-Grade Network Configuration</strong></summary>

**Authentication & Headers:**
- **🔧 Custom Headers** (`-H`): Repeatable authentication headers and custom request headers
- **🍪 Cookie Management** (`-c`): Session cookies for accessing protected resources
- **🎭 User-Agent Control** (`-U`): Custom UA strings or file-based rotation for stealth

**Performance & Reliability:**
- **⏱️ Rate Limiting** (`-R`): Configurable request delays (milliseconds) to avoid detection
- **⏰ Smart Timeouts** (`-T`): Custom timeout settings for different network conditions
- **🔄 Intelligent Retry** (`-y`): Automatic retry mechanism with exponential backoff for failed requests

**Professional Integration:**
- **🔗 Proxy Support** (`-p`): Full Burp Suite and custom proxy integration (HTTP/HTTPS/SOCKS5)
- **🔒 TLS Flexibility** (`-k`): Optional certificate verification bypass for testing environments
- **🎯 Thread Control** (`-t`): Configurable concurrent request handling for optimal performance

> **🔒 Security Professional Features**: Designed for penetration testing and security assessments  
> **Example**: `jshunter -l targets.txt -p 127.0.0.1:8080 -H "Authorization: Bearer token" -R 1000`

</details>

### 📝 Advanced JavaScript Analysis
<details>
<summary><strong>Complete Code Analysis & Deobfuscation Suite</strong></summary>

**Core Analysis Tools:**
- **🧩 Deobfuscation Engine** (`-d`): Unpacks minified and obfuscated JavaScript for deep analysis
- **🗺️ Source Map Parser** (`-m`): Extracts and analyzes original source code from source maps
- **🔍 Obfuscation Detection** (`-z`): Identifies and classifies obfuscation techniques and patterns

**Dynamic Analysis:**
- **⚡ Eval Analysis** (`-e`): Analyzes dynamic code execution (`eval()`, `Function()`, runtime generation)

**Code Intelligence:**
- **🔍 Pattern Recognition**: Identifies common JavaScript frameworks and libraries
- **📊 Code Structure Analysis**: Maps application architecture and data flows
- **🎯 Context-Aware Detection**: Understands code context to reduce false positives

> **💡 Professional Usage**: Combine analysis tools with security detection for maximum coverage  
> **Example**: `jshunter -u target.js -d -m -e -s -g` (full deobfuscation + security analysis)

</details>

### 🔐 Security Analysis Suite
<details>
<summary><strong>Complete Security Assessment Toolkit</strong></summary>

**Core Security Detection:**
- **🔑 Secrets Detection** (`-s`): API keys, access tokens, passwords, and hardcoded credentials
- **🎫 JWT Token Analysis** (`-x`): Authentication token extraction, validation, and payload inspection
- **🔥 Firebase Security** (`-F`): Configuration analysis, API keys, and database URL detection

**Advanced Analysis:**
- **📋 Parameter Discovery** (`-P`): Hidden form parameters, variables, and configuration keys
- **🔗 URL Parameter Extraction** (`-PU`): Advanced parameter analysis with full URL context
- **📊 GraphQL Analysis** (`-g`): Schema detection, query extraction, and endpoint discovery
- **🛡️ WAF Bypass Detection** (`-B`): Security bypass patterns and evasion techniques

**Scope & Context:**
- **🏠 Internal Endpoint Filtering** (`-i`): Private/internal resource identification and classification
- **🌐 Link Analysis** (`-L`): Comprehensive URL extraction and relationship mapping

> **🎯 Professional Tip**: Combine flags for comprehensive analysis (e.g., `jshunter -u target.js -s -x -F -g`)

</details>

### 🎯 Scope & Discovery
<details>
<summary><strong>Intelligent Crawling & Targeting</strong></summary>

- **🔍 Recursive Discovery**: Multi-depth JavaScript file crawling
- **🌍 Domain Scoping**: Focus analysis on specific domains
- **📂 Extension Filtering**: Target specific JavaScript file types

</details>

### 📤 Professional Reporting & Export Suite
<details>
<summary><strong>Enterprise-Grade Output & Integration</strong></summary>

**Core Output Formats:**
- **🖥️ Console Display**: Color-coded terminal output with professional formatting and clear categorization
- **📄 File Export** (`-o`): Save comprehensive results to custom file locations
- **📊 JSON Export** (`-j`): Structured data format for automation and programmatic processing
- **📈 CSV Export** (`-C`): Spreadsheet-compatible format for executive reporting and analysis

**Professional Integration:**
- **🔴 Burp Suite Export** (`-n`): Direct integration with Burp Suite Professional for immediate testing
- **🎯 Regex Filtering** (`-r`): Custom pattern matching for targeted result filtering
- **🔍 Verbose Analysis** (`-v`): Detailed analysis output with debugging information and context

**Result Management:**
- **✨ Clean Mode** (`--found-only`): Hide empty results for focused security reporting
- **🤫 Quiet Mode** (`-q`): Suppress banner for automated scripting and CI/CD integration

> **📋 Reporting Workflow**: Use JSON for automation, CSV for management reports, Burp export for immediate testing  
> **Example**: `jshunter -l targets.txt -s -j -o security-findings.json` (structured security report)

</details>

---

## 📦 Installation

### Go Install (Recommended)
```bash
# Install JSHunter
go install -v github.com/cc1a2b/jshunter/cmd/jshunter@latest

# Verify installation
jshunter --help
```

### Build from Source
```bash
git clone https://github.com/cc1a2b/jshunter.git
cd jshunter
go build -o jshunter ./cmd/jshunter
```

### System Requirements
- **Go 1.22.5+** (for building from source)
- **Linux, macOS, or Windows** (64-bit architecture)
- **Network connectivity** for remote JavaScript analysis

---

## 🚀 Quick Start

### Basic Analysis
```bash
# Analyze a single JavaScript file
jshunter -u "https://example.com/app.js"

# Scan multiple URLs from file
jshunter -l urls.txt

# Analyze local JavaScript file
jshunter -f app.js
```

### Complete Security Analysis
```bash
# Find API keys, secrets, and credentials
jshunter -u "https://target.com/app.js" -s

# Full analysis with deobfuscation, GraphQL, and Firebase detection
jshunter -u "https://target.com/app.js" -d -s -g -F -x -L

# Professional security assessment with all tools
jshunter -u "https://target.com/app.js" -d -m -e -s -x -P -g -F -B -L

# Export comprehensive results for reporting
jshunter -l targets.txt -s -g -F -j -o security_findings.json
```

---

## 💡 Usage Examples

```bash
# Analyze single URL
jshunter -u "https://example.com/app.js"

# Analyze multiple URLs from file
jshunter -l urls.txt

# Pipe URLs from stdin
cat urls.txt | grep "\.js" | jshunter

# Complete security analysis - find secrets, API keys, and credentials
jshunter -u "https://example.com/app.js" -s -x -F

# Full analysis suite with deobfuscation and all security tools
jshunter -u "https://target.com/app.js" -d -m -e -s -x -P -g -F -B -L

# Professional assessment with source map analysis
jshunter -u "https://target.com/bundle.js" -d -m -s -g -F

# Export comprehensive results to structured formats
jshunter -l targets.txt -s -x -F -g -j -o security_findings.json

# Stealth scanning with Burp Suite integration
jshunter -l targets.txt -p 127.0.0.1:8080 -s -g -F -n -o burp_findings.txt

# Scanning through SOCKS5 proxy (Tor, SSH tunnel, etc.)
jshunter -l targets.txt -p socks5://127.0.0.1:9050 -s -x -F

# Rate-limited professional scanning with authentication
jshunter -l urls.txt -R 2000 -H "Authorization: Bearer token" -s -x -F -g -q

# Complete endpoint and parameter discovery
jshunter -l urls.txt -ep -P -PU -L -w 2

# Advanced obfuscation analysis with context detection
jshunter -f obfuscated.js -d -z -e -s -v
```

---

## 📋 Command Reference

Get the complete help anytime with `jshunter --help`

```
Usage:
  -u,  --url URL                Input a URL
  -l,  --list FILE.txt          Input a file with URLs (.txt)
  -f,  --file FILE.js           Path to JavaScript file
       --har FILE               Ingest a Chrome DevTools HAR archive

Basic Options:
  -t,  --threads INT            Number of concurrent threads (default: 5)
  -c,  --cookies <cookies>      Authentication cookies for protected resources
  -p,  --proxy host:port        HTTP/SOCKS5 proxy (e.g., 127.0.0.1:8080 for Burp Suite)
  -q,  --quiet                  Suppress ASCII art output
       --no-color               Disable ANSI color (auto-off when not a TTY)
  -o,  --output FILENAME        Output file path
  -r,  --regex <pattern>        RegEx for filtering results
       --update, --up           Update the tool to latest version
  -ep, --end-point              Extract endpoints from JavaScript files
  -k,  --skip-tls               Skip TLS certificate verification
  -fo, --found-only             Only show results when sensitive data is found

HTTP Configuration:
  -H,  --header "Key: Value"    Custom HTTP headers (repeatable, including Auth)
  -U,  --user-agent UA          Custom User-Agent string or file path
  -R,  --rate-limit MS          Request rate limiting delay (milliseconds)
  -T,  --timeout SEC            HTTP request timeout (seconds)
  -y,  --retry INT              Retry attempts for failed requests (default: 2)
       --per-host INT           Per-host outbound concurrency cap (default: 4)
       --max-bytes N            Cap response body read in bytes (default: 32MiB)
       --allow-internal         Permit localhost / RFC1918 / link-local targets
       --cache-dir DIR          Persist responses on disk; revalidate via ETag

JavaScript Analysis:
  -d,  --deobfuscate            Deobfuscate minified and obfuscated JavaScript
  -m,  --sourcemap              Fetch and parse source maps + sourcesContent[]
  -e,  --eval                   Analyze dynamic code execution (eval, Function)
  -z,  --obfs-detect            Detect code obfuscation patterns and techniques
       --inline-html            Scan inline <script> tags + SRI/CSP in HTML responses
       --csp-origins            Emit CSP-allowed origins as candidate endpoints

Security Analysis:
  -s,  --secrets                Detect API keys, tokens, and credentials
  -x,  --tokens                 Extract JWT and authentication tokens
  -P,  --params                 Discover hidden parameters and variables
  -PU, --param-urls             Advanced parameter extraction with URL context
  -i,  --internal               Filter for internal/private endpoints
  -g,  --graphql                Analyze GraphQL endpoints and queries
  -B,  --bypass                 Detect WAF bypass patterns and techniques
  -F,  --firebase               Analyze Firebase configurations and keys
  -L,  --links                  Extract and analyze all embedded links

Detection Tuning:
  -mc, --min-confidence FLOAT   Minimum confidence (0.0-1.0) for a finding (default: 0.50)
  -sc, --show-confidence        Print [conf=X.XX] alongside each finding
       --no-fp-filter           Disable the false-positive filter (debug)
       --ignore-file FILE       Permanent suppressions (.jshunterignore)
       --diff PREVIOUS.json     Report only NEW findings vs previous JSON envelope
       --rules-file FILE.json   Load an external JSON rule pack
       --only-rules id,glob     Run only matching rules (supports * glob)
       --disable-rule id,glob   Disable matching rules (supports * glob)

Verification:
       --verify                 Probe findings against provider read-only endpoints
       --verify-timeout SEC     Timeout per verification probe (default: 10)
       --verify-workers INT     Concurrent verifier worker pool (default: 8)

Scope & Discovery:
  -w,  --crawl DEPTH            Recursive JavaScript discovery depth (default: 1)
  -D,  --domain DOMAIN          Limit analysis to specific domain
  -E,  --ext                    Filter by JavaScript file extensions
       --robots                 Fetch /robots.txt for each input host and exit

Output Formats:
  -j,  --json                   Structured JSON output (schema_version 2)
       --ndjson                 Newline-delimited JSON (jq / SIEM streaming)
       --sarif                  SARIF 2.1.0 (GitHub code-scanning compatible)
  -C,  --csv                    CSV format for spreadsheet analysis
  -v,  --verbose                Detailed analysis and debug output
  -n,  --burp                   Burp Suite compatible export format
       --stats                  Per-stage counters on stderr at end of run

Registry:
       --list-rules             Print the rule registry as a table and exit
       --explain RULE_ID        Print full rule details and exit
       --self-test              Run rule registry against built-in TP/FP fixtures

  -h,  --help                   Display this help message
```

### Confidence model

Every secret-class match is scored in `[0.0, 1.0]`. The score starts from a per-rule prior and is adjusted by:

| Signal                                         | Effect                       |
|------------------------------------------------|------------------------------|
| Source path looks like a vendor/chunk bundle   | −0.15                        |
| Surrounding context contains fixture wording   | −0.30                        |
| Provider-specific validator passed             | +0.10                        |
| Required context keyword present (generic rule)| +0.05                        |
| Shannon entropy ≥ 4.5                          | +0.05                        |
| Character-class diversity ≥ 3                  | +0.05                        |
| Match in the vendor-noise denylist             | dropped before scoring       |
| Length / entropy below rule floor              | dropped before scoring       |
| Line is a `//# sourceMappingURL=` marker       | dropped before scoring       |

The default `--min-confidence 0.50` filters out the long tail of pattern-only matches. Use `--min-confidence 0.80` for high-precision triage, `--no-fp-filter` for raw, unfiltered output.

### Provider validators

| Provider | Validator                                                                |
|----------|--------------------------------------------------------------------------|
| AWS      | Prefix family (`AKIA/ASIA/A3T…`) + 16-char base32 body                   |
| Stripe   | Prefix family (`sk/rk/pk_live/test_`) + clean base62 body                |
| GitHub   | CRC32 base62 checksum verified against random body                        |
| OpenAI   | Family prefix + length window (`sk-/sk-proj-/sk-svcacct-`)                |
| Slack    | Hyphen-segment shape (numeric inner segments, alphanumeric tail)         |
| JWT      | base64url-decoded JSON header with `alg` field + JSON payload            |
| Twilio   | 32-hex body + entropy gate                                               |

---

## 🔧 Advanced Usage

### Professional Security Assessment
```bash
# Complete security analysis with all tools
jshunter -l targets.txt -d -m -e -z -s -x -P -PU -g -F -B -L -j -v -o complete_assessment.json

# Advanced deobfuscation and analysis pipeline
jshunter -l targets.txt -d -m -z -e -s -g -F --found-only -o deobfuscated_findings.json

# Stealth reconnaissance with rate limiting and custom headers
jshunter -l targets.txt -R 2000 -U "Mozilla/5.0..." -H "X-Forwarded-For: 1.1.1.1" -s -x -F -q

# Professional penetration testing through proxy
jshunter -l targets.txt -p 127.0.0.1:8080 -s -x -g -F -B -n -o burp_comprehensive.txt

# Deep parameter and endpoint discovery
jshunter -l targets.txt -ep -P -PU -L -w 3 -i -j -o endpoint_discovery.json
```

### Enterprise & Automation Integration
```bash
# CI/CD Security Pipeline Integration
jshunter -f dist/bundle.js -d -s -x -F -j --found-only > security-scan.json

# Comprehensive automated security reporting
jshunter -l production-js.txt -d -s -x -P -g -F -B -C -o enterprise-security-report.csv

# Source map analysis for development security
jshunter -f app.js -m -s -x -F -v -o sourcemap-analysis.json

# Firebase and GraphQL focused assessment
jshunter -l targets.txt -g -F -L -j -o api_security_findings.json
```

---

## 🤝 Contributing

We welcome contributions! Here's how you can help:

- **🐛 Report bugs** via [GitHub Issues](https://github.com/cc1a2b/jshunter/issues)
- **💡 Suggest features** or improvements
- **📝 Improve documentation** 
- **🔧 Submit pull requests** with enhancements

### Development Setup
```bash
git clone https://github.com/cc1a2b/jshunter.git
cd jshunter
go mod tidy
go build -o jshunter ./cmd/jshunter
```

---

## 📄 License

JSHunter is released under the **MIT License**. See [LICENSE](https://github.com/cc1a2b/jshunter/blob/master/LICENSE) for details.

```
Copyright (c) 2024-2026 Hussain Alsharman
Licensed under MIT License - free for commercial and personal use
```

---

##  Support

If JSHunter helps with your security research or professional work:

<div align="center">

[![Buy Me A Coffee](https://cdn.buymeacoffee.com/buttons/default-orange.png)](https://www.buymeacoffee.com/cc1a2b)

**⭐ Star this repo** • **🐦 Follow [@cc1a2b](https://twitter.com/cc1a2b)** • **📢 Share with others**

</div>

---

<div align="center">

**🔍 JSHunter - Professional JavaScript Security Analysis**

*Built with ❤️ by [cc1a2b](https://github.com/cc1a2b) for the security community*

</div>


================================================
FILE: RULES.md
================================================
# Rule schema

JSHunter v0.6 ships with two rule sources: the **built-in registry** (Go code
in `detection.go`) and **external rule packs** loaded at runtime via
`--rules-file` (JSON). This doc covers both, plus the contract every new rule
must meet to ship.

## Mental model
 
```
fetch  →  parse  →  rule match  →  scoreFinding  →  recordFinding  →  output
                                       │
                       (vendor-noise gate, entropy gate, context gate,
                        provider validator, fixture-context penalty,
                        sourcemap-line skip, vendor-chunk penalty)
```

Every rule is just a regex paired with a per-rule **confidence prior** and
a set of **gates** that adjust or reject the score. The gates are the same
for built-in and external rules; the only thing external rules can't do is
register a Go-coded `Validate` function (we don't run user-supplied code).

## Rule fields

| Field              | Type        | Required | Notes                                                     |
|--------------------|-------------|----------|-----------------------------------------------------------|
| `id`               | string      | yes      | Stable, namespaced (`provider.subtype`). Used for dedupe. |
| `name`             | string      | yes      | Human label shown in output.                              |
| `provider`         | string      | no       | Vendor name (`AWS`, `Stripe`, …).                         |
| `secret_type`      | string      | no       | `api_key`, `pat`, `webhook`, `private_key`, …             |
| `severity`         | enum        | yes      | `critical|high|medium|low|info`.                          |
| `pattern`          | regex       | yes      | RE2 syntax. ≤ 4096 bytes.                                 |
| `group`            | int         | no       | Capture-group index (default 0 = full match).             |
| `confidence_prior` | float [0,1] | no       | Default 0.55.                                             |
| `requires_context` | bool        | no       | If true, drops match when no context keyword in ±96 chars.|
| `context_keywords` | []string    | no       | Keywords required if `requires_context: true`.            |
| `min_entropy`      | float       | no       | Drop match when Shannon entropy < this.                   |
| `min_len`          | int         | no       | Drop match shorter than this.                             |
| `max_len`          | int         | no       | Drop match longer than this.                              |
| `high_fp_prone`    | bool        | no       | Apply stricter entropy + char-class gates.                |
| `tp_examples`      | []string    | yes¹     | Example values the rule MUST match.                       |
| `fp_examples`      | []string    | yes¹     | Example values the rule MUST NOT match.                   |

¹ Required *contractually* (R6). `--self-test` walks every rule's TP and FP
fixtures; CI gates merges on `--self-test` exit code.

## Confidence model

Score starts at `confidence_prior`, then is adjusted:

| Adjustment                                     | Delta  |
|------------------------------------------------|--------|
| Source path matches `vendor/chunk/runtime/…`   | −0.15  |
| Provider validator passed                      | +0.10  |
| Context keyword present (`requires_context`)   | +0.05  |
| Surrounding text contains fixture keywords     | −0.30  |
| Shannon entropy ≥ 4.5                          | +0.05  |
| Character-class diversity ≥ 3                  | +0.05  |

Hard rejects (no score, drop the match):

- Match in `vendorNoiseExact` or contains a `vendorNoiseSubstr` fragment.
- Length below `min_len` or above `max_len`.
- Entropy below `min_entropy`.
- `high_fp_prone` rule with character-class diversity < 2 or entropy < 3.0.
- `requires_context` rule with no keyword hit.
- `Validate` function returns false (built-in rules only).
- Line is a `//# sourceMappingURL=` marker.

`--min-confidence` (default 0.50) gates the final score.

## External rule pack format

A pack is a JSON file containing an array of rule objects:

```json
[
  {
    "id": "acme.api_key",
    "name": "Acme API Key",
    "provider": "Acme",
    "secret_type": "api_key",
    "severity": "high",
    "pattern": "\\bacme_[A-Za-z0-9]{32}\\b",
    "confidence_prior": 0.85,
    "min_len": 37,
    "max_len": 37,
    "tp_examples": ["acme_aBcDeFgHiJkLmNoPqRsTuVwXyZ012345"],
    "fp_examples": ["acme_placeholder_____xxxxxxxxxxxxxx"]
  }
]
```

Load with:

```bash
jshunter --rules-file /path/to/rules.json -u https://target.com/app.js
```

Validation is strict: any rule that fails to compile or misses a required
field rejects the whole pack. That keeps "why didn't my rule fire?" out of
the support queue.

## Adding a built-in rule

1. Add a `Rule{}` literal to `registerRules()` in `detection.go`.
2. If the provider has a stable read-only liveness endpoint, register a
   verifier in `registerVerifiers()` in `verify.go` keyed by the rule ID.
3. Add `TPExamples` and `FPExamples` lists. **No rule ships without an FP
   fixture**; one of the most common failure modes is shipping a rule that
   flags a famous open-source bundle.
4. If your rule is `high_fp_prone` (matches a generic shape like 32-hex),
   either set `requires_context: true` with provider-specific
   `context_keywords`, or set `min_entropy` ≥ 3.5 and `max_len` to the
   actual provider format. **Both are usually correct.**
5. Run `go test ./...` and `./jshunter --self-test`. CI must be green.

## Provider validator contract

Validators are pure Go functions of signature
`func(value string) (ok bool, reasons []string)`. They MUST be deterministic
and offline; never call the network from a validator (use the verifier in
`verify.go` for that). They MAY be slow per call (e.g., CRC32) — they run on
matched candidates only, not every byte of the body.

Examples:

- `validateAWSAccessKeyID` — prefix family + base32 alphabet check.
- `validateGitHubToken` — CRC32 base62 trailing checksum verification.
- `validateJWT` — base64url-decode header, parse JSON, require `alg` field.
- `validateStripeKey` — prefix family + clean base62 body (no `_`).

## Anti-patterns

- ❌ A rule whose pattern is `(?i).*key.*=.*[A-Za-z0-9]{8,}.*`. This will
  flood the operator. Be specific. Use the provider's documented prefix.
- ❌ A rule with no FP fixture. You have a regex that flagged a vendor
  bundle once; that's the FP fixture. Add it.
- ❌ Calling `regexp.MustCompile` inside a hot loop. Compile once at
  registration time.
- ❌ A rule whose `severity` is `critical` for a publishable key. Public
  keys aren't credentials; they're configuration. `low` or `info` is right.


================================================
FILE: cmd/jshunter/main.go
================================================
package main

import "github.com/cc1a2b/jshunter/internal/jshunter"

func main() {
	jshunter.Run()
}


================================================
FILE: go.mod
================================================
module github.com/cc1a2b/jshunter

go 1.24.0

toolchain go1.24.5

require golang.org/x/net v0.49.0


================================================
FILE: go.sum
================================================
golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=
golang.org/x/net v0.49.0/go.mod h1:/ysNB2EvaqvesRkuLAyjI1ycPZlQHM3q01F02UY/MV8=


================================================
FILE: internal/jshunter/aws_pair.go
================================================
package jshunter

import (
	"context"
	"crypto/hmac"
	"crypto/sha256"
	"encoding/hex"
	"fmt"
	"io"
	"net/http"
	"strings"
	"time"
)

// AWS SigV4 verifier for sts:GetCallerIdentity.
//
// AWS credentials come in pairs (Access Key ID + Secret Access Key); a single
// AKID can't be verified alone — the signing process requires both. When
// JSHunter detects both in the same source, this verifier signs a minimal
// read-only POST to sts.amazonaws.com and reports back account/ARN.
//
// Region is fixed at us-east-1 because the global STS endpoint is
// regionalized as us-east-1; service is "sts". No SDK dependency.

const (
	awsService = "sts"
	awsRegion  = "us-east-1"
	awsHost    = "sts.amazonaws.com"
)

// AWSPair is a (AKID, SecretKey) tuple discovered in the same source.
type AWSPair struct {
	AccessKeyID     string
	SecretAccessKey string
	Source          string
	Line            int
	Column          int
}

// pairAWSCredentials walks the dedupe map and returns pairs that share a
// Source. The pairing is conservative: same source, both findings present.
// Cross-source pairing risks attaching the wrong secret to the wrong AKID.
func pairAWSCredentials() []AWSPair {
	findingsMutex.Lock()
	defer findingsMutex.Unlock()

	bySource := map[string]struct {
		akids   []*Finding
		secrets []*Finding
	}{}
	for _, f := range findingsByHash {
		s := bySource[f.Source]
		switch f.RuleID {
		case "aws.access_key_id":
			s.akids = append(s.akids, f)
		case "aws.secret_access_key":
			s.secrets = append(s.secrets, f)
		}
		bySource[f.Source] = s
	}

	pairs := []AWSPair{}
	for src, s := range bySource {
		// Single AKID + single secret in the same source is the only case
		// we can pair with confidence. Multiple of either yield ambiguity;
		// skip those — operator can run --no-fp-filter and triage manually.
		if len(s.akids) == 1 && len(s.secrets) == 1 {
			a, sec := s.akids[0], s.secrets[0]
			pairs = append(pairs, AWSPair{
				AccessKeyID:     a.Value,
				SecretAccessKey: sec.Value,
				Source:          src,
				Line:            a.Line,
				Column:          a.Column,
			})
		}
	}
	return pairs
}

// verifyAWSPair calls sts:GetCallerIdentity with SigV4. Returns alive=true
// and the ARN of the caller on success, alive=false on any 4xx, error string
// on transport failure. Sanitizes any leaked secret from the error.
func verifyAWSPair(ctx context.Context, client *http.Client, p AWSPair) VerifyResult {
	body := "Action=GetCallerIdentity&Version=2011-06-15"
	now := time.Now().UTC()
	dateStr := now.Format("20060102")
	timeStr := now.Format("20060102T150405Z")

	bodyHash := sha256Hex([]byte(body))
	canonicalReq := strings.Join([]string{
		"POST",
		"/",
		"",
		"content-type:application/x-www-form-urlencoded; charset=utf-8",
		"host:" + awsHost,
		"x-amz-content-sha256:" + bodyHash,
		"x-amz-date:" + timeStr,
		"",
		"content-type;host;x-amz-content-sha256;x-amz-date",
		bodyHash,
	}, "\n")

	credScope := dateStr + "/" + awsRegion + "/" + awsService + "/aws4_request"
	stringToSign := strings.Join([]string{
		"AWS4-HMAC-SHA256",
		timeStr,
		credScope,
		sha256Hex([]byte(canonicalReq)),
	}, "\n")

	signingKey := awsDeriveSigningKey(p.SecretAccessKey, dateStr, awsRegion, awsService)
	signature := hmacHex(signingKey, []byte(stringToSign))

	authHeader := fmt.Sprintf(
		"AWS4-HMAC-SHA256 Credential=%s/%s, SignedHeaders=content-type;host;x-amz-content-sha256;x-amz-date, Signature=%s",
		p.AccessKeyID, credScope, signature,
	)

	req, err := http.NewRequestWithContext(ctx, "POST", "https://"+awsHost+"/", strings.NewReader(body))
	if err != nil {
		return VerifyResult{Error: err.Error()}
	}
	req.Host = awsHost
	req.Header.Set("Content-Type", "application/x-www-form-urlencoded; charset=utf-8")
	req.Header.Set("X-Amz-Date", timeStr)
	req.Header.Set("X-Amz-Content-Sha256", bodyHash)
	req.Header.Set("Authorization", authHeader)

	host := req.URL.Host
	release := verifyHostLimiter.acquire(host)
	defer release()

	resp, err := client.Do(req)
	if err != nil {
		return VerifyResult{Error: sanitizeNetErr(err.Error())}
	}
	defer resp.Body.Close()
	respBody, _ := io.ReadAll(&capReader{r: resp.Body, max: 32 * 1024})

	res := VerifyResult{Status: resp.StatusCode}
	if resp.StatusCode == http.StatusOK {
		res.Alive = true
		s := string(respBody)
		// STS returns XML with <Arn>arn:aws:iam::123456789012:user/Foo</Arn>.
		// We use substring extraction rather than a full XML decoder — the
		// expected response shape is stable and small.
		if i := strings.Index(s, "<Arn>"); i != -1 {
			j := strings.Index(s[i+5:], "</Arn>")
			if j != -1 {
				res.Account = s[i+5 : i+5+j]
			}
		}
		res.Note = "sts:GetCallerIdentity returned 200"
	} else {
		// Don't leak the response body — STS error replies can echo the
		// AKID. Sanitize aggressively.
		res.Note = fmt.Sprintf("sts returned %d", resp.StatusCode)
	}
	return res
}

// awsDeriveSigningKey computes the SigV4 signing key:
//
//	kDate    = HMAC("AWS4" + secret, dateStr)
//	kRegion  = HMAC(kDate, region)
//	kService = HMAC(kRegion, service)
//	kSigning = HMAC(kService, "aws4_request")
func awsDeriveSigningKey(secret, dateStr, region, service string) []byte {
	k := hmacBytes([]byte("AWS4"+secret), []byte(dateStr))
	k = hmacBytes(k, []byte(region))
	k = hmacBytes(k, []byte(service))
	return hmacBytes(k, []byte("aws4_request"))
}

func sha256Hex(b []byte) string {
	h := sha256.Sum256(b)
	return hex.EncodeToString(h[:])
}

func hmacBytes(key, msg []byte) []byte {
	m := hmac.New(sha256.New, key)
	m.Write(msg)
	return m.Sum(nil)
}

func hmacHex(key, msg []byte) string {
	return hex.EncodeToString(hmacBytes(key, msg))
}


================================================
FILE: internal/jshunter/cache.go
================================================
package jshunter

import (
	"crypto/sha256"
	"encoding/hex"
	"encoding/json"
	"fmt"
	"net/http"
	"os"
	"path/filepath"
	"sync"
	"time"
)

// On-disk cache for HTTP responses, keyed by URL hash. The point is twofold:
//
//  1. Re-running JSHunter on the same target during triage shouldn't
//     re-pull megabytes of bundles we already saw.
//  2. With ETag / Last-Modified, the second run becomes mostly 304s on
//     the wire — kinder to targets, faster on the operator's laptop.
//
// On disk:
//   <cache-dir>/<sha256(url)>.body  — response body
//   <cache-dir>/<sha256(url)>.meta  — JSON metadata (Etag, Last-Modified, status, fetchedAt, contentType)
//
// We do NOT cache responses with set-cookie or auth headers — those are
// session-specific and caching them is a security hazard.

type cacheMeta struct {
	URL          string    `json:"url"`
	Status       int       `json:"status"`
	ContentType  string    `json:"content_type,omitempty"`
	ETag         string    `json:"etag,omitempty"`
	LastModified string    `json:"last_modified,omitempty"`
	FetchedAt    time.Time `json:"fetched_at"`
	Size         int       `json:"size"`
}

type DiskCache struct {
	dir string
	mu  sync.Mutex
}

func NewDiskCache(dir string) (*DiskCache, error) {
	if dir == "" {
		return nil, nil
	}
	if err := os.MkdirAll(dir, 0o755); err != nil {
		return nil, fmt.Errorf("create cache dir: %w", err)
	}
	return &DiskCache{dir: dir}, nil
}

func (c *DiskCache) keyFor(u string) string {
	h := sha256.Sum256([]byte(u))
	return hex.EncodeToString(h[:])
}

func (c *DiskCache) bodyPath(u string) string {
	return filepath.Join(c.dir, c.keyFor(u)+".body")
}

func (c *DiskCache) metaPath(u string) string {
	return filepath.Join(c.dir, c.keyFor(u)+".meta")
}

// Lookup returns the cached entry if present. Caller decides whether to
// short-circuit (use as-is) or revalidate via If-None-Match.
func (c *DiskCache) Lookup(u string) (body []byte, meta *cacheMeta, ok bool) {
	if c == nil {
		return nil, nil, false
	}
	c.mu.Lock()
	defer c.mu.Unlock()

	rawMeta, err := os.ReadFile(c.metaPath(u))
	if err != nil {
		return nil, nil, false
	}
	var m cacheMeta
	if err := json.Unmarshal(rawMeta, &m); err != nil {
		return nil, nil, false
	}
	body, err = os.ReadFile(c.bodyPath(u))
	if err != nil {
		return nil, nil, false
	}
	return body, &m, true
}

// Store writes body + metadata. Skipped silently when the response carries
// `Set-Cookie` or `Authorization` (security hazard) or when `body` is empty.
func (c *DiskCache) Store(u string, resp *http.Response, body []byte) error {
	if c == nil || len(body) == 0 {
		return nil
	}
	if resp.Header.Get("Set-Cookie") != "" {
		return nil
	}
	c.mu.Lock()
	defer c.mu.Unlock()

	meta := cacheMeta{
		URL:          u,
		Status:       resp.StatusCode,
		ContentType:  resp.Header.Get("Content-Type"),
		ETag:         resp.Header.Get("ETag"),
		LastModified: resp.Header.Get("Last-Modified"),
		FetchedAt:    time.Now().UTC(),
		Size:         len(body),
	}
	rawMeta, err := json.Marshal(meta)
	if err != nil {
		return err
	}
	if err := os.WriteFile(c.metaPath(u), rawMeta, 0o600); err != nil {
		return err
	}
	if err := os.WriteFile(c.bodyPath(u), body, 0o600); err != nil {
		return err
	}
	return nil
}

// AttachConditional sets If-None-Match / If-Modified-Since on a request when
// we have a cached entry. The caller observes a 304 in makeRequestWithRetry
// and substitutes the cached body.
func (c *DiskCache) AttachConditional(req *http.Request) {
	if c == nil {
		return
	}
	_, m, ok := c.Lookup(req.URL.String())
	if !ok {
		return
	}
	if m.ETag != "" {
		req.Header.Set("If-None-Match", m.ETag)
	}
	if m.LastModified != "" {
		req.Header.Set("If-Modified-Since", m.LastModified)
	}
}


================================================
FILE: internal/jshunter/concurrent_verify.go
================================================
package jshunter

import (
	"context"
	"net/http"
	"sync"
	"time"
)

// VerifyAllConcurrent runs liveness probes against every Finding that has
// a registered verifier, using a bounded worker pool. Per-host concurrency
// is still capped by `verifyHostLimiter`; the worker pool here is a
// global ceiling on simultaneous outbound HTTP calls so a scan with 200
// findings doesn't open 200 sockets at once.
//
// Each probe is bounded by `timeout`. Findings missing a verifier are
// silently skipped. Mutates findings in place (sets Verified, Verify,
// Confidence).
func VerifyAllConcurrent(findings []*Finding, client *http.Client, timeout time.Duration, workers int) {
	if workers <= 0 {
		workers = 8
	}
	if timeout <= 0 {
		timeout = 10 * time.Second
	}
	registerVerifiers()

	jobs := make(chan *Finding)
	var wg sync.WaitGroup
	wg.Add(workers)
	for i := 0; i < workers; i++ {
		go func() {
			defer wg.Done()
			for f := range jobs {
				v, ok := verifierRegistry[f.RuleID]
				if !ok {
					continue
				}
				if globalStats != nil {
					statInc(&globalStats.VerifyAttempts)
				}
				ctx, cancel := context.WithTimeout(context.Background(), timeout)
				res := v(ctx, client, f.Value)
				cancel()
				findingsMutex.Lock()
				f.Verify = &res
				if res.Alive {
					f.Verified = true
					f.Confidence = 1.0
				}
				findingsMutex.Unlock()
				switch {
				case res.Alive && globalStats != nil:
					statInc(&globalStats.VerifyAlive)
				case res.Error != "" && globalStats != nil:
					statInc(&globalStats.VerifyError)
				case globalStats != nil:
					statInc(&globalStats.VerifyDead)
				}
			}
		}()
	}
	for _, f := range findings {
		jobs <- f
	}
	close(jobs)
	wg.Wait()
}


================================================
FILE: internal/jshunter/crawler.go
================================================
package jshunter

import (
	"fmt"
	"math/rand"
	"net/http"
	"net/url"
	"strconv"
	"sync"
	"time"
)

// Per-host concurrency cap. Recon tools that hit one host with N parallel
// goroutines get banned. The default is intentionally conservative; operators
// who own the target can raise it via --threads (which becomes a global cap)
// without changing the per-host floor.
const (
	defaultPerHostConcurrency = 4
	defaultBreakerThreshold   = 5
	defaultBreakerCooldown    = 30 * time.Second
)

// hostController bounds outbound concurrency per host AND tracks consecutive
// 429/5xx responses for a circuit breaker. When the breaker trips, all
// subsequent requests to that host are dropped until cooldown elapses.
type hostController struct {
	perHost int
	mu      sync.Mutex
	state   map[string]*hostState
}

type hostState struct {
	sem        chan struct{}
	failStreak int
	tripUntil  time.Time
}

var (
	globalHostController *hostController
	hostControllerOnce   sync.Once
)

func getHostController() *hostController {
	hostControllerOnce.Do(func() {
		globalHostController = &hostController{
			perHost: defaultPerHostConcurrency,
			state:   map[string]*hostState{},
		}
	})
	return globalHostController
}

// host returns or creates the per-host bookkeeping struct.
func (c *hostController) host(h string) *hostState {
	c.mu.Lock()
	defer c.mu.Unlock()
	s, ok := c.state[h]
	if !ok {
		s = &hostState{sem: make(chan struct{}, c.perHost)}
		c.state[h] = s
	}
	return s
}

// acquire blocks until a token is available for the host. Returns a release
// closure and a bool — false means the breaker is tripped and the caller
// should NOT make the request.
func (c *hostController) acquire(host string) (release func(), allowed bool) {
	if host == "" {
		return func() {}, true
	}
	s := c.host(host)
	c.mu.Lock()
	if !s.tripUntil.IsZero() && time.Now().Before(s.tripUntil) {
		c.mu.Unlock()
		return func() {}, false
	}
	c.mu.Unlock()
	s.sem <- struct{}{}
	return func() { <-s.sem }, true
}

// recordOutcome teaches the circuit breaker. 200/2xx clears the streak; 429
// or 5xx increments it; once we've crossed the threshold the host is benched
// for the cooldown duration.
func (c *hostController) recordOutcome(host string, status int, retryAfter time.Duration) {
	if host == "" {
		return
	}
	s := c.host(host)
	c.mu.Lock()
	defer c.mu.Unlock()
	if status >= 200 && status < 400 {
		s.failStreak = 0
		return
	}
	if status == http.StatusTooManyRequests || status >= 500 {
		s.failStreak++
		if s.failStreak >= defaultBreakerThreshold {
			cd := defaultBreakerCooldown
			if retryAfter > cd {
				cd = retryAfter
			}
			s.tripUntil = time.Now().Add(cd)
			s.failStreak = 0
		}
	}
}

// parseRetryAfter returns the duration the server asked us to wait. Honors
// both seconds-form ("Retry-After: 30") and HTTP-date form. Returns 0 when
// absent or unparseable.
func parseRetryAfter(h http.Header) time.Duration {
	v := h.Get("Retry-After")
	if v == "" {
		return 0
	}
	if secs, err := strconv.Atoi(v); err == nil && secs >= 0 {
		return time.Duration(secs) * time.Second
	}
	if t, err := http.ParseTime(v); err == nil {
		d := time.Until(t)
		if d > 0 {
			return d
		}
	}
	return 0
}

// backoffWithJitter returns the v0.6 retry sleep — exponential base with
// ±25% jitter to avoid thundering-herd when many concurrent crawlers hit the
// same backoff schedule on the same host.
func backoffWithJitter(attempt int) time.Duration {
	if attempt < 0 {
		attempt = 0
	}
	if attempt > 6 {
		attempt = 6
	}
	base := time.Duration(1<<uint(attempt)) * time.Second
	jitter := time.Duration(rand.Int63n(int64(base) / 2))
	return base + jitter - time.Duration(int64(base)/4)
}

// hostOf is a tiny helper for the controller; tolerates malformed URLs.
func hostOf(rawURL string) string {
	u, err := url.Parse(rawURL)
	if err != nil || u == nil {
		return ""
	}
	return u.Host
}

// describeBreaker is used by --verbose to explain why a request was dropped.
func describeBreaker(host string) string {
	c := getHostController()
	c.mu.Lock()
	defer c.mu.Unlock()
	s, ok := c.state[host]
	if !ok || s.tripUntil.IsZero() {
		return ""
	}
	left := time.Until(s.tripUntil).Round(time.Second)
	return fmt.Sprintf("breaker tripped for %s, %s remaining", host, left)
}


================================================
FILE: internal/jshunter/csp.go
================================================
package jshunter

import (
	"strings"
)

// ParseCSPOrigins extracts host origins from a Content-Security-Policy
// header (or http-equiv meta value). Recon use-case: the allow-list of
// hosts a site loads from is a fast list of subdomains and third-party
// vendors to investigate. We only return scheme://host[:port] tokens —
// keywords (`'self'`, `'unsafe-inline'`), data:, blob:, mediastream:,
// filesystem: are filtered out.
func ParseCSPOrigins(policy string) []string {
	if policy == "" {
		return nil
	}
	seen := map[string]struct{}{}
	out := []string{}
	for _, dir := range strings.Split(policy, ";") {
		dir = strings.TrimSpace(dir)
		if dir == "" {
			continue
		}
		fields := strings.Fields(dir)
		if len(fields) < 2 {
			continue
		}
		// Skip directive name (default-src, script-src, …); iterate sources.
		for _, src := range fields[1:] {
			src = strings.Trim(src, "\"'")
			if src == "" {
				continue
			}
			if strings.HasPrefix(src, "'") || strings.HasPrefix(src, "*") {
				continue
			}
			low := strings.ToLower(src)
			if strings.HasPrefix(low, "data:") || strings.HasPrefix(low, "blob:") ||
				strings.HasPrefix(low, "mediastream:") || strings.HasPrefix(low, "filesystem:") ||
				strings.HasPrefix(low, "ws:") || strings.HasPrefix(low, "wss:") ||
				strings.HasPrefix(low, "self") || strings.HasPrefix(low, "none") ||
				strings.HasPrefix(low, "nonce-") || strings.HasPrefix(low, "sha256-") ||
				strings.HasPrefix(low, "sha384-") || strings.HasPrefix(low, "sha512-") ||
				strings.HasPrefix(low, "strict-dynamic") || strings.HasPrefix(low, "report-sample") ||
				strings.HasPrefix(low, "unsafe-") {
				continue
			}
			if _, ok := seen[src]; !ok {
				seen[src] = struct{}{}
				out = append(out, src)
			}
		}
	}
	return out
}


================================================
FILE: internal/jshunter/detection.go
================================================
package jshunter

import (
	"crypto/sha256"
	"encoding/base64"
	"encoding/hex"
	"encoding/json"
	"fmt"
	"hash/crc32"
	"math"
	"regexp"
	"sort"
	"strings"
	"sync"
)

// SchemaVersion tags every JSON finding so downstream tools can detect
// breaking changes in the JSHunter output contract.
const (
	SchemaVersion        = 2
	DefaultMinConfidence = 0.50
	DefaultMaxBytes      = 32 * 1024 * 1024
	contextWindow        = 96
)

type Severity string

const (
	SevCritical Severity = "critical"
	SevHigh     Severity = "high"
	SevMedium   Severity = "medium"
	SevLow      Severity = "low"
	SevInfo     Severity = "info"
)

// Rule is a single secret-class detector with all signals the FP pipeline needs.
type Rule struct {
	ID              string
	Name            string
	Provider        string
	SecretType      string
	Severity        Severity
	Pattern         *regexp.Regexp
	Group           int
	ConfidencePrior float64
	RequiresContext bool
	ContextKeywords []string
	MinEntropy      float64
	MinLen          int
	MaxLen          int
	HighFPProne     bool
	Validate        func(string) (bool, []string)
	TPExamples      []string
	FPExamples      []string
}

// Location records every distinct site at which the same secret value was seen.
type Location struct {
	Source string `json:"source"`
	Line   int    `json:"line,omitempty"`
	Column int    `json:"column,omitempty"`
}

// Finding is the v0.6 unit of output: scored, deduped, and self-describing.
// Line/Column carry the position of the *first* occurrence; Locations[]
// mirrors all subsequent occurrences after dedupe.
type Finding struct {
	SchemaVersion int           `json:"schema_version"`
	RuleID        string        `json:"rule_id,omitempty"`
	Name          string        `json:"name"`
	Provider      string        `json:"provider,omitempty"`
	SecretType    string        `json:"secret_type,omitempty"`
	Severity      Severity      `json:"severity,omitempty"`
	Value         string        `json:"value,omitempty"`
	Redacted      string        `json:"redacted"`
	ValueHash     string        `json:"value_hash"`
	Source        string        `json:"source"`
	Line          int           `json:"line,omitempty"`
	Column        int           `json:"column,omitempty"`
	Confidence    float64       `json:"confidence"`
	Entropy       float64       `json:"entropy"`
	Verified      bool          `json:"verified"`
	Verify        *VerifyResult `json:"verify,omitempty"`
	Reasons       []string      `json:"reasons,omitempty"`
	Locations     []Location    `json:"locations,omitempty"`
}

var (
	rulesRegistry  []Rule
	rulesIndex     = make(map[string]*Rule)
	rulesOnce      sync.Once
	findingsByHash = make(map[string]*Finding)
	findingsMutex  sync.Mutex

	// Operator-managed suppression hooks. Set from main() once at startup,
	// consulted on every recordFinding. Both are nil-safe.
	activeIgnoreList *IgnoreList
	activeDiffSeen   map[string]bool
)

// Famous false positives the recon community has burned cycles on.
// Exact-match denylist; never a true positive.
//
// Provider sample values are split into prefix + body fragments so this
// source file does not itself trigger upstream secret-scanning systems
// (GitHub Push Protection, etc.). The runtime value is identical — Go
// folds the constant concatenation at compile time.
var vendorNoiseExact = map[string]struct{}{
	"AKIA" + "IOSFODNN7EXAMPLE":                  {},
	"wJalrXUtnFEMI/K7MDENG/bPxRfi" + "CYEXAMPLEKEY": {},
	"sk_" + "test_" + "BQokikJOvBiI2HlWgH4olfQ2": {},
	"pk_" + "test_" + "TYooMQauvdEDq54NiTphI7jx": {},
	"sk_" + "test_" + "4eC39HqLyjWDarjtT1zdp7dc": {},
	"pk_" + "test_" + "6pRNAsCfBOKtIshFeQd4XMUh": {},
	"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9" +
		".eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ" +
		".SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c": {},
}

// Substring denylist: any match containing one of these is sample/placeholder.
var vendorNoiseSubstr = []string{
	"EXAMPLEKEY", "EXAMPLEEXAMPLE", "YOUR_API_KEY", "YOURAPIKEY",
	"REPLACEME", "REPLACE_ME", "PLACEHOLDER", "XXXXXXXX",
	"INSERT_KEY_HERE", "PUT_KEY_HERE", "ENTER_YOUR_KEY",
	"my_secret_key", "test-secret-key",
	"'password'", `"password"`, "'PASSWORD'", `"PASSWORD"`, "'Password'", `"Password"`,
	"'passwd'", `"passwd"`, "'pwd'", `"pwd"`,
	"'changeme'", `"changeme"`, "'CHANGEME'", `"CHANGEME"`,
	"'admin'", `"admin"`, "'admin123'", `"admin123"`,
	"'12345678'", `"12345678"`, "'123456789'", `"123456789"`,
	"'qwerty'", `"qwerty"`, "'qwerty123'", `"qwerty123"`,
	"'letmein'", `"letmein"`, "'test123'", `"test123"`,
	"'secret'", `"secret"`, "'default'", `"default"`,
}

// Surrounding-context tokens that lower the score (looks like a fixture/sample).
var fixtureKeywords = []string{
	"example", "fixture", "dummy", "sample",
	"placeholder", "fake_", "mock_", "stub_", "lorem",
	"FIXME", "TODO", "// e.g.", "for example",
}

// Generic-rule context: at least one of these must appear within ±contextWindow
// chars when a rule is flagged RequiresContext=true.
var contextKeywordsGeneric = []string{
	"key", "token", "secret", "auth", "bearer", "api",
	"private", "credential", "password", "pwd", "session", "access",
}

// Sourcemap signature on the line is an instant skip - it's a build artifact.
var sourcemapMarkerRe = regexp.MustCompile(`(?i)//[#@]\s*source(?:mapping)?URL=`)

// Vendor chunk filename hint; raises the FP threshold.
var vendorChunkRe = regexp.MustCompile(`(?i)(?:vendor|chunk|runtime|polyfill|framework|webpack|node_modules)[-_./~]`)

// registerRules wires the curated provider registry. Called lazily so users
// who only consume legacy regexPatterns pay nothing.
func registerRules() {
	rulesOnce.Do(func() {
		rulesRegistry = append(rulesRegistry, []Rule{
			{
				ID:              "aws.access_key_id",
				Name:            "AWS Access Key ID",
				Provider:        "AWS",
				SecretType:      "access_key_id",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\b(?:AKIA|ASIA|A3T[A-Z0-9]|AGPA|AIDA|AROA|AIPA|ANPA|ANVA)[A-Z2-7]{16}\b`),
				ConfidencePrior: 0.85,
				MinLen:          20, MaxLen: 20,
				Validate:   validateAWSAccessKeyID,
				TPExamples: []string{"AKIA2OGYBAH6STMMNXWG"},
				FPExamples: []string{"AKIAIOSFODNN7EXAMPLE"},
			},
			{
				ID:              "aws.secret_access_key",
				Name:            "AWS Secret Access Key",
				Provider:        "AWS",
				SecretType:      "secret_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`(?i)\b(?:aws[_-]?(?:secret|sk))[_-]?(?:access[_-]?)?key\s*[:=]\s*["']([A-Za-z0-9/+=]{40})["']`),
				Group:           1,
				ConfidencePrior: 0.80,
				MinEntropy:      4.2,
				MinLen:          40, MaxLen: 40,
				HighFPProne: true,
				Validate:    validateAWSSecretKey,
			},
			{
				ID:              "stripe.secret_key",
				Name:            "Stripe Secret Key",
				Provider:        "Stripe",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\b(?:sk|rk)_live_[0-9A-Za-z]{20,247}\b`),
				ConfidencePrior: 0.95,
				MinLen:          28,
				Validate:        validateStripeKey,
				TPExamples:      []string{"sk_" + "live_" + "51HVFjkJK29bs8Hjk39MeOpqRsTuVwXyZ"},
				FPExamples:      []string{"sk_" + "test_" + "BQokikJOvBiI2HlWgH4olfQ2"},
			},
			{
				ID:              "stripe.restricted_key",
				Name:            "Stripe Restricted Key",
				Provider:        "Stripe",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\brk_live_[0-9A-Za-z]{20,247}\b`),
				ConfidencePrior: 0.95,
				MinLen:          28,
				Validate:        validateStripeKey,
			},
			{
				ID:              "stripe.publishable_key",
				Name:            "Stripe Publishable Key",
				Provider:        "Stripe",
				SecretType:      "publishable_key",
				Severity:        SevLow,
				Pattern:         regexp.MustCompile(`\bpk_live_[0-9A-Za-z]{20,247}\b`),
				ConfidencePrior: 0.95,
				MinLen:          28,
				Validate:        validateStripeKey,
			},
			{
				ID:              "github.pat_classic",
				Name:            "GitHub Personal Access Token",
				Provider:        "GitHub",
				SecretType:      "pat",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bgh[oprsu]_[A-Za-z0-9]{36,251}\b`),
				ConfidencePrior: 0.85,
				MinLen:          40,
				Validate:        validateGitHubToken,
			},
			{
				ID:              "github.fine_grained_pat",
				Name:            "GitHub Fine-Grained PAT",
				Provider:        "GitHub",
				SecretType:      "pat",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bgithub_pat_[0-9A-Za-z_]{82}\b`),
				ConfidencePrior: 0.95,
				MinLen:          93, MaxLen: 93,
			},
			{
				ID:              "openai.legacy_key",
				Name:            "OpenAI API Key (legacy)",
				Provider:        "OpenAI",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bsk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}\b`),
				ConfidencePrior: 0.95,
				MinLen:          51, MaxLen: 51,
			},
			{
				ID:              "openai.project_key",
				Name:            "OpenAI Project Key",
				Provider:        "OpenAI",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bsk-proj-[A-Za-z0-9_\-]{40,200}\b`),
				ConfidencePrior: 0.92,
				MinLen:          48,
			},
			{
				ID:              "openai.svcacct_key",
				Name:            "OpenAI Service Account Key",
				Provider:        "OpenAI",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bsk-svcacct-[A-Za-z0-9_\-]{40,200}\b`),
				ConfidencePrior: 0.92,
				MinLen:          48,
			},
			{
				ID:              "anthropic.api_key",
				Name:            "Anthropic API Key",
				Provider:        "Anthropic",
				SecretType:      "api_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bsk-ant-(?:api|admin)\d{2}-[A-Za-z0-9_\-]{86,200}\b`),
				ConfidencePrior: 0.95,
				MinLen:          93,
			},
			{
				ID:              "google.api_key",
				Name:            "Google API Key",
				Provider:        "Google",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bAIza[0-9A-Za-z_\-]{35}\b`),
				ConfidencePrior: 0.85,
				MinLen:          39, MaxLen: 39,
			},
			{
				ID:              "google.oauth_token",
				Name:            "Google OAuth Access Token",
				Provider:        "Google",
				SecretType:      "oauth",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bya29\.[0-9A-Za-z_\-]{40,200}\b`),
				ConfidencePrior: 0.90,
				MinLen:          45,
			},
			{
				ID:              "slack.user_or_bot_token",
				Name:            "Slack Token",
				Provider:        "Slack",
				SecretType:      "api_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bxox[bopsare]-(?:\d+-){1,4}[A-Za-z0-9]{16,40}\b`),
				ConfidencePrior: 0.92,
				MinLen:          24,
				Validate:        validateSlackToken,
			},
			{
				ID:              "slack.app_token",
				Name:            "Slack App Token",
				Provider:        "Slack",
				SecretType:      "api_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bxapp-\d-[A-Z0-9]+-\d+-[A-Za-z0-9]{40,80}\b`),
				ConfidencePrior: 0.92,
				MinLen:          50,
			},
			{
				ID:              "slack.webhook",
				Name:            "Slack Incoming Webhook",
				Provider:        "Slack",
				SecretType:      "webhook",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bhttps://hooks\.slack\.com/services/T[A-Z0-9]{8,12}/B[A-Z0-9]{8,12}/[A-Za-z0-9]{20,40}\b`),
				ConfidencePrior: 0.97,
			},
			{
				ID:              "discord.webhook",
				Name:            "Discord Webhook",
				Provider:        "Discord",
				SecretType:      "webhook",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bhttps://(?:discord|discordapp)\.com/api/webhooks/\d{17,20}/[A-Za-z0-9_\-]{60,80}\b`),
				ConfidencePrior: 0.97,
			},
			{
				ID:              "discord.bot_token",
				Name:            "Discord Bot Token",
				Provider:        "Discord",
				SecretType:      "bot_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\b[MN][A-Za-z\d]{23,25}\.[\w-]{6}\.[\w-]{27,38}\b`),
				ConfidencePrior: 0.85,
				MinLen:          59,
				HighFPProne:     true,
			},
			{
				ID:              "twilio.api_key",
				Name:            "Twilio API Key (SK)",
				Provider:        "Twilio",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bSK[0-9a-fA-F]{32}\b`),
				ConfidencePrior: 0.75,
				MinLen:          34, MaxLen: 34,
				HighFPProne:     true,
				Validate:        validateTwilioSK,
			},
			{
				ID:              "twilio.account_sid",
				Name:            "Twilio Account SID (AC)",
				Provider:        "Twilio",
				SecretType:      "account_id",
				Severity:        SevMedium,
				Pattern:         regexp.MustCompile(`\bAC[a-f0-9]{32}\b`),
				ConfidencePrior: 0.70,
				MinLen:          34, MaxLen: 34,
				HighFPProne:     true,
			},
			{
				ID:              "sendgrid.api_key",
				Name:            "SendGrid API Key",
				Provider:        "SendGrid",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bSG\.[A-Za-z0-9_\-]{22}\.[A-Za-z0-9_\-]{43}\b`),
				ConfidencePrior: 0.97,
				MinLen:          69, MaxLen: 69,
			},
			{
				ID:              "mailgun.api_key",
				Name:            "Mailgun API Key",
				Provider:        "Mailgun",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bkey-[0-9a-zA-Z]{32}\b`),
				ConfidencePrior: 0.85,
				MinLen:          36, MaxLen: 36,
			},
			{
				ID:              "mailchimp.api_key",
				Name:            "Mailchimp API Key",
				Provider:        "Mailchimp",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\b[0-9a-f]{32}-us[0-9]{1,3}\b`),
				ConfidencePrior: 0.85,
				MinLen:          35,
			},
			{
				ID:              "github.app_install_token",
				Name:            "GitHub App Installation Token",
				Provider:        "GitHub",
				SecretType:      "installation_token",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\bv1\.[a-f0-9]{40,}\b`),
				ConfidencePrior: 0.55,
				MinLen:          43,
				HighFPProne:     true,
				RequiresContext: true,
				ContextKeywords: []string{"github", "token", "install", "app"},
			},
			{
				ID:              "gitlab.pat",
				Name:            "GitLab Personal Access Token",
				Provider:        "GitLab",
				SecretType:      "pat",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bglpat-[0-9A-Za-z_\-]{20,40}\b`),
				ConfidencePrior: 0.95,
				MinLen:          26,
			},
			{
				ID:              "gitlab.pipeline_token",
				Name:            "GitLab Pipeline Trigger Token",
				Provider:        "GitLab",
				SecretType:      "pipeline_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bglptt-[a-f0-9]{40}\b`),
				ConfidencePrior: 0.97,
				MinLen:          46, MaxLen: 46,
			},
			{
				ID:              "vercel.token",
				Name:            "Vercel Token",
				Provider:        "Vercel",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\b(?:vercel_)?[A-Za-z0-9]{24}\b`),
				ConfidencePrior: 0.45,
				HighFPProne:     true,
				RequiresContext: true,
				ContextKeywords: []string{"vercel", "VERCEL_TOKEN"},
			},
			{
				ID:              "doppler.token",
				Name:            "Doppler Token",
				Provider:        "Doppler",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bdp\.(?:pt|st|sa|ct|scim|audit)\.[A-Za-z0-9]{40,44}\b`),
				ConfidencePrior: 0.97,
				MinLen:          47,
			},
			{
				ID:              "digitalocean.token",
				Name:            "DigitalOcean Token",
				Provider:        "DigitalOcean",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bdop_v1_[a-f0-9]{64}\b`),
				ConfidencePrior: 0.97,
				MinLen:          71, MaxLen: 71,
			},
			{
				ID:              "shopify.access_token",
				Name:            "Shopify Access Token",
				Provider:        "Shopify",
				SecretType:      "access_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bshpat_[a-fA-F0-9]{32}\b`),
				ConfidencePrior: 0.97,
				MinLen:          38, MaxLen: 38,
			},
			{
				ID:              "shopify.shared_secret",
				Name:            "Shopify Shared Secret",
				Provider:        "Shopify",
				SecretType:      "shared_secret",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bshpss_[a-fA-F0-9]{32}\b`),
				ConfidencePrior: 0.97,
				MinLen:          38, MaxLen: 38,
			},
			{
				ID:              "npm.token",
				Name:            "npm Access Token",
				Provider:        "npm",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bnpm_[A-Za-z0-9]{36}\b`),
				ConfidencePrior: 0.95,
				MinLen:          40, MaxLen: 40,
			},
			{
				ID:              "pypi.token",
				Name:            "PyPI Token",
				Provider:        "PyPI",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bpypi-AgEIcHlwaS5vcmc[A-Za-z0-9_\-]{50,200}\b`),
				ConfidencePrior: 0.97,
				MinLen:          70,
			},
			{
				ID:              "jwt.token",
				Name:            "JSON Web Token",
				Provider:        "JWT",
				SecretType:      "jwt",
				Severity:        SevMedium,
				Pattern:         regexp.MustCompile(`\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9_\-]{8,}\b`),
				ConfidencePrior: 0.70,
				MinLen:          40,
				Validate:        validateJWT,
			},
			{
				ID:              "rsa.private_key",
				Name:            "RSA Private Key",
				Provider:        "PKI",
				SecretType:      "private_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`-----BEGIN RSA PRIVATE KEY-----`),
				ConfidencePrior: 0.99,
			},
			{
				ID:              "openssh.private_key",
				Name:            "OpenSSH Private Key",
				Provider:        "PKI",
				SecretType:      "private_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`-----BEGIN OPENSSH PRIVATE KEY-----`),
				ConfidencePrior: 0.99,
			},
			{
				ID:              "ec.private_key",
				Name:            "EC Private Key",
				Provider:        "PKI",
				SecretType:      "private_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`-----BEGIN EC PRIVATE KEY-----`),
				ConfidencePrior: 0.99,
			},
			{
				ID:              "pgp.private_key",
				Name:            "PGP Private Key",
				Provider:        "PKI",
				SecretType:      "private_key",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`-----BEGIN PGP PRIVATE KEY BLOCK-----`),
				ConfidencePrior: 0.99,
			},
			{
				ID:              "facebook.access_token",
				Name:            "Facebook Access Token",
				Provider:        "Meta",
				SecretType:      "access_token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bEAA[A-Za-z0-9]{20,200}\b`),
				ConfidencePrior: 0.65,
				MinLen:          25,
				HighFPProne:     true,
				RequiresContext: true,
				ContextKeywords: []string{"facebook", "fb", "meta", "graph.facebook"},
			},
			{
				ID:              "linear.api_key",
				Name:            "Linear API Key",
				Provider:        "Linear",
				SecretType:      "api_key",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\blin_(?:api|oauth)_[A-Za-z0-9]{40,80}\b`),
				ConfidencePrior: 0.97,
				MinLen:          47,
			},
			{
				ID:              "huggingface.token",
				Name:            "HuggingFace Token",
				Provider:        "HuggingFace",
				SecretType:      "token",
				Severity:        SevHigh,
				Pattern:         regexp.MustCompile(`\bhf_[A-Za-z0-9]{34,80}\b`),
				ConfidencePrior: 0.92,
				MinLen:          37,
			},
			{
				ID:              "supabase.service_role",
				Name:            "Supabase Service Role JWT",
				Provider:        "Supabase",
				SecretType:      "service_role",
				Severity:        SevCritical,
				Pattern:         regexp.MustCompile(`\beyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9\.[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+\b`),
				ConfidencePrior: 0.75,
				MinLen:          60,
				Validate:        validateJWT,
			},
		}...)

		for i := range rulesRegistry {
			r := &rulesRegistry[i]
			rulesIndex[r.ID] = r
		}
	})
}

// shannonEntropy is the standard bits-per-symbol entropy. Vendored here to
// avoid pulling another dep; the math/log2 path is hot-cache safe.
func shannonEntropy(s string) float64 {
	if s == "" {
		return 0
	}
	freq := make(map[rune]int, len(s))
	for _, r := range s {
		freq[r]++
	}
	length := float64(len(s))
	entropy := 0.0
	for _, c := range freq {
		p := float64(c) / length
		entropy -= p * math.Log2(p)
	}
	return entropy
}

// charClassDiversity counts how many of the four shape classes the string uses.
// Real secrets tend to use 3+; minified identifiers use 1-2.
func charClassDiversity(s string) int {
	var hasLower, hasUpper, hasDigit, hasSym bool
	for _, r := range s {
		switch {
		case r >= 'a' && r <= 'z':
			hasLower = true
		case r >= 'A' && r <= 'Z':
			hasUpper = true
		case r >= '0' && r <= '9':
			hasDigit = true
		case r == '_' || r == '-' || r == '.' || r == '+' || r == '/' || r == '=':
			hasSym = true
		}
	}
	d := 0
	if hasLower {
		d++
	}
	if hasUpper {
		d++
	}
	if hasDigit {
		d++
	}
	if hasSym {
		d++
	}
	return d
}

// redactValue masks the middle of a secret while keeping enough head/tail to
// disambiguate findings without leaking the secret to logs.
func redactValue(v string) string {
	n := len(v)
	switch {
	case n <= 8:
		return strings.Repeat("*", n)
	case n <= 16:
		return v[:2] + strings.Repeat("*", n-4) + v[n-2:]
	default:
		return v[:4] + strings.Repeat("*", n-8) + v[n-4:]
	}
}

// hashValue gives a stable, short identifier for dedup that can't be reversed
// to the secret itself when emitted in summary output.
func hashValue(v string) string {
	h := sha256.Sum256([]byte(v))
	return hex.EncodeToString(h[:8])
}

// looksLikeFixture returns true if the surrounding context smells like docs/example code.
func looksLikeFixture(context string) bool {
	low := strings.ToLower(context)
	for _, kw := range fixtureKeywords {
		if strings.Contains(low, strings.ToLower(kw)) {
			return true
		}
	}
	return false
}

// hasContextKeyword checks whether at least one of `kws` appears (case-insensitive)
// in the given context window.
func hasContextKeyword(context string, kws []string) bool {
	if len(kws) == 0 {
		return true
	}
	low := strings.ToLower(context)
	for _, kw := range kws {
		if strings.Contains(low, strings.ToLower(kw)) {
			return true
		}
	}
	return false
}

// isInVendorNoise screens canonical sample/placeholder values.
func isInVendorNoise(v string) (bool, string) {
	if _, ok := vendorNoiseExact[v]; ok {
		return true, "exact-match in vendor-noise corpus"
	}
	for _, sub := range vendorNoiseSubstr {
		if strings.Contains(v, sub) {
			return true, "contains placeholder fragment '" + sub + "'"
		}
	}
	return false, ""
}

// extractContextWindow returns the slice of body around `start..end` for context analysis.
func extractContextWindow(body string, start, end int) string {
	a := start - contextWindow
	if a < 0 {
		a = 0
	}
	b := end + contextWindow
	if b > len(body) {
		b = len(body)
	}
	return body[a:b]
}

// scoreFinding runs the FP pipeline against (rule, value, context) and returns
// (keep, score, reasons). Score is in [0,1]. Caller uses the configured
// minimum-confidence threshold to gate output. Counters are incremented when
// a match is dropped at a known stage so --stats can audit the pipeline.
func scoreFinding(rule *Rule, value, context, source string) (bool, float64, []string) {
	reasons := []string{}
	score := rule.ConfidencePrior
	if score == 0 {
		score = 0.5
	}

	if vendorChunkRe.MatchString(source) {
		score -= 0.15
		reasons = append(reasons, "source looks like a vendor/chunk bundle")
	}

	if drop, why := isInVendorNoise(value); drop {
		if globalStats != nil {
			statInc(&globalStats.DroppedVendorNoise)
		}
		return false, 0, []string{why}
	}

	if rule.MinLen > 0 && len(value) < rule.MinLen {
		return false, 0, []string{fmt.Sprintf("length %d < MinLen %d", len(value), rule.MinLen)}
	}
	if rule.MaxLen > 0 && len(value) > rule.MaxLen {
		return false, 0, []string{fmt.Sprintf("length %d > MaxLen %d", len(value), rule.MaxLen)}
	}

	entropy := shannonEntropy(value)
	if rule.MinEntropy > 0 && entropy < rule.MinEntropy {
		if globalStats != nil {
			statInc(&globalStats.DroppedLowEntropy)
		}
		return false, 0, []string{fmt.Sprintf("entropy %.2f < required %.2f", entropy, rule.MinEntropy)}
	}

	if rule.HighFPProne {
		diversity := charClassDiversity(value)
		if diversity < 2 {
			if globalStats != nil {
				statInc(&globalStats.DroppedLowEntropy)
			}
			return false, 0, []string{"insufficient character-class diversity for high-FP rule"}
		}
		if entropy < 3.0 {
			if globalStats != nil {
				statInc(&globalStats.DroppedLowEntropy)
			}
			return false, 0, []string{fmt.Sprintf("entropy %.2f too low for high-FP rule", entropy)}
		}
	}

	if rule.RequiresContext {
		kws := rule.ContextKeywords
		if len(kws) == 0 {
			kws = contextKeywordsGeneric
		}
		if !hasContextKeyword(context, kws) {
			if globalStats != nil {
				statInc(&globalStats.DroppedNoContext)
			}
			return false, 0, []string{"missing required context keyword(s)"}
		}
		score += 0.05
		reasons = append(reasons, "context keyword present")
	}

	if rule.Validate != nil {
		ok, vReasons := rule.Validate(value)
		if !ok {
			return false, 0, append([]string{"provider validator rejected"}, vReasons...)
		}
		score += 0.10
		reasons = append(reasons, vReasons...)
	}

	if looksLikeFixture(context) {
		score -= 0.30
		if globalStats != nil {
			statInc(&globalStats.DroppedFixture)
		}
		reasons = append(reasons, "surrounded by fixture/example wording")
	}

	if entropy >= 4.5 {
		score += 0.05
		reasons = append(reasons, fmt.Sprintf("high entropy %.2f", entropy))
	}

	if charClassDiversity(value) >= 3 {
		score += 0.05
		reasons = append(reasons, "diverse character classes")
	}

	if score < 0 {
		score = 0
	}
	if score > 1 {
		score = 1
	}

	return true, score, reasons
}

// recordFinding inserts or merges a finding into the dedupe map keyed by
// (value_hash, secret_type). Same secret seen in many sources collapses to a
// single Finding with a Locations[] list, each location carrying its own
// line:column pair so an operator can `vim file:line:col` directly.
//
// Returns nil when the finding is suppressed by the active ignore list or
// the active diff baseline. Callers must check for nil.
func recordFinding(f *Finding) *Finding {
	if activeIgnoreList != nil && activeIgnoreList.ShouldIgnore(f) {
		return nil
	}
	if activeDiffSeen != nil && activeDiffSeen[f.ValueHash] {
		return nil
	}
	findingsMutex.Lock()
	defer findingsMutex.Unlock()
	key := f.ValueHash + "|" + f.SecretType
	loc := Location{Source: f.Source, Line: f.Line, Column: f.Column}
	if existing, ok := findingsByHash[key]; ok {
		existing.Locations = append(existing.Locations, loc)
		if f.Confidence > existing.Confidence {
			existing.Confidence = f.Confidence
			existing.Reasons = f.Reasons
		}
		if f.Verified && !existing.Verified {
			existing.Verified = true
			existing.Verify = f.Verify
		}
		return existing
	}
	f.Locations = []Location{loc}
	findingsByHash[key] = f
	if globalStats != nil {
		statInc(&globalStats.FindingsAfterDedupe)
	}
	return f
}

// flushFindings returns a snapshot of all unique findings, sorted by severity then confidence.
func flushFindings() []*Finding {
	findingsMutex.Lock()
	defer findingsMutex.Unlock()
	out := make([]*Finding, 0, len(findingsByHash))
	for _, f := range findingsByHash {
		out = append(out, f)
	}
	sevRank := map[Severity]int{
		SevCritical: 5, SevHigh: 4, SevMedium: 3, SevLow: 2, SevInfo: 1,
	}
	sort.Slice(out, func(i, j int) bool {
		ri, rj := sevRank[out[i].Severity], sevRank[out[j].Severity]
		if ri != rj {
			return ri > rj
		}
		if out[i].Confidence != out[j].Confidence {
			return out[i].Confidence > out[j].Confidence
		}
		return out[i].Name < out[j].Name
	})
	return out
}

// resetFindings clears the dedupe state between independent runs.
func resetFindings() {
	findingsMutex.Lock()
	findingsByHash = make(map[string]*Finding)
	findingsMutex.Unlock()
}

// analyzeBody runs the curated registry against `body` and returns scored findings.
// Source is the URL or filepath. minConfidence gates which findings pass.
// Each Finding records the byte offset, line, and column of the match so
// downstream tools can anchor results back to the exact source location.
func analyzeBody(source string, body []byte, minConfidence float64) []*Finding {
	registerRules()
	bodyStr := string(body)
	out := []*Finding{}

	for i := range rulesRegistry {
		rule := &rulesRegistry[i]
		matches := rule.Pattern.FindAllStringSubmatchIndex(bodyStr, -1)
		for _, m := range matches {
			start, end := m[0], m[1]
			value := bodyStr[start:end]
			if rule.Group > 0 && len(m) > 2*rule.Group+1 {
				gs, ge := m[2*rule.Group], m[2*rule.Group+1]
				if gs >= 0 && ge >= 0 {
					start, end = gs, ge
					value = bodyStr[start:end]
				}
			}

			lineCtx := bodyStr[lineStartIndex(bodyStr, start):lineEndIndex(bodyStr, end)]
			if sourcemapMarkerRe.MatchString(lineCtx) {
				if globalStats != nil {
					statInc(&globalStats.DroppedSourcemap)
				}
				continue
			}

			ctx := extractContextWindow(bodyStr, start, end)
			keep, score, reasons := scoreFinding(rule, value, ctx, source)
			if !keep {
				continue
			}
			if score < minConfidence {
				if globalStats != nil {
					statInc(&globalStats.DroppedBelowConf)
				}
				continue
			}

			line, col := positionAt(bodyStr, start)
			f := &Finding{
				SchemaVersion: SchemaVersion,
				RuleID:        rule.ID,
				Name:          rule.Name,
				Provider:      rule.Provider,
				SecretType:    rule.SecretType,
				Severity:      rule.Severity,
				Value:         value,
				Redacted:      redactValue(value),
				ValueHash:     hashValue(value),
				Source:        source,
				Confidence:    score,
				Entropy:       shannonEntropy(value),
				Reasons:       reasons,
				Line:          line,
				Column:        col,
			}
			if globalStats != nil {
				statInc(&globalStats.RegistryHits)
				statInc(&globalStats.FindingsAfterFilter)
			}
			if rec := recordFinding(f); rec != nil {
				out = append(out, rec)
			}
		}
	}
	return out
}

// positionAt returns the 1-indexed (line, column) of byte offset `idx` in s.
// Cheap O(idx) scan; called once per finding so the cost is negligible
// relative to the regex evaluation that produced the offset.
func positionAt(s string, idx int) (line, col int) {
	if idx < 0 {
		idx = 0
	}
	if idx > len(s) {
		idx = len(s)
	}
	line, col = 1, 1
	for i := 0; i < idx; i++ {
		if s[i] == '\n' {
			line++
			col = 1
		} else {
			col++
		}
	}
	return line, col
}

func lineStartIndex(s string, idx int) int {
	if idx <= 0 {
		return 0
	}
	if idx >= len(s) {
		idx = len(s) - 1
	}
	for i := idx; i > 0; i-- {
		if s[i-1] == '\n' {
			return i
		}
	}
	return 0
}

func lineEndIndex(s string, idx int) int {
	if idx >= len(s) {
		return len(s)
	}
	for i := idx; i < len(s); i++ {
		if s[i] == '\n' {
			return i
		}
	}
	return len(s)
}

// applyLegacyFPFilter wraps the existing regexPatterns dictionary so that
// every legacy hit is also scored. Returns (keep, confidence, reasons) so the
// caller can decide to print or drop, and to optionally show the score.
//
// This is the bridge that brings v0.6 quality to rules we have not yet
// migrated into the curated registry.
func applyLegacyFPFilter(name, value, body, source string, start, end int) (bool, float64, []string) {
	if globalStats != nil {
		statInc(&globalStats.LegacyMatchesRaw)
	}
	if value == "" {
		return false, 0, []string{"empty match"}
	}
	if drop, why := isInVendorNoise(value); drop {
		if globalStats != nil {
			statInc(&globalStats.DroppedVendorNoise)
		}
		return false, 0, []string{why}
	}

	ctx := extractContextWindow(body, start, end)
	if sourcemapMarkerRe.MatchString(body[lineStartIndex(body, start):lineEndIndex(body, end)]) {
		if globalStats != nil {
			statInc(&globalStats.DroppedSourcemap)
		}
		return false, 0, []string{"line is a //# sourceMappingURL marker"}
	}

	score := 0.55
	reasons := []string{}

	low := strings.ToLower(name)
	highFP := strings.HasPrefix(low, "generic ") || strings.Contains(low, "quickbooks") ||
		strings.Contains(low, "cisco access") || strings.Contains(low, "sanity") ||
		strings.Contains(low, "atlassian access") || strings.Contains(low, "heroku")

	entropy := shannonEntropy(value)
	diversity := charClassDiversity(value)

	if highFP {
		if diversity < 2 {
			if globalStats != nil {
				statInc(&globalStats.DroppedLowEntropy)
			}
			return false, 0, []string{"high-FP-prone rule: low character-class diversity"}
		}
		if entropy < 3.2 {
			if globalStats != nil {
				statInc(&globalStats.DroppedLowEntropy)
			}
			return false, 0, []string{fmt.Sprintf("high-FP-prone rule: entropy %.2f too low", entropy)}
		}
		if !hasContextKeyword(ctx, contextKeywordsGeneric) {
			if globalStats != nil {
				statInc(&globalStats.DroppedNoContext)
			}
			return false, 0, []string{"high-FP-prone rule: no key/token/secret context keyword"}
		}
		reasons = append(reasons, "context keyword present (generic-rule gate)")
	}

	if vendorChunkRe.MatchString(source) {
		score -= 0.15
		reasons = append(reasons, "vendor/chunk bundle")
	}

	if looksLikeFixture(ctx) {
		score -= 0.30
		if globalStats != nil {
			statInc(&globalStats.DroppedFixture)
		}
		reasons = append(reasons, "context looks like a fixture/example")
	}

	if entropy >= 4.5 {
		score += 0.05
		reasons = append(reasons, fmt.Sprintf("high entropy %.2f", entropy))
	}

	if score < 0 {
		score = 0
	}
	if score > 1 {
		score = 1
	}
	return true, score, reasons
}

// validateAWSAccessKeyID enforces the documented prefix family and pure-base32
// body alphabet (A-Z, 2-7). Excludes 0,1,8,9 which AWS deliberately omits.
func validateAWSAccessKeyID(v string) (bool, []string) {
	if len(v) != 20 {
		return false, []string{"length != 20"}
	}
	prefix := v[:4]
	switch prefix {
	case "AKIA", "ASIA", "AGPA", "AIDA", "AROA", "AIPA", "ANPA", "ANVA":
	default:
		if !(strings.HasPrefix(v, "A3T") && v[3] >= 'A' && v[3] <= 'Z') {
			return false, []string{"unknown AWS access key prefix"}
		}
	}
	for i := 4; i < 20; i++ {
		c := v[i]
		ok := (c >= 'A' && c <= 'Z') || (c >= '2' && c <= '7')
		if !ok {
			return false, []string{"non-base32 character in body"}
		}
	}
	return true, []string{"AWS prefix + base32 body OK"}
}

// validateAWSSecretKey enforces 40-char body and high entropy on the captured group.
func validateAWSSecretKey(v string) (bool, []string) {
	if len(v) != 40 {
		return false, []string{"length != 40"}
	}
	if shannonEntropy(v) < 4.0 {
		return false, []string{"entropy below 4.0"}
	}
	if charClassDiversity(v) < 3 {
		return false, []string{"low character-class diversity"}
	}
	return true, []string{"40-char base64 body, high entropy"}
}

// validateStripeKey verifies the prefix family and that the body is clean base62
// (no underscores), which Stripe uses to avoid colliding with random hashes.
func validateStripeKey(v string) (bool, []string) {
	prefixes := []string{"sk_live_", "sk_test_", "rk_live_", "rk_test_", "pk_live_", "pk_test_"}
	matched := ""
	for _, p := range prefixes {
		if strings.HasPrefix(v, p) {
			matched = p
			break
		}
	}
	if matched == "" {
		return false, []string{"unknown Stripe key prefix"}
	}
	body := v[len(matched):]
	if len(body) < 20 {
		return false, []string{"Stripe body too short"}
	}
	for _, c := range body {
		if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9')) {
			return false, []string{"non-base62 char in Stripe key body"}
		}
	}
	return true, []string{"Stripe " + matched + " key, base62 body"}
}

// validateGitHubToken implements the documented CRC32-base62 tail checksum.
// GitHub tokens of the form ghp_/gho_/ghu_/ghs_/ghr_ embed a 6-char checksum
// computed over the random body. Verifying it is one of the highest-precision
// signals available without a network call.
func validateGitHubToken(v string) (bool, []string) {
	if len(v) < 40 {
		return false, []string{"too short for GitHub token"}
	}
	if !strings.HasPrefix(v, "ghp_") && !strings.HasPrefix(v, "gho_") &&
		!strings.HasPrefix(v, "ghu_") && !strings.HasPrefix(v, "ghs_") &&
		!strings.HasPrefix(v, "ghr_") {
		return false, []string{"unknown GitHub token prefix"}
	}
	body := v[4:]
	if len(body) < 6 {
		return false, []string{"body too short for checksum"}
	}
	random := body[:len(body)-6]
	checksum := body[len(body)-6:]
	want := base62EncodeCRC32(crc32.ChecksumIEEE([]byte(random)))
	if !strings.EqualFold(want, checksum) {
		return false, []string{"CRC32 checksum mismatch (cannot verify; treat as suspect)"}
	}
	return true, []string{"GitHub CRC32 checksum verified"}
}

// base62EncodeCRC32 encodes a uint32 as 6-character base62, left-padded with '0'.
func base62EncodeCRC32(n uint32) string {
	const alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
	if n == 0 {
		return "000000"
	}
	buf := make([]byte, 0, 6)
	for n > 0 {
		buf = append([]byte{alphabet[n%62]}, buf...)
		n /= 62
	}
	for len(buf) < 6 {
		buf = append([]byte{'0'}, buf...)
	}
	return string(buf)
}

// validateSlackToken enforces the hyphenated segment shape used across the
// Slack token family (xoxb/xoxp/xoxa/xoxr/xoxs/xoxe/xapp).
func validateSlackToken(v string) (bool, []string) {
	parts := strings.Split(v, "-")
	if len(parts) < 3 {
		return false, []string{"too few hyphen-segments for Slack token"}
	}
	for _, p := range parts[1 : len(parts)-1] {
		for _, c := range p {
			if c < '0' || c > '9' {
				return false, []string{"non-numeric inner segment"}
			}
		}
	}
	tail := parts[len(parts)-1]
	if len(tail) < 16 {
		return false, []string{"tail segment too short"}
	}
	return true, []string{"Slack hyphen-segment structure OK"}
}

// validateTwilioSK enforces the 32-hex body and rejects pure-zero or repeating runs.
func validateTwilioSK(v string) (bool, []string) {
	if len(v) != 34 || !strings.HasPrefix(v, "SK") {
		return false, []string{"bad Twilio SK shape"}
	}
	body := v[2:]
	for _, c := range body {
		if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F')) {
			return false, []string{"non-hex char in Twilio SK body"}
		}
	}
	if shannonEntropy(body) < 3.5 {
		return false, []string{"Twilio SK entropy too low"}
	}
	return true, []string{"32-hex Twilio SK body"}
}

// validateJWT decodes the header and payload as base64url-encoded JSON and
// requires that the header carries an `alg` field. Catches the very common
// "JWT-shaped strings that aren't JWTs" FP class.
func validateJWT(v string) (bool, []string) {
	parts := strings.Split(v, ".")
	if len(parts) != 3 {
		return false, []string{"JWT must have 3 dot-separated segments"}
	}
	headerBytes, err := base64.RawURLEncoding.DecodeString(parts[0])
	if err != nil {
		// some JWT libs emit padded base64; tolerate it
		headerBytes, err = base64.URLEncoding.DecodeString(parts[0])
		if err != nil {
			return false, []string{"JWT header is not base64url"}
		}
	}
	var header map[string]any
	if err := json.Unmarshal(headerBytes, &header); err != nil {
		return false, []string{"JWT header is not JSON"}
	}
	if _, ok := header["alg"]; !ok {
		return false, []string{"JWT header missing alg"}
	}
	payloadBytes, err := base64.RawURLEncoding.DecodeString(parts[1])
	if err != nil {
		payloadBytes, err = base64.URLEncoding.DecodeString(parts[1])
		if err != nil {
			return false, []string{"JWT payload is not base64url"}
		}
	}
	var payload map[string]any
	if err := json.Unmarshal(payloadBytes, &payload); err != nil {
		return false, []string{"JWT payload is not JSON"}
	}
	return true, []string{"JWT structurally valid (alg present, JSON header+payload)"}
}

// SelfTestResult is the per-rule outcome of `--self-test`.
type SelfTestResult struct {
	RuleID   string `json:"rule_id"`
	Name     string `json:"name"`
	TPPassed int    `json:"tp_passed"`
	TPTotal  int    `json:"tp_total"`
	FPCaught int    `json:"fp_caught"`
	FPTotal  int    `json:"fp_total"`
	OK       bool   `json:"ok"`
	Notes    []string `json:"notes,omitempty"`
}

// runSelfTest exercises every registered rule against its embedded TP/FP
// fixtures and reports a precision/recall summary. A rule is OK when it
// catches all TPs and rejects all FPs.
func runSelfTest() []SelfTestResult {
	registerRules()
	out := make([]SelfTestResult, 0, len(rulesRegistry))
	for i := range rulesRegistry {
		r := &rulesRegistry[i]
		res := SelfTestResult{RuleID: r.ID, Name: r.Name, OK: true}
		for _, tp := range r.TPExamples {
			res.TPTotal++
			fakeBody := "const apiKey = \"" + tp + "\";"
			fs := analyzeBody("self-test://"+r.ID, []byte(fakeBody), 0.0)
			ok := false
			for _, f := range fs {
				if f.RuleID == r.ID && f.Value == tp {
					ok = true
					break
				}
			}
			if ok {
				res.TPPassed++
			} else {
				res.OK = false
				res.Notes = append(res.Notes, "missed TP: "+redactValue(tp))
			}
			resetFindings()
		}
		for _, fp := range r.FPExamples {
			res.FPTotal++
			fakeBody := "const apiKey = \"" + fp + "\";"
			fs := analyzeBody("self-test://"+r.ID, []byte(fakeBody), 0.0)
			caught := true
			for _, f := range fs {
				if f.RuleID == r.ID && f.Value == fp {
					caught = false
					break
				}
			}
			if caught {
				res.FPCaught++
			} else {
				res.OK = false
				res.Notes = append(res.Notes, "leaked FP: "+redactValue(fp))
			}
			resetFindings()
		}
		out = append(out, res)
	}
	return out
}


================================================
FILE: internal/jshunter/diff.go
================================================
package jshunter

import (
	"encoding/json"
	"fmt"
	"os"
)

// DiffPrevious reads a previous schema-v2 envelope and returns the set of
// value_hashes already reported. Operators run:
//
//	jshunter ... -j -o yesterday.json
//	jshunter ... --diff yesterday.json -j -o today-new.json
//
// and only see findings that weren't there yesterday. Anchored on
// value_hash because secrets that move between sources or that match
// different rules across releases must still dedupe consistently.
func DiffPrevious(path string) (map[string]bool, error) {
	if path == "" {
		return nil, nil
	}
	raw, err := os.ReadFile(path)
	if err != nil {
		return nil, fmt.Errorf("read previous: %w", err)
	}
	var env struct {
		SchemaVersion int       `json:"schema_version"`
		Findings      []Finding `json:"findings"`
	}
	if err := json.Unmarshal(raw, &env); err != nil {
		return nil, fmt.Errorf("parse previous: %w", err)
	}
	if env.SchemaVersion != SchemaVersion {
		return nil, fmt.Errorf("--diff requires schema_version=%d; previous file has %d", SchemaVersion, env.SchemaVersion)
	}
	seen := make(map[string]bool, len(env.Findings))
	for _, f := range env.Findings {
		if f.ValueHash != "" {
			seen[f.ValueHash] = true
		}
	}
	return seen, nil
}


================================================
FILE: internal/jshunter/har.go
================================================
package jshunter

import (
	"encoding/base64"
	"encoding/json"
	"fmt"
	"os"
	"strings"
)

// HAR (HTTP Archive) ingestion. Operators who already have a Burp/Chrome
// devtools archive don't need JSHunter to re-fetch — feeding the HAR
// directly is faster and reproducible.

type harFile struct {
	Log struct {
		Entries []harEntry `json:"entries"`
	} `json:"log"`
}

type harEntry struct {
	Request struct {
		URL string `json:"url"`
	} `json:"request"`
	Response struct {
		Status  int `json:"status"`
		Content struct {
			MimeType string `json:"mimeType"`
			Text     string `json:"text"`
			Encoding string `json:"encoding"`
		} `json:"content"`
	} `json:"response"`
}

// IngestHAR reads a HAR file and runs the v0.6 detection pipeline against
// every JS-typed response within it. Non-JS entries are silently skipped.
// Returns the number of entries scanned (useful for --stats and CI gating).
func IngestHAR(path string, config *Config) (int, error) {
	raw, err := os.ReadFile(path)
	if err != nil {
		return 0, fmt.Errorf("read har: %w", err)
	}
	var h harFile
	if err := json.Unmarshal(raw, &h); err != nil {
		return 0, fmt.Errorf("parse har: %w", err)
	}

	scanned := 0
	for _, e := range h.Log.Entries {
		if e.Response.Status < 200 || e.Response.Status >= 400 {
			continue
		}
		mt := strings.ToLower(e.Response.Content.MimeType)
		urlLower := strings.ToLower(e.Request.URL)
		isJS := strings.Contains(mt, "javascript") ||
			strings.Contains(mt, "ecmascript") ||
			strings.HasSuffix(urlLower, ".js") ||
			strings.Contains(urlLower, ".js?")
		if !isJS {
			continue
		}

		body := []byte(e.Response.Content.Text)
		if e.Response.Content.Encoding == "base64" {
			if dec, derr := harBase64Decode(body); derr == nil {
				body = dec
			}
		}
		if config.MaxBytes > 0 && int64(len(body)) > config.MaxBytes {
			body = body[:config.MaxBytes]
			if globalStats != nil {
				statInc(&globalStats.BytesTruncated)
			}
		}
		if globalStats != nil {
			statInc(&globalStats.URLsFetched)
			statAdd(&globalStats.BytesParsed, int64(len(body)))
		}
		processed := processJSAnalysis(body, config)
		reportMatchesWithConfig(e.Request.URL, processed, config)
		scanned++
	}
	return scanned, nil
}

// harBase64Decode is tolerant of std/URL/raw base64 variants (HAR exporters
// disagree). We try each in order and return the first decode that succeeds.
func harBase64Decode(b []byte) ([]byte, error) {
	s := strings.TrimSpace(string(b))
	for _, dec := range []func(string) ([]byte, error){
		base64.StdEncoding.DecodeString,
		base64.URLEncoding.DecodeString,
		base64.RawStdEncoding.DecodeString,
		base64.RawURLEncoding.DecodeString,
	} {
		if out, err := dec(s); err == nil {
			return out, nil
		}
	}
	return nil, fmt.Errorf("har: not decodable as any base64 variant")
}


================================================
FILE: internal/jshunter/html_extract.go
================================================
package jshunter

import (
	"bytes"
	"fmt"
	"io"
	"strings"

	"golang.org/x/net/html"
)

// HTMLArtifacts is the structured slice of recon-relevant payloads
// extracted from one HTML response. Inline scripts and SRI hashes are not
// available in JS-only crawls — operators routinely miss secrets that live
// in the homepage's `<script>` tag rather than an external bundle.
type HTMLArtifacts struct {
	InlineScripts []InlineScript
	ExternalJS    []ExternalJS
	CSPOrigins    []string
	Sourcemaps    []string
}

type InlineScript struct {
	// Index is the zero-based position of the script tag in the document
	// so we can synthesize a stable per-script source id (`page#script[3]`).
	Index    int
	Body     string
	Type     string // "module" | "" | "application/json" | …
	Nonce    string
	IsLDJSON bool
}

type ExternalJS struct {
	URL       string
	Integrity string // SRI: "sha384-..."
	Async     bool
	Defer     bool
	Type      string
}

// ExtractFromHTML parses an HTML body and returns the extractable
// artifacts. Robust to malformed input — `golang.org/x/net/html` recovers
// from broken markup the way browsers do.
func ExtractFromHTML(body []byte) (*HTMLArtifacts, error) {
	out := &HTMLArtifacts{}
	z := html.NewTokenizer(bytes.NewReader(body))
	scriptIdx := 0

	for {
		tt := z.Next()
		switch tt {
		case html.ErrorToken:
			if err := z.Err(); err != nil && err != io.EOF {
				return out, fmt.Errorf("html tokenizer: %w", err)
			}
			return out, nil

		case html.StartTagToken, html.SelfClosingTagToken:
			t := z.Token()
			switch strings.ToLower(t.Data) {
			case "script":
				attrs := tagAttrs(t)
				src := attrs["src"]
				if src != "" {
					out.ExternalJS = append(out.ExternalJS, ExternalJS{
						URL:       src,
						Integrity: attrs["integrity"],
						Async:     hasAttr(t, "async"),
						Defer:     hasAttr(t, "defer"),
						Type:      attrs["type"],
					})
				} else if tt == html.StartTagToken {
					// Capture inline script body up to </script>.
					body, err := readUntilEndTag(z, "script")
					if err == nil && strings.TrimSpace(body) != "" {
						script := InlineScript{
							Index: scriptIdx,
							Body:  body,
							Type:  attrs["type"],
							Nonce: attrs["nonce"],
						}
						script.IsLDJSON = strings.EqualFold(script.Type, "application/ld+json")
						out.InlineScripts = append(out.InlineScripts, script)
					}
					scriptIdx++
				}

			case "meta":
				// CSP via http-equiv (some sites prefer this over header).
				if strings.EqualFold(tagAttrs(t)["http-equiv"], "Content-Security-Policy") {
					content := tagAttrs(t)["content"]
					out.CSPOrigins = append(out.CSPOrigins, ParseCSPOrigins(content)...)
				}

			case "link":
				attrs := tagAttrs(t)
				rel := strings.ToLower(attrs["rel"])
				href := attrs["href"]
				if href != "" {
					switch rel {
					case "preload", "modulepreload", "prefetch":
						if strings.EqualFold(attrs["as"], "script") || rel == "modulepreload" {
							out.ExternalJS = append(out.ExternalJS, ExternalJS{
								URL:       href,
								Integrity: attrs["integrity"],
								Type:      "module",
							})
						}
					}
				}
			}
		}
	}
}

// readUntilEndTag consumes tokens up to and including the closing tag,
// returning the concatenated text content. Used to capture inline script
// bodies which the tokenizer reports as a separate Text token.
func readUntilEndTag(z *html.Tokenizer, tag string) (string, error) {
	var buf bytes.Buffer
	for {
		tt := z.Next()
		switch tt {
		case html.ErrorToken:
			return buf.String(), z.Err()
		case html.TextToken:
			buf.Write(z.Text())
		case html.EndTagToken:
			t := z.Token()
			if strings.EqualFold(t.Data, tag) {
				return buf.String(), nil
			}
		}
	}
}

func tagAttrs(t html.Token) map[string]string {
	m := make(map[string]string, len(t.Attr))
	for _, a := range t.Attr {
		m[strings.ToLower(a.Key)] = a.Val
	}
	return m
}

func hasAttr(t html.Token, name string) bool {
	for _, a := range t.Attr {
		if strings.EqualFold(a.Key, name) {
			return true
		}
	}
	return false
}

// looksLikeHTML returns true when the response body is HTML rather than JS.
// We use a tiny prefix sniff rather than the full encoding/sniff implementation
// because the body is already bounded by --max-bytes.
func looksLikeHTML(body []byte, contentType string) bool {
	if strings.Contains(strings.ToLower(contentType), "html") {
		return true
	}
	head := body
	if len(head) > 512 {
		head = head[:512]
	}
	low := strings.ToLower(string(head))
	low = strings.TrimSpace(low)
	return strings.HasPrefix(low, "<!doctype html") ||
		strings.HasPrefix(low, "<html") ||
		strings.HasPrefix(low, "<head") ||
		strings.HasPrefix(low, "<body")
}


================================================
FILE: internal/jshunter/ignore.go
================================================
package jshunter

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strings"
)

// .jshunterignore is the operator's permanent suppression list. Format is
// one entry per line, blank/`#` lines ignored. Supported kinds:
//
//	hash:<value_hash_hex>           # suppress one specific finding
//	rule:<rule_id|rule_id_glob>     # suppress an entire rule (or family)
//	source:<glob>                   # suppress all findings whose source matches
//	rule_value:<rule>:<value_glob>  # suppress findings where rule matches and
//	                                # value matches the glob (after rule)
//
// Globs use the standard filepath.Match syntax (`*`, `?`, `[abc]`).

type IgnoreEntry struct {
	Kind string
	A    string
	B    string
}

type IgnoreList struct {
	Entries []IgnoreEntry
}

// LoadIgnoreFile reads and parses an ignore file. A missing file is NOT an
// error — operators expect to run with or without one — but a malformed
// file is, because silently ignoring bad rules invites "why didn't my
// suppression work?" tickets.
func LoadIgnoreFile(path string) (*IgnoreList, error) {
	if path == "" {
		return nil, nil
	}
	f, err := os.Open(path)
	if err != nil {
		if os.IsNotExist(err) {
			return nil, nil
		}
		return nil, fmt.Errorf("open ignore file: %w", err)
	}
	defer f.Close()
	return parseIgnoreReader(f)
}

func parseIgnoreReader(r io.Reader) (*IgnoreList, error) {
	il := &IgnoreList{}
	sc := bufio.NewScanner(r)
	lineNo := 0
	for sc.Scan() {
		lineNo++
		line := strings.TrimSpace(sc.Text())
		if line == "" || strings.HasPrefix(line, "#") {
			continue
		}
		idx := strings.Index(line, ":")
		if idx == -1 {
			return nil, fmt.Errorf("ignore line %d: missing ':' separator", lineNo)
		}
		kind := strings.TrimSpace(line[:idx])
		rest := strings.TrimSpace(line[idx+1:])
		if rest == "" {
			return nil, fmt.Errorf("ignore line %d: empty value", lineNo)
		}
		switch kind {
		case "hash", "rule", "source":
			il.Entries = append(il.Entries, IgnoreEntry{Kind: kind, A: rest})
		case "rule_value":
			parts := strings.SplitN(rest, ":", 2)
			if len(parts) != 2 || parts[0] == "" || parts[1] == "" {
				return nil, fmt.Errorf("ignore line %d: rule_value needs <rule>:<value-glob>", lineNo)
			}
			il.Entries = append(il.Entries, IgnoreEntry{Kind: "rule_value", A: parts[0], B: parts[1]})
		default:
			return nil, fmt.Errorf("ignore line %d: unknown kind %q (want hash|rule|source|rule_value)", lineNo, kind)
		}
	}
	if err := sc.Err(); err != nil {
		return nil, err
	}
	return il, nil
}

// ShouldIgnore returns true if any entry matches this finding.
func (il *IgnoreList) ShouldIgnore(f *Finding) bool {
	if il == nil {
		return false
	}
	for _, e := range il.Entries {
		switch e.Kind {
		case "hash":
			if f.ValueHash == e.A {
				return true
			}
		case "rule":
			if f.RuleID == e.A || globMatch(e.A, f.RuleID) {
				return true
			}
		case "source":
			if globMatch(e.A, f.Source) {
				return true
			}
		case "rule_value":
			if (f.RuleID == e.A || globMatch(e.A, f.RuleID)) && globMatch(e.B, f.Value) {
				return true
			}
		}
	}
	return false
}

func globMatch(pattern, s string) bool {
	if pattern == "*" {
		return true
	}
	ok, _ := filepath.Match(pattern, s)
	return ok
}


================================================
FILE: internal/jshunter/jshunter.go
================================================
package jshunter

import (
    "bufio"
    "context"
    "crypto/tls"
    "encoding/csv"
    "encoding/json"
    "flag"
    "fmt"
    "io"
    "math"
    "net"
    "net/http"
    "net/url"
    "os"
    "regexp"
    "runtime"
    "sort"
    "strings"
    "sync"
    "time"
    "math/rand"

    "golang.org/x/net/proxy"
)


var (
    version = "v0.7.5"
    colors = map[string]string{
        "RED":    "\033[0;31m",
        "GREEN":  "\033[0;32m",
        "BLUE":   "\033[0;34m",
        "YELLOW": "\033[0;33m",
        "CYAN":   "\033[0;36m",
        "PURPLE": "\033[0;35m",
        "NC":     "\033[0m",
    }
    // Global deduplication for all outputs
    globalSeenParams = make(map[string]bool)
    globalSeenAll    = make(map[string]bool)
    globalSeenMutex  sync.Mutex
    globalFoundAny   = false // Track if any findings were made across all files
    missingMessages  = make([]string, 0) // Buffer for MISSING messages
    missingMutex     sync.Mutex
)



var (
    //regex-cc1a2b
    regexPatterns = map[string]*regexp.Regexp{
	"Google API":                    regexp.MustCompile(`AIza[0-9A-Za-z-_]{35}`),
	"Firebase":                      regexp.MustCompile(`AAAA[A-Za-z0-9_-]{7}:[A-Za-z0-9_-]{140}(?:\s|$|[^A-Za-z0-9_-])`),
	"Amazon Aws Access Key ID":      regexp.MustCompile(`A[SK]IA[0-9A-Z]{16}`),
	"Amazon Mws Auth Token":         regexp.MustCompile(`\bamzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b`),
	"Amazon Aws Url":                regexp.MustCompile(`s3\.amazonaws.com[/]+|[a-zA-Z0-9_-]*\.s3\.amazonaws.com`),
	"Amazon Aws Url2":               regexp.MustCompile(`([a-zA-Z0-9-._]+\.s3\.amazonaws\.com|s3://[a-zA-Z0-9-._]+|s3-[a-z]{2}-[a-z]+-[0-9]+\.amazonaws\.com|s3.amazonaws.com/[a-zA-Z0-9-._]+|s3.console.aws.amazon.com/s3/buckets/[a-zA-Z0-9-._]+)`),
	"Facebook Access Token":         regexp.MustCompile(`EAACEdEose0cBA[0-9A-Za-z]+`),
	"Authorization Basic":           regexp.MustCompile(`(?i)\bauthorization\s*:\s*basic\s+[a-zA-Z0-9=:_\+\/-]{20,100}`),
	"Authorization Bearer":          regexp.MustCompile(`(?i)\bauthorization\s*:\s*bearer\s+[a-zA-Z0-9_\-\.=:_\+\/]{20,100}`),
    "Authorization Api":             regexp.MustCompile(`(?i)\bapi[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_\-]{20,100}["']?`),
	"Twilio Api Key":                regexp.MustCompile(`SK[0-9a-fA-F]{32}`),
	"Twilio Account Sid":            regexp.MustCompile(`(?i)\b(?:twilio|tw)\s*[_-]?account[_-]?sid\s*[:=]\s*["']?AC[a-zA-Z0-9_\-]{32}["']?`),
	"Twilio App Sid":                regexp.MustCompile(`\bAP[a-fA-F0-9]{32}\b`),
	"Paypal Braintre Access Token":  regexp.MustCompile(`access_token\$production\$[0-9a-z]{16}\$[0-9a-f]{32}`),
	"Square Oauth Secret":           regexp.MustCompile(`sq0csp-[0-9A-Za-z\-_]{43}|sq0[a-z]{3}-[0-9A-Za-z\-_]{22,43}`),
	"Square Access Token":           regexp.MustCompile(`sqOatp-[0-9A-Za-z\-_]{22}`),
	"Stripe Standard Api":           regexp.MustCompile(`sk_live_[0-9a-zA-Z]{24}`),
	"Stripe Restricted Api":         regexp.MustCompile(`rk_live_[0-9a-zA-Z]{24}`),
	"Authorization Github Token":    regexp.MustCompile(`\bghp_[a-zA-Z0-9]{36}\b`),
	"Github Access Token":           regexp.MustCompile(`[a-zA-Z0-9_-]+:[a-zA-Z0-9_\-]{20,}@github\.com\b`),
	"Rsa Private Key":               regexp.MustCompile(`-----BEGIN RSA PRIVATE KEY-----`),
	"Ssh Dsa Private Key":           regexp.MustCompile(`-----BEGIN DSA PRIVATE KEY-----`),
	"Ssh Dc Private Key":            regexp.MustCompile(`-----BEGIN EC PRIVATE KEY-----`),
	"Pgp Private Block":             regexp.MustCompile(`-----BEGIN PGP PRIVATE KEY BLOCK-----`),
	"Ssh Private Key":               regexp.MustCompile(`(?s)-----BEGIN OPENSSH PRIVATE KEY-----[a-zA-Z0-9+\/=\n]+-----END OPENSSH PRIVATE KEY-----`),
	"Json Web Token":                regexp.MustCompile(`\beyJ[A-Za-z0-9_\-]{8,}\.eyJ[A-Za-z0-9_\-]{8,}\.[A-Za-z0-9_\-]{8,}\b`),
    "Putty Private Key":             regexp.MustCompile(`(?s)PuTTY-User-Key-File-2.*?-----END`),
    "Ssh2 Encrypted Private Key":    regexp.MustCompile(`(?s)-----BEGIN SSH2 ENCRYPTED PRIVATE KEY-----[a-zA-Z0-9+\/=\n]+-----END SSH2 ENCRYPTED PRIVATE KEY-----`),
    "Generic Private Key":           regexp.MustCompile(`(?s)-----BEGIN.*PRIVATE KEY-----[a-zA-Z0-9+\/=\n]+-----END.*PRIVATE KEY-----`),
    "Username Password Combo":       regexp.MustCompile(`(?i)\b[a-z]+://[^/\s:@"']{1,64}:[^/\s:@"']{1,128}@[a-zA-Z0-9.\-]{3,255}`),
    "Facebook Oauth":                regexp.MustCompile(`(?i)(?:facebook|fb)[_\-]?(?:app[_\-]?)?(?:secret|client[_\-]?secret|oauth)\s*[:=]\s*['\"]?[0-9a-f]{32}['\"]?`),
    "Twitter Oauth":                 regexp.MustCompile(`(?i)\b(?:twitter|tw)\s*[_-]?oauth[_-]?token\s*[:=]\s*["']?[0-9a-zA-Z]{35,44}["']?`),
    "Github Token":                  regexp.MustCompile(`(?i)\b(gh[pousr]_[0-9a-zA-Z]{36})\b`),
    "Google Oauth Client Secret":    regexp.MustCompile(`\"client_secret\":\"[a-zA-Z0-9-_]{24}\"`),
    "Aws Api Key":                   regexp.MustCompile(`\bAKIA[0-9A-Z]{16}\b`),
	"Slack Token":                   regexp.MustCompile(`\"api_token\":\"(xox[a-zA-Z]-[a-zA-Z0-9-]+)\"`),
	"Ssh Priv Key":                  regexp.MustCompile(`([-]+BEGIN [^\s]+ PRIVATE KEY[-]+[\s]*[^-]*[-]+END [^\s]+ PRIVATE KEY[-]+)`),
	"Slack Webhook Url":             regexp.MustCompile(`https://hooks.slack.com/services/[A-Za-z0-9]+/[A-Za-z0-9]+/[A-Za-z0-9]+`),
	"Heroku Api Key 2":              regexp.MustCompile(`(?i)\bheroku[_-]?(?:api[_-]?)?key\s*[:=]\s*["']?[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}["']?`),
	"Dropbox Access Token":          regexp.MustCompile(`\bsl\.[A-Za-z0-9_-]{64,200}\b`),
	"Salesforce Access Token":       regexp.MustCompile(`00D[0-9A-Za-z]{15,18}![A-Za-z0-9]{40}`),
	"Twitter Bearer Token":          regexp.MustCompile(`\bAAAAAAAAAAAAAAAAAAAAA[A-Za-z0-9%]{30,80}\b`),
	"Firebase Url":                  regexp.MustCompile(`https://[a-z0-9-]+\.firebaseio\.com`),
	"Pem Private Key":               regexp.MustCompile(`-----BEGIN (?:[A-Z ]+ )?PRIVATE KEY-----`),
	"Google Cloud Sa Key":           regexp.MustCompile(`"type": "service_account"`),
	"Stripe Publishable Key":        regexp.MustCompile(`pk_live_[0-9a-zA-Z]{24}`),
	"Azure Storage Account Key":     regexp.MustCompile(`(?i)\b(?:AccountKey|azure[_-]?storage[_-]?key)\s*[:=]\s*["']?[A-Za-z0-9+/]{86}==["']?`),
	"Instagram Access Token":        regexp.MustCompile(`IGQV[A-Za-z0-9._-]{10,}`),
	"Stripe Test Publishable Key":   regexp.MustCompile(`pk_test_[0-9a-zA-Z]{24}`),
	"Stripe Test Secret Key":        regexp.MustCompile(`sk_test_[0-9a-zA-Z]{24}`),
	"Slack Bot Token":               regexp.MustCompile(`xoxb-[A-Za-z0-9-]{24,34}`),
	"Slack User Token":              regexp.MustCompile(`xoxp-[A-Za-z0-9-]{24,34}`),
    "Google Gmail Api Key":          regexp.MustCompile(`\bAIza[0-9A-Za-z_\-]{35}\b`),
    "Google Gmail Oauth":            regexp.MustCompile(`\b[0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent\.com\b`),
    "Google Oauth Access Token":     regexp.MustCompile(`\bya29\.[0-9A-Za-z_\-]{40,}\b`),
    "Mailchimp Api Key":             regexp.MustCompile(`[0-9a-f]{32}-us[0-9]{1,2}`),
    "Mailgun Api Key":               regexp.MustCompile(`key-[0-9a-zA-Z]{32}`),
    "Google Drive Oauth":            regexp.MustCompile(`\b[0-9]+-[0-9A-Za-z_]{32}\.apps\.googleusercontent\.com\b`),
    "Paypal Braintree Access Token": regexp.MustCompile(`access_token\$production\$[0-9a-z]{16}\$[0-9a-f]{32}`),
    "Picatic Api Key":               regexp.MustCompile(`sk_live_[0-9a-z]{32}`),
    "Stripe Api Key":                regexp.MustCompile(`sk_live_[0-9a-zA-Z]{24}`),
    "Stripe Restricted Api Key":     regexp.MustCompile(`rk_live_[0-9a-zA-Z]{24}`),
    "Square Access Token 2":         regexp.MustCompile(`\bsq0atp-[0-9A-Za-z_\-]{22}\b`),
    "Square Oauth Secret 2":         regexp.MustCompile(`\bsq0csp-[0-9A-Za-z_\-]{43}\b`),
    "Twitter Access Token":          regexp.MustCompile(`(?i)\b(?:twitter|tw)\s*[_-]?access[_-]?token\s*[:=]\s*["']?[0-9]+-[0-9a-zA-Z]{40}["']?`),
	"Heroku Api Key 3":              regexp.MustCompile(`(?i)\bheroku\b[^\n]{0,80}\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b`),
    "Generic Api Key":               regexp.MustCompile(`(?i)\bapi[_-]?key\s*[:=]\s*['\"]?[0-9a-zA-Z]{32,45}['\"]?`),
    "Generic Secret":                regexp.MustCompile(`(?i)\bsecret\s*[:=]\s*['\"]?[0-9a-zA-Z]{32,45}['\"]?`),
    "Slack Webhook":                 regexp.MustCompile(`https://hooks[.]slack[.]com/services/T[a-zA-Z0-9_]{8}/B[a-zA-Z0-9_]{8}/[a-zA-Z0-9_]{24}`),
    "Gcp Service Account":           regexp.MustCompile(`\"type\": \"service_account\"`),
    "Password in Url":               regexp.MustCompile(`[a-zA-Z]{3,10}://[^/\s:@"']{3,32}:[^/\s:@"']{3,128}@[a-zA-Z0-9.\-]{3,200}`),
	"Discord Webhook url":           regexp.MustCompile(`https://discord(?:app)?\.com/api/webhooks/[0-9]{18,20}/[A-Za-z0-9_-]{64,}`),
	"Discord bot Token":             regexp.MustCompile(`[MN][A-Za-z\d]{23}\.[\w-]{6}\.[\w-]{27}`),
	"Okta Api Token":                regexp.MustCompile(`00[a-zA-Z0-9]{30}\.[a-zA-Z0-9\-_]{30,}\.[a-zA-Z0-9\-_]{30,}`),
	"Sendgrid Api Key":              regexp.MustCompile(`SG\.[A-Za-z0-9_-]{22}\.[A-Za-z0-9_-]{43}`),
	"Mapbox Access Token":           regexp.MustCompile(`pk\.[a-zA-Z0-9]{60}\.[a-zA-Z0-9]{22}`),
	"Gitlab Personal Access token":  regexp.MustCompile(`glpat-[A-Za-z0-9\-]{20}`),
	"Datadog Api Key":               regexp.MustCompile(`ddapi_[a-zA-Z0-9]{32}`),
	"shopify Access Token":          regexp.MustCompile(`shpat_[A-Za-z0-9]{32}`),
    "Atlassian Access Token":        regexp.MustCompile(`(?i)\b(?:atlassian|jira|confluence)[_-]?(?:api[_-]?)?token\s*[:=]\s*["']?ATATT3[A-Za-z0-9_\-]{180,250}["']?`),
	"Crowdstrike Api Key":           regexp.MustCompile(`(?i)\b(?:crowdstrike|cs)[_-]?(?:api[_-]?)?(?:key|token)\s*[:=]\s*["']?[A-Za-z0-9]{32}\.[A-Za-z0-9]{16}["']?`),
	"Quickbooks Api Key":            regexp.MustCompile(`(?i)\b(?:quickbooks|qbo|intuit)[_-]?(?:api[_-]?)?(?:key|token)\s*[:=]\s*["']?A[0-9a-f]{32}["']?`),
	"Cisco Api Key":                 regexp.MustCompile(`(?i)\bcisco[_-]?(?:api[_-]?)?key\s*[:=]\s*["']?[A-Za-z0-9]{30,}["']?`),
	"Cisco Access Token":            regexp.MustCompile(`(?i)\bcisco[_-]?access[_-]?token\s*[:=]\s*["']?[A-Za-z0-9_\-]{20,}["']?`),
	"Segment Write Key":             regexp.MustCompile(`(?i)\b(?:segment[_-]?)?writeKey\s*[:=]\s*["']?[A-Za-z0-9]{32}["']?`),
	"Tiktok Access Token":           regexp.MustCompile(`\btiktok_access_token=[a-zA-Z0-9_]{20,}\b`),
	"Slack Client Secret":           regexp.MustCompile(`xoxs-[0-9]{1,9}.[0-9A-Za-z]{1,12}.[0-9A-Za-z]{24,64}`),
    "Phone Number":                  regexp.MustCompile(`(?:^|[\s"'<>:,;(\[])\+\d{9,14}(?:[\s"'<>,;.!?)\]]|$)`),
    "Email":                         regexp.MustCompile(`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`),
	"Ali Cloud Access Key":		     regexp.MustCompile(`\bLTAI[A-Za-z0-9]{12,20}\b`),
	"Tencent Cloud Access Key":	     regexp.MustCompile(`\bAKID[A-Za-z0-9]{13,20}\b`),
        "OpenAI API Key":                regexp.MustCompile(`sk-[a-zA-Z0-9]{20}T3BlbkFJ[a-zA-Z0-9]{20}`),
        "OpenAI API Key Project":        regexp.MustCompile(`sk-proj-[a-zA-Z0-9]{48,}`),
        "OpenAI API Key Svc":            regexp.MustCompile(`sk-svcacct-[a-zA-Z0-9_-]{80,}`),
        "Anthropic API Key":             regexp.MustCompile(`sk-ant-api[a-zA-Z0-9-]{37,}`),
        "HuggingFace Token":             regexp.MustCompile(`hf_[a-zA-Z0-9]{34,}`),
        "Cohere API Key":                regexp.MustCompile(`(?i)cohere[_-]?api[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9]{40}["']?`),
        "Replicate API Token":           regexp.MustCompile(`r8_[a-zA-Z0-9]{40}`),
        "Google AI API Key":             regexp.MustCompile(`(?i)(?:gemini|palm|bard)[_-]?api[_-]?key\s*[:=]\s*["']?AIza[a-zA-Z0-9_-]{35}["']?`),
        "AWS Secret Access Key":         regexp.MustCompile(`(?i)(?:aws)?[_-]?secret[_-]?(?:access)?[_-]?key\s*[:=]\s*["']?[A-Za-z0-9/+=]{40}["']?`),
        "AWS Session Token":             regexp.MustCompile(`(?i)aws[_-]?session[_-]?token\s*[:=]\s*["']?[A-Za-z0-9/+=]{100,}["']?`),
        "MongoDB Connection String":     regexp.MustCompile(`mongodb(?:\+srv)?://[a-zA-Z0-9._-]+:[^@\s"']+@[a-zA-Z0-9._-]+`),
        "PostgreSQL Connection String":  regexp.MustCompile(`postgres(?:ql)?://[a-zA-Z0-9._-]+:[^@\s"']+@[a-zA-Z0-9._-]+`),
        "MySQL Connection String":       regexp.MustCompile(`mysql://[a-zA-Z0-9._-]+:[^@\s"']+@[a-zA-Z0-9._-]+`),
        "Redis Connection String":       regexp.MustCompile(`redis://[a-zA-Z0-9._-]+:[^@\s"']+@[a-zA-Z0-9._-]+`),
        "MSSQL Connection String":       regexp.MustCompile(`(?i)(?:server|data source)=[^;]+;.*(?:password|pwd)=[^;]+`),
        "Database URL Generic":          regexp.MustCompile(`(?i)(?:database|db)[_-]?url\s*[:=]\s*["']?[a-z]+://[^:]+:[^@]+@[^\s"']+["']?`),
        "Azure Client Secret":           regexp.MustCompile(`(?i)(?:azure|ad)[_-]?(?:client)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9~._-]{34,}["']?`),
        "Azure Storage Connection":      regexp.MustCompile(`DefaultEndpointsProtocol=https?;AccountName=[^;]+;AccountKey=[a-zA-Z0-9+/=]{86,}`),
        "Azure SAS Token":               regexp.MustCompile(`(?i)[?&]sig=[a-zA-Z0-9%]{43,}`),
        "Azure SQL Connection":          regexp.MustCompile(`(?i)Server=tcp:[^;]+;.*Password=[^;]+`),
        "DigitalOcean Token":            regexp.MustCompile(`dop_v1_[a-f0-9]{64}`),
        "DigitalOcean OAuth":            regexp.MustCompile(`doo_v1_[a-f0-9]{64}`),
        "DigitalOcean Refresh":          regexp.MustCompile(`dor_v1_[a-f0-9]{64}`),
        "Linode API Token":              regexp.MustCompile(`(?i)linode[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?[a-f0-9]{64}["']?`),
        "Vultr API Key":                 regexp.MustCompile(`(?i)vultr[_-]?api[_-]?key\s*[:=]\s*["']?[A-Z0-9]{36}["']?`),
        "Hetzner API Token":             regexp.MustCompile(`(?i)hetzner[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9]{64}["']?`),
        "Oracle Cloud API Key":          regexp.MustCompile(`(?i)oci[_-]?api[_-]?key\s*[:=]\s*["']?-----BEGIN (?:RSA )?PRIVATE KEY-----`),
        "IBM Cloud API Key":             regexp.MustCompile(`(?i)ibm[_-]?(?:cloud)?[_-]?api[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_-]{44}["']?`),
        "NPM Access Token":              regexp.MustCompile(`npm_[a-zA-Z0-9]{36}`),
        "PyPI API Token":                regexp.MustCompile(`pypi-[a-zA-Z0-9_-]{100,}`),
        "NuGet API Key":                 regexp.MustCompile(`oy2[a-z0-9]{43}`),
        "RubyGems API Key":              regexp.MustCompile(`rubygems_[a-f0-9]{48}`),
        "CircleCI Token":                regexp.MustCompile(`(?i)circle[_-]?(?:ci)?[_-]?token\s*[:=]\s*["']?[a-f0-9]{40}["']?`),
        "Travis CI Token":               regexp.MustCompile(`(?i)travis[_-]?(?:ci)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9]{22}["']?`),
        "Jenkins API Token":             regexp.MustCompile(`(?i)jenkins[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?[a-f0-9]{32,}["']?`),
        "Bitbucket App Password":        regexp.MustCompile(`(?i)bitbucket[_-]?(?:app)?[_-]?(?:password|secret)\s*[:=]\s*["']?[a-zA-Z0-9]{18,}["']?`),
        "Codecov Token":                 regexp.MustCompile(`(?i)codecov[_-]?token\s*[:=]\s*["']?[a-f0-9-]{36}["']?`),
        "Vercel Token":                  regexp.MustCompile(`(?i)vercel[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9]{24}["']?`),
        "Netlify Token":                 regexp.MustCompile(`(?i)netlify[_-]?(?:auth)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9_-]{40,}["']?`),
        "Vault Token":                   regexp.MustCompile(`(?i)(?:vault[_-]?token|hvs)\s*[:=]?\s*["']?(?:hvs\.)?[a-zA-Z0-9_-]{24,}["']?`),
        "Kubernetes Token":              regexp.MustCompile(`eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+`),
        "Docker Registry Password":      regexp.MustCompile(`(?i)docker[_-]?(?:registry)?[_-]?(?:password|pass|pwd)\s*[:=]\s*["']?[^\s"']{8,}["']?`),
        "Terraform Cloud Token":         regexp.MustCompile(`(?i)(?:tfe|terraform)[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9]{14}\.[a-zA-Z0-9_-]{67}["']?`),
        "Pulumi Access Token":           regexp.MustCompile(`pul-[a-f0-9]{40}`),
        "Adyen API Key":                 regexp.MustCompile(`(?i)adyen[_-]?api[_-]?key\s*[:=]\s*["']?AQE[a-zA-Z0-9_-]{50,}["']?`),
        "Klarna API Key":                regexp.MustCompile(`(?i)klarna[_-]?api[_-]?(?:key|secret)\s*[:=]\s*["']?[a-zA-Z0-9_-]{30,}["']?`),
        "Razorpay Key":                  regexp.MustCompile(`rzp_(?:live|test)_[a-zA-Z0-9]{14}`),
        "Coinbase API Secret":           regexp.MustCompile(`(?i)coinbase[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9]{64}["']?`),
        "Binance API Secret":            regexp.MustCompile(`(?i)binance[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9]{64}["']?`),
        "Twilio Auth Token":             regexp.MustCompile(`(?i)twilio[_-]?auth[_-]?token\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Pusher Secret":                 regexp.MustCompile(`(?i)pusher[_-]?(?:app)?[_-]?secret\s*[:=]\s*["']?[a-f0-9]{20}["']?`),
        "Vonage API Secret":             regexp.MustCompile(`(?i)(?:vonage|nexmo)[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9]{16}["']?`),
        "Plivo Auth Token":              regexp.MustCompile(`(?i)plivo[_-]?auth[_-]?(?:token|id)\s*[:=]\s*["']?[a-zA-Z0-9]{40,}["']?`),
        "MessageBird API Key":           regexp.MustCompile(`(?i)messagebird[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9]{25}["']?`),
        "Intercom Access Token":         regexp.MustCompile(`(?i)intercom[_-]?(?:access)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9=_-]{60,}["']?`),
        "Zendesk API Token":             regexp.MustCompile(`(?i)zendesk[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9]{40}["']?`),
        "Algolia Admin API Key":         regexp.MustCompile(`(?i)algolia[_-]?(?:admin)?[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Elasticsearch API Key":         regexp.MustCompile(`(?i)(?:elastic|es)[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_-]{50,}["']?`),
        "Mixpanel API Secret":           regexp.MustCompile(`(?i)mixpanel[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Amplitude API Key":             regexp.MustCompile(`(?i)amplitude[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Segment Write Key Alt":         regexp.MustCompile(`(?i)segment[_-]?(?:write)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9]{32}["']?`),
        "New Relic License Key":         regexp.MustCompile(`(?i)new[_-]?relic[_-]?license[_-]?key\s*[:=]\s*["']?[a-f0-9]{40}["']?`),
        "New Relic API Key":             regexp.MustCompile(`NRAK-[A-Z0-9]{27}`),
        "New Relic Insights Key":        regexp.MustCompile(`NRI[IQ]-[a-zA-Z0-9_-]{32}`),
        "Loggly Token":                  regexp.MustCompile(`(?i)loggly[_-]?(?:customer)?[_-]?token\s*[:=]\s*["']?[a-f0-9-]{36}["']?`),
        "Splunk HEC Token":              regexp.MustCompile(`(?i)splunk[_-]?(?:hec)?[_-]?token\s*[:=]\s*["']?[a-f0-9-]{36}["']?`),
        "Sumo Logic Access Key":         regexp.MustCompile(`(?i)sumo[_-]?logic[_-]?(?:access)?[_-]?(?:key|id)\s*[:=]\s*["']?su[a-zA-Z0-9]{12}["']?`),
        "Grafana API Key":               regexp.MustCompile(`eyJr[a-zA-Z0-9_-]{50,}={0,2}`),
        "PagerDuty API Key":             regexp.MustCompile(`(?i)pagerduty[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9+/=_-]{20}["']?`),
        "Supabase Service Role Key":     regexp.MustCompile(`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+`),
        "Firebase Admin SDK Key":        regexp.MustCompile(`(?i)firebase[_-]?(?:admin)?[_-]?sdk[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_-]{100,}["']?`),
        "Auth0 Client Secret":           regexp.MustCompile(`(?i)auth0[_-]?(?:client)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9_-]{64,}["']?`),
        "Okta API Token Alt":            regexp.MustCompile(`(?i)okta[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?00[a-zA-Z0-9_-]{40}["']?`),
        "Cloudinary Secret":             regexp.MustCompile(`(?i)cloudinary[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9_-]{27}["']?`),
        "Cloudinary URL":                regexp.MustCompile(`cloudinary://[0-9]+:[a-zA-Z0-9_-]+@[a-z]+`),
        "Backblaze Application Key":     regexp.MustCompile(`(?i)b2[_-]?(?:application)?[_-]?key\s*[:=]\s*["']?K[a-zA-Z0-9]{30,}["']?`),
        "Wasabi Access Key":             regexp.MustCompile(`(?i)wasabi[_-]?(?:access)?[_-]?key\s*[:=]\s*["']?[A-Z0-9]{20}["']?`),
        "LaunchDarkly SDK Key":          regexp.MustCompile(`(?i)(?:ld)?[_-]?sdk[_-]?key\s*[:=]\s*["']?sdk-[a-f0-9-]{36}["']?`),
        "LaunchDarkly API Key":          regexp.MustCompile(`(?i)launchdarkly[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?api-[a-f0-9-]{36}["']?`),
        "Split.io API Key":              regexp.MustCompile(`(?i)split[_-]?(?:io)?[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9]{50,}["']?`),
        "Statsig Secret":                regexp.MustCompile(`(?i)statsig[_-]?(?:secret)?[_-]?key\s*[:=]\s*["']?secret-[a-zA-Z0-9]{50,}["']?`),
        "GitLab Pipeline Token":         regexp.MustCompile(`glptt-[a-f0-9]{40}`),
        "GitLab Runner Token":           regexp.MustCompile(`GR1348941[a-zA-Z0-9_-]{20}`),
        "GitHub App Private Key":        regexp.MustCompile(`-----BEGIN RSA PRIVATE KEY-----[\s\S]+?-----END RSA PRIVATE KEY-----`),
        "Bitbucket OAuth Secret":        regexp.MustCompile(`(?i)bitbucket[_-]?(?:oauth)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9]{32,}["']?`),
        "Contentful Management Token":   regexp.MustCompile(`CFPAT-[a-zA-Z0-9_-]{43}`),
        "Contentful Delivery Token":     regexp.MustCompile(`(?i)contentful[_-]?(?:delivery)?[_-]?token\s*[:=]\s*["']?[a-zA-Z0-9_-]{43}["']?`),
        "Sanity Token":                  regexp.MustCompile(`(?i)\bsanity[_-]?(?:api[_-]?)?token\s*[:=]\s*["']?sk[a-zA-Z0-9]{32,}["']?`),
        "Strapi API Token":              regexp.MustCompile(`(?i)strapi[_-]?(?:api)?[_-]?token\s*[:=]\s*["']?[a-f0-9]{256}["']?`),
        "Postmark Server Token":         regexp.MustCompile(`(?i)postmark[_-]?(?:server)?[_-]?token\s*[:=]\s*["']?[a-f0-9-]{36}["']?`),
        "SparkPost API Key":             regexp.MustCompile(`(?i)sparkpost[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-f0-9]{40}["']?`),
        "Mailjet API Secret":            regexp.MustCompile(`(?i)mailjet[_-]?(?:api)?[_-]?secret\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Mandrill API Key":              regexp.MustCompile(`(?i)mandrill[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_-]{22}["']?`),
        "Customer.io API Key":           regexp.MustCompile(`(?i)customer[_-]?io[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Mapbox Secret Token":           regexp.MustCompile(`sk\.[a-zA-Z0-9]{60,}\.[a-zA-Z0-9_-]{22,}`),
        "Here API Key":                  regexp.MustCompile(`(?i)here[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9_-]{43}["']?`),
        "TomTom API Key":                regexp.MustCompile(`(?i)tomtom[_-]?(?:api)?[_-]?key\s*[:=]\s*["']?[a-zA-Z0-9]{32}["']?`),
        "LinkedIn Client Secret":        regexp.MustCompile(`(?i)linkedin[_-]?(?:client)?[_-]?secret\s*[:=]\s*["']?[a-zA-Z0-9]{16}["']?`),
        "Spotify Client Secret":         regexp.MustCompile(`(?i)spotify[_-]?(?:client)?[_-]?secret\s*[:=]\s*["']?[a-f0-9]{32}["']?`),
        "Dropbox App Secret":            regexp.MustCompile(`(?i)dropbox[_-]?(?:app)?[_-]?secret\s*[:=]\s*["']?[a-z0-9]{15}["']?`),
        "Private Key Inline":            regexp.MustCompile(`(?i)(?:private[_-]?key|priv[_-]?key)\s*[:=]\s*["'][a-zA-Z0-9+/=\n]{100,}["']`),
        "Password Hardcoded":            regexp.MustCompile(`(?i)(?:password|passwd|pwd)\s*[:=]\s*["'][^"']{8,50}["']`),
        "Secret Key Hardcoded":          regexp.MustCompile(`(?i)(?:secret[_-]?key|signing[_-]?key|encryption[_-]?key)\s*[:=]\s*["'][a-zA-Z0-9+/=_-]{20,}["']`),
    }

    asciiArt = `
         ________             __         
     __ / / __/ /  __ _____  / /____ ____
    / // /\ \/ _ \/ // / _ \/ __/ -_) __/
    \___/___/_//_/\_,_/_//_/\__/\__/_/  

     ` + version + `                         Created by cc1a2b
    `
)

// progressReader wraps an io.Reader to track download progress
type progressReader struct {
    reader     io.Reader
    total      int64
    current    int64
    lastUpdate time.Time
    onProgress func(int64)
}

func (pr *progressReader) Read(p []byte) (int, error) {
    n, err := pr.reader.Read(p)
    pr.current += int64(n)
    
    // Only update progress every 100ms to avoid too many updates
    if pr.onProgress != nil && time.Since(pr.lastUpdate) > 100*time.Millisecond {
        pr.onProgress(pr.current)
        pr.lastUpdate = time.Now()
    }
    
    return n, err
}

// flagList is a custom type for handling multiple header flags
type flagList []string

func (f *flagList) String() string {
    return strings.Join(*f, ", ")
}

func (f *flagList) Set(value string) error {
    *f = append(*f, value)
    return nil
}

// Config holds all configuration options
type Config struct {
    // Basic options
    URL, List, JSFile, Output, Regex, Cookies, Proxy string
    Threads                                           int
    Quiet, Help, Update, ExtractEndpoints, SkipTLS, FoundOnly bool
    
    // Advanced HTTP
    Headers    []string // Custom HTTP headers
    UserAgent  string   // Custom User-Agent (single string or randomly selected from file)
    UserAgents []string // List of User-Agents (when loaded from file)
    RateLimit int      // Delay between requests (ms)
    Timeout    int      // Request timeout (seconds)
    Retry      int      // Retry failed requests
    
    // JS Analysis
    Deobfuscate, SourceMap, Eval, ObfsDetect bool
    
    // Security Analysis
    Secrets, Tokens, Params, ParamURLs, Internal, GraphQL, Bypass, Firebase, Links bool
    
    // Crawling & Scope
    CrawlDepth int    // Recursive JS crawling depth
    Domain     string // Scope to specific domain
    Ext        string // Match specific JS file extensions
    
    // Output
    JSON, CSV, Verbose, Burp bool

    // v0.6: false-positive pipeline controls.
    // MinConfidence gates findings; ShowConfidence prints the score inline.
    // NoFPFilter disables the legacy FP filter (debug only). SelfTest runs
    // the rule registry against its TP/FP fixtures and exits. MaxBytes caps
    // response body reads (gzip-bomb defense). AllowInternal opts in to
    // file://, localhost, and RFC1918 targets — off by default to avoid SSRF.
    MinConfidence  float64
    ShowConfidence bool
    NoFPFilter     bool
    SelfTest       bool
    MaxBytes       int64
    AllowInternal  bool

    // v0.6+: live verification + observability + extensibility.
    // Verify enables read-only liveness probes against provider endpoints
    // (Stripe /v1/balance, GitHub /user, OpenAI /v1/models, Slack auth.test, etc.).
    // VerifyTimeout bounds each probe; PerHost caps outbound concurrency per host.
    // Stats prints per-stage counters at end of run.
    // RulesFile loads an external JSON rule pack at startup.
    Verify         bool
    VerifyTimeout  int
    PerHost        int
    Stats          bool
    RulesFile      string

    // v0.6++: I/O formats, suppressions, registry introspection, deltas.
    // SARIF and NDJSON are alternative output modes; IgnoreFile is a
    // permanent suppression list; DiffFile takes a previous JSON envelope
    // and reports only new findings; OnlyRules/DisableRule apply a registry
    // filter; HARFile bypasses the fetcher and reads from a Chrome HAR
    // archive directly; NoColor disables ANSI color (also auto when stdout
    // is not a TTY).
    SARIF          bool
    NDJSON         bool
    IgnoreFile     string
    DiffFile       string
    OnlyRules      string
    DisableRule    string
    HARFile        string
    NoColor        bool
    IgnoreSet      *IgnoreList
    DiffSeen       map[string]bool

    // v0.6+++: page-aware crawling, source maps, cache, robots, concurrent verify.
    CacheDir       string
    Robots         bool
    InlineHTML     bool
    CSPOrigins     bool
    VerifyWorkers  int
    Cache          *DiskCache
}

func Run() {
    var (
        url, list, jsFile, output, regex, cookies, proxy string
        threads                                           int
        quiet, help, update, extractEndpoints, skipTLS, foundOnly bool
    )
    
    // Advanced HTTP
    var headers flagList
    var userAgent string
    var rateLimit, timeout, retry int
    
    // JS Analysis
    var deobfuscate, sourceMap, eval, obfsDetect bool
    
    // Security Analysis
    var secrets, tokens, params, paramURLs, internal, graphql, bypass, firebase, links bool
    
    // Crawling & Scope
    var crawlDepth int
    var domain, ext string
    
    // Output
    var jsonOut, csvOut, verbose, burp bool

    flag.StringVar(&url, "u", "", "Input a URL")
    flag.StringVar(&url, "url", "", "Input a URL")
    flag.StringVar(&list, "l", "", "Input a file with URLs (.txt)")
    flag.StringVar(&list, "list", "", "Input a file with URLs (.txt)")
    flag.StringVar(&jsFile, "f", "", "Path to JavaScript file")
    flag.StringVar(&jsFile, "file", "", "Path to JavaScript file")
    flag.StringVar(&output, "o", "", "Output file path")
    flag.StringVar(&output, "output", "", "Output file path")
    flag.StringVar(&regex, "r", "", "RegEx for filtering results (endpoints and sensitive data)")
    flag.StringVar(&regex, "regex", "", "RegEx for filtering results (endpoints and sensitive data)")
    flag.StringVar(&cookies, "c", "", "Cookies for authenticated JS files")
    flag.StringVar(&cookies, "cookies", "", "Cookies for authenticated JS files")
    flag.StringVar(&proxy, "p", "", "Set proxy (host:port)")
    flag.StringVar(&proxy, "proxy", "", "Set proxy (host:port)")
    flag.IntVar(&threads, "t", 5, "Number of concurrent threads")
    flag.IntVar(&threads, "threads", 5, "Number of concurrent threads")
    flag.BoolVar(&quiet, "q", false, "Quiet mode: suppress ASCII art output")
    flag.BoolVar(&quiet, "quiet", false, "Quiet mode: suppress ASCII art output")
    flag.BoolVar(&help, "h", false, "Display help message")
    flag.BoolVar(&help, "help", false, "Display help message")
    flag.BoolVar(&update, "update", false, "Update the tool with latest patterns")
    flag.BoolVar(&update, "up", false, "Update the tool to latest version")
    flag.BoolVar(&extractEndpoints, "ep", false, "Extract endpoints from JavaScript files")
    flag.BoolVar(&extractEndpoints, "end-point", false, "Extract endpoints from JavaScript files")
    flag.BoolVar(&skipTLS, "k", false, "Skip TLS certificate verification")
    flag.BoolVar(&skipTLS, "skip-tls", false, "Skip TLS certificate verification")
    flag.BoolVar(&foundOnly, "fo", false, "Only show results when sensitive data is found (hide MISSING messages)")
    flag.BoolVar(&foundOnly, "found-only", false, "Only show results when sensitive data is found (hide MISSING messages)")
    
    // Advanced HTTP flags
    flag.Var(&headers, "H", "Custom HTTP headers (repeatable, format: 'Key: Value')")
    flag.Var(&headers, "header", "Custom HTTP headers (repeatable, format: 'Key: Value')")
    flag.StringVar(&userAgent, "U", "", "Custom User-Agent string or path to file containing user agents (one per line)")
    flag.StringVar(&userAgent, "user-agent", "", "Custom User-Agent string or path to file containing user agents (one per line)")
    flag.IntVar(&rateLimit, "R", 0, "Delay between requests (ms)")
    flag.IntVar(&rateLimit, "rate-limit", 0, "Delay between requests (ms)")
    flag.IntVar(&timeout, "T", 30, "Request timeout (seconds)")
    flag.IntVar(&timeout, "timeout", 30, "Request timeout (seconds)")
    flag.IntVar(&retry, "y", 2, "Retry failed requests")
    flag.IntVar(&retry, "retry", 2, "Retry failed requests")
    
    // JS Analysis flags
    flag.BoolVar(&deobfuscate, "d", false, "Deobfuscate minified/obfuscated code")
    flag.BoolVar(&deobfuscate, "deobfuscate", false, "Deobfuscate minified/obfuscated code")
    flag.BoolVar(&sourceMap, "m", false, "Parse source maps for original JS")
    flag.BoolVar(&sourceMap, "sourcemap", false, "Parse source maps for original JS")
    flag.BoolVar(&eval, "e", false, "Analyze eval() & dynamic code")
    flag.BoolVar(&eval, "eval", false, "Analyze eval() & dynamic code")
    flag.BoolVar(&obfsDetect, "z", false, "Detect obfuscation techniques")
    flag.BoolVar(&obfsDetect, "obfs-detect", false, "Detect obfuscation techniques")
    
    // Security Analysis flags
    flag.BoolVar(&secrets, "s", false, "API keys, tokens, credentials detection")
    flag.BoolVar(&secrets, "secrets", false, "API keys, tokens, credentials detection")
    flag.BoolVar(&tokens, "x", false, "JWT/auth tokens extraction")
    flag.BoolVar(&tokens, "tokens", false, "JWT/auth tokens extraction")
    flag.BoolVar(&params, "P", false, "Hidden parameters discovery")
    flag.BoolVar(&params, "params", false, "Hidden parameters discovery")
    flag.BoolVar(&paramURLs, "PU", false, "Advanced URL parameter extraction with base URLs")
    flag.BoolVar(&paramURLs, "param-urls", false, "Advanced URL parameter extraction with base URLs")
    flag.BoolVar(&internal, "i", false, "Internal/private endpoints only")
    flag.BoolVar(&internal, "internal", false, "Internal/private endpoints only")
    flag.BoolVar(&graphql, "g", false, "GraphQL endpoints & queries")
    flag.BoolVar(&graphql, "graphql", false, "GraphQL endpoints & queries")
    flag.BoolVar(&bypass, "B", false, "WAF bypass patterns detection")
    flag.BoolVar(&bypass, "bypass", false, "WAF bypass patterns detection")
    flag.BoolVar(&firebase, "F", false, "Firebase config/secrets detection")
    flag.BoolVar(&firebase, "firebase", false, "Firebase config/secrets detection")
    flag.BoolVar(&links, "L", false, "Extract all links/URLs from JS")
    flag.BoolVar(&links, "links", false, "Extract all links/URLs from JS")
    
    // Crawling & Scope flags
    flag.IntVar(&crawlDepth, "w", 1, "Recursive JS crawling depth")
    flag.IntVar(&crawlDepth, "crawl", 1, "Recursive JS crawling depth")
    flag.StringVar(&domain, "D", "", "Scope to specific domain")
    flag.StringVar(&domain, "domain", "", "Scope to specific domain")
    flag.StringVar(&ext, "E", "", "Match specific JS file extensions (comma-separated)")
    flag.StringVar(&ext, "ext", "", "Match specific JS file extensions (comma-separated)")
    
    // Output flags
    flag.BoolVar(&jsonOut, "j", false, "Structured JSON output")
    flag.BoolVar(&jsonOut, "json", false, "Structured JSON output")
    flag.BoolVar(&csvOut, "C", false, "CSV for Excel/Sheets import")
    flag.BoolVar(&csvOut, "csv", false, "CSV for Excel/Sheets import")
    flag.BoolVar(&verbose, "v", false, "Detailed analysis output")
    flag.BoolVar(&verbose, "verbose", false, "Detailed analysis output")
    flag.BoolVar(&burp, "n", false, "Burp Suite export format")
    flag.BoolVar(&burp, "burp", false, "Burp Suite export format")

    // v0.6 — FP pipeline controls
    var minConfidence float64
    var showConfidence, noFPFilter, selfTest, allowInternal bool
    var maxBytes int64
    flag.Float64Var(&minConfidence, "mc", DefaultMinConfidence, "Minimum confidence (0.0-1.0) for a finding to be reported")
    flag.Float64Var(&minConfidence, "min-confidence", DefaultMinConfidence, "Minimum confidence (0.0-1.0) for a finding to be reported")
    flag.BoolVar(&showConfidence, "sc", false, "Show confidence score on each printed finding")
    flag.BoolVar(&showConfidence, "show-confidence", false, "Show confidence score on each printed finding")
    flag.BoolVar(&noFPFilter, "no-fp-filter", false, "Disable false-positive filter (debug; keep all matches)")
    flag.BoolVar(&selfTest, "self-test", false, "Run the rule registry against its built-in TP/FP fixtures and exit")
    flag.Int64Var(&maxBytes, "max-bytes", DefaultMaxBytes, "Cap response body read size in bytes (gzip-bomb defense)")
    flag.BoolVar(&allowInternal, "allow-internal", false, "Allow file://, localhost, and RFC1918 targets (off by default to prevent SSRF)")

    // v0.6+ — verifier, stats, extensibility
    var verify, stats bool
    var verifyTimeout, perHost int
    var rulesFile string
    flag.BoolVar(&verify, "verify", false, "Probe each finding against the provider's read-only endpoint (off by default; opt-in)")
    flag.IntVar(&verifyTimeout, "verify-timeout", 10, "Timeout in seconds for each verification probe")
    flag.IntVar(&perHost, "per-host", defaultPerHostConcurrency, "Per-host outbound concurrency cap (avoids getting banned)")
    flag.BoolVar(&stats, "stats", false, "Print per-stage counters (URLs fetched, FP-drops by reason, findings) on stderr at end of run")
    flag.StringVar(&rulesFile, "rules-file", "", "Load an external JSON rule pack at startup (additive to built-in registry)")

    // v0.6++ — I/O formats, suppressions, deltas, registry introspection
    var sarifOut, ndjsonOut, listRules, noColor bool
    var explainID, ignoreFile, diffFile, onlyRules, disableRule, harFile string
    flag.BoolVar(&sarifOut, "sarif", false, "Emit SARIF 2.1.0 (suitable for GitHub code-scanning)")
    flag.BoolVar(&ndjsonOut, "ndjson", false, "Stream findings as newline-delimited JSON")
    flag.StringVar(&ignoreFile, "ignore-file", "", "Path to .jshunterignore for permanent suppression")
    flag.StringVar(&diffFile, "diff", "", "Diff against a previous JSON envelope; only NEW findings reported")
    flag.BoolVar(&listRules, "list-rules", false, "Print the rule registry as a table and exit")
    flag.StringVar(&explainID, "explain", "", "Print the full rule definition (incl. TP/FP fixtures) and exit")
    flag.StringVar(&onlyRules, "only-rules", "", "Comma-separated rule_id patterns; only matching rules run (supports * glob)")
    flag.StringVar(&disableRule, "disable-rule", "", "Comma-separated rule_id patterns to disable (supports * glob)")
    flag.StringVar(&harFile, "har", "", "Ingest a Chrome DevTools HAR file instead of fetching URLs")
    flag.BoolVar(&noColor, "no-color", false, "Disable ANSI color (auto-disabled when stdout is not a TTY)")

    // v0.6+++ — page-aware crawling, sourcemaps, cache, robots, concurrent verify
    var cacheDir string
    var robotsMode, inlineHTML, cspOrigins bool
    var verifyWorkers int
    flag.StringVar(&cacheDir, "cache-dir", "", "Persist HTTP responses on disk for ETag-based revalidation")
    flag.BoolVar(&robotsMode, "robots", false, "Fetch /robots.txt for the target host(s) and print Disallow paths")
    flag.BoolVar(&inlineHTML, "inline-html", false, "Scan inline <script> tags and SRI/CSP from HTML responses")
    flag.BoolVar(&cspOrigins, "csp-origins", false, "Extract Content-Security-Policy origins as candidate endpoints")
    flag.IntVar(&verifyWorkers, "verify-workers", 8, "Worker pool size for concurrent --verify probes")

    flag.Parse()

    // Apply rule-registry selection BEFORE any subcommand that depends on
    // the rule set (--list-rules, --explain, --self-test).
    if onlyRules != "" || disableRule != "" {
        kept := applyRuleSelection(onlyRules, disableRule)
        if !quiet {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] rule selection applied: %d rules active\n",
                colors["CYAN"], colors["NC"], kept)
        }
    }

    if listRules {
        runListRules()
        return
    }
    if explainID != "" {
        runExplainRule(explainID)
        return
    }

    // TTY autodetect: disable colors when stdout is not a terminal so piped
    // output stays clean. --no-color forces disable in any case.
    if noColor || !isStdoutTTY() {
        disableColors()
    }

    if rulesFile != "" {
        n, err := LoadRulesFile(rulesFile)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sERROR%s] rules file: %v\n", colors["RED"], colors["NC"], err)
            os.Exit(2)
        }
        if !quiet {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] loaded %d external rules from %s\n", colors["CYAN"], colors["NC"], n, rulesFile)
        }
    }

    if perHost > 0 {
        getHostController().perHost = perHost
    }

    if selfTest {
        runSelfTestCLI()
        return
    }

    // Process User-Agent: check if it's a file path or a string
    var userAgentsList []string
    finalUserAgent := userAgent
    if userAgent != "" {
        // Check if it looks like a file path (contains path separators or common file extensions)
        if strings.Contains(userAgent, "/") || strings.Contains(userAgent, "\\") || 
           strings.HasSuffix(userAgent, ".txt") || strings.HasSuffix(userAgent, ".list") {
            // Try to read as file
            if fileInfo, err := os.Stat(userAgent); err == nil && !fileInfo.IsDir() {
                // It's a file, read user agents from it
                file, err := os.Open(userAgent)
                if err == nil {
                    defer file.Close()
                    scanner := bufio.NewScanner(file)
                    for scanner.Scan() {
                        line := strings.TrimSpace(scanner.Text())
                        if line != "" && !strings.HasPrefix(line, "#") {
                            userAgentsList = append(userAgentsList, line)
                        }
                    }
                    if len(userAgentsList) > 0 {
                        // Select a random user agent from the list
                        finalUserAgent = userAgentsList[rand.Intn(len(userAgentsList))]
                        if !quiet {
                            fmt.Printf("[%sINFO%s] Loaded %d user agents from file, using: %s\n", 
                                colors["CYAN"], colors["NC"], len(userAgentsList), finalUserAgent)
                        }
                    } else {
                        if !quiet {
                            fmt.Printf("[%sWARN%s] User-Agent file is empty or contains no valid entries, using as string\n", 
                                colors["YELLOW"], colors["NC"])
                        }
                    }
                } else {
                    if !quiet {
                        fmt.Printf("[%sWARN%s] Could not read User-Agent file, using as string: %v\n", 
                            colors["YELLOW"], colors["NC"], err)
                    }
                }
            }
        }
    }

    // Create config object
    config := Config{
        URL: url, List: list, JSFile: jsFile, Output: output, Regex: regex,
        Cookies: cookies, Proxy: proxy, Threads: threads,
        Quiet: quiet, Help: help, Update: update, ExtractEndpoints: extractEndpoints,
        SkipTLS: skipTLS, FoundOnly: foundOnly,
        Headers: headers, UserAgent: finalUserAgent, UserAgents: userAgentsList, RateLimit: rateLimit,
        Timeout: timeout, Retry: retry,
        Deobfuscate: deobfuscate, SourceMap: sourceMap, Eval: eval, ObfsDetect: obfsDetect,
        Secrets: secrets, Tokens: tokens, Params: params, ParamURLs: paramURLs, Internal: internal,
        GraphQL: graphql, Bypass: bypass, Firebase: firebase, Links: links,
        CrawlDepth: crawlDepth, Domain: domain, Ext: ext,
        JSON: jsonOut, CSV: csvOut, Verbose: verbose, Burp: burp,
        MinConfidence: minConfidence, ShowConfidence: showConfidence,
        NoFPFilter: noFPFilter, SelfTest: selfTest,
        MaxBytes: maxBytes, AllowInternal: allowInternal,
        Verify: verify, VerifyTimeout: verifyTimeout, PerHost: perHost,
        Stats: stats, RulesFile: rulesFile,
        SARIF: sarifOut, NDJSON: ndjsonOut,
        IgnoreFile: ignoreFile, DiffFile: diffFile,
        OnlyRules: onlyRules, DisableRule: disableRule,
        HARFile: harFile, NoColor: noColor,
        CacheDir: cacheDir, Robots: robotsMode,
        InlineHTML: inlineHTML, CSPOrigins: cspOrigins,
        VerifyWorkers: verifyWorkers,
    }

    // Initialize the run-wide stats struct lazily; counters are no-op when
    // --stats isn't requested but Inc() calls remain cheap and uniform.
    initStats()

    // Disk cache: enabled when --cache-dir is set. Failure to mkdir is
    // a hard error — silent fallback would surprise operators expecting
    // 304s and finding full re-downloads instead.
    if config.CacheDir != "" {
        dc, err := NewDiskCache(config.CacheDir)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sERROR%s] cache-dir: %v\n", colors["RED"], colors["NC"], err)
            os.Exit(2)
        }
        config.Cache = dc
        if !quiet {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] disk cache active at %s\n",
                colors["CYAN"], colors["NC"], config.CacheDir)
        }
    }

    // --robots: opt-in fetch of /robots.txt for each unique host in the
    // input. Prints disallowed paths and sitemap references; does NOT
    // make JSHunter respect them on subsequent fetches.
    if config.Robots {
        runRobotsCLI(&config)
        return
    }

    // Load .jshunterignore if specified — operator-managed permanent
    // suppression of known-noise findings.
    if config.IgnoreFile != "" {
        il, err := LoadIgnoreFile(config.IgnoreFile)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sERROR%s] ignore file: %v\n", colors["RED"], colors["NC"], err)
            os.Exit(2)
        }
        config.IgnoreSet = il
        activeIgnoreList = il
        if !quiet && il != nil {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] loaded %d ignore entries\n",
                colors["CYAN"], colors["NC"], len(il.Entries))
        }
    }

    // Load --diff baseline. New findings only — anything in the previous
    // envelope (by value_hash) is suppressed so CI gates can fail on
    // genuine regressions.
    if config.DiffFile != "" {
        seen, err := DiffPrevious(config.DiffFile)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sERROR%s] diff: %v\n", colors["RED"], colors["NC"], err)
            os.Exit(2)
        }
        config.DiffSeen = seen
        activeDiffSeen = seen
        if !quiet {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] diff baseline carries %d known finding hashes\n",
                colors["CYAN"], colors["NC"], len(seen))
        }
    }

    // HAR ingestion is mutually exclusive with URL/list/file fetch paths;
    // when --har is set we shortcut the dispatch entirely.
    if config.HARFile != "" {
        n, err := IngestHAR(config.HARFile, &config)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sERROR%s] har: %v\n", colors["RED"], colors["NC"], err)
            os.Exit(2)
        }
        if !quiet {
            fmt.Fprintf(os.Stderr, "[%sINFO%s] HAR scan complete: %d JS entries\n",
                colors["CYAN"], colors["NC"], n)
        }
        emitFinalOutput(&config)
        return
    }

    if help {
        customHelp()
        return
    }

    if update {
        updateTool()
        return
    }

    if config.URL == "" && config.List == "" && config.JSFile == "" {
        if isInputFromStdin() {
            // Show ASCII art before processing stdin if not quiet
            if !config.Quiet {
                time.Sleep(100 * time.Millisecond)
                displayAsciiArt()
            }
            
            // Read all stdin content
            stdinContent, err := io.ReadAll(os.Stdin)
            if err != nil {
                if !config.Quiet {
                    fmt.Fprintf(os.Stderr, "Error reading from stdin: %v\n", err)
                }
                return
            }
            
            content := string(stdinContent)
            
            // Check if it looks like a list of URLs (each line is a URL)
            lines := strings.Split(content, "\n")
            urlCount := 0
            jsLineCount := 0
            totalLines := 0
            
            for _, line := range lines {
                line = strings.TrimSpace(line)
                if line == "" {
                    continue
                }
                totalLines++
                
                // Check if line looks like a URL
                if strings.HasPrefix(line, "http://") || strings.HasPrefix(line, "https://") {
                    urlCount++
                }
                // Check if line looks like JavaScript
                if strings.Contains(line, "function") || 
                   strings.Contains(line, "const ") || 
                   strings.Contains(line, "let ") || 
                   strings.Contains(line, "var ") ||
                   strings.Contains(line, "URLSearchParams") ||
                   strings.Contains(line, ".get(") ||
                   strings.Contains(line, "fetch(") ||
                   strings.Contains(line, "axios.") ||
                   strings.Contains(line, "//") ||
                   strings.Contains(line, "/*") {
                    jsLineCount++
                }
            }
            
            // Determine if it's JavaScript or URL list
            // Priority: If most lines are URLs, always treat as URL list (process each URL)
            isJavaScript := false
            
            if totalLines > 0 {
                urlRatio := float64(urlCount) / float64(totalLines)
                
                // If more than 50% are URLs, treat as URL list (process each URL individually)
                if urlRatio > 0.5 {
                    isJavaScript = false
                } else if config.ParamURLs || config.Params {
                    // Using -PU/-P flags, check if it's actually JS code
                    if jsLineCount > 5 || 
                       strings.Contains(content, "function ") || 
                       strings.Contains(content, "const urlParams") ||
                       strings.Contains(content, "new URLSearchParams") ||
                       strings.Contains(content, "URLSearchParams.get") {
                        // Clear JavaScript patterns
                        isJavaScript = true
                    } else {
                        // Default: treat as JavaScript when using -PU/-P
                        isJavaScript = true
                    }
                } else {
                    // Without -PU/-P, check if it's JavaScript
                    if jsLineCount > 5 || strings.Contains(content, "function ") {
                        isJavaScript = true
                    }
                }
            }
            
            if isJavaScript {
                // Process as JavaScript content directly
                source := "stdin"
                bodyBytes := []byte(content)
                
                if config.ParamURLs {
                    paramURLs := extractURLParamsWithBaseURLs(content, source)
                    if len(paramURLs) > 0 {
                        globalSeenMutex.Lock()
                        globalFoundAny = true // Mark that we found something
                        for _, paramURL := range paramURLs {
                            if !globalSeenAll[paramURL] {
                                globalSeenAll[paramURL] = true
                                fmt.Println(paramURL)
                            }
                        }
                        globalSeenMutex.Unlock()
                    }
                } else if config.ExtractEndpoints {
                    endpoints := extractEndpointsFromContent(content, config.Regex, "")
                    displayEndpoints(endpoints, source)
                } else {
                    // Process as sensitive data search - use reportMatchesWithConfig directly
                    reportMatchesWithConfig(source, bodyBytes, &config)
                }
            } else {
                // Treat each line as URL/file path (old behavior)
                scanner := bufio.NewScanner(strings.NewReader(content))
                for scanner.Scan() {
                    inputURL := strings.TrimSpace(scanner.Text())
                    if inputURL == "" {
                        continue
                    }
                    
                    if config.ExtractEndpoints {
                        processInputsForEndpointsWithConfig(inputURL, &config)
                    } else {
                        processInputsWithConfig(inputURL, &config)
                    }
                }
                if err := scanner.Err(); err != nil {
                    if !config.Quiet {
                        fmt.Fprintln(os.Stderr, "Error reading from stdin:", err)
                    }
                }
            }
            return
        }
        customHelp()
        os.Exit(1)
    }

    if !config.Quiet {
        time.Sleep(100 * time.Millisecond)
        displayAsciiArt()
    }

    if config.Quiet {
        disableColors()
    }

    if config.JSFile != "" {
        if config.ExtractEndpoints {
            processJSFileForEndpointsWithConfig(config.JSFile, &config)
        } else {
            processJSFileWithConfig(config.JSFile, &config)
        }
        return 
    }

    if config.ExtractEndpoints && (config.URL != "" || config.List != "") {
        processInputsForEndpointsWithConfig(config.URL, &config)
    } else {
        processInputsWithConfig(config.URL, &config)
    }
}


// runRobotsCLI fetches /robots.txt for each unique host in --url/--list
// input and prints the parsed Disallow / Allow / Sitemap lines. This is a
// pure recon helper: JSHunter does not honor robots.txt for its own fetch
// path — operators who want compliance can pipe these paths back in.
func runRobotsCLI(config *Config) {
    client := createHTTPClientWithConfig(config)
    hosts := collectHostsFromInput(config)
    if len(hosts) == 0 {
        fmt.Fprintf(os.Stderr, "[%sWARN%s] --robots: no input URLs to inspect (use -u or -l)\n",
            colors["YELLOW"], colors["NC"])
        return
    }
    for _, h := range hosts {
        r, err := FetchRobots(client, h, config.UserAgent)
        if err != nil {
            fmt.Fprintf(os.Stderr, "[%sROBOTS%s] %s: %v\n", colors["YELLOW"], colors["NC"], h, err)
            continue
        }
        if r == nil {
            fmt.Printf("# %s — no robots.txt\n", h)
            continue
        }
        fmt.Printf("# %s\n", r.URL)
        for _, p := range r.Disallow {
            fmt.Printf("Disallow %s%s\n", strings.TrimRight(h, "/"), p)
        }
        for _, p := range r.Allow {
            fmt.Printf("Allow    %s%s\n", strings.TrimRight(h, "/"), p)
        }
        for _, s := range r.Sitemaps {
            fmt.Printf("Sitemap  %s\n", s)
        }
    }
}

// collectHostsFromInput dedupes the scheme://host roots from --url and --list
// so each host is asked for /robots.txt exactly once.
func collectHostsFromInput(config *Config) []string {
    seen := map[string]struct{}{}
    out := []string{}
    add := func(u string) {
        u = strings.TrimSpace(u)
        if !strings.HasPrefix(u, "http://") && !strings.HasPrefix(u, "https://") {
            return
        }
        parsed, err := url.Parse(u)
        if err != nil {
            return
        }
        root := parsed.Scheme + "://" + parsed.Host
        if _, ok := seen[root]; !ok {
            seen[root] = struct{}{}
            out = append(out, root)
        }
    }
    if config.URL != "" {
        add(config.URL)
    }
    if config.List != "" {
        f, err := os.Open(config.List)
        if err == nil {
            defer f.Close()
            sc := bufio.NewScanner(f)
            for sc.Scan() {
                add(sc.Text())
            }
        }
    }
    return out
}

// emitFinalOutput is the run-shutdown hook. Order matters:
//  1. AWS pair verification (if --verify) — needs the dedupe table to be
//     fully populated so AKID + secret in the same source can be paired.
//  2. SARIF output (if --sarif).
//  3. NDJSON output (if --ndjson).
//  4. Stats summary (if --stats).
// SARIF and NDJSON are mutually permissive — both can be emitted in the
// same run, but operators typically pick one.
func emitFinalOutput(config *Config) {
    if config.Verify {
        verifyClient := createHTTPClientWithConfig(config)
        timeout := time.Duration(config.VerifyTimeout) * time.Second
        if timeout <= 0 {
            timeout = 10 * time.Second
        }

        // Per-finding verifiers run concurrently across a bounded worker
        // pool. Per-host limits still apply via verifyHostLimiter.
        all := flushFindings()
        if len(all) > 0 {
            VerifyAllConcurrent(all, verifyClient, timeout, config.VerifyWorkers)
        }

        // AWS pair verification is separate: the SigV4 path is paired
        // (AKID + secret) and lives outside the per-finding verifier map.
        for _, p := range pairAWSCredentials() {
            if globalStats != nil {
                statInc(&globalStats.VerifyAttempts)
            }
            ctx, cancel := context.WithTimeout(context.Background(), timeout)
            res := verifyAWSPair(ctx, verifyClient, p)
            cancel()
            findingsMutex.Lock()
            for _, f := range findingsByHash {
                if (f.RuleID == "aws.access_key_id" && f.Value == p.AccessKeyID) ||
                    (f.RuleID == "aws.secret_access_key" && f.Value == p.SecretAccessKey) {
                    f.Verify = &res
                    if res.Alive {
                        f.Verified = true
                        f.Confidence = 1.0
                    }
                }
            }
            findingsMutex.Unlock()
            switch {
            case res.Alive && globalStats != nil:
                statInc(&globalStats.VerifyAlive)
            case res.Error != "" && globalStats != nil:
                statInc(&globalStats.VerifyError)
            case globalStats != nil:
                statInc(&globalStats.VerifyDead)
            }
        }
    }
    if config.SARIF {
        outputSARIF()
    }
    if config.NDJSON {
        outputNDJSON()
    }
    if config.Stats {
        printStats(globalStats)
    }
}

// runSelfTestCLI exercises the curated rule registry against its embedded
// TP/FP fixtures and prints a per-rule precision/recall summary. Exits with
// non-zero status if any rule fails — useful in CI to gate detector regressions.
func runSelfTestCLI() {
    results := runSelfTest()
    overallOK := true
    fmt.Printf("[%sSELF-TEST%s] JShunter %s rule registry\n", colors["BLUE"], colors["NC"], version)
    for _, r := range results {
        statusColor := colors["GREEN"]
        statusText := "PASS"
        if !r.OK {
            statusColor = colors["RED"]
            statusText = "FAIL"
            overallOK = false
        }
        fmt.Printf("  [%s%s%s] %-40s  TP %d/%d  FP %d/%d\n",
            statusColor, statusText, colors["NC"],
            r.Name, r.TPPassed, r.TPTotal, r.FPCaught, r.FPTotal)
        for _, n := range r.Notes {
            fmt.Printf("        %s%s%s %s\n", colors["YELLOW"], "·", colors["NC"], n)
        }
    }
    if !overallOK {
        os.Exit(1)
    }
}

// looksLikeHTMLContentType is the content-type-only sibling of
// looksLikeHTML; used pre-body-read so we can decide whether to take the
// inline-HTML path before allocating the body slice.
func looksLikeHTMLContentType(ct string) bool {
    if ct == "" {
        return false
    }
    low := strings.ToLower(ct)
    if i := strings.Index(low, ";"); i != -1 {
        low = low[:i]
    }
    low = strings.TrimSpace(low)
    return low == "text/html" || low == "application/xhtml+xml"
}

// scanHTMLArtifacts pulls inline <script> bodies and SRI/CSP metadata out
// of an HTML response and feeds each through reportMatchesWithConfig under
// a synthetic source label so operators can locate the exact tag.
func scanHTMLArtifacts(pageURL string, body []byte, config *Config) {
    arts, err := ExtractFromHTML(body)
    if err != nil {
        if config.Verbose {
            fmt.Printf("[%sHTML%s] %s: %v\n", colors["YELLOW"], colors["NC"], pageURL, err)
        }
        return
    }
    for _, sc := range arts.InlineScripts {
        // application/ld+json is structured data; still scan because it
        // sometimes carries access tokens or webhook URLs.
        src := fmt.Sprintf("%s#inline[%d]", pageURL, sc.Index)
        if globalStats != nil {
            statAdd(&globalStats.BytesParsed, int64(len(sc.Body)))
        }
        processed := processJSAnalysis([]byte(sc.Body), config)
        reportMatchesWithConfig(src, processed, config)
    }
    if config.CSPOrigins && len(arts.CSPOrigins) > 0 {
        emitCSPOrigins(pageURL+"#meta-csp", arts.CSPOrigins)
    }
    if config.Verbose && len(arts.ExternalJS) > 0 {
        fmt.Printf("[%sHTML%s] %s: %d external scripts referenced\n",
            colors["CYAN"], colors["NC"], pageURL, len(arts.ExternalJS))
    }
}

// emitCSPOrigins prints CSP-allowed origins one per line, prefixed with
// the source for grep-friendly piping into the URL queue of a follow-up
// scan. Concise and pipeline-friendly.
func emitCSPOrigins(source string, origins []string) {
    for _, o := range origins {
        fmt.Printf("[CSP] %s\t%s\n", source, o)
    }
}

// validateTargetURL refuses internal/loopback/private targets unless the user
// explicitly opts in. Recon tools that follow links are SSRF-prone if they
// blindly fetch any URL the input feeds them; this gate is the smallest
// useful guard. Only http/https are allowed; file:// is always rejected.
func validateTargetURL(urlStr string, allowInternal bool) error {
    if !strings.HasPrefix(urlStr, "http://") && !strings.HasPrefix(urlStr, "https://") {
        return fmt.Errorf("only http(s) URLs are permitted; got %q", urlStr)
    }
    parsed, err := url.Parse(urlStr)
    if err != nil {
        return fmt.Errorf("malformed URL: %w", err)
    }
    host := parsed.Hostname()
    if host == "" {
        return fmt.Errorf("URL has no host")
    }
    if allowInternal {
        return nil
    }
    lowHost := strings.ToLower(host)
    if lowHost == "localhost" || lowHost == "ip6-localhost" || lowHost == "ip6-loopback" {
        return fmt.Errorf("internal target %q blocked (use --allow-internal to override)", host)
    }
    if ip := net.ParseIP(host); ip != nil {
        if ip.IsLoopback() || ip.IsPrivate() || ip.IsLinkLocalUnicast() ||
            ip.IsLinkLocalMulticast() || ip.IsUnspecified() {
            return fmt.Errorf("internal IP %q blocked (use --allow-internal to override)", host)
        }
    }
    return nil
}

func displayAsciiArt() {
    versionStatus := getVersionStatus()
    var statusColor string
    var statusText string
    
    switch versionStatus {
    case "latest":
        statusColor = colors["GREEN"]
        statusText = "latest"
    case "outdated":
        statusColor = colors["RED"]
        statusText = "outdated"
    default:
        statusColor = colors["YELLOW"]
        statusText = "Unknown"
    }
    
    fmt.Printf(`
         ________             __         
     __ / / __/ /  __ _____  / /____ ____
    / // /\ \/ _ \/ // / _ \/ __/ -_) __/
    \___/___/_//_/\_,_/_//_/\__/\__/_/  

     %s (%s%s%s%s)                         Created by cc1a2b
`, version, statusColor, statusText, colors["NC"], "")
}

func customHelp() {
    displayAsciiArt()
    fmt.Println("Usage:")
    fmt.Println("  -u,  --url URL                Input a URL")
    fmt.Println("  -l,  --list FILE.txt          Input a file with URLs (.txt)")
    fmt.Println("  -f,  --file FILE.js           Path to JavaScript file")
    fmt.Println("       --har FILE               Ingest a Chrome DevTools HAR archive")
    fmt.Println()
    fmt.Println("Basic Options:")
    fmt.Println("  -t,  --threads INT            Number of concurrent threads (default: 5)")
    fmt.Println("  -c,  --cookies <cookies>      Authentication cookies for protected resources")
    fmt.Println("  -p,  --proxy host:port        HTTP/SOCKS5 proxy (e.g., 127.0.0.1:8080 for Burp Suite)")
    fmt.Println("  -q,  --quiet                  Suppress ASCII art output")
    fmt.Println("       --no-color               Disable ANSI color (auto-off when not a TTY)")
    fmt.Println("  -o,  --output FILENAME        Output file path (full values, not redacted)")
    fmt.Println("  -r,  --regex <pattern>        RegEx for filtering results")
    fmt.Println("       --update, --up           Update the tool to latest version")
    fmt.Println("  -ep, --end-point              Extract endpoints from JavaScript files")
    fmt.Println("  -k,  --skip-tls               Skip TLS certificate verification")
    fmt.Println("  -fo, --found-only             Only show results when sensitive data is found")
    fmt.Println()
    fmt.Println("HTTP Configuration:")
    fmt.Println("  -H,  --header \"Key: Value\"    Custom HTTP headers (repeatable, including Auth)")
    fmt.Println("  -U,  --user-agent UA          Custom User-Agent string or file path")
    fmt.Println("  -R,  --rate-limit MS          Request rate limiting delay (milliseconds)")
    fmt.Println("  -T,  --timeout SEC            HTTP request timeout (seconds)")
    fmt.Println("  -y,  --retry INT              Retry attempts for failed requests (default: 2)")
    fmt.Println("       --per-host INT           Per-host outbound concurrency cap (default: 4)")
    fmt.Println("       --max-bytes N            Cap response body read in bytes (default: 32MiB)")
    fmt.Println("       --allow-internal         Permit localhost / RFC1918 / link-local targets")
    fmt.Println("       --cache-dir DIR          Persist responses on disk; revalidate via ETag")
    fmt.Println()
    fmt.Println("JavaScript Analysis:")
    fmt.Println("  -d,  --deobfuscate            Deobfuscate minified and obfuscated JavaScript")
    fmt.Println("  -m,  --sourcemap              Fetch and parse source maps + sourcesContent[]")
    fmt.Println("  -e,  --eval                   Analyze dynamic code execution (eval, Function)")
    fmt.Println("  -z,  --obfs-detect            Detect code obfuscation patterns and techniques")
    fmt.Println("       --inline-html            Scan inline <script> tags + SRI/CSP in HTML responses")
    fmt.Println("       --csp-origins            Emit CSP-allowed origins as candidate endpoints")
    fmt.Println()
    fmt.Println("Security Analysis:")
    fmt.Println("  -s,  --secrets                Detect API keys, tokens, and credentials")
    fmt.Println("  -x,  --tokens                 Extract JWT and authentication tokens")
    fmt.Println("  -P,  --params                 Discover hidden parameters and variables")
    fmt.Println("  -PU, --param-urls             Advanced parameter extraction with URL context")
    fmt.Println("  -i,  --internal               Filter for internal/private endpoints")
    fmt.Println("  -g,  --graphql                Analyze GraphQL endpoints and queries")
    fmt.Println("  -B,  --bypass                 Detect WAF bypass patterns and techniques")
    fmt.Println("  -F,  --firebase               Analyze Firebase configurations and keys")
    fmt.Println("  -L,  --links                  Extract and analyze all embedded links")
    fmt.Println()
    fmt.Println("Detection Tuning:")
    fmt.Println("  -mc, --min-confidence FLOAT   Minimum confidence (0.0-1.0) for a finding (default: 0.50)")
    fmt.Println("  -sc, --show-confidence        Print [conf=X.XX] alongside each finding")
    fmt.Println("       --no-fp-filter           Disable the false-positive filter (debug)")
    fmt.Println("       --ignore-file FILE       Permanent suppressions (.jshunterignore)")
    fmt.Println("       --diff PREVIOUS.json     Report only NEW findings vs previous JSON envelope")
    fmt.Println("       --rules-file FILE.json   Load an external JSON rule pack")
    fmt.Println("       --only-rules id,glob     Run only matching rules (supports * glob)")
    fmt.Println("       --disable-rule id,glob   Disable matching rules (supports * glob)")
    fmt.Println()
    fmt.Println("Verification:")
    fmt.Println("       --verify                 Probe findings against provider read-only endpoints")
    fmt.Println("       --verify-timeout SEC     Timeout per verification probe (default: 10)")
    fmt.Println("       --verify-workers INT     Concurrent verifier worker pool (default: 8)")
    fmt.Println()
    fmt.Println("Scope & Discovery:")
    fmt.Println("  -w,  --crawl DEPTH            Recursive JavaScript discovery depth (default: 1)")
    fmt.Println("  -D,  --domain DOMAIN          Limit analysis to specific domain")
    fmt.Println("  -E,  --ext                    Filter by JavaScript file extensions")
    fmt.Println("       --robots                 Fetch /robots.txt for each input host and exit")
    fmt.Println()
    fmt.Println("Output Formats:")
    fmt.Println("  -j,  --json                   Structured JSON output (schema_version 2)")
    fmt.Println("       --ndjson                 Newline-delimited JSON (jq / SIEM streaming)")
    fmt.Println("       --sarif                  SARIF 2.1.0 (GitHub code-scanning compatible)")
    fmt.Println("  -C,  --csv                    CSV format for spreadsheet analysis")
    fmt.Println("  -v,  --verbose                Detailed analysis and debug output")
    fmt.Println("  -n,  --burp                   Burp Suite compatible export format")
    fmt.Println("       --stats                  Per-stage counters on stderr at end of run")
    fmt.Println()
    fmt.Println("Registry:")
    fmt.Println("       --list-rules             Print the rule registry as a table and exit")
    fmt.Println("       --explain RULE_ID        Print full rule details and exit")
    fmt.Println("       --self-test              Run rule registry against built-in TP/FP fixtures")
    fmt.Println()
    fmt.Println("  -h,  --help                   Display this help message")
}

func processStdin(output, regex, cookies, proxy string, threads int) {
    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        line := scanner.Text()
        fmt.Println("Processing line from stdin:", line)

    }
    if err := scanner.Err(); err != nil {
        fmt.Fprintln(os.Stderr, "Error reading from stdin:", err)
    }
}


func isInputFromStdin() bool {
    fi, err := os.Stdin.Stat()
    if err != nil {
        fmt.Println("Error checking stdin:", err)
        return false
    }
    return fi.Mode()&os.ModeCharDevice == 0
}

// isStdoutTTY returns true when stdout is a terminal — used to auto-disable
// ANSI color when the operator is piping or redirecting output. The
// `os.ModeCharDevice` check is the same heuristic POSIX `isatty(3)` exposes.
func isStdoutTTY() bool {
    fi, err := os.Stdout.Stat()
    if err != nil {
        return false
    }
    return fi.Mode()&os.ModeCharDevice != 0
}

func disableColors() {
    for k := range colors {
        colors[k] = ""
    }
}


func processJSFile(jsFile, regex string) {
    // Create minimal config for backward compatibility
    config := &Config{
        Regex: regex,
    }
    processJSFileWithConfig(jsFile, config)
}


func processInputs(url, list, output, regex, cookie, proxy string, threads int, skipTLS, foundOnly bool) {
    // Create config for backward compatibility
    config := &Config{
        URL: url, List: list, Output: output, Regex: regex,
        Cookies: cookie, Proxy: proxy, Threads: threads,
        SkipTLS: skipTLS, FoundOnly: foundOnly,
        Timeout: 30, Retry: 2,
    }
    processInputsWithConfig(url, config)
    return
}

func processInputsOld(url, list, output, regex, cookie, proxy string, threads int, skipTLS, foundOnly bool) {
    var wg sync.WaitGroup
    urlChannel := make(chan string)

    var fileWriter *os.File
    if output != "" {
        var err error
        fileWriter, err = os.Create(output)
        if err != nil {
            fmt.Printf("Error creating output file: %v\n", err)
            return
        }
        defer fileWriter.Close()
    }

    for i := 0; i < threads; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for u := range urlChannel {
                // Create minimal config for each request
                config := &Config{
                    Regex: regex, Cookies: cookie, Proxy: proxy,
                    SkipTLS: skipTLS, FoundOnly: foundOnly,
                    Timeout: 30, Retry: 1,
                }
                _, sensitiveData := searchForSensitiveDataWithConfig(u, config)

                // Don't print sensitive data if ParamURLs flag is set (user only wants URL params)
                if !config.ParamURLs {
                    if fileWriter != nil {
                        fmt.Fprintln(fileWriter, "URL:", u)
                        for name, matches := range sensitiveData {
                            for _, match := range matches {
                                fmt.Fprintf(fileWriter, "Sensitive Data [%s%s%s]: %s\n", colors["YELLOW"], name, colors["NC"], match)
                            }
                        }
                    } else {
                        for name, matches := range sensitiveData {
                            for _, match := range matches {
                                fmt.Printf("Sensitive Data [%s%s%s]: %s\n", colors["YELLOW"], name, colors["NC"], match)
                            }
                        }
                    }
                }
            }
        }()
    }

    if err := enqueueURLs(url, list, urlChannel, regex); err != nil {
        fmt.Printf("Error in input processing: %v\n", err)
        close(urlChannel)
        return
    }

    close(urlChannel)
    wg.Wait()
    
    // Print buffered MISSING messages only if no findings were made
    // This is for the old/legacy function - always clear buffer
    globalSeenMutex.Lock()
    foundAny := globalFoundAny
    globalSeenMutex.Unlock()
    
    if !foundAny && !foundOnly {
        missingMutex.Lock()
        for _, msg := range missingMessages {
            fmt.Printf("[%sMISSING%s] No sensitive data found at: %s\n", colors["BLUE"], colors["NC"], msg)
        }
        missingMessages = missingMessages[:0] // Clear the buffer
        missingMutex.Unlock()
    } else {
        // Clear the buffer if findings were made
        missingMutex.Lock()
        missingMessages = missingMessages[:0]
        missingMutex.Unlock()
    }
}


func enqueueURLs(url, list string, urlChannel chan<- string, regex string) error {
    if list != "" {
        return enqueueFromFile(list, urlChannel)
    } else if url != "" {
        enqueueSingleURL(url, urlChannel, regex)
    } else {
        enqueueFromStdin(urlChannel)
    }
    return nil
}

func enqueueFromFile(filename string, urlChannel chan<- string) error {
    file, err := os.Open(filename)
    if err != nil {
        return fmt.Errorf("Error opening file: %w", err)
    }
    defer file.Close()

    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
        urlChannel <- scanner.Text()
    }
    return scanner.Err()
}

func enqueueSingleURL(url string, urlChannel chan<- string, regex string) {
    if strings.HasPrefix(url, "http://") || strings.HasPrefix(url, "https://") {
        urlChannel <- url
    } else {
        processJSFile(url, regex)
    }
}

func enqueueFromStdin(urlChannel chan<- string) {
    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        urlChannel <- scanner.Text()
    }
    if err := scanner.Err(); err != nil {
        fmt.Printf("Error reading from stdin: %v\n", err)
    }
}


// isTLSCanceledError checks if an error is a TLS cancellation error (common with proxy interception)
func isTLSCanceledError(err error) bool {
    if err == nil {
        return false
    }
    errStr := strings.ToLower(err.Error())
    // Check for various TLS and connection errors that can occur with proxy interception
    return strings.Contains(errStr, "tls: user canceled") || 
           strings.Contains(errStr, "user canceled") ||
           strings.Cont

Download .txt

gitextract_f1fvidta/

├── .gitignore
├── .jshunterignore.example
├── CHANGELOG.md
├── CREDITS.md
├── LICENSE
├── README.md
├── RULES.md
├── cmd/
│   └── jshunter/
│       └── main.go
├── go.mod
├── go.sum
├── internal/
│   └── jshunter/
│       ├── aws_pair.go
│       ├── cache.go
│       ├── concurrent_verify.go
│       ├── crawler.go
│       ├── csp.go
│       ├── detection.go
│       ├── diff.go
│       ├── har.go
│       ├── html_extract.go
│       ├── ignore.go
│       ├── jshunter.go
│       ├── ndjson.go
│       ├── robots.go
│       ├── rules_cli.go
│       ├── rules_loader.go
│       ├── sarif.go
│       ├── sourcemap.go
│       ├── stats.go
│       └── verify.go
└── patterns.json

Download .txt

SYMBOL INDEX (235 symbols across 20 files)

FILE: cmd/jshunter/main.go
  function main (line 5) | func main() {

FILE: internal/jshunter/aws_pair.go
  constant awsService (line 26) | awsService = "sts"
  constant awsRegion (line 27) | awsRegion  = "us-east-1"
  constant awsHost (line 28) | awsHost    = "sts.amazonaws.com"
  type AWSPair (line 32) | type AWSPair struct
  function pairAWSCredentials (line 43) | func pairAWSCredentials() []AWSPair {
  function verifyAWSPair (line 84) | func verifyAWSPair(ctx context.Context, client *http.Client, p AWSPair) ...
  function awsDeriveSigningKey (line 169) | func awsDeriveSigningKey(secret, dateStr, region, service string) []byte {
  function sha256Hex (line 176) | func sha256Hex(b []byte) string {
  function hmacBytes (line 181) | func hmacBytes(key, msg []byte) []byte {
  function hmacHex (line 187) | func hmacHex(key, msg []byte) string {

FILE: internal/jshunter/cache.go
  type cacheMeta (line 29) | type cacheMeta struct
  type DiskCache (line 39) | type DiskCache struct
    method keyFor (line 54) | func (c *DiskCache) keyFor(u string) string {
    method bodyPath (line 59) | func (c *DiskCache) bodyPath(u string) string {
    method metaPath (line 63) | func (c *DiskCache) metaPath(u string) string {
    method Lookup (line 69) | func (c *DiskCache) Lookup(u string) (body []byte, meta *cacheMeta, ok...
    method Store (line 93) | func (c *DiskCache) Store(u string, resp *http.Response, body []byte) ...
    method AttachConditional (line 128) | func (c *DiskCache) AttachConditional(req *http.Request) {
  function NewDiskCache (line 44) | func NewDiskCache(dir string) (*DiskCache, error) {

FILE: internal/jshunter/concurrent_verify.go
  function VerifyAllConcurrent (line 19) | func VerifyAllConcurrent(findings []*Finding, client *http.Client, timeo...

FILE: internal/jshunter/crawler.go
  constant defaultPerHostConcurrency (line 18) | defaultPerHostConcurrency = 4
  constant defaultBreakerThreshold (line 19) | defaultBreakerThreshold   = 5
  constant defaultBreakerCooldown (line 20) | defaultBreakerCooldown    = 30 * time.Second
  type hostController (line 26) | type hostController struct
    method host (line 54) | func (c *hostController) host(h string) *hostState {
    method acquire (line 68) | func (c *hostController) acquire(host string) (release func(), allowed...
    method recordOutcome (line 86) | func (c *hostController) recordOutcome(host string, status int, retryA...
  type hostState (line 32) | type hostState struct
  function getHostController (line 43) | func getHostController() *hostController {
  function parseRetryAfter (line 113) | func parseRetryAfter(h http.Header) time.Duration {
  function backoffWithJitter (line 133) | func backoffWithJitter(attempt int) time.Duration {
  function hostOf (line 146) | func hostOf(rawURL string) string {
  function describeBreaker (line 155) | func describeBreaker(host string) string {

FILE: internal/jshunter/csp.go
  function ParseCSPOrigins (line 13) | func ParseCSPOrigins(policy string) []string {

FILE: internal/jshunter/detection.go
  constant SchemaVersion (line 20) | SchemaVersion        = 2
  constant DefaultMinConfidence (line 21) | DefaultMinConfidence = 0.50
  constant DefaultMaxBytes (line 22) | DefaultMaxBytes      = 32 * 1024 * 1024
  constant contextWindow (line 23) | contextWindow        = 96
  type Severity (line 26) | type Severity
  constant SevCritical (line 29) | SevCritical Severity = "critical"
  constant SevHigh (line 30) | SevHigh     Severity = "high"
  constant SevMedium (line 31) | SevMedium   Severity = "medium"
  constant SevLow (line 32) | SevLow      Severity = "low"
  constant SevInfo (line 33) | SevInfo     Severity = "info"
  type Rule (line 37) | type Rule struct
  type Location (line 58) | type Location struct
  type Finding (line 67) | type Finding struct
  function registerRules (line 158) | func registerRules() {
  function shannonEntropy (line 614) | func shannonEntropy(s string) float64 {
  function charClassDiversity (line 633) | func charClassDiversity(s string) int {
  function redactValue (line 665) | func redactValue(v string) string {
  function hashValue (line 679) | func hashValue(v string) string {
  function looksLikeFixture (line 685) | func looksLikeFixture(context string) bool {
  function hasContextKeyword (line 697) | func hasContextKeyword(context string, kws []string) bool {
  function isInVendorNoise (line 711) | func isInVendorNoise(v string) (bool, string) {
  function extractContextWindow (line 724) | func extractContextWindow(body string, start, end int) string {
  function scoreFinding (line 740) | func scoreFinding(rule *Rule, value, context, source string) (bool, floa...
  function recordFinding (line 849) | func recordFinding(f *Finding) *Finding {
  function flushFindings (line 881) | func flushFindings() []*Finding {
  function resetFindings (line 905) | func resetFindings() {
  function analyzeBody (line 915) | func analyzeBody(source string, body []byte, minConfidence float64) []*F...
  function positionAt (line 987) | func positionAt(s string, idx int) (line, col int) {
  function lineStartIndex (line 1006) | func lineStartIndex(s string, idx int) int {
  function lineEndIndex (line 1021) | func lineEndIndex(s string, idx int) int {
  function applyLegacyFPFilter (line 1039) | func applyLegacyFPFilter(name, value, body, source string, start, end in...
  function validateAWSAccessKeyID (line 1123) | func validateAWSAccessKeyID(v string) (bool, []string) {
  function validateAWSSecretKey (line 1146) | func validateAWSSecretKey(v string) (bool, []string) {
  function validateStripeKey (line 1161) | func validateStripeKey(v string) (bool, []string) {
  function validateGitHubToken (line 1189) | func validateGitHubToken(v string) (bool, []string) {
  function base62EncodeCRC32 (line 1212) | func base62EncodeCRC32(n uint32) string {
  function validateSlackToken (line 1230) | func validateSlackToken(v string) (bool, []string) {
  function validateTwilioSK (line 1250) | func validateTwilioSK(v string) (bool, []string) {
  function validateJWT (line 1269) | func validateJWT(v string) (bool, []string) {
  type SelfTestResult (line 1304) | type SelfTestResult struct
  function runSelfTest (line 1318) | func runSelfTest() []SelfTestResult {

FILE: internal/jshunter/diff.go
  function DiffPrevious (line 18) | func DiffPrevious(path string) (map[string]bool, error) {

FILE: internal/jshunter/har.go
  type harFile (line 15) | type harFile struct
  type harEntry (line 21) | type harEntry struct
  function IngestHAR (line 38) | func IngestHAR(path string, config *Config) (int, error) {
  function harBase64Decode (line 88) | func harBase64Decode(b []byte) ([]byte, error) {

FILE: internal/jshunter/html_extract.go
  type HTMLArtifacts (line 16) | type HTMLArtifacts struct
  type InlineScript (line 23) | type InlineScript struct
  type ExternalJS (line 33) | type ExternalJS struct
  function ExtractFromHTML (line 44) | func ExtractFromHTML(body []byte) (*HTMLArtifacts, error) {
  function readUntilEndTag (line 119) | func readUntilEndTag(z *html.Tokenizer, tag string) (string, error) {
  function tagAttrs (line 137) | func tagAttrs(t html.Token) map[string]string {
  function hasAttr (line 145) | func hasAttr(t html.Token, name string) bool {
  function looksLikeHTML (line 157) | func looksLikeHTML(body []byte, contentType string) bool {

FILE: internal/jshunter/ignore.go
  type IgnoreEntry (line 23) | type IgnoreEntry struct
  type IgnoreList (line 29) | type IgnoreList struct
    method ShouldIgnore (line 91) | func (il *IgnoreList) ShouldIgnore(f *Finding) bool {
  function LoadIgnoreFile (line 37) | func LoadIgnoreFile(path string) (*IgnoreList, error) {
  function parseIgnoreReader (line 52) | func parseIgnoreReader(r io.Reader) (*IgnoreList, error) {
  function globMatch (line 118) | func globMatch(pattern, s string) bool {

FILE: internal/jshunter/jshunter.go
  type progressReader (line 261) | type progressReader struct
    method Read (line 269) | func (pr *progressReader) Read(p []byte) (int, error) {
  type flagList (line 283) | type flagList
    method String (line 285) | func (f *flagList) String() string {
    method Set (line 289) | func (f *flagList) Set(value string) error {
  type Config (line 295) | type Config struct
  function Run (line 375) | func Run() {
  function runRobotsCLI (line 904) | func runRobotsCLI(config *Config) {
  function collectHostsFromInput (line 937) | func collectHostsFromInput(config *Config) []string {
  function emitFinalOutput (line 979) | func emitFinalOutput(config *Config) {
  function runSelfTestCLI (line 1039) | func runSelfTestCLI() {
  function looksLikeHTMLContentType (line 1066) | func looksLikeHTMLContentType(ct string) bool {
  function scanHTMLArtifacts (line 1081) | func scanHTMLArtifacts(pageURL string, body []byte, config *Config) {
  function emitCSPOrigins (line 1111) | func emitCSPOrigins(source string, origins []string) {
  function validateTargetURL (line 1121) | func validateTargetURL(urlStr string, allowInternal bool) error {
  function displayAsciiArt (line 1149) | func displayAsciiArt() {
  function customHelp (line 1176) | func customHelp() {
  function processStdin (line 1265) | func processStdin(output, regex, cookies, proxy string, threads int) {
  function isInputFromStdin (line 1278) | func isInputFromStdin() bool {
  function isStdoutTTY (line 1290) | func isStdoutTTY() bool {
  function disableColors (line 1298) | func disableColors() {
  function processJSFile (line 1305) | func processJSFile(jsFile, regex string) {
  function processInputs (line 1314) | func processInputs(url, list, output, regex, cookie, proxy string, threa...
  function processInputsOld (line 1326) | func processInputsOld(url, list, output, regex, cookie, proxy string, th...
  function enqueueURLs (line 1406) | func enqueueURLs(url, list string, urlChannel chan<- string, regex strin...
  function enqueueFromFile (line 1417) | func enqueueFromFile(filename string, urlChannel chan<- string) error {
  function enqueueSingleURL (line 1431) | func enqueueSingleURL(url string, urlChannel chan<- string, regex string) {
  function enqueueFromStdin (line 1439) | func enqueueFromStdin(urlChannel chan<- string) {
  function isTLSCanceledError (line 1451) | func isTLSCanceledError(err error) bool {
  function isJavaScriptContentType (line 1466) | func isJavaScriptContentType(contentType string) bool {
  function isValidStatusCode (line 1496) | func isValidStatusCode(statusCode int) bool {
  function isNonJavaScriptContentType (line 1502) | func isNonJavaScriptContentType(contentType string) bool {
  function shouldProcessResponse (line 1576) | func shouldProcessResponse(resp *http.Response, urlStr string, config *C...
  function searchForSensitiveData (line 1606) | func searchForSensitiveData(urlStr, regex, cookie, proxyStr string, skip...
  function isUnwantedEmail (line 1744) | func isUnwantedEmail(email string) bool {
  function reportMatches (line 1781) | func reportMatches(source string, body []byte, regexPatterns map[string]...
  function getVersionStatus (line 1840) | func getVersionStatus() string {
  function updateTool (line 1876) | func updateTool() {
  function processJSFileForEndpoints (line 2058) | func processJSFileForEndpoints(jsFile, regex, output string) {
  function processInputsForEndpoints (line 2076) | func processInputsForEndpoints(url, list, output, regex, cookie, proxy s...
  function processInputsForEndpointsOld (line 2088) | func processInputsForEndpointsOld(url, list, output, regex, cookie, prox...
  function extractEndpointsFromFile (line 2141) | func extractEndpointsFromFile(filePath, regex string) []string {
  function extractEndpointsFromURL (line 2151) | func extractEndpointsFromURL(urlStr, regex, cookie, proxy string, skipTL...
  function extractEndpointsFromContent (line 2163) | func extractEndpointsFromContent(content, regex, targetDomain string) []...
  function cleanEndpoint (line 2292) | func cleanEndpoint(endpoint string) string {
  function isValidEndpoint (line 2318) | func isValidEndpoint(endpoint string) bool {
  function displayEndpoints (line 2443) | func displayEndpoints(endpoints []string, source string) {
  function writeEndpointsToFile (line 2452) | func writeEndpointsToFile(endpoints []string, outputFile, source string) {
  function contains (line 2470) | func contains(slice []string, item string) bool {
  function createHTTPClientWithConfig (line 2480) | func createHTTPClientWithConfig(config *Config) *http.Client {
  function makeRequestWithRetry (line 2557) | func makeRequestWithRetry(client *http.Client, req *http.Request, config...
  function searchForSensitiveDataWithConfig (line 2615) | func searchForSensitiveDataWithConfig(urlStr string, config *Config) (st...
  function stripJSComments (line 2800) | func stripJSComments(body []byte) []byte {
  function processJSAnalysis (line 2861) | func processJSAnalysis(body []byte, config *Config) []byte {
  function basicDeobfuscate (line 2891) | func basicDeobfuscate(content string) string {
  function extractSourceMap (line 2899) | func extractSourceMap(content string) string {
  function extractEvalContent (line 2909) | func extractEvalContent(content string) string {
  function isObfuscated (line 2921) | func isObfuscated(content string) bool {
  function extractURLParamsWithBaseURLs (line 2933) | func extractURLParamsWithBaseURLs(content, source string) []string {
  function groupParamsByContext (line 3460) | func groupParamsByContext(content string, paramSet map[string]bool) [][]...
  function cleanURL (line 3530) | func cleanURL(urlStr string) string {
  function isValidURL (line 3544) | func isValidURL(urlStr string) bool {
  function isPlaceholderURL (line 3579) | func isPlaceholderURL(urlStr string) bool {
  function isURLInComment (line 3606) | func isURLInComment(context, match string) bool {
  function isMatchInBase64DataURI (line 3649) | func isMatchInBase64DataURI(context, match string) bool {
  function isLikelyBase64MediaData (line 3697) | func isLikelyBase64MediaData(context, match string) bool {
  function looksLikeBase64 (line 3751) | func looksLikeBase64(s string) bool {
  function hasHighBase64Entropy (line 3771) | func hasHighBase64Entropy(s string) bool {
  function isPartOfLargerBase64String (line 3798) | func isPartOfLargerBase64String(context string, matchPos, matchLen int) ...
  function extractDomain (line 3820) | func extractDomain(urlStr string) string {
  function extractBaseDomain (line 3840) | func extractBaseDomain(domain string) string {
  function isSameBaseDomain (line 3872) | func isSameBaseDomain(domain1, domain2 string) bool {
  function isMatchInURL (line 3884) | func isMatchInURL(context, match, sourceDomain string) bool {
  function filterMatchesByDomain (line 3916) | func filterMatchesByDomain(matches []string, sourceURL string) []string {
  function reportMatchesWithConfig (line 3974) | func reportMatchesWithConfig(source string, body []byte, config *Config)...
  function outputJSON (line 4598) | func outputJSON(source string, matchesMap map[string][]string) {
  function outputCSV (line 4611) | func outputCSV(source string, matchesMap map[string][]string) {
  function outputBurp (line 4622) | func outputBurp(source string, matchesMap map[string][]string) {
  function processInputsWithConfig (line 4632) | func processInputsWithConfig(url string, config *Config) {
  function processInputsForEndpointsWithConfig (line 4721) | func processInputsForEndpointsWithConfig(url string, config *Config) {
  function processJSFileWithConfig (line 4769) | func processJSFileWithConfig(jsFile string, config *Config) {
  function processJSFileForEndpointsWithConfig (line 4837) | func processJSFileForEndpointsWithConfig(jsFile string, config *Config) {
  function extractEndpointsFromURLWithConfig (line 4856) | func extractEndpointsFromURLWithConfig(urlStr string, config *Config) []...
  function crawlAndProcessJS (line 4933) | func crawlAndProcessJS(initialURL string, config *Config, depth int, vis...

FILE: internal/jshunter/ndjson.go
  function outputNDJSON (line 12) | func outputNDJSON() {

FILE: internal/jshunter/robots.go
  type RobotsResult (line 24) | type RobotsResult struct
  function FetchRobots (line 36) | func FetchRobots(client *http.Client, baseURL string, ua string) (*Robot...
  function parseRobots (line 73) | func parseRobots(target, ua string, body []byte) *RobotsResult {

FILE: internal/jshunter/rules_cli.go
  function runListRules (line 14) | func runListRules() {
  function runExplainRule (line 46) | func runExplainRule(id string) {
  function applyRuleSelection (line 106) | func applyRuleSelection(only, disable string) int {

FILE: internal/jshunter/rules_loader.go
  type ExternalRule (line 16) | type ExternalRule struct
  function LoadRulesFile (line 41) | func LoadRulesFile(path string) (int, error) {
  function validateAndCompileExternalRules (line 70) | func validateAndCompileExternalRules(ext []ExternalRule) ([]Rule, error) {
  function normalizeSeverity (line 140) | func normalizeSeverity(s string) Severity {

FILE: internal/jshunter/sarif.go
  type SARIFEnvelope (line 13) | type SARIFEnvelope struct
  type SARIFRun (line 19) | type SARIFRun struct
  type SARIFTool (line 24) | type SARIFTool struct
  type SARIFDriver (line 28) | type SARIFDriver struct
  type SARIFRule (line 35) | type SARIFRule struct
  type SARIFText (line 45) | type SARIFText struct
  type SARIFRuleConfiguration (line 49) | type SARIFRuleConfiguration struct
  type SARIFResult (line 53) | type SARIFResult struct
  type SARIFLocation (line 62) | type SARIFLocation struct
  type SARIFPhysicalLocation (line 66) | type SARIFPhysicalLocation struct
  type SARIFArtifactLocation (line 71) | type SARIFArtifactLocation struct
  type SARIFRegion (line 75) | type SARIFRegion struct
  function severityToSARIFLevel (line 83) | func severityToSARIFLevel(sev Severity) string {
  function ToSARIF (line 98) | func ToSARIF() *SARIFEnvelope {
  function outputSARIF (line 177) | func outputSARIF() {

FILE: internal/jshunter/sourcemap.go
  type sourceMap (line 28) | type sourceMap struct
  function FetchAndScanSourceMap (line 47) | func FetchAndScanSourceMap(client *http.Client, baseURL string, body []b...
  function sourceLabel (line 93) | func sourceLabel(baseURL string, sm *sourceMap, idx int) string {
  function fetchSourceMapPayload (line 108) | func fetchSourceMapPayload(client *http.Client, baseURL, ref string, con...
  function decodeDataURI (line 157) | func decodeDataURI(uri string) ([]byte, error) {

FILE: internal/jshunter/stats.go
  type Stats (line 16) | type Stats struct
  function initStats (line 45) | func initStats() *Stats {
  function newRunID (line 56) | func newRunID() string {
  function statInc (line 66) | func statInc(p *int64) {
  function statAdd (line 72) | func statAdd(p *int64, n int64) {
  function printStats (line 81) | func printStats(s *Stats) {

FILE: internal/jshunter/verify.go
  type Verifier (line 17) | type Verifier
  type VerifyResult (line 20) | type VerifyResult struct
  function registerVerifiers (line 39) | func registerVerifiers() {
  type hostLimiter (line 82) | type hostLimiter struct
    method acquire (line 97) | func (h *hostLimiter) acquire(host string) func() {
  function newHostLimiter (line 89) | func newHostLimiter(per int, cooldown time.Duration) *hostLimiter {
  function runVerify (line 114) | func runVerify(ruleID, value string, client *http.Client, timeout time.D...
  function doVerifyRequest (line 128) | func doVerifyRequest(ctx context.Context, client *http.Client, req *http...
  function sanitizeNetErr (line 145) | func sanitizeNetErr(msg string) string {
  type capReader (line 157) | type capReader struct
    method Read (line 163) | func (c *capReader) Read(p []byte) (int, error) {
  function stripeVerify (line 177) | func stripeVerify(ctx context.Context, client *http.Client, value string...
  function githubVerify (line 201) | func githubVerify(ctx context.Context, client *http.Client, value string...
  function openaiVerify (line 231) | func openaiVerify(ctx context.Context, client *http.Client, value string...
  function anthropicVerify (line 254) | func anthropicVerify(ctx context.Context, client *http.Client, value str...
  function slackVerify (line 279) | func slackVerify(ctx context.Context, client *http.Client, value string)...
  function sendgridVerify (line 316) | func sendgridVerify(ctx context.Context, client *http.Client, value stri...
  function mailgunVerify (line 339) | func mailgunVerify(ctx context.Context, client *http.Client, value strin...
  function huggingfaceVerify (line 362) | func huggingfaceVerify(ctx context.Context, client *http.Client, value s...

Download .json

Condensed preview — 30 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (401K chars).

[
  {
    "path": ".gitignore",
    "chars": 243,
    "preview": "# Compiled binary (rebuilt locally; release artifacts attached on GitHub)\n/jshunter\n/jshunter.exe\n/jshunter_*\n/dist/\n\n# "
  },
  {
    "path": ".jshunterignore.example",
    "chars": 838,
    "preview": "# JSHunter ignore file.\n# One entry per line. Blank lines and lines starting with `#` are skipped.\n#\n# Supported kinds:\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 17493,
    "preview": "# Changelog\n\nAll notable changes to JSHunter are tracked here. Dates are ISO-8601.\n\n## [v0.6 — page-aware crawling, sour"
  },
  {
    "path": "CREDITS.md",
    "chars": 3020,
    "preview": "# Credits\n\nJSHunter is a competitive recon tool. Pretending it sprang from nowhere would\nbe dishonest — the secret-detec"
  },
  {
    "path": "LICENSE",
    "chars": 1079,
    "preview": "MIT License\n\nCopyright (c) 2024-2026 Hussain Alsharman\n\nPermission is hereby granted, free of charge, to any person obta"
  },
  {
    "path": "README.md",
    "chars": 20612,
    "preview": "# JSHunter\n\n<div align=\"center\">\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Go Version"
  },
  {
    "path": "RULES.md",
    "chars": 6719,
    "preview": "# Rule schema\n\nJSHunter v0.6 ships with two rule sources: the **built-in registry** (Go code\nin `detection.go`) and **ex"
  },
  {
    "path": "cmd/jshunter/main.go",
    "chars": 101,
    "preview": "package main\n\nimport \"github.com/cc1a2b/jshunter/internal/jshunter\"\n\nfunc main() {\n\tjshunter.Run()\n}\n"
  },
  {
    "path": "go.mod",
    "chars": 99,
    "preview": "module github.com/cc1a2b/jshunter\n\ngo 1.24.0\n\ntoolchain go1.24.5\n\nrequire golang.org/x/net v0.49.0\n"
  },
  {
    "path": "go.sum",
    "chars": 153,
    "preview": "golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=\ngolang.org/x/net v0.49.0/go.mod h1:/ysNB2Evaqve"
  },
  {
    "path": "internal/jshunter/aws_pair.go",
    "chars": 5624,
    "preview": "package jshunter\n\nimport (\n\t\"context\"\n\t\"crypto/hmac\"\n\t\"crypto/sha256\"\n\t\"encoding/hex\"\n\t\"fmt\"\n\t\"io\"\n\t\"net/http\"\n\t\"strings"
  },
  {
    "path": "internal/jshunter/cache.go",
    "chars": 3723,
    "preview": "package jshunter\n\nimport (\n\t\"crypto/sha256\"\n\t\"encoding/hex\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"net/http\"\n\t\"os\"\n\t\"path/filepath\"\n\t"
  },
  {
    "path": "internal/jshunter/concurrent_verify.go",
    "chars": 1692,
    "preview": "package jshunter\n\nimport (\n\t\"context\"\n\t\"net/http\"\n\t\"sync\"\n\t\"time\"\n)\n\n// VerifyAllConcurrent runs liveness probes against"
  },
  {
    "path": "internal/jshunter/crawler.go",
    "chars": 4258,
    "preview": "package jshunter\n\nimport (\n\t\"fmt\"\n\t\"math/rand\"\n\t\"net/http\"\n\t\"net/url\"\n\t\"strconv\"\n\t\"sync\"\n\t\"time\"\n)\n\n// Per-host concurre"
  },
  {
    "path": "internal/jshunter/csp.go",
    "chars": 1765,
    "preview": "package jshunter\n\nimport (\n\t\"strings\"\n)\n\n// ParseCSPOrigins extracts host origins from a Content-Security-Policy\n// head"
  },
  {
    "path": "internal/jshunter/detection.go",
    "chars": 42185,
    "preview": "package jshunter\n\nimport (\n\t\"crypto/sha256\"\n\t\"encoding/base64\"\n\t\"encoding/hex\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"hash/crc32\"\n\t\"m"
  },
  {
    "path": "internal/jshunter/diff.go",
    "chars": 1235,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n)\n\n// DiffPrevious reads a previous schema-v2 envelope and retu"
  },
  {
    "path": "internal/jshunter/har.go",
    "chars": 2778,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/base64\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n\t\"strings\"\n)\n\n// HAR (HTTP Archive) ingestio"
  },
  {
    "path": "internal/jshunter/html_extract.go",
    "chars": 4686,
    "preview": "package jshunter\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"io\"\n\t\"strings\"\n\n\t\"golang.org/x/net/html\"\n)\n\n// HTMLArtifacts is the structu"
  },
  {
    "path": "internal/jshunter/ignore.go",
    "chars": 3218,
    "preview": "package jshunter\n\nimport (\n\t\"bufio\"\n\t\"fmt\"\n\t\"io\"\n\t\"os\"\n\t\"path/filepath\"\n\t\"strings\"\n)\n\n// .jshunterignore is the operator"
  },
  {
    "path": "internal/jshunter/jshunter.go",
    "chars": 212519,
    "preview": "package jshunter\n\nimport (\n    \"bufio\"\n    \"context\"\n    \"crypto/tls\"\n    \"encoding/csv\"\n    \"encoding/json\"\n    \"flag\"\n"
  },
  {
    "path": "internal/jshunter/ndjson.go",
    "chars": 512,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n)\n\n// outputNDJSON streams the dedupe-table snapshot one findin"
  },
  {
    "path": "internal/jshunter/robots.go",
    "chars": 3562,
    "preview": "package jshunter\n\nimport (\n\t\"bufio\"\n\t\"context\"\n\t\"fmt\"\n\t\"io\"\n\t\"net/http\"\n\t\"strings\"\n\t\"time\"\n)\n\n// robots.txt is recon gol"
  },
  {
    "path": "internal/jshunter/rules_cli.go",
    "chars": 4427,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n\t\"sort\"\n\t\"strings\"\n)\n\n// runListRules prints every registered r"
  },
  {
    "path": "internal/jshunter/rules_loader.go",
    "chars": 4990,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n\t\"regexp\"\n\t\"strings\"\n)\n\n// ExternalRule is the JSON-friendly se"
  },
  {
    "path": "internal/jshunter/sarif.go",
    "chars": 5630,
    "preview": "package jshunter\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"os\"\n)\n\n// SARIF 2.1.0 output. Lets JSHunter feed GitHub Code Scanni"
  },
  {
    "path": "internal/jshunter/sourcemap.go",
    "chars": 5644,
    "preview": "package jshunter\n\nimport (\n\t\"context\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"io\"\n\t\"net/http\"\n\t\"net/url\"\n\t\"regexp\"\n\t\"strings\"\n\t\"time\"\n"
  },
  {
    "path": "internal/jshunter/stats.go",
    "chars": 3762,
    "preview": "package jshunter\n\nimport (\n\t\"crypto/rand\"\n\t\"encoding/hex\"\n\t\"fmt\"\n\t\"os\"\n\t\"sync/atomic\"\n\t\"time\"\n)\n\n// Stats is the operato"
  },
  {
    "path": "internal/jshunter/verify.go",
    "chars": 12618,
    "preview": "package jshunter\n\nimport (\n\t\"context\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"io\"\n\t\"net/http\"\n\t\"strings\"\n\t\"sync\"\n\t\"time\"\n)\n\n// Verifie"
  },
  {
    "path": "patterns.json",
    "chars": 545,
    "preview": "{\n  \"ajax_url\": \"\\\\.ajax\\\\s*\\\\(\\\\s*[\\\"'][^\\\"']*[\\\"']\",\n  \"api_endpoint\": \"[\\\"']/api/[a-zA-Z0-9._~:/?#[\\\\]@!$\\u0026'()*+,"
  }
]

About this extraction

This page contains the full source code of the cc1a2b/JShunter GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 30 files (366.9 KB), approximately 96.3k tokens, and a symbol index with 235 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo