Full Code of laramies/theHarvester for AI

master 53e13662409e cached

127 files

2.1 MB

551.8k tokens

777 symbols

1 requests

Download .txt

Showing preview only (2,205K chars total). Download the full file or copy to clipboard to get everything.

Repository: laramies/theHarvester
Branch: master
Commit: 53e13662409e
Files: 127
Total size: 2.1 MB

Directory structure:
gitextract_7wyx50xx/

├── .dockerignore
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   └── issue-template.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── codeql-analysis.yml
│       ├── docker-build-push.yml
│       ├── dockerci.yml
│       └── theHarvester.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── README/
│   ├── CONTRIBUTING.md
│   ├── COPYING
│   └── LICENSES
├── README.md
├── bin/
│   ├── restfulHarvest
│   └── theHarvester
├── docker-compose.yml
├── pyproject.toml
├── tests/
│   ├── __init__.py
│   ├── discovery/
│   │   ├── __init__.py
│   │   ├── test_baidusearch.py
│   │   ├── test_censys.py
│   │   ├── test_certspotter.py
│   │   ├── test_criminalip.py
│   │   ├── test_githubcode.py
│   │   ├── test_githubcode_additions.py
│   │   ├── test_otx.py
│   │   ├── test_rocketreach.py
│   │   ├── test_shodan_engine.py
│   │   └── test_thc.py
│   ├── lib/
│   │   ├── test_core.py
│   │   └── test_output.py
│   ├── test_hackertarget_apikey.py
│   ├── test_mojeek.py
│   ├── test_myparser.py
│   └── test_security.py
└── theHarvester/
    ├── __init__.py
    ├── __main__.py
    ├── data/
    │   ├── proxies.yaml
    │   └── wordlists/
    │       ├── api_endpoints.txt
    │       ├── dns-big.txt
    │       ├── dns-names.txt
    │       ├── dorks.txt
    │       ├── general/
    │       │   └── common.txt
    │       └── names_small.txt
    ├── discovery/
    │   ├── __init__.py
    │   ├── additional_apis.py
    │   ├── api_endpoints.py
    │   ├── baidusearch.py
    │   ├── bevigil.py
    │   ├── bitbucket.py
    │   ├── bravesearch.py
    │   ├── bufferoverun.py
    │   ├── builtwith.py
    │   ├── censysearch.py
    │   ├── certspottersearch.py
    │   ├── chaos.py
    │   ├── commoncrawl.py
    │   ├── constants.py
    │   ├── criminalip.py
    │   ├── crtsh.py
    │   ├── dnssearch.py
    │   ├── duckduckgosearch.py
    │   ├── fofa.py
    │   ├── fullhuntsearch.py
    │   ├── githubcode.py
    │   ├── gitlabsearch.py
    │   ├── hackertarget.py
    │   ├── haveibeenpwned.py
    │   ├── hudsonrocksearch.py
    │   ├── huntersearch.py
    │   ├── intelxsearch.py
    │   ├── leakix.py
    │   ├── leaklookup.py
    │   ├── mojeek.py
    │   ├── netlas.py
    │   ├── onyphe.py
    │   ├── otxsearch.py
    │   ├── pentesttools.py
    │   ├── projectdiscovery.py
    │   ├── rapiddns.py
    │   ├── robtex.py
    │   ├── rocketreach.py
    │   ├── search_dehashed.py
    │   ├── search_dnsdumpster.py
    │   ├── searchhunterhow.py
    │   ├── securityscorecard.py
    │   ├── securitytrailssearch.py
    │   ├── shodansearch.py
    │   ├── subdomaincenter.py
    │   ├── subdomainfinderc99.py
    │   ├── takeover.py
    │   ├── thc.py
    │   ├── threatcrowd.py
    │   ├── tombasearch.py
    │   ├── urlscan.py
    │   ├── venacussearch.py
    │   ├── virustotal.py
    │   ├── waybackarchive.py
    │   ├── whoisxml.py
    │   ├── windvane.py
    │   ├── yahoosearch.py
    │   └── zoomeyesearch.py
    ├── lib/
    │   ├── __init__.py
    │   ├── api/
    │   │   ├── __init__.py
    │   │   ├── additional_endpoints.py
    │   │   ├── api.py
    │   │   ├── api_example.py
    │   │   ├── auth.py
    │   │   └── static/
    │   │       └── .gitkeep
    │   ├── core.py
    │   ├── hostchecker.py
    │   ├── output.py
    │   ├── resolvers.txt
    │   └── stash.py
    ├── parsers/
    │   ├── __init__.py
    │   ├── intelxparser.py
    │   ├── myparser.py
    │   ├── securitytrailsparser.py
    │   └── venacusparser.py
    ├── restfulHarvest.py
    ├── screenshot/
    │   ├── __init__.py
    │   └── screenshot.py
    └── theHarvester.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
.github/*
.gitattributes
.git-blame-ignore-revs
.idea/
.pytest_cache
.mypy_cache
tests/*
README/
bin/
theHarvester-logo.png
theHarvester-logo.webp
CHANGELOG.md


================================================
FILE: .git-blame-ignore-revs
================================================
# #1492 run `black .` and `isort .`
c13843ec0d513ac7f9c35b7bd0501fa46e356415

================================================
FILE: .gitattributes
================================================
# Set the default behavior, which is to have git automatically determine
# whether a file is a text or binary, unless otherwise specified.

* text=auto

# Basic .gitattributes for a python repo.

# Source files
# ============
*.pxd       text diff=python
*.py        text diff=python
*.py3       text diff=python
*.pyw       text diff=python
*.pyx       text diff=python

# Binary files
# ============
*.db        binary
*.p         binary
*.pkl       binary
*.pyc       binary
*.pyd       binary
*.pyo       binary

# Note: .db, .p, and .pkl files are associated with the python modules
# ``pickle``, ``dbm.*``, # ``shelve``, ``marshal``, ``anydbm``, & ``bsddb``
# (among others).


================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: [L1ghtn1ng, NotoriousRebel]
open_collective: # Replace with a single Open Collective username
ko_fi: #
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']


================================================
FILE: .github/ISSUE_TEMPLATE/issue-template.md
================================================
---
name: Issue Template
about: A template for new issues.
title: "[Bug|Feature Request|Other] Short Description of Issue"
labels: ''

---

## Note we do not support installing theHarvester on android

**Feature Request or Bug or Another**
Feature Request | Bug | Other

**Describe the feature request or bug or other**
A clear and concise description of what the bug, feature request,
or other request is.

**To Reproduce**
Steps to reproduce the behaviour:
1. Run tool like this: '...'
2. See error

**Expected behaviour**
A clear and concise description of what you expected to happen.

**Screenshots**
If possible please add screenshots to help explain your problem.

**System Information (System that tool is running on):**
 - OS: [e.g. Windows10]
 - Version [e.g. 2.7]

**Additional context**
Add any other context about the problem here.


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
- package-ecosystem: github-actions
  directory: "/"
  schedule:
    interval: daily
    timezone: Europe/London
- package-ecosystem: uv
  directory: "/"
  schedule:
    interval: daily
    timezone: Europe/London
  open-pull-requests-limit: 10
  target-branch: master
  allow:
  - dependency-type: direct
  - dependency-type: indirect


================================================
FILE: .github/workflows/codeql-analysis.yml
================================================
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
  push:
    branches: [ master, dev ]
  pull_request:
    # The branches below must be a subset of the branches above
    branches: [ master, dev ]
  schedule:
    - cron: '19 11 * * 4'

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-latest

    strategy:
      fail-fast: false
      matrix:
        language: [ 'python' ]
        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
        # Learn more:
        # https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed

    steps:
    - name: Checkout repository
      uses: actions/checkout@v6

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v4
      with:
        languages: ${{ matrix.language }}
        # If you wish to specify custom queries, you can do so here or in a config file.
        # By default, queries listed here will override any specified in a config file.
        # Prefix the list here with "+" to use these queries and those in the config file.
        # queries: ./path/to/local/query, your-org/your-repo/queries@main

    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
    # If this step fails, then you should remove it and run the build manually (see below)
    - name: Autobuild
      uses: github/codeql-action/autobuild@v4

    # ℹ️ Command-line programs to run using the OS shell.
    # 📚 https://git.io/JvXDl

    # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
    #    and modify them (or add more) to build your code if your project
    #    uses a compiled language

    #- run: |
    #   make bootstrap
    #   make release

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v4


================================================
FILE: .github/workflows/docker-build-push.yml
================================================
name: Build and Push Docker Image

on:
  push:
    branches:
      - master

permissions:
  contents: read
  packages: write

jobs:
  build-and-push:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v4

      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v4
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata for Docker
        id: meta
        uses: docker/metadata-action@v6
        with:
          images: ghcr.io/${{ github.repository_owner }}/theharvester
          tags: |
            latest
            type=ref,event=branch
            type=sha

      - name: Build and push Docker image
        uses: docker/build-push-action@v7
        with:
          context: .
          file: Dockerfile
          push: true
          platforms: linux/amd64,linux/arm64
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}


================================================
FILE: .github/workflows/dockerci.yml
================================================
name: TheHarvester Docker Image CI

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Build the Docker image
        run: docker build --tag theharvester .
      - name: Smoke test
        run: docker run --rm theharvester --help | grep restfulHarvest


================================================
FILE: .github/workflows/theHarvester.yml
================================================
name: TheHarvester Python CI

on:
  push:
    branches:
      - '*'

  pull_request:
    branches:
      - '*'

jobs:
  Python:
    runs-on: ${{ matrix.os }}
    strategy:
      max-parallel: 10
      matrix:
        os: [ ubuntu-latest ]
        python-version: [ '3.12', '3.13', '3.14' ]

    steps:
      - uses: actions/checkout@v6
      - name: Install uv
        uses: astral-sh/setup-uv@v7
        with:
          python-version: ${{ matrix.python-version }}
          enable-cache: true
          cache-dependency-glob: "uv.lock"

      - name: Install dependencies
        run: |
          sudo mkdir -p /usr/local/etc/theHarvester
          sudo cp theHarvester/data/*.yaml /usr/local/etc/theHarvester/
          sudo chown -R runner:runner /usr/local/etc/theHarvester/
          uv sync --all-groups --frozen
          echo "$GITHUB_WORKSPACE/.venv/bin" >> $GITHUB_PATH

      - name: Lint with ruff
        uses: astral-sh/ruff-action@v3
        with:
          args: check --fix

      - name: Format with ruff
        uses: astral-sh/ruff-action@v3
        with:
          args: format

      - name: Commit changes for ruff formating and linting
        if: github.event_name == 'push'
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git add .
          git commit -m "Apply ruff fixes and formatting" || true # Use || true to prevent failure if no changes
          git push origin HEAD:${{ github.ref_name }}
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Test with pytest
        run: |
          pytest tests/**

      - name: Run theHarvester module Baidu
        run: |
          theHarvester -d yale.edu -b baidu

      - name: Run theHarvester module CertSpotter
        run: |
          theHarvester -d yale.edu -b certspotter

      - name: Run theHarvester module Crtsh
        run: |
          theHarvester -d hcl.com -b crtsh

      - name: Run theHarvester module DuckDuckGo
        run: |
          theHarvester -d yale.edu -b duckduckgo

      - name: Run theHarvester module HackerTarget
        run: |
          theHarvester -d yale.edu -b hackertarget

      - name: Run theHarvester module Otx
        run: |
          theHarvester -d yale.edu -b otx

      - name: Run theHarvester module RapidDns
        run: |
          theHarvester -d yale.edu -b rapiddns

      - name: Run theHarvester module Urlscan
        run: |
          theHarvester -d yale.edu -b urlscan

      - name: Run theHarvester module Yahoo
        run: |
          theHarvester -d yale.edu -b yahoo

      - name: Run theHarvester module DNS brute force
        run: |
          theHarvester -d yale.edu -c


================================================
FILE: .gitignore
================================================
*.idea
*.pyc
*.sqlite
*.html
*.htm
*.vscode
*.xml
*.json
debug_results.txt
venv
.mypy_cache
.pytest_cache
build/
dist/
theHarvester.egg-info
api-keys.yaml
.DS_Store
.venv
.venv/**
.pyre
.junie

================================================
FILE: CHANGELOG.md
================================================
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [4.10.1] - 2026-02-22

### Changed
- Updated Censys integration to align with current API documentation ([67419190](https://github.com/laramies/theHarvester/commit/67419190)).
- Updated RocketReach integration to align with latest API documentation and tests ([ffc7420d](https://github.com/laramies/theHarvester/commit/ffc7420d)).
- Refactored async file handling in CLI paths: replace blocking path calls with awaited operations and improve path sanitization ([e98bf5bb](https://github.com/laramies/theHarvester/commit/e98bf5bb), [607016a1](https://github.com/laramies/theHarvester/commit/607016a1)).
- Migrated packaging/build configuration to `flit-core` and updated entrypoint/version wiring ([d2cae0be](https://github.com/laramies/theHarvester/commit/d2cae0be)).
- Refactored and standardized output utilities, with new regression tests for output formatting and dedup helpers ([fa2dedd3](https://github.com/laramies/theHarvester/commit/fa2dedd3)).
- Updated dependencies: bump `fastapi`, `playwright`, `ruff`, `ty`, and `uvicorn` ([1dfa6e98](https://github.com/laramies/theHarvester/commit/1dfa6e98), [46865337](https://github.com/laramies/theHarvester/commit/46865337), [c1ac137d](https://github.com/laramies/theHarvester/commit/c1ac137d), [7eaec4da](https://github.com/laramies/theHarvester/commit/7eaec4da)).
- Updated packaging dependency `wheel` to `0.46.3` ([46865337](https://github.com/laramies/theHarvester/commit/46865337)).

### Fixed
- Fixed CriminalIP integration for current API behavior, including safer scan/report handling and hostname normalization (issue #2229) ([06c2fbd9](https://github.com/laramies/theHarvester/commit/06c2fbd9)).
- Fixed Shodan engine processing to return hostnames consistently and avoid worker processing errors (issue #2227) ([419291a3](https://github.com/laramies/theHarvester/commit/419291a3)).
- Fixed Bitbucket search flow so discovery runs successfully ([a1968f71](https://github.com/laramies/theHarvester/commit/a1968f71)).
- Improved module API key error messages for clearer diagnostics ([e1b775e3](https://github.com/laramies/theHarvester/commit/e1b775e3)).
- Improved BuiltWith URL handling logic in CLI processing ([15872350](https://github.com/laramies/theHarvester/commit/15872350)).

## [4.10.0] - 2026-01-18

### Added
- LeakIX API key support and improved request header configuration ([31861c19](https://github.com/laramies/theHarvester/commit/31861c19)).
- Bitbucket API key entry in `theHarvester/data/api-keys.yaml` ([6be673fa](https://github.com/laramies/theHarvester/commit/6be673fa)).
- Fix issue #469 Add socks proxy support ([e38bb8fb](https://github.com/laramies/theHarvester/commit/e38bb8fb)).

### Changed
- CI: switch GitHub workflow to `ruff-action` for linting and formatting ([8ddcd1a8](https://github.com/laramies/theHarvester/commit/8ddcd1a8)).
- Dockerfile: add `apt-get update/upgrade` and clean up apt cache layers ([3a5d504b](https://github.com/laramies/theHarvester/commit/3a5d504b)).
- Dependencies updated: bump `aiodns`, `ruff`, `ty`, `filelock`, and `librt` ([40759146](https://github.com/laramies/theHarvester/commit/40759146)).
- Codebase formatting and lint fixes applied (Ruff) ([7c6dec53](https://github.com/laramies/theHarvester/commit/7c6dec53)).
- Tests: expand proxy parameter default structure to include both `http` and `socks5` fields ([bc2fce07](https://github.com/laramies/theHarvester/commit/bc2fce07)).
- `api-keys.yaml` synchronized with `Core` API key references; add consistency test coverage ([ffe1f3a8](https://github.com/laramies/theHarvester/commit/ffe1f3a8)).

### Removed
- `Core.bing_key()` removed ([814c7811](https://github.com/laramies/theHarvester/commit/814c7811)).

### Fixed
- Fix mypy type-checking errors ([0991356b](https://github.com/laramies/theHarvester/commit/0991356b)).

### Security
- Improve input sanitization and add security-focused tests ([3d7489c9](https://github.com/laramies/theHarvester/commit/3d7489c9)).

[Unreleased]: https://github.com/laramies/theHarvester/compare/06520b40...master
[4.10.1]: https://github.com/laramies/theHarvester/compare/4.10.0...06520b40
[4.10.0]: https://github.com/laramies/theHarvester/compare/4.9.2...4.10.0


================================================
FILE: Dockerfile
================================================
FROM python:3.14-slim-trixie

LABEL maintainer="@jay_townsend1 & @NotoriousRebel1"

RUN useradd -m -u 1000 -s /bin/bash theharvester

RUN apt-get update && apt-get upgrade -yqq && apt-get clean && \
    rm -rf /var/lib/apt/lists/*
# Set workdir and copy project files
WORKDIR /app
COPY . /app

# Create and sync environment using uv
# Compile bytecode for faster startup and install to system site-packages
RUN --mount=from=ghcr.io/astral-sh/uv,source=/uv,target=/bin/uv \
    UV_PROJECT_ENVIRONMENT=/usr/local uv sync --locked --no-dev --no-cache --compile-bytecode

# Use non-root user
USER theharvester

# Expose port if the service listens on 80
EXPOSE 80

# Run the application as theharvester user
ENTRYPOINT ["restfulHarvest", "-H", "0.0.0.0", "-p", "80"]


================================================
FILE: README/CONTRIBUTING.md
================================================
# Contributing to theHarvester Project
Welcome to theHarvester project, so you would like to contribute.
The following below must be met to get accepted.

# CI
Make sure all CI passes and you do not introduce any alerts from ruff

# Unit Tests
For new modules a unit test for that module is required and we use pytest.

# Coding Standards
* No single letter variables and variable names must represent the action that it is performing
* Have static typing on functions etc
* Make sure no errors are reported from mypy
* No issues reported with ruff
 
# Submitting Bugs
If you find a bug in a module that you want to submit an issue for and know how to write python code.
Please create a unit test for that bug(If possible) and submit a fix for it as it would be a big help to the project. 


================================================
FILE: README/COPYING
================================================
                   GNU GENERAL PUBLIC LICENSE
                       Version 2, June 1991

 Copyright (C) 1989, 1991 Free Software Foundation, Inc.
 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The licenses for most software are designed to take away your
freedom to share and change it.  By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users.  This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it.  (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.)  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.

  To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have.  You must make sure that they, too, receive or can get the
source code.  And you must show them these terms so they know their
rights.

  We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.

  Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software.  If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.

  Finally, any free program is threatened constantly by software
patents.  We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary.  To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.

  The precise terms and conditions for copying, distribution and
modification follow.

                    GNU GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License.  The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language.  (Hereinafter, translation is included without limitation in
the term "modification".)  Each licensee is addressed as "you".

Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope.  The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.

  1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.

You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.

  2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.

    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.

In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

  3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:

    a) Accompany it with the complete corresponding machine-readable
    source code, which must be distributed under the terms of Sections
    1 and 2 above on a medium customarily used for software interchange; or,

    b) Accompany it with a written offer, valid for at least three
    years, to give any third party, for a charge no more than your
    cost of physically performing source distribution, a complete
    machine-readable copy of the corresponding source code, to be
    distributed under the terms of Sections 1 and 2 above on a medium
    customarily used for software interchange; or,

    c) Accompany it with the information you received as to the offer
    to distribute corresponding source code.  (This alternative is
    allowed only for noncommercial distribution and only if you
    received the program in object code or executable form with such
    an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for
making modifications to it.  For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.

If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.

  4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License.  Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.

  5. You are not required to accept this License, since you have not
signed it.  However, nothing else grants you permission to modify or
distribute the Program or its derivative works.  These actions are
prohibited by law if you do not accept this License.  Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.

  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions.  You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.

  7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all.  For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.

  8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded.  In such case, this License incorporates
the limitation as if written in the body of this License.

  9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

Each version is given a distinguishing version number.  If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation.  If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.

  10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission.  For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this.  Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.

                            NO WARRANTY

  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.

  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

                     END OF TERMS AND CONDITIONS


================================================
FILE: README/LICENSES
================================================
Released under the GPL v 2.0.

If you did not receive a copy of the GPL, try http://www.gnu.org/.

Copyright 2011 Christian Martorella 

theHarvester is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.

theHarvester is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA


================================================
FILE: README.md
================================================
![theHarvester](https://github.com/laramies/theHarvester/blob/master/theHarvester-logo.webp)

![TheHarvester CI](https://github.com/laramies/theHarvester/workflows/TheHarvester%20Python%20CI/badge.svg) ![TheHarvester Docker Image CI](https://github.com/laramies/theHarvester/workflows/TheHarvester%20Docker%20Image%20CI/badge.svg)
[![Rawsec's CyberSecurity Inventory](https://inventory.raw.pm/img/badges/Rawsec-inventoried-FF5050_flat_without_logo.svg)](https://inventory.raw.pm/)

[![Packaging status](https://repology.org/badge/vertical-allrepos/theharvester.svg)](https://repology.org/project/theharvester/versions)

About
-----
theHarvester is a simple to use, yet powerful tool designed to be used during the reconnaissance stage of a red
team assessment or penetration test. It performs open source intelligence (OSINT) gathering to help determine
a domain's external threat landscape. The tool gathers names, emails, IPs, subdomains, and URLs by using
multiple public resources that include:

Install and dependencies
------------------------
* Python 3.12 or higher.
* https://github.com/laramies/theHarvester/wiki/Installation

Install uv:
   ```bash
   curl -LsSf https://astral.sh/uv/install.sh | sh
   ```

Clone the repository:
   ```bash
   git clone https://github.com/laramies/theHarvester
   cd theHarvester
   ```

Install dependencies and create a virtual environment:
   ```bash
   uv sync
   ```

Run theHarvester:
   ```bash
   uv run theHarvester
   ```

## Development

To install development dependencies:
```bash
uv sync --all-groups
```

To run tests:
```bash
uv run pytest
```

To run linting and formatting:
```bash
uv run ruff check
```
```bash
uv run ruff format
```

Passive modules
---------------

* baidu: Baidu search engine (https://www.baidu.com)

* bevigil: CloudSEK BeVigil scans mobile application for OSINT assets (https://bevigil.com/osint-api)

* brave: Brave search engine - now uses official Brave Search API (https://api-dashboard.search.brave.com)

* bufferoverun: Fast domain name lookups for TLS certificates in IPv4 space (https://tls.bufferover.run)

* builtwith: Find out what websites are built with (https://builtwith.com)

* censys: Uses certificates searches to enumerate subdomains and gather emails (https://censys.io)

* certspotter: Cert Spotter monitors Certificate Transparency logs (https://sslmate.com/certspotter)

* criminalip: Specialized Cyber Threat Intelligence (CTI) search engine (https://www.criminalip.io)

* crtsh: Comodo Certificate search (https://crt.sh)

* dehashed: Take your data security to the next level is (https://dehashed.com)

* dnsdumpster: Domain research tool that can discover hosts related to a domain (https://dnsdumpster.com)

* duckduckgo: DuckDuckGo search engine (https://duckduckgo.com)

* fofa: FOFA search eingine (https://en.fofa.info)

* fullhunt: Next-generation attack surface security platform (https://fullhunt.io)

* github-code: GitHub code search engine (https://www.github.com)

* hackertarget: Online vulnerability scanners and network intelligence to help organizations (https://hackertarget.com)

* haveibeenpwned: Check if your email address is in a data breach (https://haveibeenpwned.com)

* hunter: Hunter search engine (https://hunter.io)

* hunterhow: Internet search engines for security researchers (https://hunter.how)

* intelx: Intelx search engine (https://intelx.io)

* leakix: LeakIX search engine (https://leakix.net)

* leaklookup: Data breach search engine (https://leak-lookup.com)

* mojeek: Mojeek search engine (https://www.mojeek.com)

* netlas: A Shodan or Censys competitor (https://app.netlas.io)

* onyphe: Cyber defense search engine (https://www.onyphe.io)

* otx: AlienVault open threat exchange (https://otx.alienvault.com)

* pentesttools: Cloud-based toolkit for offensive security testing, focused on web applications and network penetration testing (https://pentest-tools.com)

* projecdiscovery: Actively collects and maintains internet-wide assets data, to enhance research and analyse changes around DNS for better insights (https://chaos.projectdiscovery.io)

* rapiddns: DNS query tool which make querying subdomains or sites of a same IP easy (https://rapiddns.io)

* rocketreach: Access real-time verified personal/professional emails, phone numbers, and social media links (https://rocketreach.co)

* securityscorecard: helps TPRM and SOC teams detect, prioritize, and remediate vendor risk across their entire supplier ecosystem at scale (https://securityscorecard.com)

* securityTrails: Security Trails search engine, the world's largest repository of historical DNS data (https://securitytrails.com)

* -s, --shodan: Shodan search engine will search for ports and banners from discovered hosts (https://shodan.io)

* subdomaincenter: A subdomain finder tool used to find subdomains of a given domain (https://www.subdomain.center)

* subdomainfinderc99: A subdomain finder is a tool used to find the subdomains of a given domain (https://subdomainfinder.c99.nl)

* thc: Free subdomain enumeration service with no API key required (https://ip.thc.org)

* threatminer: Data mining for threat intelligence (https://www.threatminer.org)

* tomba: Tomba search engine (https://tomba.io)

* urlscan: A sandbox for the web that is a URL and website scanner (https://urlscan.io)

* venacus: Venacus search engine (https://venacus.com)

* virustotal: Domain search (https://www.virustotal.com)

* whoisxml: Subdomain search (https://subdomains.whoisxmlapi.com/api/pricing)

* yahoo: Yahoo search engine (https://www.yahoo.com)

* windvane: Windvane search engine (https://windvane.lichoin.com)

* zoomeye: China's version of Shodan (https://www.zoomeye.org)

Active modules
--------------
* DNS brute force: dictionary brute force enumeration
* Screenshots: Take screenshots of subdomains that were found

Modules that require an API key
-------------------------------
Documentation to setup API keys can be found at - https://github.com/laramies/theHarvester/wiki/Installation#api-keys

* bevigil - 50 free queries/month. 1k queries/month $50
* brave - free plan available. Pro plans for higher limits
* bufferoverun - 100 free queries/month. 10k/month $25
* builtwith - 50 free queries ever. $2950/yr
* censys - 500 credits $100
* criminalip - 100 free queries/month. 700k/month $59
* dehashed - 500 credts $15, 5k credits $150
* dnsdumpster - 50 free querries/day, $49
* fofa - query credits 10,000/month. 100k results/month $25
* fullhunt - 50 free queries. 200 queries $29/month, 500 queries $59 
* github-code
* haveibeenpwned - 10 email searches/min $4.50, 50 email searches/min $22
* hunter - 50 free credits/month. 12k credits/yr $34
* hunterhow - 10k free API results per 30 days. 50k API results per 30 days $10
* intelx - free account is very limited. Business acount $2900
* leakix - free 25 results pages, 3000 API requests/month. Bounty Hunter $29
* leaklookup - 20 credits $10, 50 credits $20, 140 credits $50, 300 credits $100
* mojeek - 5000 free credits $6.50, $1.30 CPM (Personal), $2.60 CPM (Startup), $3.90 CPM (Business)
* netlas - 50 free requests/day. 1k requests $49, 10k requests $249
* onyphe - 10M results/month $587
* pentesttools - 5 assets netsec $95/month, 5 assets webnetsec $140/month
* projecdiscovery - requires work email. Free monthly discovery and vulnerability scans on sign-up email domain, enterprise $
* rocketreach - 100 email lookups/month $48, 250 email lookups/month $108
* securityscorecard - requires a work email
* securityTrails - 50 free queries/month. 20k queries/month $500
* shodan - Freelancer $69 month, Small Business $359 month
* tomba - 25 free searches/month. 1k searches/month $39, 5k searches/month $89
* venacus - 1 free search/day. 10 searches/day $12, 30 searches/day $36
* virustotal - 500 free lookups/day, 15.5k lookups/month. Busines accounts requires a work email
* whoisxml - 2k queries $50, 5k queries $105
* windvane - 100 free queries
* zoomeye - 5 free results/day. 30/results/day $190/yr

## Package versions
[![Packaging status](https://repology.org/badge/vertical-allrepos/theharvester.svg)](https://repology.org/project/theharvester/versions)

Comments, bugs, and requests
----------------------------
* [![Twitter Follow](https://img.shields.io/twitter/follow/laramies.svg?style=social&label=Follow)](https://twitter.com/laramies) Christian Martorella @laramies
  cmartorella@edge-security.com
* [![Twitter Follow](https://img.shields.io/twitter/follow/NotoriousRebel1.svg?style=social&label=Follow)](https://twitter.com/NotoriousRebel1) Matthew Brown @NotoriousRebel1
* [![Twitter Follow](https://img.shields.io/twitter/follow/jay_townsend1.svg?style=social&label=Follow)](https://twitter.com/jay_townsend1) Jay "L1ghtn1ng" Townsend @jay_townsend1

Main contributors
-----------------
* [![Twitter Follow](https://img.shields.io/twitter/follow/NotoriousRebel1.svg?style=social&label=Follow)](https://twitter.com/NotoriousRebel1) Matthew Brown @NotoriousRebel1
* [![Twitter Follow](https://img.shields.io/twitter/follow/jay_townsend1.svg?style=social&label=Follow)](https://twitter.com/jay_townsend1) Jay "L1ghtn1ng" Townsend @jay_townsend1
* [![Twitter Follow](https://img.shields.io/twitter/follow/discoverscripts.svg?style=social&label=Follow)](https://twitter.com/discoverscripts) Lee Baird @discoverscripts


Thanks
------
* John Matherly - Shodan project
* Ahmed Aboul Ela - subdomain names dictionaries (big and small)


================================================
FILE: bin/restfulHarvest
================================================
#!/usr/bin/env python3
from theHarvester.restfulHarvest import main

if __name__ == '__main__':
    main()


================================================
FILE: bin/theHarvester
================================================
#!/usr/bin/env python3
# Note: This script runs theHarvester
import sys

from theHarvester.theHarvester import main

if sys.version_info.major < 3 or sys.version_info.minor < 10:
    print('[!] Make sure you have Python 3.10+ installed, quitting.\n\n')
    sys.exit(1)

if __name__ == '__main__':
    main()


================================================
FILE: docker-compose.yml
================================================
services:
  theharvester.svc.local:
    container_name: theHarvester
    volumes:
      - ./theHarvester/data/api-keys.yaml:/root/.theHarvester/api-keys.yaml
      - ./theHarvester/data/api-keys.yaml:/etc/theHarvester/api-keys.yaml
      - ./theHarvester/data/proxies.yaml:/etc/theHarvester/proxies.yaml
      - ./theHarvester/data/proxies.yaml:/root/.theHarvester/proxies.yaml
    build: .
    ports:
      - "5000:80"

networks:
  default:
    name: app_theHarvester_network


================================================
FILE: pyproject.toml
================================================
[project]
name = "theHarvester"
description = "theHarvester is a very simple, yet effective tool designed to be used in the early stages of a penetration test"
readme = "README.md"
license = "GPL-2.0-only"
authors = [
    { name = "Christian Martorella", email = "cmartorella@edge-security.com" },
    { name = "Jay Townsend", email = "jay@cybermon.uk" },
    { name = "Matthew Brown", email = "36310667+NotoriousRebel@users.noreply.github.com" },
]
requires-python = ">=3.12"
urls.Homepage = "https://github.com/laramies/theHarvester"
classifiers = [
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: 3.13",
    "Programming Language :: Python :: 3.14",
    "Operating System :: OS Independent",
]
dynamic = ["version"]
dependencies = [
    "aiodns==4.0.0",
    "aiofiles==25.1.0",
    "aiohttp==3.13.3",
    "aiohttp-socks==0.11.0",
    "aiomultiprocess==0.9.1",
    "aiosqlite==0.22.1",
    "beautifulsoup4==4.14.3",
    "censys==2.2.19",
    "certifi==2026.2.25",
    "dnspython==2.8.0",
    "fastapi==0.135.1",
    "lxml==6.0.2",
    "netaddr==1.3.0",
    "playwright==1.58.0",
    "PyYAML==6.0.3",
    "python-dateutil==2.9.0.post0",
    "httpx==0.28.1",
    "retrying==1.4.2",
    "shodan==1.31.0",
    "slowapi==0.1.9",
    "ujson==5.12.0",
    "uvicorn==0.41.0",
    "uvloop==0.22.1; platform_system != 'Windows'",
    "winloop==0.4.0; platform_system == 'Windows'",
]

[dependency-groups]
dev = [
    "mypy==1.19.1",
    "mypy-extensions==1.1.0",
    "pytest==9.0.2",
    "pytest-asyncio==1.3.0",
    "types-certifi==2021.10.8.3",
    "types-chardet==5.0.4.6",
    "types-python-dateutil==2.9.0.20260305",
    "types-PyYAML==6.0.12.20250915",
    "ruff==0.15.5",
    "types-ujson==5.10.0.20250822",
    "wheel==0.46.3",
    "ty==0.0.21",
]

[project.scripts]
theHarvester = "theHarvester.theHarvester:main"
restfulHarvest = "theHarvester.restfulHarvest:main"

[tool.pytest.ini_options]
minversion = "8.3.3"
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
addopts = "--no-header"
testpaths = ["tests"]

[build-system]
requires = ["flit_core >=3.11,<4"]
build-backend = "flit_core.buildapi"

[tool.mypy]
python_version = "3.13"
warn_unused_configs = true
ignore_missing_imports = true
show_traceback = true
show_error_codes = true
namespace_packages = true
check_untyped_defs = true

[tool.uv]
python-preference = "managed"

[tool.uv.pip]
python-version = "3.13"

[tool.ty.src]
respect-ignore-files = false
exclude = [
    ".venv/**",
    "tests/**",
    ".github/*"
]

[tool.ruff]
# Exclude a variety of commonly ignored directories.
exclude = [
    "tests",
    ".eggs",
    ".git",
    ".git-rewrite",
    ".mypy_cache",
    ".pyenv",
    ".pytest_cache",
    ".pytype",
    ".ruff_cache",
    ".github",
    ".venv",
    ".vscode",
    ".idea",
    "__pypackages__",
    "build",
    "dist",
    "site-packages",
    "venv",
]

line-length = 130
target-version = "py313"
show-fixes = true

[tool.ruff.lint]
select = ["E",
    "F",
    "N",
    "I",
    "UP",
    "TCH",
    "FA",
    "RUF",
    "PT",
    "TC",
    "ASYNC"
    ]
ignore = [
    "E501",
    "ASYNC230",
    "N999",
    "PLR0915"
    ]

# Allow fix for all enabled rules (when `--fix`) is provided.
fixable = ["ALL"]
unfixable = []

# Allow unused variables when underscore-prefixed.
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"

[tool.ruff.format]
# Like Black, use double quotes for strings.
quote-style = "single"
indent-style = "space"

# Like Black, respect magic trailing commas.
skip-magic-trailing-comma = false

# Like Black, automatically detect the appropriate line ending.
line-ending = "auto"


================================================
FILE: tests/__init__.py
================================================


================================================
FILE: tests/discovery/__init__.py
================================================


================================================
FILE: tests/discovery/test_baidusearch.py
================================================
import pytest

from theHarvester.discovery import baidusearch


class TestBaiduSearch:
    @pytest.mark.asyncio
    async def test_process_and_parsing(self, monkeypatch):
        called = {}

        async def fake_fetch_all(urls, headers=None, proxy=False):
            called["urls"] = urls
            called["headers"] = headers
            called["proxy"] = proxy
            return [
                "Contact foo@example.com on a.example.com \n",
                " bar@sub.example.com is here and www.example.com appears \n",
                " Visit sub.a.example.com. baz@example.com \n",
            ]

        # Patch the AsyncFetcher.fetch_all to avoid network I/O
        import theHarvester.lib.core as core_module

        monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
        # Make user agent deterministic (not strictly necessary, but stable)
        monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)

        search = baidusearch.SearchBaidu(word="example.com", limit=21)
        await search.process(proxy=True)

        expected_urls = [
            "https://www.baidu.com/s?wd=%40example.com&pn=0&oq=example.com",
            "https://www.baidu.com/s?wd=%40example.com&pn=10&oq=example.com",
            "https://www.baidu.com/s?wd=%40example.com&pn=20&oq=example.com",
        ]
        assert called["urls"] == expected_urls
        assert called["proxy"] is True

        emails = await search.get_emails()
        hosts = await search.get_hostnames()

        # Ensure our expected values are present
        assert "foo@example.com" in emails
        assert "bar@sub.example.com" in emails
        assert "baz@example.com" in emails

        assert {"a.example.com", "www.example.com", "sub.a.example.com"} <= set(hosts)

    @pytest.mark.asyncio
    async def test_pagination_limit_exclusive(self, monkeypatch):
        captured = {}

        async def fake_fetch_all(urls, headers=None, proxy=False):
            captured["urls"] = urls
            return [""] * len(urls)

        import theHarvester.lib.core as core_module

        monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
        monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)

        search = baidusearch.SearchBaidu(word="example.com", limit=20)
        await search.process()

        # For limit=20, range(0, 20, 10) yields 0 and 10 only (20 is excluded)
        assert captured["urls"] == [
            "https://www.baidu.com/s?wd=%40example.com&pn=0&oq=example.com",
            "https://www.baidu.com/s?wd=%40example.com&pn=10&oq=example.com",
        ]


================================================
FILE: tests/discovery/test_censys.py
================================================
import sys
import types

import pytest

if 'aiohttp_socks' not in sys.modules:
    aiohttp_socks_stub = types.ModuleType('aiohttp_socks')

    class _ProxyConnector:
        @staticmethod
        def from_url(*_args, **_kwargs):
            return None

    setattr(aiohttp_socks_stub, 'ProxyConnector', _ProxyConnector)
    sys.modules['aiohttp_socks'] = aiohttp_socks_stub

from theHarvester.discovery import censysearch
from theHarvester.discovery.constants import MissingKey


class _FakeQuery:
    def __init__(self, pages):
        self.pages = pages

    def __iter__(self):
        return iter(self.pages)


@pytest.mark.asyncio
async def test_missing_key_raises(monkeypatch) -> None:
    monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: (None, None))

    with pytest.raises(MissingKey):
        censysearch.SearchCensys('example.com')


@pytest.mark.asyncio
async def test_search_uses_documented_pagination_and_fields(monkeypatch) -> None:
    monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: ('id', 'secret'))

    calls = {}

    class _FakeCensysCerts:
        def __init__(self, api_id, api_secret, user_agent):
            calls['init'] = {'api_id': api_id, 'api_secret': api_secret, 'user_agent': user_agent}

        def search(self, **kwargs):
            calls['search'] = kwargs
            return _FakeQuery(
                [
                    [
                        {'names': ['a.example.com'], 'parsed': {'subject': {'email_address': 'admin@example.com'}}},
                        {'names': ['b.example.com'], 'parsed': {'subject': {'email_address': ['ops@example.com']}}},
                    ],
                    [
                        {'names': ['c.example.com'], 'parsed': {'subject': {'email_address': None}}},
                    ],
                ]
            )

    monkeypatch.setattr(censysearch, 'CensysCerts', _FakeCensysCerts)

    search = censysearch.SearchCensys('example.com', limit=250)
    await search.process()

    assert calls['init']['api_id'] == 'id'
    assert calls['init']['api_secret'] == 'secret'
    assert calls['search']['query'] == 'names: example.com'
    assert calls['search']['per_page'] == 100
    assert calls['search']['pages'] == 3
    assert calls['search']['fields'] == ['names', 'parsed.subject.email_address']
    assert await search.get_hostnames() == {'a.example.com', 'b.example.com', 'c.example.com'}
    assert await search.get_emails() == {'admin@example.com', 'ops@example.com'}


@pytest.mark.asyncio
async def test_search_respects_limit_across_page_data(monkeypatch) -> None:
    monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: ('id', 'secret'))

    class _FakeCensysCerts:
        def __init__(self, api_id, api_secret, user_agent):
            del api_id, api_secret, user_agent

        def search(self, **kwargs):
            del kwargs
            return _FakeQuery(
                [
                    [
                        {'names': ['1.example.com']},
                        {'names': ['2.example.com']},
                        {'names': ['3.example.com']},
                        {'names': ['4.example.com']},
                        {'names': ['5.example.com']},
                    ]
                ]
            )

    monkeypatch.setattr(censysearch, 'CensysCerts', _FakeCensysCerts)

    search = censysearch.SearchCensys('example.com', limit=3)
    await search.process()

    assert await search.get_hostnames() == {'1.example.com', '2.example.com', '3.example.com'}


================================================
FILE: tests/discovery/test_certspotter.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import os
from typing import Optional

import pytest
import httpx

from theHarvester.discovery import certspottersearch
from theHarvester.lib.core import *

github_ci: Optional[str] = os.getenv(
    "GITHUB_ACTIONS"
)  # Github set this to be the following: true instead of True


class TestCertspotter(object):
    @staticmethod
    def domain() -> str:
        return "metasploit.com"


@pytest.mark.skipif(github_ci == 'true', reason="Skipping this test for now")
class TestCertspotterSearch(object):
    @pytest.mark.asyncio
    async def test_api(self) -> None:
        base_url = f"https://api.certspotter.com/v1/issuances?domain={TestCertspotter.domain()}&expand=dns_names"
        headers = {"User-Agent": Core.get_user_agent()}
        request = httpx.get(base_url, headers=headers)
        assert request.status_code == 200

    @pytest.mark.asyncio
    async def test_search(self) -> None:
        search = certspottersearch.SearchCertspoter(TestCertspotter.domain())
        await search.process()
        assert isinstance(await search.get_hostnames(), set)


if __name__ == "__main__":
    pytest.main()


================================================
FILE: tests/discovery/test_criminalip.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import pytest

from theHarvester.discovery import criminalip


@pytest.mark.asyncio
async def test_parser_handles_missing_legacy_fields(monkeypatch) -> None:
    monkeypatch.setattr(criminalip.Core, 'criminalip_key', lambda: 'test-key')

    search = criminalip.SearchCriminalIP('example.com')
    payload = {
        'data': {
            'certificates': [{'subject': 'www.example.com'}],
            'connected_domain_subdomain': [{'main_domain': {'domain': 'example.com'}, 'subdomains': [{'domain': 'api.example.com'}]}],
            'connected_ip': [{'ip': '93.184.216.34'}],
            'connected_ip_info': [
                {
                    'asn': '15133',
                    'ip': '93.184.216.34',
                    'domain_list': [{'domain': 'mail.example.com'}],
                }
            ],
            'cookies': [{'domain': '.portal.example.com'}],
            'dns_record': {
                'dns_record_type_a': {'ipv4': [{'ip': '93.184.216.34'}], 'ipv6': []},
                'dns_record_type_ns': ['ns1.example.com.'],
            },
            'html_page_link_domains': [{'domain': 'www.iana.org', 'mapped_ips': [{'ip': '192.0.33.8'}]}],
            'links': [{'url': 'https://docs.example.com/guide'}],
            'mapped_ip': [{'ip': '203.0.113.10'}],
            'network_logs': {
                'data': [{'url': 'https://cdn.example.com/script.js', 'as_number': '64500', 'ip_port': '198.51.100.10:443'}]
            },
            'page_redirections': [[{'url': 'https://login.example.com'}]],
            'subdomains': [{'subdomain_name': 'blog.example.com'}],
        }
    }

    await search.parser(payload)

    hostnames = await search.get_hostnames()
    ips = await search.get_ips()
    asns = await search.get_asns()

    assert {'api.example.com', 'blog.example.com', 'cdn.example.com', 'docs.example.com', 'login.example.com'}.issubset(hostnames)
    assert {'93.184.216.34', '198.51.100.10', '203.0.113.10'}.issubset(ips)
    assert {'15133', '64500'}.issubset(asns)


@pytest.mark.asyncio
async def test_do_search_uses_v2_report_endpoint(monkeypatch) -> None:
    monkeypatch.setattr(criminalip.Core, 'criminalip_key', lambda: 'test-key')
    monkeypatch.setattr(criminalip.Core, 'get_user_agent', lambda: 'test-agent')

    called_urls = []

    async def fake_post_fetch(url, **kwargs):
        assert url == 'https://api.criminalip.io/v1/domain/scan'
        return {'status': 200, 'data': {'scan_id': 12345}}

    async def fake_fetch_all(urls, **kwargs):
        called_urls.append(urls[0])
        if '/v1/domain/status/' in urls[0]:
            return [{'status': 200, 'data': {'scan_percentage': 100}}]
        if '/v2/domain/report/' in urls[0]:
            return [
                {
                    'status': 200,
                    'data': {
                        'certificates': [],
                        'connected_domain_subdomain': [],
                        'connected_ip': [],
                        'connected_ip_info': [],
                        'cookies': [],
                        'dns_record': {},
                        'html_page_link_domains': [],
                        'links': [],
                        'mapped_ip': [],
                        'network_logs': {'data': []},
                        'page_redirections': [],
                        'subdomains': [],
                    },
                }
            ]
        return [{'status': 500}]

    monkeypatch.setattr(criminalip.AsyncFetcher, 'post_fetch', fake_post_fetch)
    monkeypatch.setattr(criminalip.AsyncFetcher, 'fetch_all', fake_fetch_all)

    search = criminalip.SearchCriminalIP('example.com')
    await search.process()

    assert any('/v2/domain/report/12345' in url for url in called_urls)
    assert all('/v1/domain/report/' not in url for url in called_urls)


================================================
FILE: tests/discovery/test_githubcode.py
================================================
from unittest.mock import MagicMock
import pytest
from httpx import Response
from theHarvester.discovery import githubcode
from theHarvester.discovery.constants import MissingKey
from theHarvester.lib.core import Core


class TestSearchGithubCode:
    class OkResponse:
        response = Response(status_code=200)

        # Mocking the json method properly
        def __init__(self):
            self.response = Response(status_code=200)
            object.__setattr__(
                self.response,
                "json",
                MagicMock(
                    return_value={
                        "items": [
                            {"text_matches": [{"fragment": "test1"}]},
                            {"text_matches": [{"fragment": "test2"}]},
                        ]
                    }
                ),
            )

    class FailureResponse:
        def __init__(self):
            self.response = Response(status_code=401)
            object.__setattr__(self.response, "json", MagicMock(return_value={}))

    class RetryResponse:
        def __init__(self):
            self.response = Response(status_code=403)
            object.__setattr__(self.response, "json", MagicMock(return_value={}))

    class MalformedResponse:
        def __init__(self):
            self.response = Response(status_code=200)
            object.__setattr__(
                self.response,
                "json",
                MagicMock(
                    return_value={
                        "items": [
                            {"fail": True},
                            {"text_matches": []},
                            {"text_matches": [{"weird": "result"}]},
                        ]
                    }
                ),
            )

    @pytest.mark.asyncio
    async def test_missing_key(self):
        with pytest.raises(MissingKey):
            Core.github_key = MagicMock(return_value=None)  # type: ignore[method-assign]
            githubcode.SearchGithubCode(word="test", limit=500)

    @pytest.mark.asyncio
    async def test_fragments_from_response(self):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
        test_result = await test_class_instance.fragments_from_response(
            self.OkResponse().response.json()
        )
        print("test_result: ", test_result)
        assert test_result == ["test1", "test2"]

    @pytest.mark.asyncio
    async def test_invalid_fragments_from_response(self):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
        test_result = await test_class_instance.fragments_from_response(
            self.MalformedResponse().response.json()
        )
        assert test_result == []

    @pytest.mark.asyncio
    async def test_next_page(self):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
        test_result = githubcode.SuccessResult(list(), next_page=2, last_page=4)
        assert 2 == await test_class_instance.next_page_or_end(test_result)

    @pytest.mark.asyncio
    async def test_last_page(self):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
        test_result = githubcode.SuccessResult(list(), 0, 0)
        assert await test_class_instance.next_page_or_end(test_result) == 0

    @pytest.mark.asyncio
    async def test_infinite_loop_fix_page_zero(self):
        """Test that the loop condition properly exits when page becomes 0"""
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)

        # Test the fixed condition: page != 0
        page = 0
        counter = 0
        limit = 10

        # The condition should be False when page is 0, preventing infinite loop
        condition_result = counter <= limit and page != 0
        assert condition_result is False, "Loop should exit when page is 0"

    @pytest.mark.asyncio
    async def test_infinite_loop_fix_page_nonzero(self):
        """Test that the loop condition continues when page is non-zero"""
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)

        # Test with non-zero page values
        for page in [1, 2, 3, 10]:
            counter = 0
            limit = 10

            # The condition should be True when page is non-zero
            condition_result = counter <= limit and page != 0
            assert condition_result is True, f"Loop should continue when page is {page}"

    @pytest.mark.asyncio
    async def test_infinite_loop_fix_old_vs_new_condition(self):
        """Test that demonstrates the difference between old and new conditions"""
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)

        page = 0
        counter = 0
        limit = 10

        # Old problematic condition (would cause infinite loop)
        old_condition = counter <= limit and page is not None

        # New fixed condition (properly exits)
        new_condition = counter <= limit and page != 0

        # Old condition would be True (causing infinite loop)
        assert old_condition is True, "Old condition would cause infinite loop when page=0"

        # New condition is False (properly exits)
        assert new_condition is False, "New condition properly exits when page=0"


if __name__ == "__main__":
    pytest.main()


================================================
FILE: tests/discovery/test_githubcode_additions.py
================================================
from unittest.mock import MagicMock, AsyncMock
import asyncio
import pytest
from theHarvester.discovery import githubcode
from theHarvester.lib.core import Core


class TestSearchGithubCodeProcess:
    @pytest.mark.asyncio
    async def test_process_stops_after_max_retries(self, monkeypatch):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        inst = githubcode.SearchGithubCode(word="test", limit=10)

        # Speed up by avoiding actual sleeps
        monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
        monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))

        # Force RetryResult every time
        monkeypatch.setattr(
            inst,
            "handle_response",
            AsyncMock(return_value=githubcode.RetryResult(0)),
        )
        monkeypatch.setattr(
            inst,
            "do_search",
            AsyncMock(return_value=("", {}, 403, {})),
        )

        inst.max_retries = 2
        await inst.process()
        assert inst.page == 0, "Process should stop after exceeding max retries"
        assert inst.retry_count == 3, "Retry count should exceed max_retries before stopping"

    @pytest.mark.asyncio
    async def test_process_stops_on_error_result(self, monkeypatch):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        inst = githubcode.SearchGithubCode(word="test", limit=10)

        monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
        monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))

        # Force ErrorResult
        monkeypatch.setattr(
            inst,
            "handle_response",
            AsyncMock(return_value=githubcode.ErrorResult(500, "err")),
        )
        monkeypatch.setattr(
            inst,
            "do_search",
            AsyncMock(return_value=("", {}, 500, {})),
        )

        await inst.process()
        assert inst.page == 0, "Process should stop on error result to avoid infinite loop"

    @pytest.mark.asyncio
    async def test_process_breaks_on_same_page_pagination(self, monkeypatch):
        Core.github_key = MagicMock(return_value="test_key")  # type: ignore[method-assign]
        inst = githubcode.SearchGithubCode(word="test", limit=10)

        monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
        monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))

        # Force SuccessResult that does not advance the page
        monkeypatch.setattr(
            inst,
            "handle_response",
            AsyncMock(return_value=githubcode.SuccessResult([], next_page=1, last_page=0)),
        )
        monkeypatch.setattr(
            inst,
            "do_search",
            AsyncMock(return_value=("", {"items": []}, 200, {})),
        )

        await inst.process()
        assert inst.page == 0, "Process should stop when pagination does not advance"


================================================
FILE: tests/discovery/test_otx.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import os
from typing import Optional
import httpx
import pytest

from theHarvester.discovery import otxsearch
from theHarvester.lib.core import *

github_ci: Optional[str] = os.getenv(
    "GITHUB_ACTIONS"
)  # Github set this to be the following: true instead of True


class TestOtx(object):
    @staticmethod
    def domain() -> str:
        return "apple.com"

    @pytest.mark.asyncio
    async def test_search(self) -> None:
        search = otxsearch.SearchOtx(TestOtx.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip("Skipping OTX search due to network error")
        assert isinstance(await search.get_hostnames(), set)
        assert isinstance(await search.get_ips(), set)


if __name__ == "__main__":
    pytest.main()


================================================
FILE: tests/discovery/test_rocketreach.py
================================================
import sys
import types

import pytest

if 'aiohttp_socks' not in sys.modules:
    aiohttp_socks_stub = types.ModuleType('aiohttp_socks')

    class _ProxyConnector:
        @staticmethod
        def from_url(*_args, **_kwargs):
            return None

    setattr(aiohttp_socks_stub, 'ProxyConnector', _ProxyConnector)
    sys.modules['aiohttp_socks'] = aiohttp_socks_stub

from theHarvester.discovery import rocketreach
from theHarvester.discovery.constants import MissingKey


@pytest.mark.asyncio
async def test_missing_key_raises(monkeypatch) -> None:
    monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: None)
    with pytest.raises(MissingKey):
        rocketreach.SearchRocketReach('example.com', 10)


@pytest.mark.asyncio
async def test_do_search_uses_people_data_endpoint_and_start_pagination(monkeypatch) -> None:
    monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: 'test-key')
    monkeypatch.setattr(rocketreach.Core, 'get_user_agent', lambda: 'test-agent')
    monkeypatch.setattr(rocketreach, 'get_delay', lambda: 0)

    async def fake_sleep(_seconds):
        return None

    monkeypatch.setattr(rocketreach.asyncio, 'sleep', fake_sleep)

    calls = []

    async def fake_post_fetch(url, headers=None, data=None, json=False, **kwargs):
        calls.append((url, headers, data, json, kwargs))
        if len(calls) == 1:
            first_page_profiles = []
            for index in range(100):
                first_page_profiles.append(
                    {
                        'linkedin_url': f'https://linkedin.com/in/user{index}',
                        'emails': [{'email': f'user{index}@example.com'}],
                    }
                )
            return {
                'profiles': first_page_profiles,
                'pagination': {'page': 1, 'total': 150},
            }

        second_page_profiles = []
        for index in range(100, 150):
            second_page_profiles.append(
                {
                    'linkedin_url': f'https://linkedin.com/in/user{index}',
                    'emails': [{'email': f'user{index}@example.com'}],
                }
            )
        return {
            'profiles': second_page_profiles,
            'pagination': {'page': 2, 'total': 150},
        }

    monkeypatch.setattr(rocketreach.AsyncFetcher, 'post_fetch', fake_post_fetch)

    search = rocketreach.SearchRocketReach('example.com', 150)
    await search.process()

    assert len(calls) == 2
    first_url, first_headers, first_data, first_json, _ = calls[0]
    second_url, _, second_data, _, _ = calls[1]

    assert first_url == 'https://api.rocketreach.co/api/v2/person/search'
    assert second_url == 'https://api.rocketreach.co/api/v2/person/search'
    assert first_headers['Api-Key'] == 'test-key'
    assert first_headers['User-Agent'] == 'test-agent'
    assert first_json is True
    assert first_data == {'query': {'current_employer_domain': ['example.com']}, 'start': 0, 'page_size': 100}
    assert second_data == {'query': {'current_employer_domain': ['example.com']}, 'start': 100, 'page_size': 50}

    links = await search.get_links()
    emails = await search.get_emails()
    assert len(links) == 150
    assert len(emails) == 150
    assert 'https://linkedin.com/in/user0' in links
    assert 'https://linkedin.com/in/user149' in links
    assert 'user0@example.com' in emails
    assert 'user149@example.com' in emails


@pytest.mark.asyncio
async def test_do_search_stops_on_throttling_message(monkeypatch) -> None:
    monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: 'test-key')
    monkeypatch.setattr(rocketreach.Core, 'get_user_agent', lambda: 'test-agent')
    monkeypatch.setattr(rocketreach, 'get_delay', lambda: 0)

    async def fake_sleep(_seconds):
        return None

    monkeypatch.setattr(rocketreach.asyncio, 'sleep', fake_sleep)

    calls = []

    async def fake_post_fetch(url, headers=None, data=None, json=False, **kwargs):
        calls.append((url, data))
        return {'detail': 'Request was throttled. Credits will become available in 10 seconds.'}

    monkeypatch.setattr(rocketreach.AsyncFetcher, 'post_fetch', fake_post_fetch)

    search = rocketreach.SearchRocketReach('example.com', 10)
    await search.process()

    assert len(calls) == 1


================================================
FILE: tests/discovery/test_shodan_engine.py
================================================
import socket
import sys
from collections import OrderedDict

import pytest


class TestShodanEngine:
    @pytest.mark.asyncio
    async def test_shodan_engine_processes_without_work_item_error_and_yields_hostnames(self, monkeypatch, capsys):
        # Import inside the test so monkeypatching affects the already-imported module namespace.
        import theHarvester.__main__ as main_module

        # Make DNS resolution deterministic and offline.
        monkeypatch.setattr(socket, "gethostbyname", lambda _domain: "1.2.3.4", raising=True)

        # Avoid filesystem/sqlite side effects.
        class DummyStashManager:
            async def do_init(self) -> None:
                return None

            async def store_all(self, domain, all, res_type, source) -> None:  # noqa: A002
                return None

        monkeypatch.setattr(main_module.stash, "StashManager", DummyStashManager, raising=True)

        # Stub Shodan search to avoid network and API key requirements.
        class DummySearchShodan:
            async def search_ip(self, ip):
                return OrderedDict({ip: {"hostnames": ["a.example.com", "b.example.com"]}})

        monkeypatch.setattr(main_module.shodansearch, "SearchShodan", DummySearchShodan, raising=True)

        # Run the CLI path that uses the engine queue/worker (`-b shodan`).
        monkeypatch.setattr(sys, "argv", ["theHarvester", "-d", "example.com", "-b", "shodan"], raising=True)

        with pytest.raises(SystemExit) as excinfo:
            await main_module.start()
        assert excinfo.value.code == 0

        out = capsys.readouterr().out
        assert 'A error occurred while processing a "work item"' not in out
        assert "a.example.com" in out
        assert "b.example.com" in out


================================================
FILE: tests/discovery/test_thc.py
================================================
#!/usr/bin/env python3
# coding=utf-8
"""
Tests for THC (ip.thc.org) discovery module.

THC provides multiple endpoints:
- Subdomain enumeration
- CNAME lookup
- Reverse DNS lookup

API Documentation: https://ip.thc.org/docs/
"""
import os
from typing import Optional

import httpx
import pytest

from theHarvester.discovery import thc
from theHarvester.lib.core import Core

github_ci: Optional[str] = os.getenv('GITHUB_ACTIONS')


# =============================================================================
# 1. Direct API Tests (Endpoint Validation)
# =============================================================================
class TestThcApi:
    """Tests to validate that the THC API responds correctly."""

    @pytest.mark.asyncio
    async def test_api_subdomains_download_endpoint_responds(self) -> None:
        """Verify that the subdomain download endpoint responds."""
        url = 'https://ip.thc.org/api/v1/subdomains/download?domain=google.com&limit=10&hide_header=true'
        headers = {'User-Agent': Core.get_user_agent()}
        try:
            response = httpx.get(url, headers=headers, timeout=30)
            assert response.status_code == 200
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')

    @pytest.mark.asyncio
    async def test_api_subdomains_returns_text_format(self) -> None:
        """Verify that the response is plain text."""
        url = 'https://ip.thc.org/api/v1/subdomains/download?domain=google.com&limit=5&hide_header=true'
        headers = {'User-Agent': Core.get_user_agent()}
        try:
            response = httpx.get(url, headers=headers, timeout=30)
            content_type = response.headers.get('content-type', '')
            assert 'text' in content_type or 'octet-stream' in content_type or response.status_code == 200
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')

    @pytest.mark.asyncio
    async def test_api_cli_subdomain_endpoint(self) -> None:
        """Verify CLI endpoint /sb/{domain}."""
        url = 'https://ip.thc.org/sb/google.com?l=5&noheader'
        headers = {'User-Agent': Core.get_user_agent()}
        try:
            response = httpx.get(url, headers=headers, timeout=30)
            assert response.status_code == 200
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')

    @pytest.mark.asyncio
    async def test_api_returns_rate_limit_headers(self) -> None:
        """Verify that the API returns rate limit headers."""
        url = 'https://ip.thc.org/api/v1/subdomains/download?domain=example.com&limit=1&hide_header=true'
        headers = {'User-Agent': Core.get_user_agent()}
        try:
            response = httpx.get(url, headers=headers, timeout=30)
            assert 'x-ratelimit-limit' in response.headers
            assert 'x-ratelimit-remaining' in response.headers
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')


# =============================================================================
# 2. Subdomain Search Tests (Main Functionality)
# =============================================================================
class TestThcSubdomainSearch:
    """Tests for subdomain search functionality."""

    @staticmethod
    def domain() -> str:
        return 'tesla.com'

    @staticmethod
    def small_domain() -> str:
        return 'thc.org'

    @pytest.mark.asyncio
    async def test_search_returns_set(self) -> None:
        """Verify that get_hostnames() returns a set."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_search_finds_subdomains(self) -> None:
        """Verify that it finds subdomains for a known domain."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        assert len(result) > 0, 'Should find at least one subdomain for tesla.com'

    @pytest.mark.asyncio
    async def test_search_results_contain_target_domain(self) -> None:
        """Verify that all results contain the target domain."""
        search = thc.SearchThc(self.small_domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        for hostname in result:
            assert self.small_domain() in hostname, f'{hostname} should contain {self.small_domain()}'

    @pytest.mark.asyncio
    async def test_search_no_duplicates(self) -> None:
        """Verify that there are no duplicates in the results."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        result_list = list(result)
        assert len(result_list) == len(set(result_list))


# =============================================================================
# 3. Edge Case Tests
# =============================================================================
class TestThcEdgeCases:
    """Tests for edge cases and error handling."""

    @pytest.mark.asyncio
    async def test_search_nonexistent_domain(self) -> None:
        """Verify behavior with non-existent domain."""
        search = thc.SearchThc('this-domain-definitely-does-not-exist-12345.com')
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        except Exception:
            pass
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_search_empty_domain(self) -> None:
        """Verify behavior with empty domain."""
        search = thc.SearchThc('')
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        except Exception:
            pass
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_search_special_characters_domain(self) -> None:
        """Verify behavior with special characters."""
        search = thc.SearchThc('example.com; DROP TABLE domains;--')
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        except Exception:
            pass
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_search_unicode_domain(self) -> None:
        """Verify behavior with IDN/unicode domain."""
        search = thc.SearchThc('xn--mnchen-3ya.de')
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        except Exception:
            pass
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_search_subdomain_as_input(self) -> None:
        """Verify behavior when a subdomain is passed as input."""
        search = thc.SearchThc('www.google.com')
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        assert isinstance(result, set)


# =============================================================================
# 4. Proxy Tests
# =============================================================================
class TestThcProxy:
    """Tests for proxy functionality."""

    @staticmethod
    def domain() -> str:
        return 'example.com'

    @pytest.mark.asyncio
    async def test_process_accepts_proxy_parameter(self) -> None:
        """Verify that process() accepts proxy parameter."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process(proxy=False)
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        assert isinstance(result, set)

    @pytest.mark.asyncio
    async def test_proxy_attribute_is_set(self) -> None:
        """Verify that the proxy attribute is set correctly."""
        search = thc.SearchThc(self.domain())
        assert search.proxy is False


# =============================================================================
# 5. Initialization and Attributes Tests
# =============================================================================
class TestThcInitialization:
    """Tests for class initialization and structure."""

    def test_init_sets_word(self) -> None:
        """Verify that __init__ sets the domain."""
        domain = 'test.com'
        search = thc.SearchThc(domain)
        assert search.word == domain

    def test_init_creates_empty_results(self) -> None:
        """Verify that results is initialized empty."""
        search = thc.SearchThc('test.com')
        assert hasattr(search, 'results')
        assert len(search.results) == 0

    def test_init_proxy_default_false(self) -> None:
        """Verify that proxy is False by default."""
        search = thc.SearchThc('test.com')
        assert search.proxy is False

    def test_init_has_rate_limit_settings(self) -> None:
        """Verify that rate limit settings are initialized."""
        search = thc.SearchThc('test.com')
        assert hasattr(search, 'max_retries')
        assert hasattr(search, 'base_delay')
        assert search.max_retries == 3
        assert search.base_delay == 2

    def test_class_has_required_methods(self) -> None:
        """Verify that the class has the required methods."""
        search = thc.SearchThc('test.com')
        assert hasattr(search, 'do_search')
        assert hasattr(search, 'get_hostnames')
        assert hasattr(search, 'process')
        assert callable(search.do_search)
        assert callable(search.get_hostnames)
        assert callable(search.process)


# =============================================================================
# 6. Response Format Tests
# =============================================================================
class TestThcResponseFormat:
    """Tests to verify response format."""

    @staticmethod
    def domain() -> str:
        return 'github.com'

    @pytest.mark.asyncio
    async def test_hostnames_are_strings(self) -> None:
        """Verify that all hostnames are strings."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        for hostname in result:
            assert isinstance(hostname, str)

    @pytest.mark.asyncio
    async def test_hostnames_are_valid_format(self) -> None:
        """Verify that hostnames have valid format."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        for hostname in result:
            assert ' ' not in hostname
            assert '\n' not in hostname
            assert '\t' not in hostname

    @pytest.mark.asyncio
    async def test_hostnames_are_lowercase(self) -> None:
        """Verify that hostnames are lowercase."""
        search = thc.SearchThc(self.domain())
        try:
            await search.process()
        except (httpx.TimeoutException, httpx.RequestError):
            pytest.skip('Skipping due to network error')
        result = await search.get_hostnames()
        for hostname in result:
            assert hostname == hostname.lower()


# =============================================================================
# 7. Integration Tests with theHarvester
# =============================================================================
@pytest.mark.skipif(github_ci == 'true', reason='Skip integration tests in CI')
class TestThcIntegration:
    """Integration tests with theHarvester framework."""

    @pytest.mark.asyncio
    async def test_module_can_be_imported(self) -> None:
        """Verify that the module can be imported."""
        from theHarvester.discovery import thc as thc_module
        assert thc_module is not None

    @pytest.mark.asyncio
    async def test_search_class_exists(self) -> None:
        """Verify that SearchThc class exists."""
        from theHarvester.discovery import thc as thc_module
        assert hasattr(thc_module, 'SearchThc')

    @pytest.mark.asyncio
    async def test_compatible_with_store_function(self) -> None:
        """Verify compatibility with store function from __main__.py."""
        search = thc.SearchThc('example.com')
        assert hasattr(search, 'process')
        assert hasattr(search, 'get_hostnames')


if __name__ == '__main__':
    pytest.main()


================================================
FILE: tests/lib/test_core.py
================================================
from __future__ import annotations

from pathlib import Path
from typing import Any
from unittest import mock

import pytest
import yaml

import theHarvester.lib.core as core_module
from theHarvester.lib.core import CONFIG_DIRS, DATA_DIR, AsyncFetcher, Core


@pytest.fixture(autouse=True)
def mock_environ(monkeypatch, tmp_path: Path):
    monkeypatch.setenv("HOME", str(tmp_path))


def mock_read_text(mocked: dict[Path, str | Exception]):
    read_text = Path.read_text

    def _read_text(self: Path, *args, **kwargs):
        if result := mocked.get(self):
            if isinstance(result, Exception):
                raise result
            return result
        return read_text(self, *args, **kwargs)

    return _read_text


@pytest.mark.parametrize(
    ("name", "contents", "expected"),
    [
        ("api-keys", "apikeys: {}", {}),
        ("proxies", "http: [localhost:8080]", {"http": ["http://localhost:8080"], "socks5": []}),
    ],
)
@pytest.mark.parametrize("dir", CONFIG_DIRS)
def test_read_config_searches_config_dirs(
    name: str, contents: str, expected: Any, dir: Path, capsys
):
    file = dir.expanduser() / f"{name}.yaml"
    config_files = [d.expanduser() / file.name for d in CONFIG_DIRS]
    side_effect = mock_read_text(
        {f: contents if f == file else FileNotFoundError() for f in config_files}
    )

    with mock.patch("pathlib.Path.read_text", autospec=True, side_effect=side_effect):
        got = Core.api_keys() if name == "api-keys" else Core.proxy_list()

    assert got == expected
    assert f"Read {file.name} from {file}" in capsys.readouterr().out


@pytest.mark.parametrize("name", ("api-keys", "proxies"))
def test_read_config_copies_default_to_home(name: str, capsys):
    file = Path(f"~/.theHarvester/{name}.yaml").expanduser()
    config_files = [d.expanduser() / file.name for d in CONFIG_DIRS]
    side_effect = mock_read_text({f: FileNotFoundError() for f in config_files})

    with mock.patch("pathlib.Path.read_text", autospec=True, side_effect=side_effect):
        got = Core.api_keys() if name == "api-keys" else Core.proxy_list()

    default = yaml.safe_load((DATA_DIR / file.name).read_text())
    expected = (
        default["apikeys"]
        if name == "api-keys"
        else {
            "http": [f"http://{h}" for h in default["http"]] if default.get("http") else [],
            "socks5": [f"socks5://{h}" for h in default["socks5"]] if default.get("socks5") else [],
        }
    )
    assert got == expected
    assert f"Created default {file.name} at {file}" in capsys.readouterr().out
    assert file.exists()


class DummyResponse:
    def __init__(self, text_value: str = 'response-text', json_value: Any = None):
        self.text_value = text_value
        self.json_value = {'ok': True} if json_value is None else json_value

    async def __aenter__(self):
        return self

    async def __aexit__(self, exc_type, exc, tb):
        return False

    async def text(self):
        return self.text_value

    async def json(self):
        return self.json_value


class DummySession:
    instances: list['DummySession'] = []

    def __init__(self, *, headers=None, timeout=None, connector=None):
        self.headers = headers
        self.timeout = timeout
        self.connector = connector
        self.closed = False
        self.requests: list[tuple[str, str, dict[str, Any]]] = []
        DummySession.instances.append(self)

    async def __aenter__(self):
        return self

    async def __aexit__(self, exc_type, exc, tb):
        await self.close()
        return False

    def request(self, method: str, url: str, **kwargs):
        self.requests.append((method, url, kwargs))
        return DummyResponse()

    def get(self, url: str, **kwargs):
        self.requests.append(('GET', url, kwargs))
        return DummyResponse()

    def post(self, url: str, **kwargs):
        self.requests.append(('POST', url, kwargs))
        return DummyResponse(json_value={'posted': True})

    async def close(self):
        self.closed = True


def reset_dummy_sessions() -> None:
    DummySession.instances.clear()


async def fake_sleep(_seconds: float) -> None:
    return None


def test_api_keys_yaml_is_in_sync_with_core_accessors():
    required = core_module.Core._API_KEY_FIELDS
    assert required, "No API-key references were detected in `Core`"

    config = yaml.safe_load((DATA_DIR / "api-keys.yaml").read_text(encoding="utf-8"))
    apikeys = config["apikeys"]

    missing_providers = sorted(set(required) - set(apikeys))
    assert not missing_providers, f"Missing providers in api-keys.yaml: {missing_providers}"

    missing_fields: dict[str, list[str]] = {}
    for provider, fields in required.items():
        for field in sorted(fields):
            if field not in apikeys[provider]:
                missing_fields.setdefault(provider, []).append(field)

    assert not missing_fields, f"Missing fields in api-keys.yaml: {missing_fields}"


@pytest.mark.parametrize(
    ("accessor_name", "expected"),
    [
        ("bevigil_key", "bevigil-key"),
        ("censys_key", ("censys-id", "censys-secret")),
        ("fofa_key", ("fofa-key", "fofa-email")),
        ("tomba_key", ("tomba-key", "tomba-secret")),
    ],
)
def test_api_key_accessors_delegate_to_shared_mapping(monkeypatch, accessor_name: str, expected: Any):
    monkeypatch.setattr(
        Core,
        'api_keys',
        staticmethod(
            lambda: {
                'bevigil': {'key': 'bevigil-key'},
                'censys': {'id': 'censys-id', 'secret': 'censys-secret'},
                'fofa': {'key': 'fofa-key', 'email': 'fofa-email'},
                'tomba': {'key': 'tomba-key', 'secret': 'tomba-secret'},
            }
        ),
    )

    accessor = getattr(Core, accessor_name)
    assert accessor() == expected


@pytest.mark.asyncio
async def test_fetch_creates_session_with_default_headers(monkeypatch) -> None:
    reset_dummy_sessions()
    monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
    monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
    monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
    monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
    monkeypatch.setattr(Core, 'get_user_agent', staticmethod(lambda: 'test-agent'))

    result = await AsyncFetcher.fetch(url='https://example.com', follow_redirects=False)

    assert result == 'response-text'
    assert len(DummySession.instances) == 1
    session = DummySession.instances[0]
    assert session.headers == {'User-Agent': 'test-agent'}
    assert session.closed is True
    assert session.requests == [
        ('GET', 'https://example.com', {'ssl': 'ssl-context', 'allow_redirects': False})
    ]


@pytest.mark.asyncio
async def test_fetch_uses_http_proxy_when_enabled(monkeypatch) -> None:
    reset_dummy_sessions()
    monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
    monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
    monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
    monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
    monkeypatch.setattr(AsyncFetcher, '_get_random_proxy', staticmethod(lambda proxy_dict: ('http://proxy.local:8080', 'http')))

    async def fake_create_connector(proxy_url, proxy_type, ssl_context=None):
        return 'connector'

    monkeypatch.setattr(AsyncFetcher, '_create_connector', fake_create_connector)

    result = await AsyncFetcher.fetch(url='https://example.com', proxy=True)

    assert result == 'response-text'
    session = DummySession.instances[0]
    assert session.connector == 'connector'
    assert session.requests == [
        ('GET', 'https://example.com', {'ssl': 'ssl-context', 'proxy': 'http://proxy.local:8080'})
    ]


@pytest.mark.asyncio
async def test_post_fetch_decodes_string_payload_and_posts_params(monkeypatch) -> None:
    reset_dummy_sessions()
    monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
    monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
    monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
    monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
    monkeypatch.setattr(Core, 'get_user_agent', staticmethod(lambda: 'test-agent'))

    result = await AsyncFetcher.post_fetch(
        'https://example.com/api',
        data='{"query": "example"}',
        params={'page': 2},
        json=True,
    )

    assert result == {'ok': True}
    session = DummySession.instances[0]
    assert session.headers == {'User-Agent': 'test-agent'}
    assert session.requests == [
        ('POST', 'https://example.com/api', {'data': {'query': 'example'}, 'ssl': 'ssl-context', 'params': {'page': 2}})
    ]


@pytest.mark.asyncio
async def test_post_fetch_proxy_branch_uses_get_with_http_proxy(monkeypatch) -> None:
    reset_dummy_sessions()
    created_connectors = []
    monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
    monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
    monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
    monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
    monkeypatch.setattr(AsyncFetcher, '_get_random_proxy', staticmethod(lambda proxy_dict: ('http://proxy.local:8080', 'http')))

    async def fake_create_connector(proxy_url, proxy_type, ssl_context=None):
        created_connectors.append((proxy_url, proxy_type, ssl_context))
        return 'connector'

    monkeypatch.setattr(AsyncFetcher, '_create_connector', fake_create_connector)

    result = await AsyncFetcher.post_fetch('https://example.com/resource', proxy=True)

    assert result == 'response-text'
    assert created_connectors == [('http://proxy.local:8080', 'http', 'ssl-context')]
    session = DummySession.instances[0]
    assert session.connector == 'connector'
    assert session.requests == [
        ('GET', 'https://example.com/resource', {'proxy': 'http://proxy.local:8080'})
    ]


================================================
FILE: tests/lib/test_output.py
================================================
from __future__ import annotations


from theHarvester.lib.output import print_linkedin_sections, sorted_unique


def test_sorted_unique_sorts_and_deduplicates() -> None:
    assert sorted_unique(["b", "a", "b"]) == ["a", "b"]


def test_print_linkedin_sections_prints_links_when_present(capsys) -> None:
    # Regression coverage: the CLI previously never printed LinkedIn links when the list was non-empty.
    print_linkedin_sections(
        engines=["linkedin"],
        people=[],
        links=["https://b.example", "https://a.example", "https://a.example"],
    )

    out = capsys.readouterr().out
    assert "No LinkedIn users found" in out
    assert "LinkedIn Links found: 3" in out
    assert "https://a.example" in out
    assert "https://b.example" in out


def test_print_linkedin_sections_prints_people_and_links(capsys) -> None:
    print_linkedin_sections(
        engines=["rocketreach"],
        people=["bob", "alice", "bob"],
        links=["https://z.example", "https://z.example"],
    )

    out = capsys.readouterr().out
    assert "LinkedIn Users found: 3" in out
    assert "alice" in out
    assert "bob" in out
    assert "LinkedIn Links found: 2" in out
    assert "https://z.example" in out


================================================
FILE: tests/test_hackertarget_apikey.py
================================================
import pytest
from theHarvester.discovery import hackertarget as ht_mod
from theHarvester.lib.core import Core


class TestHackerTargetApiKey:

    @pytest.mark.asyncio
    async def test_do_search_with_apikey(self, monkeypatch):
        # make Core.hackertarget_key return a known key
        monkeypatch.setattr(Core, "hackertarget_key", lambda: "TESTKEY")

        # monkeypatch AsyncFetcher.fetch_all to capture requested URLs
        async def fake_fetch_all(urls, headers=None, proxy=False):
            # ensure apikey present in each URL
            assert all("apikey=TESTKEY" in u for u in urls)
            return ["1.2.3.4,host.example.com\n", "No PTR records found\n"]

        monkeypatch.setattr(ht_mod.AsyncFetcher, "fetch_all", fake_fetch_all)

        s = ht_mod.SearchHackerTarget("example.com")
        await s.do_search()

        # after do_search, total_results should include our fake response (commas replaced by colons)
        assert "1.2.3.4:host.example.com" in s.total_results

    @pytest.mark.asyncio
    async def test_do_search_without_apikey(self, monkeypatch):
        monkeypatch.setattr(Core, "hackertarget_key", lambda: None)

        async def fake_fetch_all(urls, headers=None, proxy=False):
            assert all("apikey=" not in u for u in urls)
            return ["1.2.3.4,host.example.com\n"]

        monkeypatch.setattr(ht_mod.AsyncFetcher, "fetch_all", fake_fetch_all)

        s = ht_mod.SearchHackerTarget("example.com")
        await s.do_search()
        assert "1.2.3.4:host.example.com" in s.total_results


================================================
FILE: tests/test_mojeek.py
================================================
import pytest
from theHarvester.discovery import mojeek

class TestMojeekSearch:

    @pytest.mark.asyncio
    async def test_process_and_parsing(self, monkeypatch):
        called = {}

        async def fake_fetch_all(urls, headers=None, proxy=False):
            called["urls"] = urls
            called["headers"] = headers
            called["proxy"] = proxy
            return [
                "Contact admin@exemple.com sur www.exemple.com \n",
                " dev@exemple.com est présent sur api.exemple.com \n"
            ]

        import theHarvester.lib.core as core_module
        monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
        monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)

        search = mojeek.SearchMojeek(word="exemple.com", limit=20)
        await search.process(proxy=True)

        expected_urls = [
            "https://www.mojeek.com/search?q=%40exemple.com&s=0",
            "https://www.mojeek.com/search?q=%40exemple.com&s=10"
        ]
        
        assert any("mojeek.com" in url for url in called["urls"])
        
        emails = await search.get_emails()
        hosts = await search.get_hostnames()

        assert "admin@exemple.com" in emails
        assert "dev@exemple.com" in emails
        assert "www.exemple.com" in hosts
        assert "api.exemple.com" in hosts

    @pytest.mark.asyncio
    async def test_pagination_limit(self, monkeypatch):
        captured = {}

        async def fake_fetch_all(urls, headers=None, proxy=False):
            captured["urls"] = urls
            return [""] * len(urls)

        import theHarvester.lib.core as core_module
        monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
        monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)

        search = mojeek.SearchMojeek(word="exemple.com", limit=10)
        await search.process()
        
        assert len(captured["urls"]) == 1


================================================
FILE: tests/test_myparser.py
================================================
#!/usr/bin/env python3
# coding=utf-8

import pytest

from theHarvester.parsers import myparser


class TestMyParser(object):
    @pytest.mark.asyncio
    async def test_emails(self) -> None:
        word = "domain.com"
        results = "@domain.com***a@domain***banotherdomain.com***c@domain.com***d@sub.domain.com***"
        parse = myparser.Parser(results, word)
        emails = sorted(await parse.emails())
        assert emails, ["c@domain.com", "d@sub.domain.com"]


if __name__ == "__main__":
    pytest.main()


================================================
FILE: tests/test_security.py
================================================
import os
import re
import tempfile
from pathlib import Path

import pytest
from fastapi.testclient import TestClient

from theHarvester.__main__ import sanitize_filename, sanitize_for_xml


class TestCORSConfiguration:
    """Test CORS security configuration."""

    def test_cors_does_not_allow_credentials_with_wildcard_origins(self):
        """
        Security Test: CORS should not allow credentials with wildcard origins.

        This prevents credential theft attacks where any origin can make
        authenticated requests to the API.
        """
        from theHarvester.lib.api.api import app

        # Find CORS middleware in the app
        cors_middleware = None
        for middleware in app.user_middleware:
            if 'CORSMiddleware' in str(middleware.cls):
                cors_middleware = middleware
                break

        assert cors_middleware is not None, 'CORS middleware should be configured'

        # Check that if allow_origins contains '*', allow_credentials must be False
        # Access kwargs from the middleware
        options = cors_middleware.kwargs
        allow_origins = options.get('allow_origins', [])
        allow_credentials = options.get('allow_credentials', False)

        if isinstance(allow_origins, (list, tuple, set)) and '*' in allow_origins:
            assert (
                allow_credentials is False
            ), 'CRITICAL: CORS must not allow credentials with wildcard origins (CVE risk)'

    def test_cors_restricts_http_methods(self):
        """
        Security Test: CORS should restrict HTTP methods to only what's needed.

        Reduces attack surface by limiting available methods.
        """
        from theHarvester.lib.api.api import app

        cors_middleware = None
        for middleware in app.user_middleware:
            if 'CORSMiddleware' in str(middleware.cls):
                cors_middleware = middleware
                break

        assert cors_middleware is not None

        options = cors_middleware.kwargs
        allow_methods = options.get('allow_methods', [])

        # Should not allow all methods
        assert allow_methods != ['*'], 'CORS should restrict HTTP methods, not allow all (*)'

        # Should only allow necessary methods (GET, POST for this API)
        if isinstance(allow_methods, list):
            dangerous_methods = {'DELETE', 'PUT', 'PATCH', 'TRACE', 'CONNECT'}
            allowed_set = {m.upper() for m in allow_methods}
            assert not (
                allowed_set & dangerous_methods
            ), f'Unnecessary HTTP methods detected: {allowed_set & dangerous_methods}'


class TestXMLInjectionPrevention:
    """Test XML injection prevention."""

    def test_sanitize_for_xml_escapes_special_characters(self):
        """
        Security Test: Verify XML special characters are properly escaped.

        Prevents XML injection attacks.
        """
        # Test all XML special characters
        test_cases = [
            ('&', '&amp;'),
            ('<', '&lt;'),
            ('>', '&gt;'),
            ('"', '&quot;'),
            ("'", '&apos;'),
            ('<script>alert("XSS")</script>', '&lt;script&gt;alert(&quot;XSS&quot;)&lt;/script&gt;'),
            ('user@example.com & <test>', 'user@example.com &amp; &lt;test&gt;'),
            ('Normal text', 'Normal text'),
        ]

        for input_text, expected_output in test_cases:
            result = sanitize_for_xml(input_text)
            assert result == expected_output, f'Failed to properly escape: {input_text}'

    def test_sanitize_for_xml_prevents_xml_entity_injection(self):
        """
        Security Test: Prevent XML entity injection attempts.
        """
        malicious_inputs = [
            '<?xml version="1.0"?><!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>',
            '<!ENTITY xxe SYSTEM "file:///dev/random">',
            '<![CDATA[malicious]]>',
            '&#x3C;script&#x3E;',
        ]

        for malicious_input in malicious_inputs:
            result = sanitize_for_xml(malicious_input)
            # Ensure dangerous characters are escaped
            assert '&lt;' in result or '&amp;' in result, f'Failed to sanitize: {malicious_input}'
            assert '<' not in result or result == malicious_input.replace('<', '&lt;'), f'XML tags not escaped: {malicious_input}'

    def test_command_line_args_are_sanitized_in_xml_output(self):
        """
        Security Test: Command line arguments must be sanitized before XML output.

        This test is a conceptual check - in real usage, ensure the XML writing
        code uses sanitize_for_xml() on all user-controlled data.
        """
        # Simulate dangerous command line arguments
        dangerous_args = [
            '--domain=test.com',
            "--source='<script>alert(1)</script>'",
            '--output="; rm -rf /',
            '--domain=example.com&param=<injection>',
        ]

        for arg in dangerous_args:
            sanitized = sanitize_for_xml(arg)
            # Verify no unescaped XML special characters remain
            assert '<script>' not in sanitized, f'Script tag not escaped in: {arg}'
            assert '&param=' not in sanitized or '&amp;' in sanitized, f'Ampersand not escaped in: {arg}'


class TestInformationDisclosure:
    """Test information disclosure prevention."""

    @pytest.fixture
    def client(self):
        """Create a test client for API testing."""
        from theHarvester.lib.api.api import app

        return TestClient(app)

    def test_api_does_not_expose_traceback_in_error_responses(self, client):
        """
        Security Test: API should never expose stack traces to clients.

        Stack traces can reveal sensitive information about the system.
        """
        # Test the /sources endpoint with a simulated error condition
        response = client.get('/sources')

        # Even if there's an error, traceback should not be in response
        if response.status_code >= 400:
            response_data = response.json()
            assert 'traceback' not in response_data, 'Traceback exposed in error response'
            assert 'Traceback' not in str(response_data), 'Traceback text found in response'
            assert 'File "' not in str(response_data), 'File paths exposed in response'

    def test_error_responses_do_not_leak_internal_paths(self, client):
        """
        Security Test: Error messages should not reveal internal file paths.
        """
        # Try various endpoints
        endpoints = ['/sources', '/dnsbrute?domain=test', '/query?domain=test&source=baidu']

        for endpoint in endpoints:
            response = client.get(endpoint)
            response_text = str(response.json() if response.status_code != 200 else {})

            # Check for common path leakage patterns
            path_patterns = [
                r'/home/\w+/',
                r'/usr/local/',
                r'C:\\Users\\',
                r'/var/www/',
                r'site-packages/',
                r'\.py:\d+',  # filename.py:123
            ]

            for pattern in path_patterns:
                matches = re.findall(pattern, response_text)
                assert not matches, f'Internal path leaked in {endpoint}: {matches}'

    def test_debug_mode_does_not_expose_sensitive_info(self, client, monkeypatch):
        """
        Security Test: Even with DEBUG=1, sensitive info should not be exposed to clients.
        """
        # Set DEBUG environment variable
        monkeypatch.setenv('DEBUG', '1')

        # Make request that might trigger an error
        response = client.get('/dnsbrute?domain=')  # Invalid request

        if response.status_code >= 400:
            response_data = response.json()
            # Even with DEBUG=1, traceback should NOT be sent to client
            assert 'traceback' not in response_data, 'DEBUG mode exposes tracebacks to clients'


class TestPathTraversalPrevention:
    """Test path traversal prevention."""

    def test_sanitize_filename_removes_path_components(self):
        """
        Security Test: Filenames should not contain path traversal sequences.
        """
        dangerous_filenames = [
            '../../../etc/passwd',
            '..\\..\\..\\windows\\system32\\config\\sam',
            '/etc/passwd',
            'C:\\Windows\\System32\\config\\sam',
            '../../sensitive_file.txt',
            './../hidden_file',
            'subdir/../../../etc/passwd',
        ]

        for dangerous_filename in dangerous_filenames:
            result = sanitize_filename(dangerous_filename)

            # Should not contain any path separators
            assert '/' not in result, f'Path separator found in sanitized filename: {result}'
            assert '\\' not in result, f'Windows path separator found: {result}'

            # Should not start with .. (parent directory reference at the beginning is most dangerous)
            assert not result.startswith('..'), f'Parent directory reference at start: {result}'

            # Should only be the basename
            assert os.path.dirname(result) == '', f'Path component remains: {result}'

    def test_sanitize_filename_removes_dangerous_characters(self):
        """
        Security Test: Filenames should only contain safe characters.
        """
        test_cases = [
            'file; rm -rf /',
            'file`whoami`.txt',
            'file$(malicious).txt',
            'file|cmd.txt',
            'file&background.txt',
            'normal-file_123.txt',
        ]

        for input_filename in test_cases:
            result = sanitize_filename(input_filename)

            # Should not be empty
            assert len(result) > 0, f'Sanitized filename is empty for: {input_filename}'

            # Should not contain shell special characters
            dangerous_chars = [';', '|', '&', '$', '`', '(', ')', '{', '}', '[', ']', '<', '>']
            for char in dangerous_chars:
                assert char not in result, f'Dangerous character {char} found in: {result}'

            # Should only contain alphanumeric, dash, underscore, and dot
            assert re.match(r'^[a-zA-Z0-9._-]+$', result), f'Invalid characters in sanitized filename: {result}'

    def test_sanitize_filename_prevents_hidden_files(self):
        """
        Security Test: Prevent creation of hidden files.
        """
        hidden_files = ['.bashrc', '.ssh_config', '.env', '..hidden', '.']

        for hidden_file in hidden_files:
            result = sanitize_filename(hidden_file)

            # Should not start with a dot (except for allowed extensions)
            if result:  # If not empty
                assert not result.startswith('.'), f'Hidden file not prevented: {result}'

    def test_filename_sanitization_preserves_safe_filenames(self):
        """
        Security Test: Safe filenames should remain mostly unchanged.
        """
        safe_filenames = [
            'report.json',
            'results_2024-01-17.xml',
            'scan-output.txt',
            'data_file_v2.csv',
        ]

        for safe_filename in safe_filenames:
            result = sanitize_filename(safe_filename)

            # Safe filenames should be preserved (possibly with minor changes)
            assert len(result) > 0, 'Safe filename was completely removed'
            assert '.' in result if '.' in safe_filename else True, 'File extension removed incorrectly'

    def test_path_traversal_in_file_operations(self):
        """
        Integration Test: Verify file operations don't allow path traversal.
        """
        # This tests the actual usage in the code
        from theHarvester.__main__ import sanitize_filename

        # Simulate user input
        user_input = '../../../etc/passwd'
        sanitized = sanitize_filename(user_input)

        # Try to create a file with sanitized name
        with tempfile.TemporaryDirectory() as tmpdir:
            safe_path = os.path.join(tmpdir, sanitized)

            # Ensure the resolved path is still within tmpdir
            assert os.path.commonpath([tmpdir, safe_path]) == tmpdir, 'Path traversal detected!'

            # Verify we can't escape the directory
            assert tmpdir in os.path.abspath(safe_path), 'File path escaped temporary directory'


class TestSecurityBestPractices:
    """Additional security best practices tests."""

    def test_no_hardcoded_secrets_in_code(self):
        """
        Security Test: Ensure no hardcoded secrets in main code files.
        """
        # Check main application files for common secret patterns
        files_to_check = [
            'theHarvester/__main__.py',
            'theHarvester/lib/api/api.py',
            'theHarvester/lib/core.py',
        ]

        # Patterns that might indicate hardcoded secrets
        secret_patterns = [
            r'password\s*=\s*["\'][^"\']+["\']',
            r'api_key\s*=\s*["\'][a-zA-Z0-9]{20,}["\']',
            r'secret\s*=\s*["\'][^"\']+["\']',
            r'token\s*=\s*["\'][a-zA-Z0-9]{20,}["\']',
        ]

        for file_path in files_to_check:
            if os.path.exists(file_path):
                with open(file_path) as f:
                    content = f.read()

                for pattern in secret_patterns:
                    matches = re.findall(pattern, content, re.IGNORECASE)
                    # Filter out obvious non-secrets (like example values, empty strings, variable names)
                    real_matches = [
                        m
                        for m in matches
                        if 'example' not in m.lower()
                        and 'your_' not in m.lower()
                        and '""' not in m
                        and "''" not in m
                    ]
                    assert not real_matches, f'Potential hardcoded secret in {file_path}: {real_matches}'

    def test_api_has_rate_limiting(self):
        """
        Security Test: Verify API endpoints have rate limiting enabled.
        """
        from theHarvester.lib.api.api import app

        # Check that rate limiting is configured
        assert hasattr(app.state, 'limiter'), 'Rate limiter not configured'
        assert app.state.limiter is not None, 'Rate limiter is None'

    def test_sensitive_endpoints_require_validation(self):
        """
        Security Test: Ensure sensitive endpoints validate input.
        """
        from fastapi.testclient import TestClient

        from theHarvester.lib.api.api import app

        client = TestClient(app)

        # Test that endpoints reject invalid input
        # Note: The /query endpoint requires 'source' as a list parameter
        test_cases = [
            ('/dnsbrute?domain=', 400),  # Empty domain should be rejected
        ]

        for endpoint, expected_status in test_cases:
            response = client.get(endpoint)
            assert (
                response.status_code >= 400
            ), f'Endpoint {endpoint} should reject invalid input (got {response.status_code})'

        # Test query endpoint with proper parameter format but invalid domain
        response = client.get('/query?domain=a&source=baidu')  # Too short domain
        # This may or may not fail depending on validation, but we check it doesn't crash
        assert response.status_code in [200, 400, 422, 500], 'Unexpected status code'


if __name__ == '__main__':
    pytest.main([__file__, '-v'])


================================================
FILE: theHarvester/__init__.py
================================================
__version__ = '4.10.1'


================================================
FILE: theHarvester/__main__.py
================================================
import argparse
import asyncio
import os
import re
import secrets
import string
import sys
import time
import traceback
from typing import TYPE_CHECKING, Any

import anyio
import netaddr
import ujson
from aiomultiprocess import Pool

from theHarvester.discovery import (
    api_endpoints,
    baidusearch,
    bevigil,
    bitbucket,
    bravesearch,
    bufferoverun,
    builtwith,
    censysearch,
    certspottersearch,
    chaos,
    commoncrawl,
    criminalip,
    crtsh,
    dnssearch,
    duckduckgosearch,
    fofa,
    fullhuntsearch,
    githubcode,
    gitlabsearch,
    hackertarget,
    haveibeenpwned,
    hudsonrocksearch,
    huntersearch,
    intelxsearch,
    leakix,
    leaklookup,
    mojeek,
    netlas,
    onyphe,
    otxsearch,
    pentesttools,
    projectdiscovery,
    rapiddns,
    robtex,
    rocketreach,
    search_dehashed,
    search_dnsdumpster,
    searchhunterhow,
    securityscorecard,
    securitytrailssearch,
    shodansearch,
    subdomaincenter,
    subdomainfinderc99,
    takeover,
    thc,
    threatcrowd,
    tombasearch,
    urlscan,
    venacussearch,
    virustotal,
    waybackarchive,
    whoisxml,
    windvane,
    yahoosearch,
    zoomeyesearch,
)
from theHarvester.discovery.constants import MissingKey
from theHarvester.lib import hostchecker, stash
from theHarvester.lib.core import DATA_DIR, Core, show_default_error_message
from theHarvester.lib.output import print_linkedin_sections, print_section, sorted_unique
from theHarvester.screenshot.screenshot import ScreenShotter

if TYPE_CHECKING:
    from collections.abc import Awaitable


def sanitize_for_xml(text: str) -> str:
    """Sanitize text for safe inclusion in XML documents."""
    text = text.replace('&', '&amp;')
    text = text.replace('<', '&lt;')
    text = text.replace('>', '&gt;')
    text = text.replace('"', '&quot;')
    text = text.replace("'", '&apos;')
    return text


def sanitize_filename(filename: str) -> str:
    filename = os.path.basename(filename)
    filename = re.sub(r'[^a-zA-Z0-9._-]', '_', filename)
    # Remove consecutive underscores
    filename = re.sub(r'_+', '_', filename)
    filename = filename.strip('_.')
    if filename.startswith('.'):
        filename = '_' + filename
    # Ensure we have a valid filename
    if not filename:
        filename = 'sanitized_file'
    return filename


async def start(rest_args: argparse.Namespace | None = None):
    """Main program function"""
    parser = argparse.ArgumentParser(
        description='theHarvester is used to gather open source intelligence (OSINT) on a company or domain.'
    )
    parser.add_argument('-d', '--domain', help='Company name or domain to search.', required=True)
    parser.add_argument(
        '-l',
        '--limit',
        help='Limit the number of search results, default=500.',
        default=500,
        type=int,
    )
    parser.add_argument(
        '-S',
        '--start',
        help='Start with result number X, default=0.',
        default=0,
        type=int,
    )
    parser.add_argument(
        '-p',
        '--proxies',
        help='Use proxies for requests, enter proxies in proxies.yaml.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '-s',
        '--shodan',
        help='Use Shodan to query discovered hosts.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '--screenshot',
        help='Take screenshots of resolved domains specify output directory: --screenshot output_directory',
        default='',
        type=str,
    )

    parser.add_argument('-e', '--dns-server', help='DNS server to use for lookup.')
    parser.add_argument(
        '-t',
        '--take-over',
        help='Check for takeovers.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '-r',
        '--dns-resolve',
        help='Perform DNS resolution on subdomains with a resolver list or passed in resolvers, default False.',
        default='',
        type=str,
        nargs='?',
    )
    parser.add_argument(
        '-n',
        '--dns-lookup',
        help='Enable DNS server lookup, default False.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '-c',
        '--dns-brute',
        help='Perform a DNS brute force on the domain.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '-f',
        '--filename',
        help='Save the results to an XML and JSON file.',
        default='',
        type=str,
    )
    parser.add_argument('-w', '--wordlist', help='Specify a wordlist for API endpoint scanning.', default='')
    parser.add_argument('-a', '--api-scan', help='Scan for API endpoints.', action='store_true')
    parser.add_argument(
        '-q',
        '--quiet',
        help='Suppress missing API key warnings and reading the api-keys file.',
        default=False,
        action='store_true',
    )
    parser.add_argument(
        '-b',
        '--source',
        help="""baidu, bevigil, bitbucket, brave, bufferoverun,
                            builtwith, censys, certspotter, chaos, commoncrawl, criminalip, crtsh, dehashed, dnsdumpster, duckduckgo, fofa, fullhunt, github-code,
                            gitlab, hackertarget, haveibeenpwned, hudsonrock, hunter, hunterhow, intelx, leakix, leaklookup, mojeek, netlas, onyphe, otx, pentesttools,
                            projectdiscovery, rapiddns, robtex, rocketreach, securityscorecard, securityTrails, shodan, subdomaincenter,
                            subdomainfinderc99, thc, threatcrowd, tomba, urlscan, venacus, virustotal, waybackarchive, whoisxml, windvane, yahoo, zoomeye""",
    )

    # determines if the filename is coming from rest api or user
    rest_filename = ''
    # indicates this from the rest API
    if rest_args:
        if rest_args.source and rest_args.source == 'getsources':
            return list(sorted(Core.get_supportedengines()))
        elif rest_args.dns_brute:
            args = rest_args
            dnsbrute = (rest_args.dns_brute, True)
        else:
            args = rest_args
            # We need to make sure the filename is random as to not overwrite other files
            filename: str = args.filename
            alphabet = string.ascii_letters + string.digits
            rest_filename += f'{"".join(secrets.choice(alphabet) for _ in range(32))}_{filename}' if len(filename) != 0 else ''
    else:
        args = parser.parse_args()
        filename = args.filename
        dnsbrute = (args.dns_brute, False)
    Core.quiet = getattr(args, 'quiet', False)
    try:
        db = stash.StashManager()
        await db.do_init()
    except (AttributeError, OSError, RuntimeError, ValueError) as init_error:
        if not args.quiet:
            print(f'Error initializing StashManager: {init_error}')
        raise ValueError('Failed to initialize StashManager')

    if len(filename) > 0:
        if filename.startswith('~/'):
            # Allow home directory expansion but sanitize the rest
            base_path = await anyio.Path('~').expanduser()
            sanitized = sanitize_filename(filename[2:])
            filename = str(base_path.joinpath(sanitized))
        elif os.path.isabs(filename):
            # For absolute paths, sanitize just the filename component
            dirname = os.path.dirname(filename)
            basename = sanitize_filename(os.path.basename(filename))
            filename = os.path.join(dirname, basename)
        else:
            # For relative paths, sanitize the entire filename
            filename = sanitize_filename(filename)

    all_emails: list = []
    all_hosts: list = []
    all_ip: list = []
    all_people: list[dict[str, str]] = []
    dnslookup = args.dns_lookup
    dnsserver = args.dns_server  # TODO arg is not used anywhere replace with resolvers wordlist arg dnsresolve
    dnsresolve: str | None = args.dns_resolve
    final_dns_resolver_list = []
    if dnsresolve is not None and len(dnsresolve) > 0:
        # Three scenarios:
        # 8.8.8.8
        # 1.1.1.1,8.8.8.8 or 1.1.1.1, 8.8.8.8
        # resolvers.txt
        if await anyio.Path(dnsresolve).exists():
            with open(dnsresolve, encoding='UTF-8') as fp:
                for line in fp:
                    line = line.strip()
                    if len(line) == 0:
                        continue
                    try:
                        _ = netaddr.IPAddress(line)
                        final_dns_resolver_list.append(line)
                    except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
                        print(f'An exception has occurred while reading from: {dnsresolve}, {e}')
                        print(f'Current line: {line}')
        else:
            cleaned = dnsresolve.replace(' ', '')
            resolver_candidates = cleaned.split(',') if ',' in cleaned else [cleaned]
            for item in resolver_candidates:
                if len(item) == 0:
                    continue
                try:
                    # Verify user passed in an IP; this does not validate resolver behavior
                    _ = netaddr.IPAddress(item)
                    final_dns_resolver_list.append(item)
                except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
                    print(f'Passed DNS resolver is invalid, skipping: {item} ({e})')

        # if for some reason, there are duplicates
        final_dns_resolver_list = list(set(final_dns_resolver_list))
        if len(final_dns_resolver_list) == 0:
            print('No valid DNS resolvers were parsed from --dns-resolve; continuing without custom resolvers.')

    engines: list = []
    # If the user specifies
    full: list = []
    ips: list = []
    host_ip: list = []
    limit: int = args.limit
    shodan = args.shodan
    start: int = args.start
    all_urls: list = []
    vhost: list = []
    word: str = args.domain.rstrip('\n')
    takeover_status = args.take_over
    use_proxy = args.proxies
    linkedin_people_list_tracker: list = []
    linkedin_links_tracker: list = []
    twitter_people_list_tracker: list = []
    interesting_urls: list = []
    total_asns: list = []

    linkedin_people_list_tracker = []
    linkedin_links_tracker = []
    twitter_people_list_tracker = []

    interesting_urls = []
    total_asns = []

    async def store(
        search_engine: Any,
        source: str,
        process_param: Any = None,
        store_host: bool = False,
        store_emails: bool = False,
        store_ip: bool = False,
        store_people: bool = False,
        store_links: bool = False,
        store_results: bool = False,
        store_interestingurls: bool = False,
        store_asns: bool = False,
    ) -> None:
        """
        Persist details into the database.
        The details to be stored are controlled by the parameters passed to the method.

        :param search_engine: search engine to fetch details from
        :param source: source against which the details (corresponding to the search engine) need to be persisted
        :param process_param: any parameters to be passed to the search engine eg: Google needs google_dorking
        :param store_host: whether to store hosts
        :param store_emails: whether to store emails
        :param store_ip: whether to store IP address
        :param store_people: whether to store user details
        :param store_links: whether to store links
        :param store_results: whether to fetch details from get_results() and persist
        :param store_interestingurls: whether to store interesting urls
        :param store_asns: whether to store asns
        """
        (
            await search_engine.process(use_proxy)
            if process_param is None
            else await search_engine.process(process_param, use_proxy)
        )
        db_stash = stash.StashManager()

        if source:
            print(f'[*] Searching {source[0].upper() + source[1:]}. ')

        if store_host:
            host_names = list({host for host in await search_engine.get_hostnames() if f'.{word}' in host})
            host_names = list(host_names)
            if source != 'hackertarget' and source != 'pentesttools' and source != 'rapiddns':
                # If a source is inside this conditional, it means the hosts returned must be resolved to obtain ip
                # This should only be checked if --dns-resolve has a wordlist
                if dnsresolve is None or len(final_dns_resolver_list) > 0:
                    # indicates that -r was passed in if dnsresolve is None
                    full_hosts_checker = hostchecker.Checker(host_names, final_dns_resolver_list)
                    # If full, this is only getting resolved hosts
                    (
                        resolved_pair,
                        _temp_hosts,
                        temp_ips,
                    ) = await full_hosts_checker.check()
                    all_ip.extend(temp_ips)
                    full.extend(resolved_pair)
                    # full.extend(temp_hosts)
                else:
                    full.extend(host_names)
            else:
                full.extend(host_names)
            all_hosts.extend(host_names)
            await db_stash.store_all(word, all_hosts, 'host', source)

        if store_emails:
            email_list = await search_engine.get_emails()
            all_emails.extend(email_list)
            await db_stash.store_all(word, email_list, 'email', source)

        if store_ip:
            ips_list = await search_engine.get_ips()
            all_ip.extend(ips_list)
            await db_stash.store_all(word, all_ip, 'ip', source)

        if store_results:
            email_list, host_names, urls = await search_engine.get_results()
            all_emails.extend(email_list)
            host_names = list({host for host in host_names if f'.{word}' in host})
            all_urls.extend(urls)
            all_hosts.extend(host_names)
            await db.store_all(word, all_hosts, 'host', source)
            await db.store_all(word, all_emails, 'email', source)

        if store_people:
            people_list = await search_engine.get_people()
            all_people.extend(people_list)
            await db_stash.store_all(word, people_list, 'people', source)

        if store_links:
            links = await search_engine.get_links()
            linkedin_links_tracker.extend(links)
            if len(links) > 0:
                await db.store_all(word, links, 'linkedinlinks', source)

        if store_interestingurls:
            iurls = await search_engine.get_interestingurls()
            interesting_urls.extend(iurls)
            if len(iurls) > 0:
                await db.store_all(word, iurls, 'interestingurls', source)

        if store_asns:
            fasns = await search_engine.get_asns()
            total_asns.extend(fasns)
            if len(fasns) > 0:
                await db.store_all(word, fasns, 'asns', source)

    stor_lst = []
    if args.source is not None:
        if args.source.lower() != 'all':
            engines = sorted(set(map(str.strip, args.source.split(','))))
        else:
            engines = Core.get_supportedengines()
        # Iterate through search engines in order
        if set(engines).issubset(Core.get_supportedengines()):
            print(f'\n[*] Target: {word} \n')

            for engineitem in engines:
                if engineitem == 'baidu':
                    try:
                        baidu_search = baidusearch.SearchBaidu(word, limit)
                        stor_lst.append(
                            store(
                                baidu_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'bevigil':
                    try:
                        bevigil_search = bevigil.SearchBeVigil(word)
                        stor_lst.append(
                            store(
                                bevigil_search,
                                engineitem,
                                store_host=True,
                                store_interestingurls=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, error=e)

                elif engineitem == 'bitbucket':
                    try:
                        bitbucket_search = bitbucket.SearchBitBucket(word, limit)
                        stor_lst.append(
                            store(
                                bitbucket_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as ex:
                        if isinstance(ex, MissingKey):
                            print(MissingKey('Bitbucket'))
                        else:
                            show_default_error_message(engineitem, word, ex)

                elif engineitem == 'brave':
                    try:
                        brave_search = bravesearch.SearchBrave(word, limit)
                        stor_lst.append(
                            store(
                                brave_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, error=e)

                elif engineitem == 'bufferoverun':
                    try:
                        bufferoverun_search = bufferoverun.SearchBufferover(word)
                        stor_lst.append(
                            store(
                                bufferoverun_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'builtwith':
                    try:
                        builtwith_search = builtwith.SearchBuiltWith(word)
                        stor_lst.append(store(builtwith_search, engineitem, store_host=True, store_interestingurls=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            print(f"Failed to perform BuiltWith search for word: '{word}'")
                            print(f'A Missing Key Error occurred in builtwith: {e}')
                        else:
                            show_default_error_message(engineitem, word, e)

                elif engineitem == 'censys':
                    try:
                        censys_search = censysearch.SearchCensys(word, limit)
                        stor_lst.append(
                            store(
                                censys_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except MissingKey as mk:
                        if not args.quiet:
                            print(f'Censys API key is missing or invalid: {mk}')
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network error while querying Censys: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Timeout occurred while contacting Censys: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'Censys returned unexpected data: {ve}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in Censys module: {e}')

                elif engineitem == 'certspotter':
                    try:
                        certspotter_search = certspottersearch.SearchCertspoter(word)
                        stor_lst.append(store(certspotter_search, engineitem, None, store_host=True))
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing Certspotter: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to Certspotter timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'Certspotter returned invalid data: {ve}')
                    except MissingKey as mk:
                        if not args.quiet:
                            print(f'Unexpected response structure from Certspotter (missing key): {mk}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in Certspotter module: {e}')

                elif engineitem == 'chaos':
                    try:
                        chaos_search = chaos.SearchChaos(word)
                        stor_lst.append(
                            store(
                                chaos_search,
                                engineitem,
                                store_host=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Chaos: {e}')
                        else:
                            show_default_error_message(engineitem, word, e)

                elif engineitem == 'commoncrawl':
                    try:
                        commoncrawl_search = commoncrawl.SearchCommoncrawl(word)
                        stor_lst.append(
                            store(
                                commoncrawl_search,
                                engineitem,
                                store_host=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'criminalip':
                    try:
                        criminalip_search = criminalip.SearchCriminalIP(word)
                        stor_lst.append(
                            store(
                                criminalip_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                                store_asns=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing key error occurred in criminalip: {e}')
                        else:
                            show_default_error_message(engineitem, word, e)

                elif engineitem == 'crtsh':
                    try:
                        crtsh_search = crtsh.SearchCrtsh(word)
                        stor_lst.append(store(crtsh_search, 'CRTsh', store_host=True))
                    except Exception as e:
                        print(f'[!] A timeout occurred with crtsh, cannot find {args.domain}\n {e}')

                elif engineitem == 'dehashed':
                    try:
                        dehashed_search = search_dehashed.SearchDehashed(word)
                        stor_lst.append(
                            store(
                                dehashed_search,
                                engineitem,
                                store_host=False,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in dehashed: {e}')
                        else:
                            show_default_error_message(engineitem, word, e)

                elif engineitem == 'dnsdumpster':
                    try:
                        dnsdumpster_search = search_dnsdumpster.SearchDNSDumpster(word)
                        stor_lst.append(
                            store(
                                dnsdumpster_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except MissingKey as e:
                        if not args.quiet:
                            print(e)
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'duckduckgo':
                    duckduckgo_search = duckduckgosearch.SearchDuckDuckGo(word, limit)
                    stor_lst.append(
                        store(
                            duckduckgo_search,
                            engineitem,
                            store_host=True,
                            store_emails=True,
                        )
                    )

                elif engineitem == 'fofa':
                    try:
                        fofa_search = fofa.SearchFofa(word)
                        stor_lst.append(
                            store(
                                fofa_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Fofa: {e}')
                        else:
                            show_default_error_message(engineitem, word, e)

                elif engineitem == 'fullhunt':
                    try:
                        fullhunt_search = fullhuntsearch.SearchFullHunt(word)
                        stor_lst.append(store(fullhunt_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in fullhunt: {e}')

                elif engineitem == 'github-code':
                    try:
                        github_search = githubcode.SearchGithubCode(word, limit)
                        stor_lst.append(
                            store(
                                github_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except MissingKey as ex:
                        if not args.quiet:
                            print(f'A Missing Key error occurred in github-code: {ex}')

                elif engineitem == 'gitlab':
                    try:
                        gitlab_search = gitlabsearch.SearchGitlab(word)
                        stor_lst.append(
                            store(
                                gitlab_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'hackertarget':
                    try:
                        hackertarget_search = hackertarget.SearchHackerTarget(word)
                        stor_lst.append(store(hackertarget_search, engineitem, store_host=True))
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'haveibeenpwned':
                    try:
                        haveibeenpwned_search = haveibeenpwned.SearchHaveIBeenPwned(word)
                        stor_lst.append(
                            store(
                                haveibeenpwned_search,
                                engineitem,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(MissingKey('HaveIBeenPwned'))
                        else:
                            print(f'An exception has occurred in HaveIBeenPwned search: {e}')

                elif engineitem == 'hudsonrock':
                    try:
                        hudsonrock_search = hudsonrocksearch.SearchHudsonRock(word)
                        stor_lst.append(
                            store(
                                hudsonrock_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        print(f'An exception has occurred in Hudson Rock search: {e}')

                elif engineitem == 'hunter':
                    try:
                        hunter_search = huntersearch.SearchHunter(word, limit, start)
                        stor_lst.append(
                            store(
                                hunter_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Hunter: {e}')

                elif engineitem == 'hunterhow':
                    try:
                        hunterhow_search = searchhunterhow.SearchHunterHow(word)
                        stor_lst.append(store(hunterhow_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Hunter How: {e}')
                        else:
                            print(f'An exception has occurred in hunterhow search: {e}')

                elif engineitem == 'intelx':
                    try:
                        intelx_search = intelxsearch.SearchIntelx(word)
                        stor_lst.append(
                            store(
                                intelx_search,
                                engineitem,
                                store_interestingurls=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in intelx: {e}')
                        else:
                            print(f'An exception has occurred in Intelx search: {e}')

                elif engineitem == 'leakix':
                    try:
                        leakix_search = leakix.SearchLeakix(word)
                        stor_lst.append(
                            store(
                                leakix_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'leaklookup':
                    try:
                        leaklookup_search = leaklookup.SearchLeakLookup(word)
                        stor_lst.append(
                            store(
                                leaklookup_search,
                                engineitem,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            print(f'A Missing Key error occurred in LeakLookup: {e}')
                        else:
                            print(f'An exception has occurred in LeakLookup search: {e}')

                elif engineitem == 'mojeek':
                    try:
                        mojeek_search = mojeek.SearchMojeek(word, limit)
                        stor_lst.append(
                            store(
                                mojeek_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            print(f'A Missing Key error occurred in Mojeek: {e}')
                        else:
                            print(f'An exception has occurred in Mojeek search: {e}')

                elif engineitem == 'netlas':
                    try:
                        netlas_search = netlas.SearchNetlas(word, limit)
                        stor_lst.append(
                            store(
                                netlas_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Netlas: {e}')

                elif engineitem == 'onyphe':
                    try:
                        onyphe_search = onyphe.SearchOnyphe(word)
                        stor_lst.append(
                            store(
                                onyphe_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                                store_asns=True,
                            )
                        )
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing Onyphe: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to Onyphe timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'Onyphe returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from Onyphe (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in Onyphe module: {e}')

                elif engineitem == 'otx':
                    try:
                        otxsearch_search = otxsearch.SearchOtx(word)
                        stor_lst.append(
                            store(
                                otxsearch_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing OTX: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to OTX timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'OTX returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from OTX (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in OTX module: {e}')

                elif engineitem == 'pentesttools':
                    try:
                        pentesttools_search = pentesttools.SearchPentestTools(word)
                        stor_lst.append(store(pentesttools_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in PentestTools search: {e}')
                        else:
                            print(f'An exception has occurred in PentestTools search: {e}')

                elif engineitem == 'projectdiscovery':
                    try:
                        projectdiscovery_search = projectdiscovery.SearchDiscovery(word)
                        stor_lst.append(store(projectdiscovery_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in ProjectDiscovery: {e}')
                        else:
                            print('An exception has occurred in ProjectDiscovery')

                elif engineitem == 'rapiddns':
                    try:
                        rapiddns_search = rapiddns.SearchRapidDns(word)
                        stor_lst.append(store(rapiddns_search, engineitem, store_host=True))
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing RapidDNS: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to RapidDNS timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'RapidDNS returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from RapidDNS (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in RapidDNS module: {e}')

                elif engineitem == 'robtex':
                    try:
                        robtex_search = robtex.SearchRobtex(word)
                        stor_lst.append(
                            store(
                                robtex_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'rocketreach':
                    try:
                        rocketreach_search = rocketreach.SearchRocketReach(word, limit)
                        stor_lst.append(store(rocketreach_search, engineitem, store_links=True, store_emails=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in RocketReach: {e}')
                        else:
                            print(f'An exception has occurred in RocketReach: {e}')

                elif engineitem == 'securityscorecard':
                    try:
                        securityscorecard_search = securityscorecard.SearchSecurityScorecard(word)
                        stor_lst.append(
                            store(
                                securityscorecard_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                                store_interestingurls=True,
                                store_asns=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            print(MissingKey('SecurityScorecard'))
                        else:
                            print(f'An exception has occurred in SecurityScorecard search: {e}')

                elif engineitem == 'securityTrails':
                    try:
                        securitytrails_search = securitytrailssearch.SearchSecuritytrail(word)
                        stor_lst.append(
                            store(
                                securitytrails_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred Security Trails: {e}')

                elif engineitem == 'shodan':
                    try:
                        shodan_search = shodansearch.SearchShodan()

                        # For normal module usage, we need to create a wrapper that works with the store function
                        class ShodanWrapper:
                            def __init__(self, domain):
                                self.word = domain
                                self.hosts = set()
                                self.shodan = shodan_search

                            async def process(self, use_proxy: bool = False):
                                import socket

                                try:
                                    # Resolve domain to IP and search in Shodan
                                    ip = socket.gethostbyname(self.word)
                                    print(f'\tSearching Shodan for {ip}')
                                    result = await self.shodan.search_ip(ip)
                                    if ip in result and isinstance(result[ip], dict):
                                        # Add the IP as a host for consistency with other modules
                                        self.hosts.add(ip)

                                        for host in result[ip].get('hostnames', []):
                                            self.hosts.add(host)

                                        print(f'Found Shodan data for {ip}')
                                    elif ip in result and isinstance(result[ip], str):
                                        print(f'{ip}: {result[ip]}')
                                except Exception as e:
                                    print(f'Error in Shodan search: {e}')

                            async def get_hostnames(self):
                                return list(self.hosts)

                        shodan_wrapper = ShodanWrapper(word)
                        stor_lst.append(store(shodan_wrapper, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Shodan search: {e}')
                        else:
                            print(f'An exception has occurred in Shodan search: {e}')

                elif engineitem == 'subdomaincenter':
                    try:
                        subdomaincenter_search = subdomaincenter.SubdomainCenter(word)
                        stor_lst.append(store(subdomaincenter_search, engineitem, store_host=True))
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing SubdomainCenter: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to SubdomainCenter timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'SubdomainCenter returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from SubdomainCenter (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in SubdomainCenter module: {e}')

                elif engineitem == 'subdomainfinderc99':
                    try:
                        subdomainfinderc99_search = subdomainfinderc99.SearchSubdomainfinderc99(word)
                        stor_lst.append(store(subdomainfinderc99_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Subdomainfinderc99 search: {e}')
                        else:
                            print(f'An exception has occurred in Subdomainfinderc99 search: {e}')

                elif engineitem == 'thc':
                    try:
                        thc_search = thc.SearchThc(word)
                        stor_lst.append(store(thc_search, engineitem, store_host=True))
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'threatcrowd':
                    try:
                        threatcrowd_search = threatcrowd.SearchThreatcrowd(word)
                        stor_lst.append(
                            store(
                                threatcrowd_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'tomba':
                    try:
                        tomba_search = tombasearch.SearchTomba(word, limit, start)
                        stor_lst.append(
                            store(
                                tomba_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in Tomba: {e}')

                elif engineitem == 'urlscan':
                    try:
                        urlscan_search = urlscan.SearchUrlscan(word)
                        stor_lst.append(
                            store(
                                urlscan_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                                store_interestingurls=True,
                                store_asns=True,
                            )
                        )
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing Urlscan: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to Urlscan timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'Urlscan returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from Urlscan (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in Urlscan module: {e}')

                elif engineitem == 'venacus':
                    try:
                        venacus_search = venacussearch.SearchVenacus(word=word, limit=limit, offset_doc=start)
                        stor_lst.append(
                            store(
                                venacus_search,
                                engineitem,
                                store_emails=True,
                                store_ip=True,
                                store_people=True,
                                store_interestingurls=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in venacus search: {e}')
                        else:
                            print(f'An exception has occurred in venacus search: {e}')

                elif engineitem == 'virustotal':
                    try:
                        virustotal_search = virustotal.SearchVirustotal(word)
                        stor_lst.append(store(virustotal_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in virustotal search: {e}')

                elif engineitem == 'waybackarchive':
                    try:
                        waybackarchive_search = waybackarchive.SearchWaybackarchive(word)
                        stor_lst.append(
                            store(
                                waybackarchive_search,
                                engineitem,
                                store_host=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'whoisxml':
                    try:
                        whoisxml_search = whoisxml.SearchWhoisXML(word)
                        stor_lst.append(store(whoisxml_search, engineitem, store_host=True))
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in whoisxml search: {e}')
                        else:
                            print(f'An exception has occurred in WhoisXML search: {e}')

                elif engineitem == 'windvane':
                    try:
                        windvane_search = windvane.SearchWindvane(word)
                        stor_lst.append(
                            store(
                                windvane_search,
                                engineitem,
                                store_host=True,
                                store_ip=True,
                                store_emails=True,
                            )
                        )
                    except Exception as e:
                        show_default_error_message(engineitem, word, e)

                elif engineitem == 'yahoo':
                    try:
                        yahoo_search = yahoosearch.SearchYahoo(word, limit)
                        stor_lst.append(
                            store(
                                yahoo_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                            )
                        )
                    except ConnectionError as ce:
                        if not args.quiet:
                            print(f'Network connection error while accessing Yahoo: {ce}')
                    except TimeoutError as te:
                        if not args.quiet:
                            print(f'Request to Yahoo timed out: {te}')
                    except ValueError as ve:
                        if not args.quiet:
                            print(f'Yahoo returned invalid or unexpected data: {ve}')
                    except KeyError as ke:
                        if not args.quiet:
                            print(f'Unexpected response structure from Yahoo (missing key): {ke}')
                    except Exception as e:
                        if not args.quiet:
                            print(f'Unexpected error occurred in Yahoo module: {e}')

                elif engineitem == 'zoomeye':
                    try:
                        zoomeye_search = zoomeyesearch.SearchZoomEye(word, limit)
                        stor_lst.append(
                            store(
                                zoomeye_search,
                                engineitem,
                                store_host=True,
                                store_emails=True,
                                store_ip=True,
                                store_interestingurls=True,
                                store_asns=True,
                            )
                        )
                    except Exception as e:
                        if isinstance(e, MissingKey):
                            if not args.quiet:
                                print(f'A Missing Key error occurred in zoomeye: {e}')

        elif rest_args is not None:
            try:
                rest_args.dns_brute
            except AttributeError:
                print('\n[!] Invalid source.\n')
                sys.exit(1)
        else:
            # Print which engines aren't supported
            unsupported_engines = set(engines) - set(Core.get_supportedengines())
            if unsupported_engines:
                print(f'The following engines are not supported: {unsupported_engines}')
            print('\n[!] Invalid source.\n')
            sys.exit(1)

    async def worker(queue):
        while True:
            # Get a "work item" out of the queue.
            stor = await queue.get()
            try:
                await stor
                queue.task_done()
                # Notify the queue that the "work item" has been processed.
            except Exception:
                print('\n A error occurred while processing a "work item".\n')
                queue.task_done()

    async def handler(lst):
        queue: asyncio.Queue[Awaitable[Any]] = asyncio.Queue()
        for stor_method in lst:
            # enqueue the coroutines
            queue.put_nowait(stor_method)
        # Create three worker tasks to process the queue concurrently.
        tasks = []
        for i in range(3):
            task = asyncio.create_task(worker(queue))
            tasks.append(task)

        # Wait until the queue is fully processed.
        await queue.join()

        # Cancel our worker tasks.
        for task in tasks:
            task.cancel()
        # Wait until all worker tasks are cancelled.
        await asyncio.gather(*tasks, return_exceptions=True)

    await handler(lst=stor_lst)
    return_ips: list = []
    if rest_args is not None and len(rest_filename) == 0 and rest_args.dns_brute is False:
        # Indicates user is using REST api but not wanting output to be saved to a file
        # cast to string so Rest API can understand the type
        return_ips.extend([str(ip) for ip in sorted([netaddr.IPAddress(ip.strip()) for ip in set(all_ip)])])
        # return list(set(all_emails)), return_ips, full, '', ''
        all_hosts = [host.replace('www.', '') for host in all_hosts if host.replace('www.', '') in all_hosts]
        all_hosts = list(sorted(set(all_hosts)))
        return (
            total_asns,
            interesting_urls,
            twitter_people_list_tracker,
            linkedin_people_list_tracker,
            linkedin_links_tracker,
            all_urls,
            all_ip,
            all_emails,
            all_hosts,
        )
    # Check to see if all_emails and all_hosts are defined.
    try:
        all_emails
    except NameError:
        print('\n\n[!] No emails found because all_emails is not defined.\n\n ')
        sys.exit(1)
    try:
        all_hosts
    except NameError:
        print('\n\n[!] No hosts found because all_hosts is not defined.\n\n ')
        sys.exit(1)

    # Results
    if len(total_asns) > 0:
        print_section(f'\n[*] ASNS found: {len(total_asns)}', total_asns, '--------------------')
        total_asns = sorted_unique(total_asns)

    if len(interesting_urls) > 0:
        print_section(f'\n[*] Interesting Urls found: {len(interesting_urls)}', interesting_urls, '--------------------')
        interesting_urls = sorted_unique(interesting_urls)

    if len(twitter_people_list_tracker) == 0 and 'twitter' in engines:
        print('\n[*] No Twitter users found.\n\n')
    elif len(twitter_people_list_tracker) >= 1:
        print_section(
            '\n[*] Twitter Users found: ' + str(len(twitter_people_list_tracker)),
            twitter_people_list_tracker,
            '---------------------',
        )
        twitter_people_list_tracker = sorted_unique(twitter_people_list_tracker)

    print_linkedin_sections(engines, linkedin_people_list_tracker, linkedin_links_tracker)
    linkedin_people_list_tracker = sorted_unique(linkedin_people_list_tracker)
    linkedin_links_tracker = sorted_unique(linkedin_links_tracker)

    length_urls = len(all_urls)
    if length_urls == 0:
        if len(engines) >= 1 and 'trello' in engines:
            print('\n[*] No Trello URLs found.')
    else:
        total = length_urls
        print_section('\n[*] Trello URLs found: ' + str(total), all_urls, '--------------------')
        all_urls = sorted_unique(all_urls)

    if len(all_ip) == 0:
        print('\n[*] No IPs found.')
    else:
        print('\n[*] IPs found: ' + str(len(all_ip)))
        print('-------------------')
        # use netaddr as the list may contain ipv4 and ipv6 addresses
        ip_list = []
        for ip in set(all_ip):
            try:
                ip = ip.strip()
                if len(ip) > 0:
                    if '/' in ip:
                        ip_list.append(str(netaddr.IPNetwork(ip)))
                    else:
                        ip_list.append(str(netaddr.IPAddress(ip)))
            except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
                print(f'An exception has occurred while adding: {ip} to ip_list: {e}')
                continue
        ip_list = list(sorted(ip_list))
        print('\n'.join(map(str, ip_list)))
        # Populate host_ip from ip_list for DNS lookup, virtual hosts search, and Shodan search
        host_ip = ip_list

    if len(all_emails) == 0:
        print('\n[*] No emails found.')
    else:
        print('\n[*] Emails found: ' + str(len(all_emails)))
        print('----------------------')
        all_emails = sorted(list(set(all_emails)))
        print('\n'.join(all_emails))

    if len(all_people) == 0:
        print('\n[*] No people found.')
    else:
        print('\n[*] People found: ' + str(len(all_people)))
        print('----------------------')
        for person in all_people:
            print(person)

    if len(all_hosts) == 0:
        print('\n[*] No hosts found.\n\n')
    else:
        db = stash.StashManager()
        if dnsresolve is None or len(final_dns_resolver_list) > 0:
            temp = set()
            for host in full:
                if ':' in host:
                    # TODO parse addresses and sort them as they are IPs
                    subdomain, addr = host.split(':', 1)
                    if subdomain.endswith(word):
                        temp.add(subdomain + ':' + addr)
                        continue
                if host.endswith(word):
                    if host[:4] == 'www.':
                        if host[4:] in all_hosts or host[4:] in full:
                            temp.add(host[4:])
                            continue
                    temp.add(host)
            full = list(sorted(temp))
            full.sort(key=lambda el: el.split(':')[0])
            print('\n[*] Hosts found: ' + str(len(full)))
            print('---------------------')
            for host in full:
                print(host)
                try:
                    if ':' in host:
                        _, addr = host.split(':', 1)
                        await db.store(word, addr, 'ip', 'DNS-resolver')
                except (OSError, RuntimeError, ValueError, TypeError) as e:
                    print(f'An exception has occurred while attempting to insert: {host} IP into DB: {e}')
                    continue
        else:
            all_hosts = [host.replace('www.', '') for host in all_hosts if host.replace('www.', '') in all_hosts]
            all_hosts = list(sorted(set(all_hosts)))
            print('\n[*] Hosts found: ' + str(len(all_hosts)))
            print('---------------------')
            for host in all_hosts:
                print(host)

    # DNS brute force
    if dnsbrute and dnsbrute[0] is True:
        print('\n[*] Starting DNS brute force.')
        dns_force = dnssearch.DnsForce(word, final_dns_resolver_list, verbose=True)
        resolved_pair, hosts, ips = await dns_force.run()
        # Check if Rest API is being used if so return found hosts
        if dnsbrute[1]:
            return resolved_pair
        db = stash.StashManager()
        temp = set()
        for host in resolved_pair:
            if ':' in host:
                # TODO parse addresses and sort them as they are IPs
                subdomain, addr = host.split(':', 1)
                if subdomain.endswith(word):
                    # Append to full, so it's within JSON/XML at the end if output file is requested
                    if host not in full:
                        full.append(host)
                        temp.add(subdomain + ':' + addr)
                    if host not in all_hosts:
                        all_hosts.append(host)
                    continue
            if host.endswith(word):
                if host[:4] == 'www.':
                    if host[4:] in all_hosts or host[4:] in full:
                        continue
                if host not in full:
                    full.append(host)
                    temp.add(host)
                if host not in all_hosts:
                    all_hosts.append(host)
        print('\n[*] Hosts found after DNS brute force:')
        for sub in temp:
            print(sub)
        await db.store_all(word, list(sorted(temp)), 'host', 'dns_bruteforce')

    takeover_results = dict()
    # TakeOver Checking
    if takeover_status:
        print('\n[*] Performing subdomain takeover check')
        print('\n[*] Subdomain Takeover checking IS ACTIVE RECON')
        search_take = takeover.TakeOver(all_hosts)
        await search_take.populate_fingerprints()
        await search_take.process(proxy=use_proxy)
        takeover_results = await search_take.get_takeover_results()
    # DNS reverse lookup
    dnsrev: list = []
    # print(f'DNSlookup: {dnslookup}')
    if dnslookup is True:
        print('\n[*] Starting active queries for DNSLookup.')

        # reverse each iprange in a separate task
        __reverse_dns_tasks: dict = {}
        for entry in host_ip:
            __ip_range = dnssearch.serialize_ip_range(ip=entry, netmask='24')
            if __ip_range and __ip_range not in set(__reverse_dns_tasks.keys()):
                print('\n[*] Performing reverse lookup on ' + __ip_range)
                __reverse_dns_tasks[__ip_range] = asyncio.create_task(
                    dnssearch.reverse_all_ips_in_range(
                        iprange=__ip_range,
                        callback=dnssearch.generate_postprocessing_callback(
                            target=word, local_results=dnsrev, overall_results=full
                        ),
                        nameservers=(final_dns_resolver_list if len(final_dns_resolver_list) > 0 else None),
                    )
                )
                # nameservers=list(map(str, dnsserver.split(','))) if dnsserver else None))

        # run all the reversing tasks concurrently
        await asyncio.gather(*__reverse_dns_tasks.values())
        print('\n[*] Hosts found after reverse lookup (in target domain):')
        print('--------------------------------------------------------')
        for xh in dnsrev:
            print(xh)

    # Screenshots
    screenshot_tups = []
    if len(args.screenshot) > 0:
        screen_shotter = ScreenShotter(args.screenshot)
        path_exists = screen_shotter.verify_path()
        # Verify the path exists, if not create it or if user does not create it skips screenshot
        if path_exists:
            await screen_shotter.verify_installation()
            print(f'\nScreenshots can be found in: {screen_shotter.output}{screen_shotter.slash}')
            start_time = time.perf_counter()
            print('Filtering domains for ones we can reach')
            if dnsresolve is None or len(final_dns_resolver_list) > 0:
                unique_resolved_domains = {url.split(':')[0] for url in full if ':' in url and 'www.' not in url}
            else:
                # Technically not resolved in this case, which is not ideal
                # You should always use dns resolve when doing screenshotting
                print('NOTE for future use cases you should only use screenshotting in tandem with DNS resolving')
                unique_resolved_domains = set(all_hosts)
            if len(unique_resolved_domains) > 0:
                # First filter out ones that didn't resolve
                print('Attempting to visit unique resolved domains, this is ACTIVE RECON')
                async with Pool(10) as pool:
                    results = await pool.map(screen_shotter.visit, list(unique_resolved_domains))
                    # Filter out domains that we couldn't connect to
                    unique_resolved_domains_list = list(sorted({tup[0] for tup in results if len(tup[1]) > 0}))
                async with Pool(3) as pool:
                    print(f'Length of unique resolved domains: {len(unique_resolved_domains_list)} chunking now!\n')
                    # If you have the resources, you could make the function faster by increasing the chunk number
                    chunk_number = 14
                    for chunk in screen_shotter.chunk_list(unique_resolved_domains_list, chunk_number):
                        try:
                            screenshot_tups.extend(await pool.map(screen_shotter.take_screenshot, chunk))
                        except Exception as ee:
                            print(f'An exception has occurred while mapping: {ee}')
            end = time.perf_counter()
            # There is probably an easier way to do this
            total = int(end - start_time)
            mon, sec = divmod(total, 60)
            hr, mon = divmod(mon, 60)
            total_time = f'{mon:02d}:{sec:02d}'
            print(f'Finished taking screenshots in {total_time} seconds')
            print('[+] Note there may be leftover chrome processes you may have to kill manually\n')

    # Shodan
    shodanres = []
    if shodan is True:
        print('[*] Searching Shodan. ')
        try:
            for ip in host_ip:
                try:
                    print('\tSearching for ' + ip)
                    shodan_search = shodansearch.SearchShodan()
                    shodandict = await shodan_search.search_ip(ip)
                    await asyncio.sleep(5)

                    # Check if the result is a string (error message)
                    if isinstance(shodandict[ip], str):
                        print(f'{ip}: {shodandict[ip]}')
                        continue

                    # Process the results if it's a dictionary
                    if isinstance(shodandict[ip], dict):
                        rowdata = []
                        for key, value in shodandict[ip].items():
                            if isinstance(value, int):
                                value = str(value)
                            if isinstance(value, list):
                                value = ', '.join(map(str, value))
                            rowdata.append(value)
                        shodanres.append(rowdata)
                        print(ujson.dumps(shodandict[ip], indent=4, sort_keys=True))
                        print('\n')
                except Exception as ip_error:
                    print(f'[SHODAN-error] Error searching {ip}: {ip_error}')
                    continue
        except Exception as e:
            print(f'[!] An error occurred with Shodan: {e} ')
    else:
        pass

    if filename != '':
        print('\n[*] Reporting started.')
        try:
            if len(rest_filename) == 0:
                filename = filename.rsplit('.', 1)[0] + '.xml'
            else:
                filename = 'theHarvester/app/static/' + rest_filename.rsplit('.', 1)[0] + '.xml'
            # TODO use aiofiles if user is using rest api
            # XML REPORT SECTION
            with open(filename, 'w+') as file:
                file.write('<?xml version="1.0" encoding="UTF-8"?><theHarvester>')
                sanitized_args = [sanitize_for_xml(f'"{arg}"' if ' ' in arg else arg) for arg in sys.argv[1:]]
                file.write('<cmd>' + ' '.join(sanitized_args) + '</cmd>')
                for x in all_emails:
                    file.write('<email>' + sanitize_for_xml(x) + '</email>')
                for x in full:
                    host, ip = x.split(':', 1) if ':' in x else (x, '')
                    if ip and len(ip) > 3:
                        file.write(f'<host><ip>{sanitize_for_xml(ip)}</ip><hostname>{sanitize_for_xml(host)}</hostname></host>')
                    else:
                        file.write(f'<host>{sanitize_for_xml(host)}</host>')
                for x in vhost:
                    host, ip = x.split(':', 1) if ':' in x else (x, '')
                    if ip and len(ip) > 3:
                        file.write(
                            f'<vhost><ip>{sanitize_for_xml(ip)} </ip><hostname>{sanitize_for_xml(host)}</hostname></vhost>'
                        )
                    else:
                        file.write(f'<vhost>{sanitize_for_xml(host)}</vhost>')
                # TODO add Shodan output into XML report
                file.write('</theHarvester>')
                print('[*] XML File saved.')
        except (OSError, ValueError, TypeError, UnicodeEncodeError) as error:
            print(f'[!] An error occurred while saving the XML file: {error}')

        try:
            # JSON REPORT SECTION
            filename = filename.rsplit('.', 1)[0] + '.json'
            # create dict with values for JSON output
            json_dict: dict = dict()
            # start by adding the command line arguments
            json_dict['cmd'] = ' '.join([f'"{arg}"' if ' ' in arg else arg for arg in sys.argv[1:]])
            # to determine if a variable exists
            # it should but just a validation check
            if 'ip_list' in locals():
                if all_ip and len(all_ip) >= 1 and ip_list and len(ip_list) > 0:
                    json_dict['ips'] = ip_list

            if len(all_emails) > 0:
                json_dict['emails'] = all_emails

            if dnsresolve is None or (len(final_dns_resolver_list) > 0 and len(full) > 0):
                json_dict['hosts'] = full
            elif len(all_hosts) > 0:
                json_dict['hosts'] = all_hosts
            else:
                json_dict['hosts'] = []

            if vhost and len(vhost) > 0:
                json_dict['vhosts'] = vhost

            if len(interesting_urls) > 0:
                json_dict['interesting_urls'] = interesting_urls

            if len(all_urls) > 0:
                json_dict['trello_urls'] = all_urls

            if len(total_asns) > 0:
                json_dict['asns'] = total_asns

            if len(twitter_people_list_tracker) > 0:
                json_dict['twitter_people'] = twitter_people_list_tracker

            if len(linkedin_people_list_tracker) > 0:
                json_dict['linkedin_people'] = linkedin_people_list_tracker

            if len(linkedin_links_tracker) > 0:
                json_dict['linkedin_links'] = linkedin_links_tracker

            if len(all_people) > 0:
                json_dict['people'] = all_people

            if takeover_status and len(takeover_results) > 0:
                json_dict['takeover_results'] = takeover_results

            json_dict['shodan'] = shodanres
            with open(filename, 'w+') as fp:
                dumped_json = ujson.dumps(json_dict, sort_keys=True)
                fp.write(dumped_json)
            print('[*] JSON File saved.')
        except (OSError, ValueError, TypeError, UnicodeEncodeError) as er:
            print(f'[!] An error occurred while saving the JSON file: {er} ')
        print('\n\n')

    # Enhanced code block for API Endpoint scanning feature
    if args.api_scan or 'api_endpoints' in engines:
        try:
            # Define a default wordlist if none is specified
            wordlist = args.wordlist if args.wordlist else str(DATA_DIR / 'wordlists' / 'api_endpoints.txt')

            if not await anyio.Path(wordlist).exists():
                print(f'\n[!] Wordlist not found: {wordlist}')
                print('Creating a basic API wordlist for scanning...')
                # Create a default simple API endpoint list
                basic_endpoints = [
                    '/api',
                    '/api/v1',
                    '/api/v2',
                    '/api/v3',
                    '/graphql',
                    '/swagger',
                    '/docs',
                    '/redoc',
                    '/swagger-ui',
                    '/openapi.json',
                    '/api-docs',
                    '/rest',
                    '/ws',
                    '/swagger-ui.html',
                    '/health',
                    '/status',
                    '/metrics',
                    '/actuator',
                    '/debug',
                ]
                temp_wordlist = str(DAT

Download .txt

gitextract_7wyx50xx/

├── .dockerignore
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   └── issue-template.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── codeql-analysis.yml
│       ├── docker-build-push.yml
│       ├── dockerci.yml
│       └── theHarvester.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── README/
│   ├── CONTRIBUTING.md
│   ├── COPYING
│   └── LICENSES
├── README.md
├── bin/
│   ├── restfulHarvest
│   └── theHarvester
├── docker-compose.yml
├── pyproject.toml
├── tests/
│   ├── __init__.py
│   ├── discovery/
│   │   ├── __init__.py
│   │   ├── test_baidusearch.py
│   │   ├── test_censys.py
│   │   ├── test_certspotter.py
│   │   ├── test_criminalip.py
│   │   ├── test_githubcode.py
│   │   ├── test_githubcode_additions.py
│   │   ├── test_otx.py
│   │   ├── test_rocketreach.py
│   │   ├── test_shodan_engine.py
│   │   └── test_thc.py
│   ├── lib/
│   │   ├── test_core.py
│   │   └── test_output.py
│   ├── test_hackertarget_apikey.py
│   ├── test_mojeek.py
│   ├── test_myparser.py
│   └── test_security.py
└── theHarvester/
    ├── __init__.py
    ├── __main__.py
    ├── data/
    │   ├── proxies.yaml
    │   └── wordlists/
    │       ├── api_endpoints.txt
    │       ├── dns-big.txt
    │       ├── dns-names.txt
    │       ├── dorks.txt
    │       ├── general/
    │       │   └── common.txt
    │       └── names_small.txt
    ├── discovery/
    │   ├── __init__.py
    │   ├── additional_apis.py
    │   ├── api_endpoints.py
    │   ├── baidusearch.py
    │   ├── bevigil.py
    │   ├── bitbucket.py
    │   ├── bravesearch.py
    │   ├── bufferoverun.py
    │   ├── builtwith.py
    │   ├── censysearch.py
    │   ├── certspottersearch.py
    │   ├── chaos.py
    │   ├── commoncrawl.py
    │   ├── constants.py
    │   ├── criminalip.py
    │   ├── crtsh.py
    │   ├── dnssearch.py
    │   ├── duckduckgosearch.py
    │   ├── fofa.py
    │   ├── fullhuntsearch.py
    │   ├── githubcode.py
    │   ├── gitlabsearch.py
    │   ├── hackertarget.py
    │   ├── haveibeenpwned.py
    │   ├── hudsonrocksearch.py
    │   ├── huntersearch.py
    │   ├── intelxsearch.py
    │   ├── leakix.py
    │   ├── leaklookup.py
    │   ├── mojeek.py
    │   ├── netlas.py
    │   ├── onyphe.py
    │   ├── otxsearch.py
    │   ├── pentesttools.py
    │   ├── projectdiscovery.py
    │   ├── rapiddns.py
    │   ├── robtex.py
    │   ├── rocketreach.py
    │   ├── search_dehashed.py
    │   ├── search_dnsdumpster.py
    │   ├── searchhunterhow.py
    │   ├── securityscorecard.py
    │   ├── securitytrailssearch.py
    │   ├── shodansearch.py
    │   ├── subdomaincenter.py
    │   ├── subdomainfinderc99.py
    │   ├── takeover.py
    │   ├── thc.py
    │   ├── threatcrowd.py
    │   ├── tombasearch.py
    │   ├── urlscan.py
    │   ├── venacussearch.py
    │   ├── virustotal.py
    │   ├── waybackarchive.py
    │   ├── whoisxml.py
    │   ├── windvane.py
    │   ├── yahoosearch.py
    │   └── zoomeyesearch.py
    ├── lib/
    │   ├── __init__.py
    │   ├── api/
    │   │   ├── __init__.py
    │   │   ├── additional_endpoints.py
    │   │   ├── api.py
    │   │   ├── api_example.py
    │   │   ├── auth.py
    │   │   └── static/
    │   │       └── .gitkeep
    │   ├── core.py
    │   ├── hostchecker.py
    │   ├── output.py
    │   ├── resolvers.txt
    │   └── stash.py
    ├── parsers/
    │   ├── __init__.py
    │   ├── intelxparser.py
    │   ├── myparser.py
    │   ├── securitytrailsparser.py
    │   └── venacusparser.py
    ├── restfulHarvest.py
    ├── screenshot/
    │   ├── __init__.py
    │   └── screenshot.py
    └── theHarvester.py

Download .txt

SYMBOL INDEX (777 symbols across 89 files)

FILE: tests/discovery/test_baidusearch.py
  class TestBaiduSearch (line 6) | class TestBaiduSearch:
    method test_process_and_parsing (line 8) | async def test_process_and_parsing(self, monkeypatch):
    method test_pagination_limit_exclusive (line 50) | async def test_pagination_limit_exclusive(self, monkeypatch):

FILE: tests/discovery/test_censys.py
  class _ProxyConnector (line 9) | class _ProxyConnector:
    method from_url (line 11) | def from_url(*_args, **_kwargs):
  class _FakeQuery (line 21) | class _FakeQuery:
    method __init__ (line 22) | def __init__(self, pages):
    method __iter__ (line 25) | def __iter__(self):
  function test_missing_key_raises (line 30) | async def test_missing_key_raises(monkeypatch) -> None:
  function test_search_uses_documented_pagination_and_fields (line 38) | async def test_search_uses_documented_pagination_and_fields(monkeypatch)...
  function test_search_respects_limit_across_page_data (line 77) | async def test_search_respects_limit_across_page_data(monkeypatch) -> None:

FILE: tests/discovery/test_certspotter.py
  class TestCertspotter (line 17) | class TestCertspotter(object):
    method domain (line 19) | def domain() -> str:
  class TestCertspotterSearch (line 24) | class TestCertspotterSearch(object):
    method test_api (line 26) | async def test_api(self) -> None:
    method test_search (line 33) | async def test_search(self) -> None:

FILE: tests/discovery/test_criminalip.py
  function test_parser_handles_missing_legacy_fields (line 9) | async def test_parser_handles_missing_legacy_fields(monkeypatch) -> None:
  function test_do_search_uses_v2_report_endpoint (line 53) | async def test_do_search_uses_v2_report_endpoint(monkeypatch) -> None:

FILE: tests/discovery/test_githubcode.py
  class TestSearchGithubCode (line 9) | class TestSearchGithubCode:
    class OkResponse (line 10) | class OkResponse:
      method __init__ (line 14) | def __init__(self):
    class FailureResponse (line 29) | class FailureResponse:
      method __init__ (line 30) | def __init__(self):
    class RetryResponse (line 34) | class RetryResponse:
      method __init__ (line 35) | def __init__(self):
    class MalformedResponse (line 39) | class MalformedResponse:
      method __init__ (line 40) | def __init__(self):
    method test_missing_key (line 57) | async def test_missing_key(self):
    method test_fragments_from_response (line 63) | async def test_fragments_from_response(self):
    method test_invalid_fragments_from_response (line 73) | async def test_invalid_fragments_from_response(self):
    method test_next_page (line 82) | async def test_next_page(self):
    method test_last_page (line 89) | async def test_last_page(self):
    method test_infinite_loop_fix_page_zero (line 96) | async def test_infinite_loop_fix_page_zero(self):
    method test_infinite_loop_fix_page_nonzero (line 111) | async def test_infinite_loop_fix_page_nonzero(self):
    method test_infinite_loop_fix_old_vs_new_condition (line 126) | async def test_infinite_loop_fix_old_vs_new_condition(self):

FILE: tests/discovery/test_githubcode_additions.py
  class TestSearchGithubCodeProcess (line 8) | class TestSearchGithubCodeProcess:
    method test_process_stops_after_max_retries (line 10) | async def test_process_stops_after_max_retries(self, monkeypatch):
    method test_process_stops_on_error_result (line 36) | async def test_process_stops_on_error_result(self, monkeypatch):
    method test_process_breaks_on_same_page_pagination (line 59) | async def test_process_breaks_on_same_page_pagination(self, monkeypatch):

FILE: tests/discovery/test_otx.py
  class TestOtx (line 16) | class TestOtx(object):
    method domain (line 18) | def domain() -> str:
    method test_search (line 22) | async def test_search(self) -> None:

FILE: tests/discovery/test_rocketreach.py
  class _ProxyConnector (line 9) | class _ProxyConnector:
    method from_url (line 11) | def from_url(*_args, **_kwargs):
  function test_missing_key_raises (line 22) | async def test_missing_key_raises(monkeypatch) -> None:
  function test_do_search_uses_people_data_endpoint_and_start_pagination (line 29) | async def test_do_search_uses_people_data_endpoint_and_start_pagination(...
  function test_do_search_stops_on_throttling_message (line 98) | async def test_do_search_stops_on_throttling_message(monkeypatch) -> None:

FILE: tests/discovery/test_shodan_engine.py
  class TestShodanEngine (line 8) | class TestShodanEngine:
    method test_shodan_engine_processes_without_work_item_error_and_yields_hostnames (line 10) | async def test_shodan_engine_processes_without_work_item_error_and_yie...

FILE: tests/discovery/test_thc.py
  class TestThcApi (line 28) | class TestThcApi:
    method test_api_subdomains_download_endpoint_responds (line 32) | async def test_api_subdomains_download_endpoint_responds(self) -> None:
    method test_api_subdomains_returns_text_format (line 43) | async def test_api_subdomains_returns_text_format(self) -> None:
    method test_api_cli_subdomain_endpoint (line 55) | async def test_api_cli_subdomain_endpoint(self) -> None:
    method test_api_returns_rate_limit_headers (line 66) | async def test_api_returns_rate_limit_headers(self) -> None:
  class TestThcSubdomainSearch (line 81) | class TestThcSubdomainSearch:
    method domain (line 85) | def domain() -> str:
    method small_domain (line 89) | def small_domain() -> str:
    method test_search_returns_set (line 93) | async def test_search_returns_set(self) -> None:
    method test_search_finds_subdomains (line 104) | async def test_search_finds_subdomains(self) -> None:
    method test_search_results_contain_target_domain (line 115) | async def test_search_results_contain_target_domain(self) -> None:
    method test_search_no_duplicates (line 127) | async def test_search_no_duplicates(self) -> None:
  class TestThcEdgeCases (line 142) | class TestThcEdgeCases:
    method test_search_nonexistent_domain (line 146) | async def test_search_nonexistent_domain(self) -> None:
    method test_search_empty_domain (line 159) | async def test_search_empty_domain(self) -> None:
    method test_search_special_characters_domain (line 172) | async def test_search_special_characters_domain(self) -> None:
    method test_search_unicode_domain (line 185) | async def test_search_unicode_domain(self) -> None:
    method test_search_subdomain_as_input (line 198) | async def test_search_subdomain_as_input(self) -> None:
  class TestThcProxy (line 212) | class TestThcProxy:
    method domain (line 216) | def domain() -> str:
    method test_process_accepts_proxy_parameter (line 220) | async def test_process_accepts_proxy_parameter(self) -> None:
    method test_proxy_attribute_is_set (line 231) | async def test_proxy_attribute_is_set(self) -> None:
  class TestThcInitialization (line 240) | class TestThcInitialization:
    method test_init_sets_word (line 243) | def test_init_sets_word(self) -> None:
    method test_init_creates_empty_results (line 249) | def test_init_creates_empty_results(self) -> None:
    method test_init_proxy_default_false (line 255) | def test_init_proxy_default_false(self) -> None:
    method test_init_has_rate_limit_settings (line 260) | def test_init_has_rate_limit_settings(self) -> None:
    method test_class_has_required_methods (line 268) | def test_class_has_required_methods(self) -> None:
  class TestThcResponseFormat (line 282) | class TestThcResponseFormat:
    method domain (line 286) | def domain() -> str:
    method test_hostnames_are_strings (line 290) | async def test_hostnames_are_strings(self) -> None:
    method test_hostnames_are_valid_format (line 302) | async def test_hostnames_are_valid_format(self) -> None:
    method test_hostnames_are_lowercase (line 316) | async def test_hostnames_are_lowercase(self) -> None:
  class TestThcIntegration (line 332) | class TestThcIntegration:
    method test_module_can_be_imported (line 336) | async def test_module_can_be_imported(self) -> None:
    method test_search_class_exists (line 342) | async def test_search_class_exists(self) -> None:
    method test_compatible_with_store_function (line 348) | async def test_compatible_with_store_function(self) -> None:

FILE: tests/lib/test_core.py
  function mock_environ (line 15) | def mock_environ(monkeypatch, tmp_path: Path):
  function mock_read_text (line 19) | def mock_read_text(mocked: dict[Path, str | Exception]):
  function test_read_config_searches_config_dirs (line 40) | def test_read_config_searches_config_dirs(
  function test_read_config_copies_default_to_home (line 57) | def test_read_config_copies_default_to_home(name: str, capsys):
  class DummyResponse (line 79) | class DummyResponse:
    method __init__ (line 80) | def __init__(self, text_value: str = 'response-text', json_value: Any ...
    method __aenter__ (line 84) | async def __aenter__(self):
    method __aexit__ (line 87) | async def __aexit__(self, exc_type, exc, tb):
    method text (line 90) | async def text(self):
    method json (line 93) | async def json(self):
  class DummySession (line 97) | class DummySession:
    method __init__ (line 100) | def __init__(self, *, headers=None, timeout=None, connector=None):
    method __aenter__ (line 108) | async def __aenter__(self):
    method __aexit__ (line 111) | async def __aexit__(self, exc_type, exc, tb):
    method request (line 115) | def request(self, method: str, url: str, **kwargs):
    method get (line 119) | def get(self, url: str, **kwargs):
    method post (line 123) | def post(self, url: str, **kwargs):
    method close (line 127) | async def close(self):
  function reset_dummy_sessions (line 131) | def reset_dummy_sessions() -> None:
  function fake_sleep (line 135) | async def fake_sleep(_seconds: float) -> None:
  function test_api_keys_yaml_is_in_sync_with_core_accessors (line 139) | def test_api_keys_yaml_is_in_sync_with_core_accessors():
  function test_api_key_accessors_delegate_to_shared_mapping (line 167) | def test_api_key_accessors_delegate_to_shared_mapping(monkeypatch, acces...
  function test_fetch_creates_session_with_default_headers (line 186) | async def test_fetch_creates_session_with_default_headers(monkeypatch) -...
  function test_fetch_uses_http_proxy_when_enabled (line 207) | async def test_fetch_uses_http_proxy_when_enabled(monkeypatch) -> None:
  function test_post_fetch_decodes_string_payload_and_posts_params (line 231) | async def test_post_fetch_decodes_string_payload_and_posts_params(monkey...
  function test_post_fetch_proxy_branch_uses_get_with_http_proxy (line 255) | async def test_post_fetch_proxy_branch_uses_get_with_http_proxy(monkeypa...

FILE: tests/lib/test_output.py
  function test_sorted_unique_sorts_and_deduplicates (line 7) | def test_sorted_unique_sorts_and_deduplicates() -> None:
  function test_print_linkedin_sections_prints_links_when_present (line 11) | def test_print_linkedin_sections_prints_links_when_present(capsys) -> None:
  function test_print_linkedin_sections_prints_people_and_links (line 26) | def test_print_linkedin_sections_prints_people_and_links(capsys) -> None:

FILE: tests/test_hackertarget_apikey.py
  class TestHackerTargetApiKey (line 6) | class TestHackerTargetApiKey:
    method test_do_search_with_apikey (line 9) | async def test_do_search_with_apikey(self, monkeypatch):
    method test_do_search_without_apikey (line 28) | async def test_do_search_without_apikey(self, monkeypatch):

FILE: tests/test_mojeek.py
  class TestMojeekSearch (line 4) | class TestMojeekSearch:
    method test_process_and_parsing (line 7) | async def test_process_and_parsing(self, monkeypatch):
    method test_pagination_limit (line 42) | async def test_pagination_limit(self, monkeypatch):

FILE: tests/test_myparser.py
  class TestMyParser (line 9) | class TestMyParser(object):
    method test_emails (line 11) | async def test_emails(self) -> None:

FILE: tests/test_security.py
  class TestCORSConfiguration (line 12) | class TestCORSConfiguration:
    method test_cors_does_not_allow_credentials_with_wildcard_origins (line 15) | def test_cors_does_not_allow_credentials_with_wildcard_origins(self):
    method test_cors_restricts_http_methods (line 44) | def test_cors_restricts_http_methods(self):
  class TestXMLInjectionPrevention (line 75) | class TestXMLInjectionPrevention:
    method test_sanitize_for_xml_escapes_special_characters (line 78) | def test_sanitize_for_xml_escapes_special_characters(self):
    method test_sanitize_for_xml_prevents_xml_entity_injection (line 100) | def test_sanitize_for_xml_prevents_xml_entity_injection(self):
    method test_command_line_args_are_sanitized_in_xml_output (line 117) | def test_command_line_args_are_sanitized_in_xml_output(self):
  class TestInformationDisclosure (line 139) | class TestInformationDisclosure:
    method client (line 143) | def client(self):
    method test_api_does_not_expose_traceback_in_error_responses (line 149) | def test_api_does_not_expose_traceback_in_error_responses(self, client):
    method test_error_responses_do_not_leak_internal_paths (line 165) | def test_error_responses_do_not_leak_internal_paths(self, client):
    method test_debug_mode_does_not_expose_sensitive_info (line 190) | def test_debug_mode_does_not_expose_sensitive_info(self, client, monke...
  class TestPathTraversalPrevention (line 206) | class TestPathTraversalPrevention:
    method test_sanitize_filename_removes_path_components (line 209) | def test_sanitize_filename_removes_path_components(self):
    method test_sanitize_filename_removes_dangerous_characters (line 236) | def test_sanitize_filename_removes_dangerous_characters(self):
    method test_sanitize_filename_prevents_hidden_files (line 263) | def test_sanitize_filename_prevents_hidden_files(self):
    method test_filename_sanitization_preserves_safe_filenames (line 276) | def test_filename_sanitization_preserves_safe_filenames(self):
    method test_path_traversal_in_file_operations (line 294) | def test_path_traversal_in_file_operations(self):
  class TestSecurityBestPractices (line 316) | class TestSecurityBestPractices:
    method test_no_hardcoded_secrets_in_code (line 319) | def test_no_hardcoded_secrets_in_code(self):
    method test_api_has_rate_limiting (line 356) | def test_api_has_rate_limiting(self):
    method test_sensitive_endpoints_require_validation (line 366) | def test_sensitive_endpoints_require_validation(self):

FILE: theHarvester/__main__.py
  function sanitize_for_xml (line 84) | def sanitize_for_xml(text: str) -> str:
  function sanitize_filename (line 94) | def sanitize_filename(filename: str) -> str:
  function start (line 108) | async def start(rest_args: argparse.Namespace | None = None):
  function entry_point (line 1881) | async def entry_point() -> None:

FILE: theHarvester/discovery/additional_apis.py
  class AdditionalAPIs (line 11) | class AdditionalAPIs:
    method __init__ (line 14) | def __init__(self, domain: str, api_keys: dict[str, str] | None = None):
    method process (line 41) | async def process(self, proxy: bool = False) -> dict[str, Any]:
    method _process_haveibeenpwned (line 60) | async def _process_haveibeenpwned(self, proxy: bool = False):
    method _process_leaklookup (line 70) | async def _process_leaklookup(self, proxy: bool = False):
    method _process_securityscorecard (line 80) | async def _process_securityscorecard(self, proxy: bool = False):
    method _process_builtwith (line 94) | async def _process_builtwith(self, proxy: bool = False):
    method _process_shodan (line 110) | async def _process_shodan(self, proxy: bool = False):
    method _is_valid_ip (line 161) | def _is_valid_ip(ip_str: str) -> bool:
    method get_hosts (line 171) | async def get_hosts(self) -> set[str]:
    method get_emails (line 175) | async def get_emails(self) -> set[str]:

FILE: theHarvester/discovery/api_endpoints.py
  class EndpointResult (line 26) | class EndpointResult:
    method to_dict (line 45) | def to_dict(self) -> dict[str, Any]:
  class SearchApiEndpoints (line 50) | class SearchApiEndpoints:
    method __init__ (line 55) | def __init__(
    method do_search (line 392) | async def do_search(self) -> None:
    method _detect_schema (line 434) | async def _detect_schema(self) -> str:
    method _check_endpoint_with_semaphore (line 453) | async def _check_endpoint_with_semaphore(self, url: str) -> EndpointRe...
    method _load_wordlist (line 458) | def _load_wordlist(self) -> list[str]:
    method _check_endpoint (line 482) | async def _check_endpoint(self, url: str) -> EndpointResult | None:
    method _get_headers (line 529) | def _get_headers(self) -> dict[str, str]:
    method _process_response (line 544) | def _process_response(self, url: str, method: str, response, response_...
    method _post_scan_analysis (line 711) | async def _post_scan_analysis(self) -> None:
    method get_results_summary (line 729) | def get_results_summary(self) -> dict[str, Any]:
    method _get_tech_stack_summary (line 750) | def _get_tech_stack_summary(self) -> dict[str, int]:
    method get_detailed_results (line 758) | def get_detailed_results(self) -> list[dict[str, Any]]:
    method get_hostnames (line 767) | def get_hostnames(self) -> set[str]:
    method get_endpoints (line 771) | def get_endpoints(self) -> set[str]:
    method get_found_endpoints (line 775) | def get_found_endpoints(self) -> dict[str, EndpointResult]:
    method get_interesting_endpoints (line 779) | def get_interesting_endpoints(self) -> dict[str, EndpointResult]:
    method get_auth_required (line 783) | def get_auth_required(self) -> dict[str, EndpointResult]:
    method get_api_versions (line 787) | def get_api_versions(self) -> set[str]:
    method get_rate_limits (line 791) | def get_rate_limits(self) -> dict[str, EndpointResult]:
    method get_methods (line 795) | def get_methods(self) -> set[str]:
    method get_status_codes (line 799) | def get_status_codes(self) -> set[int]:
    method get_response_sizes (line 803) | def get_response_sizes(self) -> dict[str, int]:
    method get_tech_stack (line 807) | def get_tech_stack(self) -> dict[str, list[str]]:
    method get_schema_detected (line 811) | def get_schema_detected(self) -> dict[str, dict[str, Any]]:
    method export_results (line 815) | def export_results(self, output_file: str | None = None, format: str =...

FILE: theHarvester/discovery/baidusearch.py
  class SearchBaidu (line 5) | class SearchBaidu:
    method __init__ (line 6) | def __init__(self, word, limit) -> None:
    method do_search (line 14) | async def do_search(self) -> None:
    method process (line 22) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 26) | async def get_emails(self):
    method get_hostnames (line 30) | async def get_hostnames(self):

FILE: theHarvester/discovery/bevigil.py
  class SearchBeVigil (line 5) | class SearchBeVigil:
    method __init__ (line 6) | def __init__(self, word) -> None:
    method do_search (line 16) | async def do_search(self) -> None:
    method get_hostnames (line 31) | async def get_hostnames(self) -> set:
    method get_interestingurls (line 34) | async def get_interestingurls(self) -> set:
    method process (line 37) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/bitbucket.py
  class RetryResult (line 14) | class RetryResult(NamedTuple):
  class SuccessResult (line 18) | class SuccessResult(NamedTuple):
  class ErrorResult (line 24) | class ErrorResult(NamedTuple):
  class SearchBitBucket (line 29) | class SearchBitBucket:
    method __init__ (line 30) | def __init__(self, word, limit) -> None:
    method fragments_from_response (line 52) | async def fragments_from_response(json_data: dict) -> list[str]:
    method page_from_response (line 65) | async def page_from_response(page: str, links) -> int | None:
    method handle_response (line 76) | async def handle_response(self, response: tuple[str, dict, int, Any]) ...
    method next_page_or_end (line 93) | async def next_page_or_end(result: SuccessResult) -> int | None:
    method do_search (line 99) | async def do_search(self, page: int) -> tuple[str, dict, int, Any]:
    method process (line 109) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 151) | async def get_emails(self):
    method get_hostnames (line 159) | async def get_hostnames(self):

FILE: theHarvester/discovery/bravesearch.py
  class SearchBrave (line 9) | class SearchBrave:
    method __init__ (line 10) | def __init__(self, word, limit):
    method do_search (line 22) | async def do_search(self):
    method get_emails (line 120) | async def get_emails(self):
    method get_hostnames (line 124) | async def get_hostnames(self):
    method process (line 128) | async def process(self, proxy=False):

FILE: theHarvester/discovery/bufferoverun.py
  class SearchBufferover (line 7) | class SearchBufferover:
    method __init__ (line 8) | def __init__(self, word) -> None:
    method do_search (line 17) | async def do_search(self) -> None:
    method get_hostnames (line 40) | async def get_hostnames(self) -> set:
    method get_ips (line 43) | async def get_ips(self) -> set:
    method process (line 46) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/builtwith.py
  class SearchBuiltWith (line 9) | class SearchBuiltWith:
    method __init__ (line 10) | def __init__(self, word: str):
    method process (line 26) | async def process(self, proxy: bool = False) -> None:
    method _extract_data (line 46) | def _extract_data(self) -> None:
    method get_hostnames (line 68) | async def get_hostnames(self) -> set[str]:
    method get_tech_stack (line 71) | async def get_tech_stack(self) -> dict:
    method get_interesting_urls (line 74) | async def get_interesting_urls(self) -> set[str]:
    method get_frameworks (line 77) | async def get_frameworks(self) -> set[str]:
    method get_languages (line 80) | async def get_languages(self) -> set[str]:
    method get_servers (line 83) | async def get_servers(self) -> set[str]:
    method get_cms (line 86) | async def get_cms(self) -> set[str]:
    method get_analytics (line 89) | async def get_analytics(self) -> set[str]:

FILE: theHarvester/discovery/censysearch.py
  class SearchCensys (line 15) | class SearchCensys:
    method __init__ (line 18) | def __init__(self, domain, limit: int = 500) -> None:
    method _normalize_emails (line 29) | def _normalize_emails(email_address: object) -> set[str]:
    method do_search (line 36) | async def do_search(self) -> None:
    method get_hostnames (line 69) | async def get_hostnames(self) -> set:
    method get_emails (line 72) | async def get_emails(self) -> set:
    method process (line 75) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/certspottersearch.py
  class SearchCertspoter (line 4) | class SearchCertspoter:
    method __init__ (line 5) | def __init__(self, word) -> None:
    method do_search (line 10) | async def do_search(self) -> None:
    method get_hostnames (line 35) | async def get_hostnames(self) -> set:
    method process (line 38) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/chaos.py
  class SearchChaos (line 18) | class SearchChaos:
    method __init__ (line 23) | def __init__(self, word) -> None:
    method _get_api_key (line 30) | def _get_api_key(self) -> str:
    method _safe_parse_json (line 38) | def _safe_parse_json(payload: object) -> dict:
    method do_search (line 49) | async def do_search(self) -> None:
    method get_hostnames (line 110) | async def get_hostnames(self) -> set:
    method process (line 113) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/commoncrawl.py
  class SearchCommoncrawl (line 18) | class SearchCommoncrawl:
    method __init__ (line 23) | def __init__(self, word) -> None:
    method _safe_parse_json_lines (line 30) | def _safe_parse_json_lines(payload: str) -> list:
    method _extract_domain_from_url (line 44) | def _extract_domain_from_url(self, url: str) -> str:
    method do_search (line 60) | async def do_search(self) -> None:
    method get_hostnames (line 123) | async def get_hostnames(self) -> set:
    method process (line 126) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/constants.py
  function splitter (line 6) | async def splitter(links):
  function filter (line 31) | def filter(lst):
  function get_delay (line 50) | def get_delay() -> float:
  function search (line 55) | async def search(text: str) -> bool:
  function google_workaround (line 71) | async def google_workaround(visit_url: str) -> bool | str:
  class MissingKeyError (line 111) | class MissingKeyError(Exception):
    method __init__ (line 116) | def __init__(self, source: str | None) -> None:
    method __str__ (line 122) | def __str__(self) -> str:

FILE: theHarvester/discovery/criminalip.py
  class SearchCriminalIP (line 9) | class SearchCriminalIP:
    method __init__ (line 10) | def __init__(self, word) -> None:
    method _normalize_host (line 20) | def _normalize_host(self, hostname: str | None) -> str | None:
    method _add_host (line 40) | def _add_host(self, hostname: str | None, include_root: bool = True) -...
    method _add_host_from_url (line 49) | def _add_host_from_url(self, url: str | None) -> None:
    method _add_ip (line 60) | def _add_ip(self, ip: str | None) -> None:
    method _add_asn (line 64) | def _add_asn(self, asn: str | int | None) -> None:
    method _collect_hosts_from_value (line 71) | def _collect_hosts_from_value(self, value) -> None:
    method do_search (line 86) | async def do_search(self) -> None:
    method parser (line 187) | async def parser(self, jlines):
    method get_asns (line 323) | async def get_asns(self) -> set:
    method get_hostnames (line 326) | async def get_hostnames(self) -> set:
    method get_ips (line 329) | async def get_ips(self) -> set:
    method process (line 332) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/crtsh.py
  class SearchCrtsh (line 4) | class SearchCrtsh:
    method __init__ (line 5) | def __init__(self, word) -> None:
    method do_search (line 10) | async def do_search(self) -> list:
    method process (line 31) | async def process(self, proxy: bool = False) -> None:
    method get_hostnames (line 36) | async def get_hostnames(self) -> list:

FILE: theHarvester/discovery/dnssearch.py
  class DnsForce (line 27) | class DnsForce:
    method __init__ (line 28) | def __init__(self, domain, dnsserver, verbose: bool = False) -> None:
    method run (line 40) | async def run(self):
  function serialize_ip_range (line 58) | def serialize_ip_range(ip: str, netmask: str = '24') -> str:
  function list_ips_in_network_range (line 89) | def list_ips_in_network_range(iprange: str) -> list[str]:
  function reverse_single_ip (line 111) | async def reverse_single_ip(ip: str, resolver: DNSResolver) -> str:
  function reverse_all_ips_in_range (line 130) | async def reverse_all_ips_in_range(iprange: str, callback: Callable, nam...
  function log_query (line 164) | def log_query(ip: str) -> None:
  function log_result (line 182) | def log_result(host: str) -> None:
  function generate_postprocessing_callback (line 199) | def generate_postprocessing_callback(target: str, **allhosts: list[str])...

FILE: theHarvester/discovery/duckduckgosearch.py
  class SearchDuckDuckGo (line 7) | class SearchDuckDuckGo:
    method __init__ (line 8) | def __init__(self, word, limit) -> None:
    method do_search (line 20) | async def do_search(self) -> None:
    method crawl (line 32) | async def crawl(self, text: str) -> set[str]:
    method get_emails (line 77) | async def get_emails(self):
    method get_hostnames (line 81) | async def get_hostnames(self):
    method process (line 85) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/fofa.py
  class SearchFofa (line 19) | class SearchFofa:
    method __init__ (line 25) | def __init__(self, word) -> None:
    method _get_api_credentials (line 33) | def _get_api_credentials(self) -> tuple[str, str]:
    method _safe_parse_json (line 41) | def _safe_parse_json(payload: object) -> dict:
    method do_search (line 51) | async def do_search(self) -> None:
    method get_hostnames (line 118) | async def get_hostnames(self) -> set:
    method get_ips (line 121) | async def get_ips(self) -> set:
    method process (line 124) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/fullhuntsearch.py
  class SearchFullHunt (line 8) | class SearchFullHunt:
    method __init__ (line 113) | def __init__(self, word) -> None:
    method _get_headers (line 136) | def _get_headers(self) -> dict[str, str]:
    method _fetch_data (line 140) | async def _fetch_data(self, endpoint: str) -> dict[str, Any]:
    method add_filter (line 151) | def add_filter(self, filter_name: str, filter_value: str) -> None:
    method add_filters (line 167) | def add_filters(self, filters: dict[str, str]) -> None:
    method clear_filters (line 179) | def clear_filters(self) -> None:
    method _build_query_string (line 183) | def _build_query_string(self) -> str:
    method advanced_search (line 198) | async def advanced_search(self) -> dict[str, Any]:
    method get_domain_details (line 212) | async def get_domain_details(self) -> dict[str, Any]:
    method get_subdomains (line 217) | async def get_subdomains(self) -> dict[str, Any]:
    method get_host_details (line 222) | async def get_host_details(self, host: str) -> dict[str, Any]:
    method search_tech (line 227) | async def search_tech(self, tech_name: str) -> dict[str, Any]:
    method search_service (line 232) | async def search_service(self, service_name: str) -> dict[str, Any]:
    method search_port (line 237) | async def search_port(self, port: int) -> dict[str, Any]:
    method search_country (line 242) | async def search_country(self, country_code: str) -> dict[str, Any]:
    method search_cloud_provider (line 247) | async def search_cloud_provider(self, provider: str) -> dict[str, Any]:
    method search_http_status (line 252) | async def search_http_status(self, status_code: int) -> dict[str, Any]:
    method search_certificate (line 257) | async def search_certificate(self, filter_name: str, value: str) -> di...
    method search_with_dns (line 266) | async def search_with_dns(self, dns_type: str, value: str) -> dict[str...
    method extract_data_from_domain_details (line 277) | async def extract_data_from_domain_details(self, details: dict[str, An...
    method extract_data_from_search_results (line 360) | async def extract_data_from_search_results(self, results: dict[str, An...
    method do_search (line 371) | async def do_search(self) -> None:
    method get_hostnames (line 393) | async def get_hostnames(self) -> list[str]:
    method get_ips (line 397) | async def get_ips(self) -> list[str]:
    method get_ports (line 401) | async def get_ports(self) -> list[int]:
    method get_technologies (line 405) | async def get_technologies(self) -> list[str]:
    method get_tags (line 409) | async def get_tags(self) -> list[str]:
    method get_dns_records (line 413) | async def get_dns_records(self) -> dict[str, dict[str, list[str]]]:
    method get_http_info (line 417) | async def get_http_info(self) -> dict[str, dict[str, Any]]:
    method get_geo_info (line 421) | async def get_geo_info(self) -> dict[str, dict[str, Any]]:
    method get_cloud_info (line 425) | async def get_cloud_info(self) -> dict[str, dict[str, Any]]:
    method get_certificate_info (line 429) | async def get_certificate_info(self) -> list[dict[str, Any]]:
    method get_all_results (line 433) | async def get_all_results(self) -> dict[str, Any]:
    method process (line 437) | async def process(self, proxy: bool = False, filters: dict[str, str] |...

FILE: theHarvester/discovery/githubcode.py
  class RetryResult (line 13) | class RetryResult(NamedTuple):
  class SuccessResult (line 17) | class SuccessResult(NamedTuple):
  class ErrorResult (line 23) | class ErrorResult(NamedTuple):
  class SearchGithubCode (line 28) | class SearchGithubCode:
    method __init__ (line 29) | def __init__(self, word, limit) -> None:
    method fragments_from_response (line 56) | async def fragments_from_response(json_data: dict) -> list[str]:
    method page_from_response (line 69) | async def page_from_response(page: str, links) -> int | None:
    method handle_response (line 80) | async def handle_response(self, response: tuple[str, dict, int, Any]) ...
    method next_page_or_end (line 97) | async def next_page_or_end(result: SuccessResult) -> int | None:
    method do_search (line 103) | async def do_search(self, page: int) -> tuple[str, dict, int, Any]:
    method process (line 113) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 155) | async def get_emails(self):
    method get_hostnames (line 163) | async def get_hostnames(self):

FILE: theHarvester/discovery/gitlabsearch.py
  class SearchGitlab (line 18) | class SearchGitlab:
    method __init__ (line 23) | def __init__(self, word) -> None:
    method _safe_parse_json (line 32) | def _safe_parse_json(payload: object) -> dict:
    method _extract_domains_from_text (line 43) | def _extract_domains_from_text(self, text: str) -> set:
    method _extract_emails_from_text (line 61) | def _extract_emails_from_text(self, text: str) -> set:
    method search_projects (line 77) | async def search_projects(self) -> None:
    method search_users (line 138) | async def search_users(self) -> None:
    method do_search (line 189) | async def do_search(self) -> None:
    method get_hostnames (line 193) | async def get_hostnames(self) -> set:
    method get_emails (line 196) | async def get_emails(self) -> set:
    method get_urls (line 199) | async def get_urls(self) -> set:
    method process (line 202) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/hackertarget.py
  class SearchHackerTarget (line 5) | class SearchHackerTarget:
    method __init__ (line 13) | def __init__(self, word) -> None:
    method do_search (line 21) | async def do_search(self) -> None:
    method process (line 44) | async def process(self, proxy: bool = False) -> None:
    method get_hostnames (line 48) | async def get_hostnames(self) -> list:

FILE: theHarvester/discovery/haveibeenpwned.py
  class SearchHaveIBeenPwned (line 7) | class SearchHaveIBeenPwned:
    method __init__ (line 8) | def __init__(self, word: str):
    method process (line 23) | async def process(self, proxy: bool = False) -> None:
    method _extract_data (line 42) | def _extract_data(self) -> None:
    method get_hostnames (line 54) | async def get_hostnames(self) -> set[str]:
    method get_emails (line 57) | async def get_emails(self) -> set[str]:
    method get_breaches (line 60) | async def get_breaches(self) -> list[dict]:
    method get_pastes (line 63) | async def get_pastes(self) -> list[dict]:
    method get_breach_dates (line 66) | async def get_breach_dates(self) -> set[str]:
    method get_breach_types (line 69) | async def get_breach_types(self) -> set[str]:
    method get_affected_data (line 72) | async def get_affected_data(self) -> set[str]:

FILE: theHarvester/discovery/hudsonrocksearch.py
  class SearchHudsonRock (line 8) | class SearchHudsonRock:
    method __init__ (line 15) | def __init__(self, word: str) -> None:
    method do_search (line 35) | async def do_search(self) -> None:
    method _is_valid_email (line 67) | def _is_valid_email(self, email: str) -> bool:
    method _search_domain (line 81) | async def _search_domain(self, domain: str) -> None:
    method _search_email (line 107) | async def _search_email(self, email: str) -> None:
    method _process_domain_response (line 133) | def _process_domain_response(self, response: dict) -> None:
    method _extract_hosts_from_urls (line 176) | def _extract_hosts_from_urls(self, urls_data: list[dict], source_type:...
    method _extract_emails_from_data (line 203) | def _extract_emails_from_data(self, data: dict) -> None:
    method _process_email_response (line 226) | def _process_email_response(self, response: dict) -> None:
    method _is_valid_ip (line 273) | def _is_valid_ip(self, ip: str) -> bool:
    method _extract_hosts_from_services (line 293) | def _extract_hosts_from_services(self, services: list[dict]) -> None:
    method get_hostnames (line 319) | async def get_hostnames(self) -> set[str]:
    method get_ips (line 327) | async def get_ips(self) -> set[str]:
    method get_emails (line 335) | async def get_emails(self) -> set[str]:
    method get_infostealers (line 343) | async def get_infostealers(self) -> list[dict]:
    method get_compromised_data (line 351) | async def get_compromised_data(self) -> dict:
    method get_summary (line 359) | def get_summary(self) -> dict:
    method process (line 377) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/huntersearch.py
  class SearchHunter (line 7) | class SearchHunter:
    method __init__ (line 8) | def __init__(self, word, limit, start) -> None:
    method do_search (line 23) | async def do_search(self) -> None:
    method parse_resp (line 65) | async def parse_resp(self, json_resp):
    method process (line 79) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 83) | async def get_emails(self):
    method get_hostnames (line 86) | async def get_hostnames(self):

FILE: theHarvester/discovery/intelxsearch.py
  class SearchIntelx (line 12) | class SearchIntelx:
    method __init__ (line 13) | def __init__(self, word) -> None:
    method do_search (line 25) | async def do_search(self) -> None:
    method process (line 64) | async def process(self, proxy: bool = False):
    method get_emails (line 70) | async def get_emails(self) -> list[str]:
    method get_interestingurls (line 73) | async def get_interestingurls(self) -> tuple[list[str], list[str]]:

FILE: theHarvester/discovery/leakix.py
  class SearchLeakix (line 17) | class SearchLeakix:
    method __init__ (line 23) | def __init__(self, word) -> None:
    method _safe_parse_json (line 31) | def _safe_parse_json(payload: object) -> list:
    method do_search (line 43) | async def do_search(self) -> None:
    method get_hostnames (line 108) | async def get_hostnames(self) -> set:
    method get_emails (line 111) | async def get_emails(self) -> set:
    method process (line 114) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/leaklookup.py
  class SearchLeakLookup (line 7) | class SearchLeakLookup:
    method __init__ (line 8) | def __init__(self, word: str):
    method process (line 20) | async def process(self, proxy: bool = False) -> None:
    method _extract_data (line 45) | def _extract_data(self) -> None:
    method get_hostnames (line 59) | async def get_hostnames(self) -> set[str]:
    method get_emails (line 62) | async def get_emails(self) -> set[str]:
    method get_leaks (line 65) | async def get_leaks(self) -> list[dict]:
    method get_passwords (line 68) | async def get_passwords(self) -> set[str]:
    method get_sources (line 71) | async def get_sources(self) -> set[str]:
    method get_leak_dates (line 74) | async def get_leak_dates(self) -> set[str]:

FILE: theHarvester/discovery/mojeek.py
  class SearchMojeek (line 5) | class SearchMojeek:
    method __init__ (line 6) | def __init__(self, word, limit) -> None:
    method do_search (line 24) | async def do_search(self) -> None:
    method process (line 63) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 67) | async def get_emails(self):
    method get_hostnames (line 71) | async def get_hostnames(self):

FILE: theHarvester/discovery/netlas.py
  class SearchNetlas (line 7) | class SearchNetlas:
    method __init__ (line 8) | def __init__(self, word, limit: int) -> None:
    method do_count (line 18) | async def do_count(self) -> None:
    method do_search (line 29) | async def do_search(self) -> None:
    method get_hostnames (line 57) | async def get_hostnames(self) -> list:
    method process (line 60) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/onyphe.py
  class SearchOnyphe (line 9) | class SearchOnyphe:
    method __init__ (line 10) | def __init__(self, word) -> None:
    method do_search (line 21) | async def do_search(self) -> None:
    method parse_onyphe_resp_json (line 35) | async def parse_onyphe_resp_json(self):
    method get_asns (line 85) | async def get_asns(self) -> set:
    method get_hostnames (line 88) | async def get_hostnames(self) -> set:
    method get_ips (line 91) | async def get_ips(self) -> set:
    method process (line 94) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/otxsearch.py
  class SearchOtx (line 7) | class SearchOtx:
    method __init__ (line 8) | def __init__(self, word) -> None:
    method do_search (line 14) | async def do_search(self) -> None:
    method get_hostnames (line 51) | async def get_hostnames(self) -> set:
    method get_ips (line 54) | async def get_ips(self) -> set:
    method process (line 57) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/pentesttools.py
  class SearchPentestTools (line 9) | class SearchPentestTools:
    method __init__ (line 10) | def __init__(self, word) -> None:
    method poll (line 20) | async def poll(self, scan_id):
    method parse_json (line 44) | async def parse_json(json_results):
    method get_hostnames (line 53) | async def get_hostnames(self) -> list:
    method do_search (line 56) | async def do_search(self) -> None:
    method process (line 72) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/projectdiscovery.py
  class SearchDiscovery (line 5) | class SearchDiscovery:
    method __init__ (line 6) | def __init__(self, word) -> None:
    method do_search (line 14) | async def do_search(self):
    method get_hostnames (line 24) | async def get_hostnames(self):
    method process (line 27) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/rapiddns.py
  class SearchRapidDns (line 7) | class SearchRapidDns:
    method __init__ (line 8) | def __init__(self, word) -> None:
    method do_search (line 13) | async def do_search(self):
    method process (line 47) | async def process(self, proxy: bool = False) -> None:
    method get_hostnames (line 51) | async def get_hostnames(self):

FILE: theHarvester/discovery/robtex.py
  class SearchRobtex (line 19) | class SearchRobtex:
    method __init__ (line 24) | def __init__(self, word) -> None:
    method _safe_parse_json_lines (line 32) | def _safe_parse_json_lines(payload: str) -> list:
    method do_search (line 46) | async def do_search(self) -> None:
    method get_hostnames (line 109) | async def get_hostnames(self) -> set:
    method get_ips (line 112) | async def get_ips(self) -> set:
    method process (line 115) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/rocketreach.py
  class SearchRocketReach (line 7) | class SearchRocketReach:
    method __init__ (line 8) | def __init__(self, word, limit) -> None:
    method do_search (line 21) | async def do_search(self) -> None:
    method get_links (line 86) | async def get_links(self):
    method get_emails (line 89) | async def get_emails(self):
    method process (line 92) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/search_dehashed.py
  class SearchDehashed (line 10) | class SearchDehashed:
    method __init__ (line 11) | def __init__(self, word) -> None:
    method do_search (line 26) | async def do_search(self) -> None:
    method print_csv_results (line 74) | async def print_csv_results(self) -> None:
    method process (line 93) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 98) | async def get_emails(self) -> set:
    method get_hostnames (line 105) | async def get_hostnames(self) -> set:
    method get_ips (line 108) | async def get_ips(self) -> set:

FILE: theHarvester/discovery/search_dnsdumpster.py
  class SearchDNSDumpster (line 6) | class SearchDNSDumpster:
    method __init__ (line 7) | def __init__(self, word) -> None:
    method do_search (line 16) | async def do_search(self) -> None:
    method process (line 44) | async def process(self, proxy: bool = False) -> None:
    method get_hostnames (line 47) | async def get_hostnames(self) -> set:
    method get_ips (line 50) | async def get_ips(self) -> set:

FILE: theHarvester/discovery/searchhunterhow.py
  class SearchHunterHow (line 10) | class SearchHunterHow:
    method __init__ (line 11) | def __init__(self, word) -> None:
    method do_search (line 19) | async def do_search(self) -> None:
    method get_hostnames (line 54) | async def get_hostnames(self) -> set:
    method process (line 57) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/securityscorecard.py
  class SearchSecurityScorecard (line 7) | class SearchSecurityScorecard:
    method __init__ (line 8) | def __init__(self, word: str):
    method process (line 23) | async def process(self, proxy: bool = False) -> None:
    method _extract_data (line 41) | def _extract_data(self, data: dict) -> None:
    method get_hostnames (line 73) | async def get_hostnames(self) -> set[str]:
    method get_ips (line 76) | async def get_ips(self) -> list[str]:
    method get_score (line 79) | async def get_score(self) -> int:
    method get_grades (line 82) | async def get_grades(self) -> dict:
    method get_issues (line 85) | async def get_issues(self) -> list[dict]:
    method get_recommendations (line 88) | async def get_recommendations(self) -> list[dict]:
    method get_history (line 91) | async def get_history(self) -> list[dict]:

FILE: theHarvester/discovery/securitytrailssearch.py
  class SearchSecuritytrail (line 8) | class SearchSecuritytrail:
    method __init__ (line 9) | def __init__(self, word) -> None:
    method authenticate (line 23) | async def authenticate(self) -> None:
    method do_search (line 33) | async def do_search(self) -> None:
    method process (line 68) | async def process(self, proxy: bool = False) -> None:
    method get_ips (line 88) | async def get_ips(self) -> set:
    method get_hostnames (line 91) | async def get_hostnames(self) -> set:

FILE: theHarvester/discovery/shodansearch.py
  class SearchShodan (line 9) | class SearchShodan:
    method __init__ (line 10) | def __init__(self) -> None:
    method search_ip (line 18) | async def search_ip(self, ip) -> OrderedDict:

FILE: theHarvester/discovery/subdomaincenter.py
  class SubdomainCenter (line 4) | class SubdomainCenter:
    method __init__ (line 5) | def __init__(self, word):
    method do_search (line 11) | async def do_search(self):
    method get_hostnames (line 21) | async def get_hostnames(self):
    method process (line 24) | async def process(self, proxy=False):

FILE: theHarvester/discovery/subdomainfinderc99.py
  class SearchSubdomainfinderc99 (line 12) | class SearchSubdomainfinderc99:
    method __init__ (line 13) | def __init__(self, word) -> None:
    method do_search (line 21) | async def do_search(self) -> None:
    method get_hostnames (line 44) | async def get_hostnames(self):
    method process (line 48) | async def process(self, proxy: bool = False) -> None:
    method get_csrf_params (line 53) | async def get_csrf_params(data):

FILE: theHarvester/discovery/takeover.py
  class TakeOver (line 10) | class TakeOver:
    method __init__ (line 11) | def __init__(self, hosts) -> None:
    method populate_fingerprints (line 19) | async def populate_fingerprints(self):
    method check (line 61) | async def check(self, url, resp) -> None:
    method do_take (line 76) | async def do_take(self) -> None:
    method process (line 101) | async def process(self, proxy: bool = False) -> None:
    method get_takeover_results (line 105) | async def get_takeover_results(self):

FILE: theHarvester/discovery/thc.py
  class SearchThc (line 8) | class SearchThc:
    method __init__ (line 11) | def __init__(self, word: str) -> None:
    method do_search (line 18) | async def do_search(self) -> None:
    method get_hostnames (line 56) | async def get_hostnames(self) -> set:
    method process (line 59) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/threatcrowd.py
  class SearchThreatcrowd (line 17) | class SearchThreatcrowd:
    method __init__ (line 23) | def __init__(self, word) -> None:
    method _safe_parse_json (line 31) | def _safe_parse_json(payload: object) -> dict:
    method do_search (line 41) | async def do_search(self) -> None:
    method get_hostnames (line 93) | async def get_hostnames(self) -> set:
    method get_ips (line 96) | async def get_ips(self) -> set:
    method process (line 99) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/tombasearch.py
  class SearchTomba (line 7) | class SearchTomba:
    method __init__ (line 8) | def __init__(self, word, limit, start) -> None:
    method do_search (line 23) | async def do_search(self) -> None:
    method parse_resp (line 74) | async def parse_resp(self, json_resp):
    method process (line 88) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 92) | async def get_emails(self):
    method get_hostnames (line 95) | async def get_hostnames(self):

FILE: theHarvester/discovery/urlscan.py
  class SearchUrlscan (line 4) | class SearchUrlscan:
    method __init__ (line 5) | def __init__(self, word) -> None:
    method do_search (line 13) | async def do_search(self) -> None:
    method get_hostnames (line 24) | async def get_hostnames(self) -> set:
    method get_ips (line 27) | async def get_ips(self) -> set:
    method get_interestingurls (line 30) | async def get_interestingurls(self) -> set:
    method get_asns (line 33) | async def get_asns(self) -> set:
    method process (line 36) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/venacussearch.py
  class SearchVenacus (line 10) | class SearchVenacus:
    method __init__ (line 11) | def __init__(self, word: str, limit=1000, offset_doc=0) -> None:
    method do_search (line 26) | async def do_search(self) -> None:
    method process (line 69) | async def process(self, proxy: bool = False):
    method get_people (line 75) | async def get_people(self) -> list[dict[str, str]]:
    method get_emails (line 80) | async def get_emails(self) -> set[str]:
    method get_ips (line 85) | async def get_ips(self) -> set[str]:
    method get_interestingurls (line 90) | async def get_interestingurls(self) -> set[str]:

FILE: theHarvester/discovery/virustotal.py
  class SearchVirustotal (line 7) | class SearchVirustotal:
    method __init__ (line 8) | def __init__(self, word) -> None:
    method do_search (line 16) | async def do_search(self) -> None:
    method get_hostnames (line 63) | async def get_hostnames(self) -> list:
    method parse_hostnames (line 67) | async def parse_hostnames(data, word):
    method process (line 100) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/waybackarchive.py
  class SearchWaybackarchive (line 6) | class SearchWaybackarchive:
    method __init__ (line 11) | def __init__(self, word) -> None:
    method _extract_domain_from_url (line 17) | def _extract_domain_from_url(self, url: str) -> str:
    method do_search (line 33) | async def do_search(self) -> None:
    method get_hostnames (line 72) | async def get_hostnames(self) -> set:
    method process (line 75) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/whoisxml.py
  class SearchWhoisXML (line 5) | class SearchWhoisXML:
    method __init__ (line 6) | def __init__(self, word) -> None:
    method do_search (line 14) | async def do_search(self):
    method get_hostnames (line 34) | async def get_hostnames(self):
    method process (line 37) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/windvane.py
  class SearchWindvane (line 17) | class SearchWindvane:
    method __init__ (line 37) | def __init__(self, word) -> None:
    method _get_api_key (line 46) | def _get_api_key(self) -> str | None:
    method _safe_parse_json (line 54) | def _safe_parse_json(payload: object) -> dict:
    method do_search (line 65) | async def do_search(self) -> None:
    method _search_subdomains (line 86) | async def _search_subdomains(self, headers: dict) -> None:
    method _search_dns_history (line 126) | async def _search_dns_history(self, headers: dict) -> None:
    method _search_emails (line 170) | async def _search_emails(self, headers: dict) -> None:
    method _search_subdomains_limited (line 203) | async def _search_subdomains_limited(self, headers: dict) -> None:
    method _fallback_search (line 245) | async def _fallback_search(self) -> None:
    method set_api_key (line 307) | def set_api_key(self, api_key: str) -> None:
    method _is_valid_ip (line 315) | def _is_valid_ip(self, ip: str) -> bool:
    method get_hostnames (line 323) | async def get_hostnames(self) -> set:
    method get_ips (line 326) | async def get_ips(self) -> set:
    method get_emails (line 329) | async def get_emails(self) -> set:
    method process (line 332) | async def process(self, proxy: bool = False) -> None:

FILE: theHarvester/discovery/yahoosearch.py
  class SearchYahoo (line 5) | class SearchYahoo:
    method __init__ (line 6) | def __init__(self, word, limit) -> None:
    method do_search (line 13) | async def do_search(self) -> None:
    method process (line 21) | async def process(self, proxy: bool = False) -> None:
    method get_emails (line 25) | async def get_emails(self):
    method get_hostnames (line 38) | async def get_hostnames(self, proxy: bool = False):

FILE: theHarvester/discovery/zoomeyesearch.py
  class SearchZoomEye (line 12) | class SearchZoomEye:
    method __init__ (line 13) | def __init__(self, word, limit) -> None:
    method _build_headers (line 64) | def _build_headers(self) -> dict[str, str]:
    method _is_success (line 69) | def _is_success(resp: dict[str, Any]) -> bool:
    method _unwrap_data (line 86) | def _unwrap_data(resp: dict[str, Any]) -> dict[str, Any]:
    method _page_total_from_payload (line 92) | def _page_total_from_payload(payload: dict[str, Any], page_size: int) ...
    method _safe_add_hostname (line 116) | def _safe_add_hostname(container: set, value: str | None) -> None:
    method fetch_subdomains (line 126) | async def fetch_subdomains(self) -> None:
    method do_search (line 181) | async def do_search(self) -> None:
    method parse_matches (line 261) | async def parse_matches(self, matches):
    method process (line 358) | async def process(self, proxy: bool = False) -> None:
    method parse_emails (line 362) | async def parse_emails(self, content):
    method parse_hostnames (line 366) | async def parse_hostnames(self, content):
    method get_hostnames (line 370) | async def get_hostnames(self):
    method get_emails (line 373) | async def get_emails(self):
    method get_ips (line 376) | async def get_ips(self):
    method get_asns (line 379) | async def get_asns(self):
    method get_interestingurls (line 382) | async def get_interestingurls(self):

FILE: theHarvester/lib/api/additional_endpoints.py
  class DomainRequest (line 10) | class DomainRequest(BaseModel):
  function get_breaches (line 16) | async def get_breaches(request: DomainRequest, api_key: str = Depends(ge...
  function get_leaks (line 28) | async def get_leaks(request: DomainRequest, api_key: str = Depends(get_a...
  function get_security_score (line 40) | async def get_security_score(request: DomainRequest, api_key: str = Depe...
  function get_tech_stack (line 52) | async def get_tech_stack(request: DomainRequest, api_key: str = Depends(...
  function get_all_info (line 64) | async def get_all_info(request: DomainRequest, api_key: str = Depends(ge...

FILE: theHarvester/lib/api/api.py
  class QueryResponse (line 22) | class QueryResponse(BaseModel):
  class ErrorResponse (line 34) | class ErrorResponse(BaseModel):
  function root (line 78) | async def root(*, user_agent: str = Header(None)) -> Response:
  class BotResponse (line 134) | class BotResponse(BaseModel):
  function bot (line 139) | async def bot() -> Response:
  class SourcesResponse (line 149) | class SourcesResponse(BaseModel):
  function getsources (line 162) | async def getsources(request: Request) -> Response:
  class DnsBruteResponse (line 187) | class DnsBruteResponse(BaseModel):
  function dnsbrute (line 201) | async def dnsbrute(
  function query (line 274) | async def query(

FILE: theHarvester/lib/api/api_example.py
  function fetch_json (line 13) | async def fetch_json(session, url):
  function fetch (line 23) | async def fetch(session, url):
  function main (line 33) | async def main() -> None:

FILE: theHarvester/lib/api/auth.py
  function get_api_key (line 4) | def get_api_key(x_api_key: str | None = Header(None)) -> str:

FILE: theHarvester/lib/core.py
  class Core (line 31) | class Core:
    method _read_config (line 71) | def _read_config(filename: str) -> str:
    method api_keys (line 90) | def api_keys() -> dict:
    method _api_key_value (line 95) | def _api_key_value(provider: str) -> Any:
    method bevigil_key (line 102) | def bevigil_key() -> str:
    method bitbucket_key (line 106) | def bitbucket_key() -> str:
    method brave_key (line 110) | def brave_key() -> str:
    method bufferoverun_key (line 114) | def bufferoverun_key() -> str:
    method builtwith_key (line 118) | def builtwith_key() -> str:
    method censys_key (line 122) | def censys_key() -> tuple:
    method criminalip_key (line 126) | def criminalip_key() -> str:
    method dehashed_key (line 130) | def dehashed_key() -> str:
    method dnsdumpster_key (line 134) | def dnsdumpster_key() -> str:
    method fofa_key (line 138) | def fofa_key() -> tuple[str, str]:
    method fullhunt_key (line 142) | def fullhunt_key() -> str:
    method github_key (line 146) | def github_key() -> str:
    method hackertarget_key (line 150) | def hackertarget_key() -> str:
    method haveibeenpwned_key (line 154) | def haveibeenpwned_key() -> str:
    method hunter_key (line 158) | def hunter_key() -> str:
    method hunterhow_key (line 162) | def hunterhow_key() -> str:
    method intelx_key (line 166) | def intelx_key() -> str:
    method leaklookup_key (line 170) | def leaklookup_key() -> str:
    method mojeek_key (line 174) | def mojeek_key() -> str:
    method leakix_key (line 178) | def leakix_key() -> str:
    method netlas_key (line 182) | def netlas_key() -> str:
    method onyphe_key (line 186) | def onyphe_key() -> str:
    method pentest_tools_key (line 190) | def pentest_tools_key() -> str:
    method projectdiscovery_key (line 194) | def projectdiscovery_key() -> str:
    method rocketreach_key (line 198) | def rocketreach_key() -> str:
    method securityscorecard_key (line 202) | def securityscorecard_key() -> str:
    method security_trails_key (line 206) | def security_trails_key() -> str:
    method shodan_key (line 210) | def shodan_key() -> str:
    method tomba_key (line 214) | def tomba_key() -> tuple[str, str]:
    method venacus_key (line 218) | def venacus_key() -> str:
    method virustotal_key (line 222) | def virustotal_key() -> str:
    method whoisxml_key (line 226) | def whoisxml_key() -> str:
    method windvane_key (line 230) | def windvane_key() -> str:
    method zoomeye_key (line 234) | def zoomeye_key() -> str:
    method _proxy_urls (line 238) | def _proxy_urls(config: dict[str, list[str] | None], proxy_type: str) ...
    method proxy_list (line 243) | def proxy_list() -> dict:
    method banner (line 251) | def banner() -> None:
    method get_supportedengines (line 267) | def get_supportedengines() -> list[str]:
    method get_user_agent (line 334) | def get_user_agent() -> str:
  class AsyncFetcher (line 401) | class AsyncFetcher:
    method _default_headers (line 405) | def _default_headers(headers: dict[str, str] | None = None) -> dict[st...
    method _ssl_context (line 409) | def _ssl_context(verify: bool | None = True) -> ssl.SSLContext | bool:
    method _request_timeout (line 415) | def _request_timeout(total: int | None) -> aiohttp.ClientTimeout | None:
    method _normalize_data (line 419) | def _normalize_data(data: str | dict[str, Any]) -> str | dict[str, Any]:
    method _resolve_proxy (line 423) | def _resolve_proxy(cls, proxy: str | bool | None) -> tuple[str | None,...
    method _build_session (line 434) | async def _build_session(
    method _read_response (line 448) | async def _read_response(response: aiohttp.ClientResponse, *, json: bo...
    method _request (line 453) | async def _request(
    method _get_random_proxy (line 473) | def _get_random_proxy(proxy_dict: dict) -> tuple[str | None, str | None]:
    method _create_connector (line 490) | async def _create_connector(
    method post_fetch (line 507) | async def post_fetch(
    method fetch (line 572) | async def fetch(
    method takeover_fetch (line 635) | async def takeover_fetch(session, url: str, proxy: str | None = None) ...
    method fetch_all (line 667) | async def fetch_all(
  function show_default_error_message (line 738) | def show_default_error_message(engine_name: str, word: str, error) -> None:

FILE: theHarvester/lib/hostchecker.py
  class Checker (line 20) | class Checker:
    method __init__ (line 21) | def __init__(self, hosts: list[str], nameservers: list[str]) -> None:
    method resolve_host (line 40) | async def resolve_host(host: str, resolver: aiodns.DNSResolver) -> str:
    method chunks (line 55) | def chunks(lst: list[str], n: int) -> Iterator[list[str]]:
    method query_all (line 60) | async def query_all(self, resolver: aiodns.DNSResolver, hosts: list[st...
    method check (line 65) | async def check(self) -> tuple[list[str], list[str], list[str]]:

FILE: theHarvester/lib/output.py
  function sorted_unique (line 9) | def sorted_unique[T: Hashable](items: Iterable[T]) -> list[T]:
  function print_section (line 15) | def print_section(header: str, items: Iterable[str], separator: str) -> ...
  function print_linkedin_sections (line 22) | def print_linkedin_sections(

FILE: theHarvester/lib/stash.py
  class StashManager (line 14) | class StashManager:
    method __init__ (line 15) | def __init__(self) -> None:
    method _col0_int (line 27) | def _col0_int(row: Row | None) -> int:
    method _col0_value (line 35) | def _col0_value(row: Row | None):
    method do_init (line 38) | async def do_init(self) -> None:
    method store (line 45) | async def store(self, domain, resource, res_type, source) -> None:
    method store_all (line 61) | async def store_all(self, domain, all, res_type, source) -> None:
    method generatedashboardcode (line 82) | async def generatedashboardcode(self, domain):
    method getlatestscanresults (line 170) | async def getlatestscanresults(self, domain, previousday: bool = False...
    method getscanboarddata (line 239) | async def getscanboarddata(self):
    method getscanhistorydomain (line 264) | async def getscanhistorydomain(self, domain):
    method getpluginscanstatistics (line 311) | async def getpluginscanstatistics(self) -> Iterable[Row] | None:
    method latestscanchartdata (line 327) | async def latestscanchartdata(self, domain):

FILE: theHarvester/parsers/intelxparser.py
  class Parser (line 1) | class Parser:
    method __init__ (line 2) | def __init__(self) -> None:
    method parse_dictionaries (line 6) | async def parse_dictionaries(self, results: dict) -> tuple:

FILE: theHarvester/parsers/myparser.py
  class Parser (line 5) | class Parser:
    method __init__ (line 6) | def __init__(self, results, word) -> None:
    method generic_clean (line 11) | async def generic_clean(self) -> None:
    method url_clean (line 40) | async def url_clean(self) -> None:
    method emails (line 45) | async def emails(self):
    method fileurls (line 63) | async def fileurls(self, file) -> list:
    method hostnames (line 75) | async def hostnames(self):
    method hostnames_all (line 90) | async def hostnames_all(self):
    method set (line 102) | async def set(self):
    method urls (line 112) | async def urls(self) -> Set[str]:
    method unique (line 117) | async def unique(self) -> list:

FILE: theHarvester/parsers/securitytrailsparser.py
  class Parser (line 4) | class Parser:
    method __init__ (line 5) | def __init__(self, word, text) -> None:
    method parse_text (line 11) | async def parse_text(self) -> tuple[set, set]:

FILE: theHarvester/parsers/venacusparser.py
  class TokenTypesEnum (line 6) | class TokenTypesEnum(enum.StrEnum):
  class Parser (line 33) | class Parser:
    method __init__ (line 34) | def __init__(self) -> None:
    method parse_text_tokens (line 38) | async def parse_text_tokens(self, results: list[dict[str, Any]]) -> Ma...

FILE: theHarvester/restfulHarvest.py
  function main (line 7) | def main():

FILE: theHarvester/screenshot/screenshot.py
  class ScreenShotter (line 18) | class ScreenShotter:
    method __init__ (line 19) | def __init__(self, output) -> None:
    method verify_path (line 24) | def verify_path(self) -> bool:
    method verify_installation (line 39) | async def verify_installation() -> None:
    method chunk_list (line 50) | def chunk_list(items: Collection, chunk_size: int) -> list:
    method visit (line 55) | async def visit(url: str, proxy: str | None = None) -> tuple[str, str]:
    method take_screenshot (line 91) | async def take_screenshot(self, url: str) -> None:

FILE: theHarvester/theHarvester.py
  function main (line 7) | def main():

Download .json

Condensed preview — 127 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,363K chars).

[
  {
    "path": ".dockerignore",
    "chars": 160,
    "preview": ".github/*\n.gitattributes\n.git-blame-ignore-revs\n.idea/\n.pytest_cache\n.mypy_cache\ntests/*\nREADME/\nbin/\ntheHarvester-logo."
  },
  {
    "path": ".git-blame-ignore-revs",
    "chars": 76,
    "preview": "# #1492 run `black .` and `isort .`\nc13843ec0d513ac7f9c35b7bd0501fa46e356415"
  },
  {
    "path": ".gitattributes",
    "chars": 682,
    "preview": "# Set the default behavior, which is to have git automatically determine\n# whether a file is a text or binary, unless ot"
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 574,
    "preview": "# These are supported funding model platforms\n\ngithub: [L1ghtn1ng, NotoriousRebel]\nopen_collective: # Replace with a sin"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/issue-template.md",
    "chars": 845,
    "preview": "---\nname: Issue Template\nabout: A template for new issues.\ntitle: \"[Bug|Feature Request|Other] Short Description of Issu"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 356,
    "preview": "version: 2\nupdates:\n- package-ecosystem: github-actions\n  directory: \"/\"\n  schedule:\n    interval: daily\n    timezone: E"
  },
  {
    "path": ".github/workflows/codeql-analysis.yml",
    "chars": 2360,
    "preview": "# For most projects, this workflow file will not need changing; you simply need\n# to commit it to your repository.\n#\n# Y"
  },
  {
    "path": ".github/workflows/docker-build-push.yml",
    "chars": 1142,
    "preview": "name: Build and Push Docker Image\n\non:\n  push:\n    branches:\n      - master\n\npermissions:\n  contents: read\n  packages: w"
  },
  {
    "path": ".github/workflows/dockerci.yml",
    "chars": 329,
    "preview": "name: TheHarvester Docker Image CI\n\non: [push, pull_request]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n     "
  },
  {
    "path": ".github/workflows/theHarvester.yml",
    "chars": 2730,
    "preview": "name: TheHarvester Python CI\n\non:\n  push:\n    branches:\n      - '*'\n\n  pull_request:\n    branches:\n      - '*'\n\njobs:\n  "
  },
  {
    "path": ".gitignore",
    "chars": 192,
    "preview": "*.idea\n*.pyc\n*.sqlite\n*.html\n*.htm\n*.vscode\n*.xml\n*.json\ndebug_results.txt\nvenv\n.mypy_cache\n.pytest_cache\nbuild/\ndist/\nt"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 4456,
    "preview": "# Changelog\n\nAll notable changes to this project will be documented in this file.\n\nThe format is based on [Keep a Change"
  },
  {
    "path": "Dockerfile",
    "chars": 763,
    "preview": "FROM python:3.14-slim-trixie\n\nLABEL maintainer=\"@jay_townsend1 & @NotoriousRebel1\"\n\nRUN useradd -m -u 1000 -s /bin/bash "
  },
  {
    "path": "README/CONTRIBUTING.md",
    "chars": 790,
    "preview": "# Contributing to theHarvester Project\nWelcome to theHarvester project, so you would like to contribute.\nThe following b"
  },
  {
    "path": "README/COPYING",
    "chars": 15216,
    "preview": "                   GNU GENERAL PUBLIC LICENSE\n                       Version 2, June 1991\n\n Copyright (C) 1989, 1991 Fre"
  },
  {
    "path": "README/LICENSES",
    "chars": 640,
    "preview": "Released under the GPL v 2.0.\n\nIf you did not receive a copy of the GPL, try http://www.gnu.org/.\n\nCopyright 2011 Christ"
  },
  {
    "path": "README.md",
    "chars": 9461,
    "preview": "![theHarvester](https://github.com/laramies/theHarvester/blob/master/theHarvester-logo.webp)\n\n![TheHarvester CI](https:/"
  },
  {
    "path": "bin/restfulHarvest",
    "chars": 107,
    "preview": "#!/usr/bin/env python3\nfrom theHarvester.restfulHarvest import main\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "bin/theHarvester",
    "chars": 308,
    "preview": "#!/usr/bin/env python3\n# Note: This script runs theHarvester\nimport sys\n\nfrom theHarvester.theHarvester import main\n\nif "
  },
  {
    "path": "docker-compose.yml",
    "chars": 477,
    "preview": "services:\n  theharvester.svc.local:\n    container_name: theHarvester\n    volumes:\n      - ./theHarvester/data/api-keys.y"
  },
  {
    "path": "pyproject.toml",
    "chars": 3698,
    "preview": "[project]\nname = \"theHarvester\"\ndescription = \"theHarvester is a very simple, yet effective tool designed to be used in "
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/discovery/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/discovery/test_baidusearch.py",
    "chars": 2705,
    "preview": "import pytest\n\nfrom theHarvester.discovery import baidusearch\n\n\nclass TestBaiduSearch:\n    @pytest.mark.asyncio\n    asyn"
  },
  {
    "path": "tests/discovery/test_censys.py",
    "chars": 3520,
    "preview": "import sys\nimport types\n\nimport pytest\n\nif 'aiohttp_socks' not in sys.modules:\n    aiohttp_socks_stub = types.ModuleType"
  },
  {
    "path": "tests/discovery/test_certspotter.py",
    "chars": 1156,
    "preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport os\nfrom typing import Optional\n\nimport pytest\nimport httpx\n\nfrom theHarvest"
  },
  {
    "path": "tests/discovery/test_criminalip.py",
    "chars": 3875,
    "preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport pytest\n\nfrom theHarvester.discovery import criminalip\n\n\n@pytest.mark.asynci"
  },
  {
    "path": "tests/discovery/test_githubcode.py",
    "chars": 5987,
    "preview": "from unittest.mock import MagicMock\nimport pytest\nfrom httpx import Response\nfrom theHarvester.discovery import githubco"
  },
  {
    "path": "tests/discovery/test_githubcode_additions.py",
    "chars": 2996,
    "preview": "from unittest.mock import MagicMock, AsyncMock\nimport asyncio\nimport pytest\nfrom theHarvester.discovery import githubcod"
  },
  {
    "path": "tests/discovery/test_otx.py",
    "chars": 865,
    "preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport os\nfrom typing import Optional\nimport httpx\nimport pytest\n\nfrom theHarveste"
  },
  {
    "path": "tests/discovery/test_rocketreach.py",
    "chars": 4317,
    "preview": "import sys\nimport types\n\nimport pytest\n\nif 'aiohttp_socks' not in sys.modules:\n    aiohttp_socks_stub = types.ModuleType"
  },
  {
    "path": "tests/discovery/test_shodan_engine.py",
    "chars": 1770,
    "preview": "import socket\nimport sys\nfrom collections import OrderedDict\n\nimport pytest\n\n\nclass TestShodanEngine:\n    @pytest.mark.a"
  },
  {
    "path": "tests/discovery/test_thc.py",
    "chars": 13901,
    "preview": "#!/usr/bin/env python3\n# coding=utf-8\n\"\"\"\nTests for THC (ip.thc.org) discovery module.\n\nTHC provides multiple endpoints:"
  },
  {
    "path": "tests/lib/test_core.py",
    "chars": 10216,
    "preview": "from __future__ import annotations\n\nfrom pathlib import Path\nfrom typing import Any\nfrom unittest import mock\n\nimport py"
  },
  {
    "path": "tests/lib/test_output.py",
    "chars": 1224,
    "preview": "from __future__ import annotations\n\n\nfrom theHarvester.lib.output import print_linkedin_sections, sorted_unique\n\n\ndef te"
  },
  {
    "path": "tests/test_hackertarget_apikey.py",
    "chars": 1563,
    "preview": "import pytest\nfrom theHarvester.discovery import hackertarget as ht_mod\nfrom theHarvester.lib.core import Core\n\n\nclass "
  },
  {
    "path": "tests/test_mojeek.py",
    "chars": 2038,
    "preview": "import pytest\nfrom theHarvester.discovery import mojeek\n\nclass TestMojeekSearch:\n\n    @pytest.mark.asyncio\n    async def"
  },
  {
    "path": "tests/test_myparser.py",
    "chars": 521,
    "preview": "#!/usr/bin/env python3\n# coding=utf-8\n\nimport pytest\n\nfrom theHarvester.parsers import myparser\n\n\nclass TestMyParser(obj"
  },
  {
    "path": "tests/test_security.py",
    "chars": 15479,
    "preview": "import os\nimport re\nimport tempfile\nfrom pathlib import Path\n\nimport pytest\nfrom fastapi.testclient import TestClient\n\nf"
  },
  {
    "path": "theHarvester/__init__.py",
    "chars": 23,
    "preview": "__version__ = '4.10.1'\n"
  },
  {
    "path": "theHarvester/__main__.py",
    "chars": 82996,
    "preview": "import argparse\nimport asyncio\nimport os\nimport re\nimport secrets\nimport string\nimport sys\nimport time\nimport traceback\n"
  },
  {
    "path": "theHarvester/data/proxies.yaml",
    "chars": 42,
    "preview": "http:\n    - ip:port\nsocks5:\n    - ip:port\n"
  },
  {
    "path": "theHarvester/data/wordlists/api_endpoints.txt",
    "chars": 21190,
    "preview": "# Common API endpoints - Most frequently found in web services\n/api\n/api/v1\n/api/v2\n/api/v3\n/api/latest\n/rest\n/restapi\n/"
  },
  {
    "path": "theHarvester/data/wordlists/dns-big.txt",
    "chars": 1115633,
    "preview": "www\nmail\nftp\nlocalhost\nwebmail\nsmtp\nwebdisk\npop\ncpanel\nwhm\nns1\nns2\nautodiscover\nautoconfig\nns\ntest\nm\nblog\ndev\nwww2\nns3\np"
  },
  {
    "path": "theHarvester/data/wordlists/dns-names.txt",
    "chars": 33565,
    "preview": "www\nmail\nftp\nlocalhost\nwebmail\nsmtp\nwebdisk\npop\ncpanel\nwhm\nns1\nns2\nautodiscover\nautoconfig\nns\ntest\nm\nblog\ndev\nwww2\nns3\np"
  },
  {
    "path": "theHarvester/data/wordlists/dorks.txt",
    "chars": 374,
    "preview": "inurl:\"contact\"\nintext:email filetype:log\n\"Index of /mail\"\n\"admin account info\" filetype:log\nintext:@\nadministrator acco"
  },
  {
    "path": "theHarvester/data/wordlists/general/common.txt",
    "chars": 42,
    "preview": "admin\ntest\nhello\nuk\nlogin\nbook\nrobots.txt\n"
  },
  {
    "path": "theHarvester/data/wordlists/names_small.txt",
    "chars": 406279,
    "preview": "www\n_tcp\n_tls\n_udp\n_domainkey\n_pkixrep._tcp\n_aix._tcp\n_afpovertcp._tcp\n_autodiscover._tcp\n_caldav._tcp\n_certificates._tc"
  },
  {
    "path": "theHarvester/discovery/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "theHarvester/discovery/additional_apis.py",
    "chars": 6821,
    "preview": "import asyncio\nfrom typing import Any\n\nfrom theHarvester.discovery.builtwith import SearchBuiltWith\nfrom theHarvester.di"
  },
  {
    "path": "theHarvester/discovery/api_endpoints.py",
    "chars": 29017,
    "preview": "\"\"\"\nAPI endpoint scanner module.\nThis module contains the SearchApiEndpoints class that performs comprehensive API endpo"
  },
  {
    "path": "theHarvester/discovery/baidusearch.py",
    "chars": 1220,
    "preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchBaidu:\n    "
  },
  {
    "path": "theHarvester/discovery/bevigil.py",
    "chars": 1422,
    "preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
  },
  {
    "path": "theHarvester/discovery/bitbucket.py",
    "chars": 6991,
    "preview": "import asyncio\nimport random\nimport re\nimport urllib.parse as urlparse\nfrom typing import Any, NamedTuple\n\nimport aiohtt"
  },
  {
    "path": "theHarvester/discovery/bravesearch.py",
    "chars": 5802,
    "preview": "import asyncio\nfrom urllib.parse import quote\n\nfrom theHarvester.discovery.constants import MissingKey, get_delay\nfrom t"
  },
  {
    "path": "theHarvester/discovery/bufferoverun.py",
    "chars": 1521,
    "preview": "import re\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n"
  },
  {
    "path": "theHarvester/discovery/builtwith.py",
    "chars": 3395,
    "preview": "from typing import Any\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.co"
  },
  {
    "path": "theHarvester/discovery/censysearch.py",
    "chars": 2728,
    "preview": "from math import ceil\n\nfrom censys.common import __version__\nfrom censys.common.exceptions import (\n    CensysRateLimitE"
  },
  {
    "path": "theHarvester/discovery/certspottersearch.py",
    "chars": 1677,
    "preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchCertspoter:\n    def __init__(self, word) -> None:\n        s"
  },
  {
    "path": "theHarvester/discovery/chaos.py",
    "chars": 4309,
    "preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom t"
  },
  {
    "path": "theHarvester/discovery/commoncrawl.py",
    "chars": 4576,
    "preview": "import json as _stdlib_json\nimport re\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core"
  },
  {
    "path": "theHarvester/discovery/constants.py",
    "chars": 4348,
    "preview": "import random\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nasync def splitter(links):\n    \"\"\"\n    Method that"
  },
  {
    "path": "theHarvester/discovery/criminalip.py",
    "chars": 13816,
    "preview": "import asyncio\nfrom typing import Any\nfrom urllib.parse import urlparse\n\nfrom theHarvester.discovery.constants import Mi"
  },
  {
    "path": "theHarvester/discovery/crtsh.py",
    "chars": 1354,
    "preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchCrtsh:\n    def __init__(self, word) -> None:\n        self.w"
  },
  {
    "path": "theHarvester/discovery/dnssearch.py",
    "chars": 6232,
    "preview": "\"\"\"\n============\nDNS Browsing\n============\n\nExplore the space around known hosts & ips for extra catches.\n\"\"\"\n\nimport as"
  },
  {
    "path": "theHarvester/discovery/duckduckgosearch.py",
    "chars": 3351,
    "preview": "import ujson\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass Sea"
  },
  {
    "path": "theHarvester/discovery/fofa.py",
    "chars": 4438,
    "preview": "import base64\nimport json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.discovery.constants import Mis"
  },
  {
    "path": "theHarvester/discovery/fullhuntsearch.py",
    "chars": 17808,
    "preview": "from typing import Any, ClassVar\nfrom urllib.parse import quote\n\nfrom theHarvester.discovery.constants import MissingKey"
  },
  {
    "path": "theHarvester/discovery/githubcode.py",
    "chars": 6865,
    "preview": "import asyncio\nimport random\nimport urllib.parse as urlparse\nfrom typing import Any, NamedTuple\n\nimport aiohttp\n\nfrom th"
  },
  {
    "path": "theHarvester/discovery/gitlabsearch.py",
    "chars": 7890,
    "preview": "import json as _stdlib_json\nimport re\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core"
  },
  {
    "path": "theHarvester/discovery/hackertarget.py",
    "chars": 1809,
    "preview": "# theHarvester/discovery/hackertarget.py\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SearchHackerTarget"
  },
  {
    "path": "theHarvester/discovery/haveibeenpwned.py",
    "chars": 2808,
    "preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/hudsonrocksearch.py",
    "chars": 15034,
    "preview": "import asyncio\nimport logging\nfrom urllib.parse import urlparse\n\nfrom theHarvester.lib.core import AsyncFetcher\n\n\nclass "
  },
  {
    "path": "theHarvester/discovery/huntersearch.py",
    "chars": 4269,
    "preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/intelxsearch.py",
    "chars": 3007,
    "preview": "import asyncio\nfrom typing import Any\nfrom urllib.parse import urlparse\n\nimport aiohttp\n\nfrom theHarvester.discovery.con"
  },
  {
    "path": "theHarvester/discovery/leakix.py",
    "chars": 4302,
    "preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
  },
  {
    "path": "theHarvester/discovery/leaklookup.py",
    "chars": 2854,
    "preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/mojeek.py",
    "chars": 2704,
    "preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchMojeek:\n   "
  },
  {
    "path": "theHarvester/discovery/netlas.py",
    "chars": 2088,
    "preview": "import json\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Cor"
  },
  {
    "path": "theHarvester/discovery/onyphe.py",
    "chars": 4453,
    "preview": "from urllib.parse import urlparse\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core im"
  },
  {
    "path": "theHarvester/discovery/otxsearch.py",
    "chars": 1968,
    "preview": "import re\nfrom typing import Any\n\nfrom theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchOtx:\n    def __init__(sel"
  },
  {
    "path": "theHarvester/discovery/pentesttools.py",
    "chars": 2985,
    "preview": "import asyncio\n\nimport ujson\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import "
  },
  {
    "path": "theHarvester/discovery/projectdiscovery.py",
    "chars": 1007,
    "preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
  },
  {
    "path": "theHarvester/discovery/rapiddns.py",
    "chars": 2099,
    "preview": "from bs4 import BeautifulSoup\nfrom bs4.element import Tag\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass "
  },
  {
    "path": "theHarvester/discovery/robtex.py",
    "chars": 4609,
    "preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nimport aiohttp\n\nfrom theHarvester.lib.core import AsyncFetcher"
  },
  {
    "path": "theHarvester/discovery/rocketreach.py",
    "chars": 3233,
    "preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey, get_delay\nfrom theHarvester.lib.core import Asy"
  },
  {
    "path": "theHarvester/discovery/search_dehashed.py",
    "chars": 3980,
    "preview": "import asyncio\nimport random\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester."
  },
  {
    "path": "theHarvester/discovery/search_dnsdumpster.py",
    "chars": 1772,
    "preview": "#!/usr/bin/env python3\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFe"
  },
  {
    "path": "theHarvester/discovery/searchhunterhow.py",
    "chars": 2408,
    "preview": "import base64\nfrom datetime import datetime\n\nfrom dateutil.relativedelta import relativedelta\n\nfrom theHarvester.discove"
  },
  {
    "path": "theHarvester/discovery/securityscorecard.py",
    "chars": 3412,
    "preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/securitytrailssearch.py",
    "chars": 4241,
    "preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/shodansearch.py",
    "chars": 4425,
    "preview": "from collections import OrderedDict\n\nfrom shodan import Shodan, exception\n\nfrom theHarvester.discovery.constants import "
  },
  {
    "path": "theHarvester/discovery/subdomaincenter.py",
    "chars": 925,
    "preview": "from theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SubdomainCenter:\n    def __init__(self, word):\n        self"
  },
  {
    "path": "theHarvester/discovery/subdomainfinderc99.py",
    "chars": 2438,
    "preview": "import asyncio\n\nimport ujson\nfrom bs4 import BeautifulSoup\nfrom bs4.element import Tag\n\nfrom theHarvester.discovery.cons"
  },
  {
    "path": "theHarvester/discovery/takeover.py",
    "chars": 5496,
    "preview": "import re\nfrom collections import defaultdict\nfrom random import shuffle\n\nimport ujson\n\nfrom theHarvester.lib.core impor"
  },
  {
    "path": "theHarvester/discovery/thc.py",
    "chars": 2467,
    "preview": "import asyncio\n\nimport aiohttp\n\nfrom theHarvester.lib.core import Core\n\n\nclass SearchThc:\n    \"\"\"Class to search for sub"
  },
  {
    "path": "theHarvester/discovery/threatcrowd.py",
    "chars": 3721,
    "preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
  },
  {
    "path": "theHarvester/discovery/tombasearch.py",
    "chars": 4448,
    "preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/urlscan.py",
    "chars": 1404,
    "preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchUrlscan:\n    def __init__(self, word) -> None:\n        self"
  },
  {
    "path": "theHarvester/discovery/venacussearch.py",
    "chars": 3213,
    "preview": "from typing import Any\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.co"
  },
  {
    "path": "theHarvester/discovery/virustotal.py",
    "chars": 4538,
    "preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
  },
  {
    "path": "theHarvester/discovery/waybackarchive.py",
    "chars": 2699,
    "preview": "import re\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SearchWaybackarchive:\n    \"\"\"\n    Class uses Int"
  },
  {
    "path": "theHarvester/discovery/whoisxml.py",
    "chars": 1569,
    "preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
  },
  {
    "path": "theHarvester/discovery/windvane.py",
    "chars": 13223,
    "preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
  },
  {
    "path": "theHarvester/discovery/yahoosearch.py",
    "chars": 1650,
    "preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchYahoo:\n    "
  },
  {
    "path": "theHarvester/discovery/zoomeyesearch.py",
    "chars": 15807,
    "preview": "import asyncio\nimport math\nimport re\nfrom collections.abc import Iterable\nfrom typing import Any\n\nfrom theHarvester.disc"
  },
  {
    "path": "theHarvester/lib/__init__.py",
    "chars": 26,
    "preview": "__all__ = ['hostchecker']\n"
  },
  {
    "path": "theHarvester/lib/api/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "theHarvester/lib/api/additional_endpoints.py",
    "chars": 2685,
    "preview": "from fastapi import APIRouter, Depends, HTTPException\nfrom pydantic import BaseModel\n\nfrom theHarvester.discovery.additi"
  },
  {
    "path": "theHarvester/lib/api/api.py",
    "chars": 13686,
    "preview": "import argparse\nimport os\nimport traceback\nfrom typing import Any, cast\n\nfrom fastapi import FastAPI, Header, HTTPExcept"
  },
  {
    "path": "theHarvester/lib/api/api_example.py",
    "chars": 4615,
    "preview": "\"\"\"\nExample script to query theHarvester rest API, obtain results, and write out to stdout as well as an html\n\"\"\"\n\nimpor"
  },
  {
    "path": "theHarvester/lib/api/auth.py",
    "chars": 444,
    "preview": "from fastapi import Header\n\n\ndef get_api_key(x_api_key: str | None = Header(None)) -> str:\n    \"\"\"\n    Simple API key au"
  },
  {
    "path": "theHarvester/lib/api/static/.gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "theHarvester/lib/core.py",
    "chars": 31198,
    "preview": "from __future__ import annotations\n\nimport asyncio\nimport contextlib\nimport random\nimport ssl\nfrom pathlib import Path\nf"
  },
  {
    "path": "theHarvester/lib/hostchecker.py",
    "chars": 3462,
    "preview": "#!/usr/bin/env python\n\"\"\"\nCreated by laramies on 2008-08-21.\nRevised to use aiodns & asyncio on 2019-09-23\n\"\"\"\n\n# Suppor"
  },
  {
    "path": "theHarvester/lib/output.py",
    "chars": 1166,
    "preview": "from __future__ import annotations\n\nfrom collections.abc import Hashable, Iterable, Sequence\nfrom typing import TypeVar\n"
  },
  {
    "path": "theHarvester/lib/resolvers.txt",
    "chars": 29331,
    "preview": "1.0.0.1\n1.1.1.1\n141.1.27.249\n194.190.225.2 \n194.225.16.5 \n91.185.6.10 \n194.2.0.50 \n66.187.16.5 \n83.222.161.130 \n69.60.16"
  },
  {
    "path": "theHarvester/lib/stash.py",
    "chars": 19249,
    "preview": "import datetime\nimport os\nfrom collections.abc import Iterable\nfrom sqlite3.dbapi2 import Row\n\nimport aiosqlite\n\ndb_path"
  },
  {
    "path": "theHarvester/parsers/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "theHarvester/parsers/intelxparser.py",
    "chars": 1023,
    "preview": "class Parser:\n    def __init__(self) -> None:\n        self.emails: set = set()\n        self.hosts: set = set()\n\n    asyn"
  },
  {
    "path": "theHarvester/parsers/myparser.py",
    "chars": 4285,
    "preview": "import re\nfrom collections.abc import Set\n\n\nclass Parser:\n    def __init__(self, results, word) -> None:\n        self.re"
  },
  {
    "path": "theHarvester/parsers/securitytrailsparser.py",
    "chars": 4460,
    "preview": "import ipaddress\n\n\nclass Parser:\n    def __init__(self, word, text) -> None:\n        self.word = word\n        self.text "
  },
  {
    "path": "theHarvester/parsers/venacusparser.py",
    "chars": 4562,
    "preview": "import enum\nfrom collections.abc import Mapping\nfrom typing import Any\n\n\nclass TokenTypesEnum(enum.StrEnum):\n    ID = 'i"
  },
  {
    "path": "theHarvester/restfulHarvest.py",
    "chars": 1346,
    "preview": "import argparse\nimport os\n\nimport uvicorn\n\n\ndef main():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n"
  },
  {
    "path": "theHarvester/screenshot/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "theHarvester/screenshot/screenshot.py",
    "chars": 4700,
    "preview": "\"\"\"\nScreenshot module that utilizes playwright to asynchronously\ntake screenshots\n\"\"\"\n\nimport os\nimport ssl\nimport sys\nf"
  },
  {
    "path": "theHarvester/theHarvester.py",
    "chars": 1035,
    "preview": "import asyncio\nimport sys\n\nfrom theHarvester import __main__\n\n\ndef main():\n    platform = sys.platform\n    if platform ="
  }
]

About this extraction

This page contains the full source code of the laramies/theHarvester GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 127 files (2.1 MB), approximately 551.8k tokens, and a symbol index with 777 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo