Showing preview only (2,205K chars total). Download the full file or copy to clipboard to get everything.
Repository: laramies/theHarvester
Branch: master
Commit: 53e13662409e
Files: 127
Total size: 2.1 MB
Directory structure:
gitextract_7wyx50xx/
├── .dockerignore
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ └── issue-template.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── codeql-analysis.yml
│ ├── docker-build-push.yml
│ ├── dockerci.yml
│ └── theHarvester.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── README/
│ ├── CONTRIBUTING.md
│ ├── COPYING
│ └── LICENSES
├── README.md
├── bin/
│ ├── restfulHarvest
│ └── theHarvester
├── docker-compose.yml
├── pyproject.toml
├── tests/
│ ├── __init__.py
│ ├── discovery/
│ │ ├── __init__.py
│ │ ├── test_baidusearch.py
│ │ ├── test_censys.py
│ │ ├── test_certspotter.py
│ │ ├── test_criminalip.py
│ │ ├── test_githubcode.py
│ │ ├── test_githubcode_additions.py
│ │ ├── test_otx.py
│ │ ├── test_rocketreach.py
│ │ ├── test_shodan_engine.py
│ │ └── test_thc.py
│ ├── lib/
│ │ ├── test_core.py
│ │ └── test_output.py
│ ├── test_hackertarget_apikey.py
│ ├── test_mojeek.py
│ ├── test_myparser.py
│ └── test_security.py
└── theHarvester/
├── __init__.py
├── __main__.py
├── data/
│ ├── proxies.yaml
│ └── wordlists/
│ ├── api_endpoints.txt
│ ├── dns-big.txt
│ ├── dns-names.txt
│ ├── dorks.txt
│ ├── general/
│ │ └── common.txt
│ └── names_small.txt
├── discovery/
│ ├── __init__.py
│ ├── additional_apis.py
│ ├── api_endpoints.py
│ ├── baidusearch.py
│ ├── bevigil.py
│ ├── bitbucket.py
│ ├── bravesearch.py
│ ├── bufferoverun.py
│ ├── builtwith.py
│ ├── censysearch.py
│ ├── certspottersearch.py
│ ├── chaos.py
│ ├── commoncrawl.py
│ ├── constants.py
│ ├── criminalip.py
│ ├── crtsh.py
│ ├── dnssearch.py
│ ├── duckduckgosearch.py
│ ├── fofa.py
│ ├── fullhuntsearch.py
│ ├── githubcode.py
│ ├── gitlabsearch.py
│ ├── hackertarget.py
│ ├── haveibeenpwned.py
│ ├── hudsonrocksearch.py
│ ├── huntersearch.py
│ ├── intelxsearch.py
│ ├── leakix.py
│ ├── leaklookup.py
│ ├── mojeek.py
│ ├── netlas.py
│ ├── onyphe.py
│ ├── otxsearch.py
│ ├── pentesttools.py
│ ├── projectdiscovery.py
│ ├── rapiddns.py
│ ├── robtex.py
│ ├── rocketreach.py
│ ├── search_dehashed.py
│ ├── search_dnsdumpster.py
│ ├── searchhunterhow.py
│ ├── securityscorecard.py
│ ├── securitytrailssearch.py
│ ├── shodansearch.py
│ ├── subdomaincenter.py
│ ├── subdomainfinderc99.py
│ ├── takeover.py
│ ├── thc.py
│ ├── threatcrowd.py
│ ├── tombasearch.py
│ ├── urlscan.py
│ ├── venacussearch.py
│ ├── virustotal.py
│ ├── waybackarchive.py
│ ├── whoisxml.py
│ ├── windvane.py
│ ├── yahoosearch.py
│ └── zoomeyesearch.py
├── lib/
│ ├── __init__.py
│ ├── api/
│ │ ├── __init__.py
│ │ ├── additional_endpoints.py
│ │ ├── api.py
│ │ ├── api_example.py
│ │ ├── auth.py
│ │ └── static/
│ │ └── .gitkeep
│ ├── core.py
│ ├── hostchecker.py
│ ├── output.py
│ ├── resolvers.txt
│ └── stash.py
├── parsers/
│ ├── __init__.py
│ ├── intelxparser.py
│ ├── myparser.py
│ ├── securitytrailsparser.py
│ └── venacusparser.py
├── restfulHarvest.py
├── screenshot/
│ ├── __init__.py
│ └── screenshot.py
└── theHarvester.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .dockerignore
================================================
.github/*
.gitattributes
.git-blame-ignore-revs
.idea/
.pytest_cache
.mypy_cache
tests/*
README/
bin/
theHarvester-logo.png
theHarvester-logo.webp
CHANGELOG.md
================================================
FILE: .git-blame-ignore-revs
================================================
# #1492 run `black .` and `isort .`
c13843ec0d513ac7f9c35b7bd0501fa46e356415
================================================
FILE: .gitattributes
================================================
# Set the default behavior, which is to have git automatically determine
# whether a file is a text or binary, unless otherwise specified.
* text=auto
# Basic .gitattributes for a python repo.
# Source files
# ============
*.pxd text diff=python
*.py text diff=python
*.py3 text diff=python
*.pyw text diff=python
*.pyx text diff=python
# Binary files
# ============
*.db binary
*.p binary
*.pkl binary
*.pyc binary
*.pyd binary
*.pyo binary
# Note: .db, .p, and .pkl files are associated with the python modules
# ``pickle``, ``dbm.*``, # ``shelve``, ``marshal``, ``anydbm``, & ``bsddb``
# (among others).
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: [L1ghtn1ng, NotoriousRebel]
open_collective: # Replace with a single Open Collective username
ko_fi: #
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
================================================
FILE: .github/ISSUE_TEMPLATE/issue-template.md
================================================
---
name: Issue Template
about: A template for new issues.
title: "[Bug|Feature Request|Other] Short Description of Issue"
labels: ''
---
## Note we do not support installing theHarvester on android
**Feature Request or Bug or Another**
Feature Request | Bug | Other
**Describe the feature request or bug or other**
A clear and concise description of what the bug, feature request,
or other request is.
**To Reproduce**
Steps to reproduce the behaviour:
1. Run tool like this: '...'
2. See error
**Expected behaviour**
A clear and concise description of what you expected to happen.
**Screenshots**
If possible please add screenshots to help explain your problem.
**System Information (System that tool is running on):**
- OS: [e.g. Windows10]
- Version [e.g. 2.7]
**Additional context**
Add any other context about the problem here.
================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: daily
timezone: Europe/London
- package-ecosystem: uv
directory: "/"
schedule:
interval: daily
timezone: Europe/London
open-pull-requests-limit: 10
target-branch: master
allow:
- dependency-type: direct
- dependency-type: indirect
================================================
FILE: .github/workflows/codeql-analysis.yml
================================================
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: [ master, dev ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master, dev ]
schedule:
- cron: '19 11 * * 4'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ]
# Learn more:
# https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed
steps:
- name: Checkout repository
uses: actions/checkout@v6
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v4
# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4
================================================
FILE: .github/workflows/docker-build-push.yml
================================================
name: Build and Push Docker Image
on:
push:
branches:
- master
permissions:
contents: read
packages: write
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
- name: Log in to GitHub Container Registry
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata for Docker
id: meta
uses: docker/metadata-action@v6
with:
images: ghcr.io/${{ github.repository_owner }}/theharvester
tags: |
latest
type=ref,event=branch
type=sha
- name: Build and push Docker image
uses: docker/build-push-action@v7
with:
context: .
file: Dockerfile
push: true
platforms: linux/amd64,linux/arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
================================================
FILE: .github/workflows/dockerci.yml
================================================
name: TheHarvester Docker Image CI
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Build the Docker image
run: docker build --tag theharvester .
- name: Smoke test
run: docker run --rm theharvester --help | grep restfulHarvest
================================================
FILE: .github/workflows/theHarvester.yml
================================================
name: TheHarvester Python CI
on:
push:
branches:
- '*'
pull_request:
branches:
- '*'
jobs:
Python:
runs-on: ${{ matrix.os }}
strategy:
max-parallel: 10
matrix:
os: [ ubuntu-latest ]
python-version: [ '3.12', '3.13', '3.14' ]
steps:
- uses: actions/checkout@v6
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
python-version: ${{ matrix.python-version }}
enable-cache: true
cache-dependency-glob: "uv.lock"
- name: Install dependencies
run: |
sudo mkdir -p /usr/local/etc/theHarvester
sudo cp theHarvester/data/*.yaml /usr/local/etc/theHarvester/
sudo chown -R runner:runner /usr/local/etc/theHarvester/
uv sync --all-groups --frozen
echo "$GITHUB_WORKSPACE/.venv/bin" >> $GITHUB_PATH
- name: Lint with ruff
uses: astral-sh/ruff-action@v3
with:
args: check --fix
- name: Format with ruff
uses: astral-sh/ruff-action@v3
with:
args: format
- name: Commit changes for ruff formating and linting
if: github.event_name == 'push'
run: |
git config user.name github-actions
git config user.email github-actions@github.com
git add .
git commit -m "Apply ruff fixes and formatting" || true # Use || true to prevent failure if no changes
git push origin HEAD:${{ github.ref_name }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Test with pytest
run: |
pytest tests/**
- name: Run theHarvester module Baidu
run: |
theHarvester -d yale.edu -b baidu
- name: Run theHarvester module CertSpotter
run: |
theHarvester -d yale.edu -b certspotter
- name: Run theHarvester module Crtsh
run: |
theHarvester -d hcl.com -b crtsh
- name: Run theHarvester module DuckDuckGo
run: |
theHarvester -d yale.edu -b duckduckgo
- name: Run theHarvester module HackerTarget
run: |
theHarvester -d yale.edu -b hackertarget
- name: Run theHarvester module Otx
run: |
theHarvester -d yale.edu -b otx
- name: Run theHarvester module RapidDns
run: |
theHarvester -d yale.edu -b rapiddns
- name: Run theHarvester module Urlscan
run: |
theHarvester -d yale.edu -b urlscan
- name: Run theHarvester module Yahoo
run: |
theHarvester -d yale.edu -b yahoo
- name: Run theHarvester module DNS brute force
run: |
theHarvester -d yale.edu -c
================================================
FILE: .gitignore
================================================
*.idea
*.pyc
*.sqlite
*.html
*.htm
*.vscode
*.xml
*.json
debug_results.txt
venv
.mypy_cache
.pytest_cache
build/
dist/
theHarvester.egg-info
api-keys.yaml
.DS_Store
.venv
.venv/**
.pyre
.junie
================================================
FILE: CHANGELOG.md
================================================
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [4.10.1] - 2026-02-22
### Changed
- Updated Censys integration to align with current API documentation ([67419190](https://github.com/laramies/theHarvester/commit/67419190)).
- Updated RocketReach integration to align with latest API documentation and tests ([ffc7420d](https://github.com/laramies/theHarvester/commit/ffc7420d)).
- Refactored async file handling in CLI paths: replace blocking path calls with awaited operations and improve path sanitization ([e98bf5bb](https://github.com/laramies/theHarvester/commit/e98bf5bb), [607016a1](https://github.com/laramies/theHarvester/commit/607016a1)).
- Migrated packaging/build configuration to `flit-core` and updated entrypoint/version wiring ([d2cae0be](https://github.com/laramies/theHarvester/commit/d2cae0be)).
- Refactored and standardized output utilities, with new regression tests for output formatting and dedup helpers ([fa2dedd3](https://github.com/laramies/theHarvester/commit/fa2dedd3)).
- Updated dependencies: bump `fastapi`, `playwright`, `ruff`, `ty`, and `uvicorn` ([1dfa6e98](https://github.com/laramies/theHarvester/commit/1dfa6e98), [46865337](https://github.com/laramies/theHarvester/commit/46865337), [c1ac137d](https://github.com/laramies/theHarvester/commit/c1ac137d), [7eaec4da](https://github.com/laramies/theHarvester/commit/7eaec4da)).
- Updated packaging dependency `wheel` to `0.46.3` ([46865337](https://github.com/laramies/theHarvester/commit/46865337)).
### Fixed
- Fixed CriminalIP integration for current API behavior, including safer scan/report handling and hostname normalization (issue #2229) ([06c2fbd9](https://github.com/laramies/theHarvester/commit/06c2fbd9)).
- Fixed Shodan engine processing to return hostnames consistently and avoid worker processing errors (issue #2227) ([419291a3](https://github.com/laramies/theHarvester/commit/419291a3)).
- Fixed Bitbucket search flow so discovery runs successfully ([a1968f71](https://github.com/laramies/theHarvester/commit/a1968f71)).
- Improved module API key error messages for clearer diagnostics ([e1b775e3](https://github.com/laramies/theHarvester/commit/e1b775e3)).
- Improved BuiltWith URL handling logic in CLI processing ([15872350](https://github.com/laramies/theHarvester/commit/15872350)).
## [4.10.0] - 2026-01-18
### Added
- LeakIX API key support and improved request header configuration ([31861c19](https://github.com/laramies/theHarvester/commit/31861c19)).
- Bitbucket API key entry in `theHarvester/data/api-keys.yaml` ([6be673fa](https://github.com/laramies/theHarvester/commit/6be673fa)).
- Fix issue #469 Add socks proxy support ([e38bb8fb](https://github.com/laramies/theHarvester/commit/e38bb8fb)).
### Changed
- CI: switch GitHub workflow to `ruff-action` for linting and formatting ([8ddcd1a8](https://github.com/laramies/theHarvester/commit/8ddcd1a8)).
- Dockerfile: add `apt-get update/upgrade` and clean up apt cache layers ([3a5d504b](https://github.com/laramies/theHarvester/commit/3a5d504b)).
- Dependencies updated: bump `aiodns`, `ruff`, `ty`, `filelock`, and `librt` ([40759146](https://github.com/laramies/theHarvester/commit/40759146)).
- Codebase formatting and lint fixes applied (Ruff) ([7c6dec53](https://github.com/laramies/theHarvester/commit/7c6dec53)).
- Tests: expand proxy parameter default structure to include both `http` and `socks5` fields ([bc2fce07](https://github.com/laramies/theHarvester/commit/bc2fce07)).
- `api-keys.yaml` synchronized with `Core` API key references; add consistency test coverage ([ffe1f3a8](https://github.com/laramies/theHarvester/commit/ffe1f3a8)).
### Removed
- `Core.bing_key()` removed ([814c7811](https://github.com/laramies/theHarvester/commit/814c7811)).
### Fixed
- Fix mypy type-checking errors ([0991356b](https://github.com/laramies/theHarvester/commit/0991356b)).
### Security
- Improve input sanitization and add security-focused tests ([3d7489c9](https://github.com/laramies/theHarvester/commit/3d7489c9)).
[Unreleased]: https://github.com/laramies/theHarvester/compare/06520b40...master
[4.10.1]: https://github.com/laramies/theHarvester/compare/4.10.0...06520b40
[4.10.0]: https://github.com/laramies/theHarvester/compare/4.9.2...4.10.0
================================================
FILE: Dockerfile
================================================
FROM python:3.14-slim-trixie
LABEL maintainer="@jay_townsend1 & @NotoriousRebel1"
RUN useradd -m -u 1000 -s /bin/bash theharvester
RUN apt-get update && apt-get upgrade -yqq && apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set workdir and copy project files
WORKDIR /app
COPY . /app
# Create and sync environment using uv
# Compile bytecode for faster startup and install to system site-packages
RUN --mount=from=ghcr.io/astral-sh/uv,source=/uv,target=/bin/uv \
UV_PROJECT_ENVIRONMENT=/usr/local uv sync --locked --no-dev --no-cache --compile-bytecode
# Use non-root user
USER theharvester
# Expose port if the service listens on 80
EXPOSE 80
# Run the application as theharvester user
ENTRYPOINT ["restfulHarvest", "-H", "0.0.0.0", "-p", "80"]
================================================
FILE: README/CONTRIBUTING.md
================================================
# Contributing to theHarvester Project
Welcome to theHarvester project, so you would like to contribute.
The following below must be met to get accepted.
# CI
Make sure all CI passes and you do not introduce any alerts from ruff
# Unit Tests
For new modules a unit test for that module is required and we use pytest.
# Coding Standards
* No single letter variables and variable names must represent the action that it is performing
* Have static typing on functions etc
* Make sure no errors are reported from mypy
* No issues reported with ruff
# Submitting Bugs
If you find a bug in a module that you want to submit an issue for and know how to write python code.
Please create a unit test for that bug(If possible) and submit a fix for it as it would be a big help to the project.
================================================
FILE: README/COPYING
================================================
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
================================================
FILE: README/LICENSES
================================================
Released under the GPL v 2.0.
If you did not receive a copy of the GPL, try http://www.gnu.org/.
Copyright 2011 Christian Martorella
theHarvester is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.
theHarvester is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
================================================
FILE: README.md
================================================

 
[](https://inventory.raw.pm/)
[](https://repology.org/project/theharvester/versions)
About
-----
theHarvester is a simple to use, yet powerful tool designed to be used during the reconnaissance stage of a red
team assessment or penetration test. It performs open source intelligence (OSINT) gathering to help determine
a domain's external threat landscape. The tool gathers names, emails, IPs, subdomains, and URLs by using
multiple public resources that include:
Install and dependencies
------------------------
* Python 3.12 or higher.
* https://github.com/laramies/theHarvester/wiki/Installation
Install uv:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
Clone the repository:
```bash
git clone https://github.com/laramies/theHarvester
cd theHarvester
```
Install dependencies and create a virtual environment:
```bash
uv sync
```
Run theHarvester:
```bash
uv run theHarvester
```
## Development
To install development dependencies:
```bash
uv sync --all-groups
```
To run tests:
```bash
uv run pytest
```
To run linting and formatting:
```bash
uv run ruff check
```
```bash
uv run ruff format
```
Passive modules
---------------
* baidu: Baidu search engine (https://www.baidu.com)
* bevigil: CloudSEK BeVigil scans mobile application for OSINT assets (https://bevigil.com/osint-api)
* brave: Brave search engine - now uses official Brave Search API (https://api-dashboard.search.brave.com)
* bufferoverun: Fast domain name lookups for TLS certificates in IPv4 space (https://tls.bufferover.run)
* builtwith: Find out what websites are built with (https://builtwith.com)
* censys: Uses certificates searches to enumerate subdomains and gather emails (https://censys.io)
* certspotter: Cert Spotter monitors Certificate Transparency logs (https://sslmate.com/certspotter)
* criminalip: Specialized Cyber Threat Intelligence (CTI) search engine (https://www.criminalip.io)
* crtsh: Comodo Certificate search (https://crt.sh)
* dehashed: Take your data security to the next level is (https://dehashed.com)
* dnsdumpster: Domain research tool that can discover hosts related to a domain (https://dnsdumpster.com)
* duckduckgo: DuckDuckGo search engine (https://duckduckgo.com)
* fofa: FOFA search eingine (https://en.fofa.info)
* fullhunt: Next-generation attack surface security platform (https://fullhunt.io)
* github-code: GitHub code search engine (https://www.github.com)
* hackertarget: Online vulnerability scanners and network intelligence to help organizations (https://hackertarget.com)
* haveibeenpwned: Check if your email address is in a data breach (https://haveibeenpwned.com)
* hunter: Hunter search engine (https://hunter.io)
* hunterhow: Internet search engines for security researchers (https://hunter.how)
* intelx: Intelx search engine (https://intelx.io)
* leakix: LeakIX search engine (https://leakix.net)
* leaklookup: Data breach search engine (https://leak-lookup.com)
* mojeek: Mojeek search engine (https://www.mojeek.com)
* netlas: A Shodan or Censys competitor (https://app.netlas.io)
* onyphe: Cyber defense search engine (https://www.onyphe.io)
* otx: AlienVault open threat exchange (https://otx.alienvault.com)
* pentesttools: Cloud-based toolkit for offensive security testing, focused on web applications and network penetration testing (https://pentest-tools.com)
* projecdiscovery: Actively collects and maintains internet-wide assets data, to enhance research and analyse changes around DNS for better insights (https://chaos.projectdiscovery.io)
* rapiddns: DNS query tool which make querying subdomains or sites of a same IP easy (https://rapiddns.io)
* rocketreach: Access real-time verified personal/professional emails, phone numbers, and social media links (https://rocketreach.co)
* securityscorecard: helps TPRM and SOC teams detect, prioritize, and remediate vendor risk across their entire supplier ecosystem at scale (https://securityscorecard.com)
* securityTrails: Security Trails search engine, the world's largest repository of historical DNS data (https://securitytrails.com)
* -s, --shodan: Shodan search engine will search for ports and banners from discovered hosts (https://shodan.io)
* subdomaincenter: A subdomain finder tool used to find subdomains of a given domain (https://www.subdomain.center)
* subdomainfinderc99: A subdomain finder is a tool used to find the subdomains of a given domain (https://subdomainfinder.c99.nl)
* thc: Free subdomain enumeration service with no API key required (https://ip.thc.org)
* threatminer: Data mining for threat intelligence (https://www.threatminer.org)
* tomba: Tomba search engine (https://tomba.io)
* urlscan: A sandbox for the web that is a URL and website scanner (https://urlscan.io)
* venacus: Venacus search engine (https://venacus.com)
* virustotal: Domain search (https://www.virustotal.com)
* whoisxml: Subdomain search (https://subdomains.whoisxmlapi.com/api/pricing)
* yahoo: Yahoo search engine (https://www.yahoo.com)
* windvane: Windvane search engine (https://windvane.lichoin.com)
* zoomeye: China's version of Shodan (https://www.zoomeye.org)
Active modules
--------------
* DNS brute force: dictionary brute force enumeration
* Screenshots: Take screenshots of subdomains that were found
Modules that require an API key
-------------------------------
Documentation to setup API keys can be found at - https://github.com/laramies/theHarvester/wiki/Installation#api-keys
* bevigil - 50 free queries/month. 1k queries/month $50
* brave - free plan available. Pro plans for higher limits
* bufferoverun - 100 free queries/month. 10k/month $25
* builtwith - 50 free queries ever. $2950/yr
* censys - 500 credits $100
* criminalip - 100 free queries/month. 700k/month $59
* dehashed - 500 credts $15, 5k credits $150
* dnsdumpster - 50 free querries/day, $49
* fofa - query credits 10,000/month. 100k results/month $25
* fullhunt - 50 free queries. 200 queries $29/month, 500 queries $59
* github-code
* haveibeenpwned - 10 email searches/min $4.50, 50 email searches/min $22
* hunter - 50 free credits/month. 12k credits/yr $34
* hunterhow - 10k free API results per 30 days. 50k API results per 30 days $10
* intelx - free account is very limited. Business acount $2900
* leakix - free 25 results pages, 3000 API requests/month. Bounty Hunter $29
* leaklookup - 20 credits $10, 50 credits $20, 140 credits $50, 300 credits $100
* mojeek - 5000 free credits $6.50, $1.30 CPM (Personal), $2.60 CPM (Startup), $3.90 CPM (Business)
* netlas - 50 free requests/day. 1k requests $49, 10k requests $249
* onyphe - 10M results/month $587
* pentesttools - 5 assets netsec $95/month, 5 assets webnetsec $140/month
* projecdiscovery - requires work email. Free monthly discovery and vulnerability scans on sign-up email domain, enterprise $
* rocketreach - 100 email lookups/month $48, 250 email lookups/month $108
* securityscorecard - requires a work email
* securityTrails - 50 free queries/month. 20k queries/month $500
* shodan - Freelancer $69 month, Small Business $359 month
* tomba - 25 free searches/month. 1k searches/month $39, 5k searches/month $89
* venacus - 1 free search/day. 10 searches/day $12, 30 searches/day $36
* virustotal - 500 free lookups/day, 15.5k lookups/month. Busines accounts requires a work email
* whoisxml - 2k queries $50, 5k queries $105
* windvane - 100 free queries
* zoomeye - 5 free results/day. 30/results/day $190/yr
## Package versions
[](https://repology.org/project/theharvester/versions)
Comments, bugs, and requests
----------------------------
* [](https://twitter.com/laramies) Christian Martorella @laramies
cmartorella@edge-security.com
* [](https://twitter.com/NotoriousRebel1) Matthew Brown @NotoriousRebel1
* [](https://twitter.com/jay_townsend1) Jay "L1ghtn1ng" Townsend @jay_townsend1
Main contributors
-----------------
* [](https://twitter.com/NotoriousRebel1) Matthew Brown @NotoriousRebel1
* [](https://twitter.com/jay_townsend1) Jay "L1ghtn1ng" Townsend @jay_townsend1
* [](https://twitter.com/discoverscripts) Lee Baird @discoverscripts
Thanks
------
* John Matherly - Shodan project
* Ahmed Aboul Ela - subdomain names dictionaries (big and small)
================================================
FILE: bin/restfulHarvest
================================================
#!/usr/bin/env python3
from theHarvester.restfulHarvest import main
if __name__ == '__main__':
main()
================================================
FILE: bin/theHarvester
================================================
#!/usr/bin/env python3
# Note: This script runs theHarvester
import sys
from theHarvester.theHarvester import main
if sys.version_info.major < 3 or sys.version_info.minor < 10:
print('[!] Make sure you have Python 3.10+ installed, quitting.\n\n')
sys.exit(1)
if __name__ == '__main__':
main()
================================================
FILE: docker-compose.yml
================================================
services:
theharvester.svc.local:
container_name: theHarvester
volumes:
- ./theHarvester/data/api-keys.yaml:/root/.theHarvester/api-keys.yaml
- ./theHarvester/data/api-keys.yaml:/etc/theHarvester/api-keys.yaml
- ./theHarvester/data/proxies.yaml:/etc/theHarvester/proxies.yaml
- ./theHarvester/data/proxies.yaml:/root/.theHarvester/proxies.yaml
build: .
ports:
- "5000:80"
networks:
default:
name: app_theHarvester_network
================================================
FILE: pyproject.toml
================================================
[project]
name = "theHarvester"
description = "theHarvester is a very simple, yet effective tool designed to be used in the early stages of a penetration test"
readme = "README.md"
license = "GPL-2.0-only"
authors = [
{ name = "Christian Martorella", email = "cmartorella@edge-security.com" },
{ name = "Jay Townsend", email = "jay@cybermon.uk" },
{ name = "Matthew Brown", email = "36310667+NotoriousRebel@users.noreply.github.com" },
]
requires-python = ">=3.12"
urls.Homepage = "https://github.com/laramies/theHarvester"
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
"Operating System :: OS Independent",
]
dynamic = ["version"]
dependencies = [
"aiodns==4.0.0",
"aiofiles==25.1.0",
"aiohttp==3.13.3",
"aiohttp-socks==0.11.0",
"aiomultiprocess==0.9.1",
"aiosqlite==0.22.1",
"beautifulsoup4==4.14.3",
"censys==2.2.19",
"certifi==2026.2.25",
"dnspython==2.8.0",
"fastapi==0.135.1",
"lxml==6.0.2",
"netaddr==1.3.0",
"playwright==1.58.0",
"PyYAML==6.0.3",
"python-dateutil==2.9.0.post0",
"httpx==0.28.1",
"retrying==1.4.2",
"shodan==1.31.0",
"slowapi==0.1.9",
"ujson==5.12.0",
"uvicorn==0.41.0",
"uvloop==0.22.1; platform_system != 'Windows'",
"winloop==0.4.0; platform_system == 'Windows'",
]
[dependency-groups]
dev = [
"mypy==1.19.1",
"mypy-extensions==1.1.0",
"pytest==9.0.2",
"pytest-asyncio==1.3.0",
"types-certifi==2021.10.8.3",
"types-chardet==5.0.4.6",
"types-python-dateutil==2.9.0.20260305",
"types-PyYAML==6.0.12.20250915",
"ruff==0.15.5",
"types-ujson==5.10.0.20250822",
"wheel==0.46.3",
"ty==0.0.21",
]
[project.scripts]
theHarvester = "theHarvester.theHarvester:main"
restfulHarvest = "theHarvester.restfulHarvest:main"
[tool.pytest.ini_options]
minversion = "8.3.3"
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
addopts = "--no-header"
testpaths = ["tests"]
[build-system]
requires = ["flit_core >=3.11,<4"]
build-backend = "flit_core.buildapi"
[tool.mypy]
python_version = "3.13"
warn_unused_configs = true
ignore_missing_imports = true
show_traceback = true
show_error_codes = true
namespace_packages = true
check_untyped_defs = true
[tool.uv]
python-preference = "managed"
[tool.uv.pip]
python-version = "3.13"
[tool.ty.src]
respect-ignore-files = false
exclude = [
".venv/**",
"tests/**",
".github/*"
]
[tool.ruff]
# Exclude a variety of commonly ignored directories.
exclude = [
"tests",
".eggs",
".git",
".git-rewrite",
".mypy_cache",
".pyenv",
".pytest_cache",
".pytype",
".ruff_cache",
".github",
".venv",
".vscode",
".idea",
"__pypackages__",
"build",
"dist",
"site-packages",
"venv",
]
line-length = 130
target-version = "py313"
show-fixes = true
[tool.ruff.lint]
select = ["E",
"F",
"N",
"I",
"UP",
"TCH",
"FA",
"RUF",
"PT",
"TC",
"ASYNC"
]
ignore = [
"E501",
"ASYNC230",
"N999",
"PLR0915"
]
# Allow fix for all enabled rules (when `--fix`) is provided.
fixable = ["ALL"]
unfixable = []
# Allow unused variables when underscore-prefixed.
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
[tool.ruff.format]
# Like Black, use double quotes for strings.
quote-style = "single"
indent-style = "space"
# Like Black, respect magic trailing commas.
skip-magic-trailing-comma = false
# Like Black, automatically detect the appropriate line ending.
line-ending = "auto"
================================================
FILE: tests/__init__.py
================================================
================================================
FILE: tests/discovery/__init__.py
================================================
================================================
FILE: tests/discovery/test_baidusearch.py
================================================
import pytest
from theHarvester.discovery import baidusearch
class TestBaiduSearch:
@pytest.mark.asyncio
async def test_process_and_parsing(self, monkeypatch):
called = {}
async def fake_fetch_all(urls, headers=None, proxy=False):
called["urls"] = urls
called["headers"] = headers
called["proxy"] = proxy
return [
"Contact foo@example.com on a.example.com \n",
" bar@sub.example.com is here and www.example.com appears \n",
" Visit sub.a.example.com. baz@example.com \n",
]
# Patch the AsyncFetcher.fetch_all to avoid network I/O
import theHarvester.lib.core as core_module
monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
# Make user agent deterministic (not strictly necessary, but stable)
monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)
search = baidusearch.SearchBaidu(word="example.com", limit=21)
await search.process(proxy=True)
expected_urls = [
"https://www.baidu.com/s?wd=%40example.com&pn=0&oq=example.com",
"https://www.baidu.com/s?wd=%40example.com&pn=10&oq=example.com",
"https://www.baidu.com/s?wd=%40example.com&pn=20&oq=example.com",
]
assert called["urls"] == expected_urls
assert called["proxy"] is True
emails = await search.get_emails()
hosts = await search.get_hostnames()
# Ensure our expected values are present
assert "foo@example.com" in emails
assert "bar@sub.example.com" in emails
assert "baz@example.com" in emails
assert {"a.example.com", "www.example.com", "sub.a.example.com"} <= set(hosts)
@pytest.mark.asyncio
async def test_pagination_limit_exclusive(self, monkeypatch):
captured = {}
async def fake_fetch_all(urls, headers=None, proxy=False):
captured["urls"] = urls
return [""] * len(urls)
import theHarvester.lib.core as core_module
monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)
search = baidusearch.SearchBaidu(word="example.com", limit=20)
await search.process()
# For limit=20, range(0, 20, 10) yields 0 and 10 only (20 is excluded)
assert captured["urls"] == [
"https://www.baidu.com/s?wd=%40example.com&pn=0&oq=example.com",
"https://www.baidu.com/s?wd=%40example.com&pn=10&oq=example.com",
]
================================================
FILE: tests/discovery/test_censys.py
================================================
import sys
import types
import pytest
if 'aiohttp_socks' not in sys.modules:
aiohttp_socks_stub = types.ModuleType('aiohttp_socks')
class _ProxyConnector:
@staticmethod
def from_url(*_args, **_kwargs):
return None
setattr(aiohttp_socks_stub, 'ProxyConnector', _ProxyConnector)
sys.modules['aiohttp_socks'] = aiohttp_socks_stub
from theHarvester.discovery import censysearch
from theHarvester.discovery.constants import MissingKey
class _FakeQuery:
def __init__(self, pages):
self.pages = pages
def __iter__(self):
return iter(self.pages)
@pytest.mark.asyncio
async def test_missing_key_raises(monkeypatch) -> None:
monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: (None, None))
with pytest.raises(MissingKey):
censysearch.SearchCensys('example.com')
@pytest.mark.asyncio
async def test_search_uses_documented_pagination_and_fields(monkeypatch) -> None:
monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: ('id', 'secret'))
calls = {}
class _FakeCensysCerts:
def __init__(self, api_id, api_secret, user_agent):
calls['init'] = {'api_id': api_id, 'api_secret': api_secret, 'user_agent': user_agent}
def search(self, **kwargs):
calls['search'] = kwargs
return _FakeQuery(
[
[
{'names': ['a.example.com'], 'parsed': {'subject': {'email_address': 'admin@example.com'}}},
{'names': ['b.example.com'], 'parsed': {'subject': {'email_address': ['ops@example.com']}}},
],
[
{'names': ['c.example.com'], 'parsed': {'subject': {'email_address': None}}},
],
]
)
monkeypatch.setattr(censysearch, 'CensysCerts', _FakeCensysCerts)
search = censysearch.SearchCensys('example.com', limit=250)
await search.process()
assert calls['init']['api_id'] == 'id'
assert calls['init']['api_secret'] == 'secret'
assert calls['search']['query'] == 'names: example.com'
assert calls['search']['per_page'] == 100
assert calls['search']['pages'] == 3
assert calls['search']['fields'] == ['names', 'parsed.subject.email_address']
assert await search.get_hostnames() == {'a.example.com', 'b.example.com', 'c.example.com'}
assert await search.get_emails() == {'admin@example.com', 'ops@example.com'}
@pytest.mark.asyncio
async def test_search_respects_limit_across_page_data(monkeypatch) -> None:
monkeypatch.setattr(censysearch.Core, 'censys_key', lambda: ('id', 'secret'))
class _FakeCensysCerts:
def __init__(self, api_id, api_secret, user_agent):
del api_id, api_secret, user_agent
def search(self, **kwargs):
del kwargs
return _FakeQuery(
[
[
{'names': ['1.example.com']},
{'names': ['2.example.com']},
{'names': ['3.example.com']},
{'names': ['4.example.com']},
{'names': ['5.example.com']},
]
]
)
monkeypatch.setattr(censysearch, 'CensysCerts', _FakeCensysCerts)
search = censysearch.SearchCensys('example.com', limit=3)
await search.process()
assert await search.get_hostnames() == {'1.example.com', '2.example.com', '3.example.com'}
================================================
FILE: tests/discovery/test_certspotter.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import os
from typing import Optional
import pytest
import httpx
from theHarvester.discovery import certspottersearch
from theHarvester.lib.core import *
github_ci: Optional[str] = os.getenv(
"GITHUB_ACTIONS"
) # Github set this to be the following: true instead of True
class TestCertspotter(object):
@staticmethod
def domain() -> str:
return "metasploit.com"
@pytest.mark.skipif(github_ci == 'true', reason="Skipping this test for now")
class TestCertspotterSearch(object):
@pytest.mark.asyncio
async def test_api(self) -> None:
base_url = f"https://api.certspotter.com/v1/issuances?domain={TestCertspotter.domain()}&expand=dns_names"
headers = {"User-Agent": Core.get_user_agent()}
request = httpx.get(base_url, headers=headers)
assert request.status_code == 200
@pytest.mark.asyncio
async def test_search(self) -> None:
search = certspottersearch.SearchCertspoter(TestCertspotter.domain())
await search.process()
assert isinstance(await search.get_hostnames(), set)
if __name__ == "__main__":
pytest.main()
================================================
FILE: tests/discovery/test_criminalip.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import pytest
from theHarvester.discovery import criminalip
@pytest.mark.asyncio
async def test_parser_handles_missing_legacy_fields(monkeypatch) -> None:
monkeypatch.setattr(criminalip.Core, 'criminalip_key', lambda: 'test-key')
search = criminalip.SearchCriminalIP('example.com')
payload = {
'data': {
'certificates': [{'subject': 'www.example.com'}],
'connected_domain_subdomain': [{'main_domain': {'domain': 'example.com'}, 'subdomains': [{'domain': 'api.example.com'}]}],
'connected_ip': [{'ip': '93.184.216.34'}],
'connected_ip_info': [
{
'asn': '15133',
'ip': '93.184.216.34',
'domain_list': [{'domain': 'mail.example.com'}],
}
],
'cookies': [{'domain': '.portal.example.com'}],
'dns_record': {
'dns_record_type_a': {'ipv4': [{'ip': '93.184.216.34'}], 'ipv6': []},
'dns_record_type_ns': ['ns1.example.com.'],
},
'html_page_link_domains': [{'domain': 'www.iana.org', 'mapped_ips': [{'ip': '192.0.33.8'}]}],
'links': [{'url': 'https://docs.example.com/guide'}],
'mapped_ip': [{'ip': '203.0.113.10'}],
'network_logs': {
'data': [{'url': 'https://cdn.example.com/script.js', 'as_number': '64500', 'ip_port': '198.51.100.10:443'}]
},
'page_redirections': [[{'url': 'https://login.example.com'}]],
'subdomains': [{'subdomain_name': 'blog.example.com'}],
}
}
await search.parser(payload)
hostnames = await search.get_hostnames()
ips = await search.get_ips()
asns = await search.get_asns()
assert {'api.example.com', 'blog.example.com', 'cdn.example.com', 'docs.example.com', 'login.example.com'}.issubset(hostnames)
assert {'93.184.216.34', '198.51.100.10', '203.0.113.10'}.issubset(ips)
assert {'15133', '64500'}.issubset(asns)
@pytest.mark.asyncio
async def test_do_search_uses_v2_report_endpoint(monkeypatch) -> None:
monkeypatch.setattr(criminalip.Core, 'criminalip_key', lambda: 'test-key')
monkeypatch.setattr(criminalip.Core, 'get_user_agent', lambda: 'test-agent')
called_urls = []
async def fake_post_fetch(url, **kwargs):
assert url == 'https://api.criminalip.io/v1/domain/scan'
return {'status': 200, 'data': {'scan_id': 12345}}
async def fake_fetch_all(urls, **kwargs):
called_urls.append(urls[0])
if '/v1/domain/status/' in urls[0]:
return [{'status': 200, 'data': {'scan_percentage': 100}}]
if '/v2/domain/report/' in urls[0]:
return [
{
'status': 200,
'data': {
'certificates': [],
'connected_domain_subdomain': [],
'connected_ip': [],
'connected_ip_info': [],
'cookies': [],
'dns_record': {},
'html_page_link_domains': [],
'links': [],
'mapped_ip': [],
'network_logs': {'data': []},
'page_redirections': [],
'subdomains': [],
},
}
]
return [{'status': 500}]
monkeypatch.setattr(criminalip.AsyncFetcher, 'post_fetch', fake_post_fetch)
monkeypatch.setattr(criminalip.AsyncFetcher, 'fetch_all', fake_fetch_all)
search = criminalip.SearchCriminalIP('example.com')
await search.process()
assert any('/v2/domain/report/12345' in url for url in called_urls)
assert all('/v1/domain/report/' not in url for url in called_urls)
================================================
FILE: tests/discovery/test_githubcode.py
================================================
from unittest.mock import MagicMock
import pytest
from httpx import Response
from theHarvester.discovery import githubcode
from theHarvester.discovery.constants import MissingKey
from theHarvester.lib.core import Core
class TestSearchGithubCode:
class OkResponse:
response = Response(status_code=200)
# Mocking the json method properly
def __init__(self):
self.response = Response(status_code=200)
object.__setattr__(
self.response,
"json",
MagicMock(
return_value={
"items": [
{"text_matches": [{"fragment": "test1"}]},
{"text_matches": [{"fragment": "test2"}]},
]
}
),
)
class FailureResponse:
def __init__(self):
self.response = Response(status_code=401)
object.__setattr__(self.response, "json", MagicMock(return_value={}))
class RetryResponse:
def __init__(self):
self.response = Response(status_code=403)
object.__setattr__(self.response, "json", MagicMock(return_value={}))
class MalformedResponse:
def __init__(self):
self.response = Response(status_code=200)
object.__setattr__(
self.response,
"json",
MagicMock(
return_value={
"items": [
{"fail": True},
{"text_matches": []},
{"text_matches": [{"weird": "result"}]},
]
}
),
)
@pytest.mark.asyncio
async def test_missing_key(self):
with pytest.raises(MissingKey):
Core.github_key = MagicMock(return_value=None) # type: ignore[method-assign]
githubcode.SearchGithubCode(word="test", limit=500)
@pytest.mark.asyncio
async def test_fragments_from_response(self):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
test_result = await test_class_instance.fragments_from_response(
self.OkResponse().response.json()
)
print("test_result: ", test_result)
assert test_result == ["test1", "test2"]
@pytest.mark.asyncio
async def test_invalid_fragments_from_response(self):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
test_result = await test_class_instance.fragments_from_response(
self.MalformedResponse().response.json()
)
assert test_result == []
@pytest.mark.asyncio
async def test_next_page(self):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
test_result = githubcode.SuccessResult(list(), next_page=2, last_page=4)
assert 2 == await test_class_instance.next_page_or_end(test_result)
@pytest.mark.asyncio
async def test_last_page(self):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
test_result = githubcode.SuccessResult(list(), 0, 0)
assert await test_class_instance.next_page_or_end(test_result) == 0
@pytest.mark.asyncio
async def test_infinite_loop_fix_page_zero(self):
"""Test that the loop condition properly exits when page becomes 0"""
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
# Test the fixed condition: page != 0
page = 0
counter = 0
limit = 10
# The condition should be False when page is 0, preventing infinite loop
condition_result = counter <= limit and page != 0
assert condition_result is False, "Loop should exit when page is 0"
@pytest.mark.asyncio
async def test_infinite_loop_fix_page_nonzero(self):
"""Test that the loop condition continues when page is non-zero"""
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
# Test with non-zero page values
for page in [1, 2, 3, 10]:
counter = 0
limit = 10
# The condition should be True when page is non-zero
condition_result = counter <= limit and page != 0
assert condition_result is True, f"Loop should continue when page is {page}"
@pytest.mark.asyncio
async def test_infinite_loop_fix_old_vs_new_condition(self):
"""Test that demonstrates the difference between old and new conditions"""
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
test_class_instance = githubcode.SearchGithubCode(word="test", limit=500)
page = 0
counter = 0
limit = 10
# Old problematic condition (would cause infinite loop)
old_condition = counter <= limit and page is not None
# New fixed condition (properly exits)
new_condition = counter <= limit and page != 0
# Old condition would be True (causing infinite loop)
assert old_condition is True, "Old condition would cause infinite loop when page=0"
# New condition is False (properly exits)
assert new_condition is False, "New condition properly exits when page=0"
if __name__ == "__main__":
pytest.main()
================================================
FILE: tests/discovery/test_githubcode_additions.py
================================================
from unittest.mock import MagicMock, AsyncMock
import asyncio
import pytest
from theHarvester.discovery import githubcode
from theHarvester.lib.core import Core
class TestSearchGithubCodeProcess:
@pytest.mark.asyncio
async def test_process_stops_after_max_retries(self, monkeypatch):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
inst = githubcode.SearchGithubCode(word="test", limit=10)
# Speed up by avoiding actual sleeps
monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))
# Force RetryResult every time
monkeypatch.setattr(
inst,
"handle_response",
AsyncMock(return_value=githubcode.RetryResult(0)),
)
monkeypatch.setattr(
inst,
"do_search",
AsyncMock(return_value=("", {}, 403, {})),
)
inst.max_retries = 2
await inst.process()
assert inst.page == 0, "Process should stop after exceeding max retries"
assert inst.retry_count == 3, "Retry count should exceed max_retries before stopping"
@pytest.mark.asyncio
async def test_process_stops_on_error_result(self, monkeypatch):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
inst = githubcode.SearchGithubCode(word="test", limit=10)
monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))
# Force ErrorResult
monkeypatch.setattr(
inst,
"handle_response",
AsyncMock(return_value=githubcode.ErrorResult(500, "err")),
)
monkeypatch.setattr(
inst,
"do_search",
AsyncMock(return_value=("", {}, 500, {})),
)
await inst.process()
assert inst.page == 0, "Process should stop on error result to avoid infinite loop"
@pytest.mark.asyncio
async def test_process_breaks_on_same_page_pagination(self, monkeypatch):
Core.github_key = MagicMock(return_value="test_key") # type: ignore[method-assign]
inst = githubcode.SearchGithubCode(word="test", limit=10)
monkeypatch.setattr(githubcode, "get_delay", lambda: 0, raising=False)
monkeypatch.setattr(asyncio, "sleep", AsyncMock(return_value=None))
# Force SuccessResult that does not advance the page
monkeypatch.setattr(
inst,
"handle_response",
AsyncMock(return_value=githubcode.SuccessResult([], next_page=1, last_page=0)),
)
monkeypatch.setattr(
inst,
"do_search",
AsyncMock(return_value=("", {"items": []}, 200, {})),
)
await inst.process()
assert inst.page == 0, "Process should stop when pagination does not advance"
================================================
FILE: tests/discovery/test_otx.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import os
from typing import Optional
import httpx
import pytest
from theHarvester.discovery import otxsearch
from theHarvester.lib.core import *
github_ci: Optional[str] = os.getenv(
"GITHUB_ACTIONS"
) # Github set this to be the following: true instead of True
class TestOtx(object):
@staticmethod
def domain() -> str:
return "apple.com"
@pytest.mark.asyncio
async def test_search(self) -> None:
search = otxsearch.SearchOtx(TestOtx.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip("Skipping OTX search due to network error")
assert isinstance(await search.get_hostnames(), set)
assert isinstance(await search.get_ips(), set)
if __name__ == "__main__":
pytest.main()
================================================
FILE: tests/discovery/test_rocketreach.py
================================================
import sys
import types
import pytest
if 'aiohttp_socks' not in sys.modules:
aiohttp_socks_stub = types.ModuleType('aiohttp_socks')
class _ProxyConnector:
@staticmethod
def from_url(*_args, **_kwargs):
return None
setattr(aiohttp_socks_stub, 'ProxyConnector', _ProxyConnector)
sys.modules['aiohttp_socks'] = aiohttp_socks_stub
from theHarvester.discovery import rocketreach
from theHarvester.discovery.constants import MissingKey
@pytest.mark.asyncio
async def test_missing_key_raises(monkeypatch) -> None:
monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: None)
with pytest.raises(MissingKey):
rocketreach.SearchRocketReach('example.com', 10)
@pytest.mark.asyncio
async def test_do_search_uses_people_data_endpoint_and_start_pagination(monkeypatch) -> None:
monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: 'test-key')
monkeypatch.setattr(rocketreach.Core, 'get_user_agent', lambda: 'test-agent')
monkeypatch.setattr(rocketreach, 'get_delay', lambda: 0)
async def fake_sleep(_seconds):
return None
monkeypatch.setattr(rocketreach.asyncio, 'sleep', fake_sleep)
calls = []
async def fake_post_fetch(url, headers=None, data=None, json=False, **kwargs):
calls.append((url, headers, data, json, kwargs))
if len(calls) == 1:
first_page_profiles = []
for index in range(100):
first_page_profiles.append(
{
'linkedin_url': f'https://linkedin.com/in/user{index}',
'emails': [{'email': f'user{index}@example.com'}],
}
)
return {
'profiles': first_page_profiles,
'pagination': {'page': 1, 'total': 150},
}
second_page_profiles = []
for index in range(100, 150):
second_page_profiles.append(
{
'linkedin_url': f'https://linkedin.com/in/user{index}',
'emails': [{'email': f'user{index}@example.com'}],
}
)
return {
'profiles': second_page_profiles,
'pagination': {'page': 2, 'total': 150},
}
monkeypatch.setattr(rocketreach.AsyncFetcher, 'post_fetch', fake_post_fetch)
search = rocketreach.SearchRocketReach('example.com', 150)
await search.process()
assert len(calls) == 2
first_url, first_headers, first_data, first_json, _ = calls[0]
second_url, _, second_data, _, _ = calls[1]
assert first_url == 'https://api.rocketreach.co/api/v2/person/search'
assert second_url == 'https://api.rocketreach.co/api/v2/person/search'
assert first_headers['Api-Key'] == 'test-key'
assert first_headers['User-Agent'] == 'test-agent'
assert first_json is True
assert first_data == {'query': {'current_employer_domain': ['example.com']}, 'start': 0, 'page_size': 100}
assert second_data == {'query': {'current_employer_domain': ['example.com']}, 'start': 100, 'page_size': 50}
links = await search.get_links()
emails = await search.get_emails()
assert len(links) == 150
assert len(emails) == 150
assert 'https://linkedin.com/in/user0' in links
assert 'https://linkedin.com/in/user149' in links
assert 'user0@example.com' in emails
assert 'user149@example.com' in emails
@pytest.mark.asyncio
async def test_do_search_stops_on_throttling_message(monkeypatch) -> None:
monkeypatch.setattr(rocketreach.Core, 'rocketreach_key', lambda: 'test-key')
monkeypatch.setattr(rocketreach.Core, 'get_user_agent', lambda: 'test-agent')
monkeypatch.setattr(rocketreach, 'get_delay', lambda: 0)
async def fake_sleep(_seconds):
return None
monkeypatch.setattr(rocketreach.asyncio, 'sleep', fake_sleep)
calls = []
async def fake_post_fetch(url, headers=None, data=None, json=False, **kwargs):
calls.append((url, data))
return {'detail': 'Request was throttled. Credits will become available in 10 seconds.'}
monkeypatch.setattr(rocketreach.AsyncFetcher, 'post_fetch', fake_post_fetch)
search = rocketreach.SearchRocketReach('example.com', 10)
await search.process()
assert len(calls) == 1
================================================
FILE: tests/discovery/test_shodan_engine.py
================================================
import socket
import sys
from collections import OrderedDict
import pytest
class TestShodanEngine:
@pytest.mark.asyncio
async def test_shodan_engine_processes_without_work_item_error_and_yields_hostnames(self, monkeypatch, capsys):
# Import inside the test so monkeypatching affects the already-imported module namespace.
import theHarvester.__main__ as main_module
# Make DNS resolution deterministic and offline.
monkeypatch.setattr(socket, "gethostbyname", lambda _domain: "1.2.3.4", raising=True)
# Avoid filesystem/sqlite side effects.
class DummyStashManager:
async def do_init(self) -> None:
return None
async def store_all(self, domain, all, res_type, source) -> None: # noqa: A002
return None
monkeypatch.setattr(main_module.stash, "StashManager", DummyStashManager, raising=True)
# Stub Shodan search to avoid network and API key requirements.
class DummySearchShodan:
async def search_ip(self, ip):
return OrderedDict({ip: {"hostnames": ["a.example.com", "b.example.com"]}})
monkeypatch.setattr(main_module.shodansearch, "SearchShodan", DummySearchShodan, raising=True)
# Run the CLI path that uses the engine queue/worker (`-b shodan`).
monkeypatch.setattr(sys, "argv", ["theHarvester", "-d", "example.com", "-b", "shodan"], raising=True)
with pytest.raises(SystemExit) as excinfo:
await main_module.start()
assert excinfo.value.code == 0
out = capsys.readouterr().out
assert 'A error occurred while processing a "work item"' not in out
assert "a.example.com" in out
assert "b.example.com" in out
================================================
FILE: tests/discovery/test_thc.py
================================================
#!/usr/bin/env python3
# coding=utf-8
"""
Tests for THC (ip.thc.org) discovery module.
THC provides multiple endpoints:
- Subdomain enumeration
- CNAME lookup
- Reverse DNS lookup
API Documentation: https://ip.thc.org/docs/
"""
import os
from typing import Optional
import httpx
import pytest
from theHarvester.discovery import thc
from theHarvester.lib.core import Core
github_ci: Optional[str] = os.getenv('GITHUB_ACTIONS')
# =============================================================================
# 1. Direct API Tests (Endpoint Validation)
# =============================================================================
class TestThcApi:
"""Tests to validate that the THC API responds correctly."""
@pytest.mark.asyncio
async def test_api_subdomains_download_endpoint_responds(self) -> None:
"""Verify that the subdomain download endpoint responds."""
url = 'https://ip.thc.org/api/v1/subdomains/download?domain=google.com&limit=10&hide_header=true'
headers = {'User-Agent': Core.get_user_agent()}
try:
response = httpx.get(url, headers=headers, timeout=30)
assert response.status_code == 200
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
@pytest.mark.asyncio
async def test_api_subdomains_returns_text_format(self) -> None:
"""Verify that the response is plain text."""
url = 'https://ip.thc.org/api/v1/subdomains/download?domain=google.com&limit=5&hide_header=true'
headers = {'User-Agent': Core.get_user_agent()}
try:
response = httpx.get(url, headers=headers, timeout=30)
content_type = response.headers.get('content-type', '')
assert 'text' in content_type or 'octet-stream' in content_type or response.status_code == 200
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
@pytest.mark.asyncio
async def test_api_cli_subdomain_endpoint(self) -> None:
"""Verify CLI endpoint /sb/{domain}."""
url = 'https://ip.thc.org/sb/google.com?l=5&noheader'
headers = {'User-Agent': Core.get_user_agent()}
try:
response = httpx.get(url, headers=headers, timeout=30)
assert response.status_code == 200
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
@pytest.mark.asyncio
async def test_api_returns_rate_limit_headers(self) -> None:
"""Verify that the API returns rate limit headers."""
url = 'https://ip.thc.org/api/v1/subdomains/download?domain=example.com&limit=1&hide_header=true'
headers = {'User-Agent': Core.get_user_agent()}
try:
response = httpx.get(url, headers=headers, timeout=30)
assert 'x-ratelimit-limit' in response.headers
assert 'x-ratelimit-remaining' in response.headers
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
# =============================================================================
# 2. Subdomain Search Tests (Main Functionality)
# =============================================================================
class TestThcSubdomainSearch:
"""Tests for subdomain search functionality."""
@staticmethod
def domain() -> str:
return 'tesla.com'
@staticmethod
def small_domain() -> str:
return 'thc.org'
@pytest.mark.asyncio
async def test_search_returns_set(self) -> None:
"""Verify that get_hostnames() returns a set."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_search_finds_subdomains(self) -> None:
"""Verify that it finds subdomains for a known domain."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
assert len(result) > 0, 'Should find at least one subdomain for tesla.com'
@pytest.mark.asyncio
async def test_search_results_contain_target_domain(self) -> None:
"""Verify that all results contain the target domain."""
search = thc.SearchThc(self.small_domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
for hostname in result:
assert self.small_domain() in hostname, f'{hostname} should contain {self.small_domain()}'
@pytest.mark.asyncio
async def test_search_no_duplicates(self) -> None:
"""Verify that there are no duplicates in the results."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
result_list = list(result)
assert len(result_list) == len(set(result_list))
# =============================================================================
# 3. Edge Case Tests
# =============================================================================
class TestThcEdgeCases:
"""Tests for edge cases and error handling."""
@pytest.mark.asyncio
async def test_search_nonexistent_domain(self) -> None:
"""Verify behavior with non-existent domain."""
search = thc.SearchThc('this-domain-definitely-does-not-exist-12345.com')
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
except Exception:
pass
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_search_empty_domain(self) -> None:
"""Verify behavior with empty domain."""
search = thc.SearchThc('')
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
except Exception:
pass
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_search_special_characters_domain(self) -> None:
"""Verify behavior with special characters."""
search = thc.SearchThc('example.com; DROP TABLE domains;--')
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
except Exception:
pass
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_search_unicode_domain(self) -> None:
"""Verify behavior with IDN/unicode domain."""
search = thc.SearchThc('xn--mnchen-3ya.de')
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
except Exception:
pass
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_search_subdomain_as_input(self) -> None:
"""Verify behavior when a subdomain is passed as input."""
search = thc.SearchThc('www.google.com')
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
assert isinstance(result, set)
# =============================================================================
# 4. Proxy Tests
# =============================================================================
class TestThcProxy:
"""Tests for proxy functionality."""
@staticmethod
def domain() -> str:
return 'example.com'
@pytest.mark.asyncio
async def test_process_accepts_proxy_parameter(self) -> None:
"""Verify that process() accepts proxy parameter."""
search = thc.SearchThc(self.domain())
try:
await search.process(proxy=False)
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
assert isinstance(result, set)
@pytest.mark.asyncio
async def test_proxy_attribute_is_set(self) -> None:
"""Verify that the proxy attribute is set correctly."""
search = thc.SearchThc(self.domain())
assert search.proxy is False
# =============================================================================
# 5. Initialization and Attributes Tests
# =============================================================================
class TestThcInitialization:
"""Tests for class initialization and structure."""
def test_init_sets_word(self) -> None:
"""Verify that __init__ sets the domain."""
domain = 'test.com'
search = thc.SearchThc(domain)
assert search.word == domain
def test_init_creates_empty_results(self) -> None:
"""Verify that results is initialized empty."""
search = thc.SearchThc('test.com')
assert hasattr(search, 'results')
assert len(search.results) == 0
def test_init_proxy_default_false(self) -> None:
"""Verify that proxy is False by default."""
search = thc.SearchThc('test.com')
assert search.proxy is False
def test_init_has_rate_limit_settings(self) -> None:
"""Verify that rate limit settings are initialized."""
search = thc.SearchThc('test.com')
assert hasattr(search, 'max_retries')
assert hasattr(search, 'base_delay')
assert search.max_retries == 3
assert search.base_delay == 2
def test_class_has_required_methods(self) -> None:
"""Verify that the class has the required methods."""
search = thc.SearchThc('test.com')
assert hasattr(search, 'do_search')
assert hasattr(search, 'get_hostnames')
assert hasattr(search, 'process')
assert callable(search.do_search)
assert callable(search.get_hostnames)
assert callable(search.process)
# =============================================================================
# 6. Response Format Tests
# =============================================================================
class TestThcResponseFormat:
"""Tests to verify response format."""
@staticmethod
def domain() -> str:
return 'github.com'
@pytest.mark.asyncio
async def test_hostnames_are_strings(self) -> None:
"""Verify that all hostnames are strings."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
for hostname in result:
assert isinstance(hostname, str)
@pytest.mark.asyncio
async def test_hostnames_are_valid_format(self) -> None:
"""Verify that hostnames have valid format."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
for hostname in result:
assert ' ' not in hostname
assert '\n' not in hostname
assert '\t' not in hostname
@pytest.mark.asyncio
async def test_hostnames_are_lowercase(self) -> None:
"""Verify that hostnames are lowercase."""
search = thc.SearchThc(self.domain())
try:
await search.process()
except (httpx.TimeoutException, httpx.RequestError):
pytest.skip('Skipping due to network error')
result = await search.get_hostnames()
for hostname in result:
assert hostname == hostname.lower()
# =============================================================================
# 7. Integration Tests with theHarvester
# =============================================================================
@pytest.mark.skipif(github_ci == 'true', reason='Skip integration tests in CI')
class TestThcIntegration:
"""Integration tests with theHarvester framework."""
@pytest.mark.asyncio
async def test_module_can_be_imported(self) -> None:
"""Verify that the module can be imported."""
from theHarvester.discovery import thc as thc_module
assert thc_module is not None
@pytest.mark.asyncio
async def test_search_class_exists(self) -> None:
"""Verify that SearchThc class exists."""
from theHarvester.discovery import thc as thc_module
assert hasattr(thc_module, 'SearchThc')
@pytest.mark.asyncio
async def test_compatible_with_store_function(self) -> None:
"""Verify compatibility with store function from __main__.py."""
search = thc.SearchThc('example.com')
assert hasattr(search, 'process')
assert hasattr(search, 'get_hostnames')
if __name__ == '__main__':
pytest.main()
================================================
FILE: tests/lib/test_core.py
================================================
from __future__ import annotations
from pathlib import Path
from typing import Any
from unittest import mock
import pytest
import yaml
import theHarvester.lib.core as core_module
from theHarvester.lib.core import CONFIG_DIRS, DATA_DIR, AsyncFetcher, Core
@pytest.fixture(autouse=True)
def mock_environ(monkeypatch, tmp_path: Path):
monkeypatch.setenv("HOME", str(tmp_path))
def mock_read_text(mocked: dict[Path, str | Exception]):
read_text = Path.read_text
def _read_text(self: Path, *args, **kwargs):
if result := mocked.get(self):
if isinstance(result, Exception):
raise result
return result
return read_text(self, *args, **kwargs)
return _read_text
@pytest.mark.parametrize(
("name", "contents", "expected"),
[
("api-keys", "apikeys: {}", {}),
("proxies", "http: [localhost:8080]", {"http": ["http://localhost:8080"], "socks5": []}),
],
)
@pytest.mark.parametrize("dir", CONFIG_DIRS)
def test_read_config_searches_config_dirs(
name: str, contents: str, expected: Any, dir: Path, capsys
):
file = dir.expanduser() / f"{name}.yaml"
config_files = [d.expanduser() / file.name for d in CONFIG_DIRS]
side_effect = mock_read_text(
{f: contents if f == file else FileNotFoundError() for f in config_files}
)
with mock.patch("pathlib.Path.read_text", autospec=True, side_effect=side_effect):
got = Core.api_keys() if name == "api-keys" else Core.proxy_list()
assert got == expected
assert f"Read {file.name} from {file}" in capsys.readouterr().out
@pytest.mark.parametrize("name", ("api-keys", "proxies"))
def test_read_config_copies_default_to_home(name: str, capsys):
file = Path(f"~/.theHarvester/{name}.yaml").expanduser()
config_files = [d.expanduser() / file.name for d in CONFIG_DIRS]
side_effect = mock_read_text({f: FileNotFoundError() for f in config_files})
with mock.patch("pathlib.Path.read_text", autospec=True, side_effect=side_effect):
got = Core.api_keys() if name == "api-keys" else Core.proxy_list()
default = yaml.safe_load((DATA_DIR / file.name).read_text())
expected = (
default["apikeys"]
if name == "api-keys"
else {
"http": [f"http://{h}" for h in default["http"]] if default.get("http") else [],
"socks5": [f"socks5://{h}" for h in default["socks5"]] if default.get("socks5") else [],
}
)
assert got == expected
assert f"Created default {file.name} at {file}" in capsys.readouterr().out
assert file.exists()
class DummyResponse:
def __init__(self, text_value: str = 'response-text', json_value: Any = None):
self.text_value = text_value
self.json_value = {'ok': True} if json_value is None else json_value
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
return False
async def text(self):
return self.text_value
async def json(self):
return self.json_value
class DummySession:
instances: list['DummySession'] = []
def __init__(self, *, headers=None, timeout=None, connector=None):
self.headers = headers
self.timeout = timeout
self.connector = connector
self.closed = False
self.requests: list[tuple[str, str, dict[str, Any]]] = []
DummySession.instances.append(self)
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
await self.close()
return False
def request(self, method: str, url: str, **kwargs):
self.requests.append((method, url, kwargs))
return DummyResponse()
def get(self, url: str, **kwargs):
self.requests.append(('GET', url, kwargs))
return DummyResponse()
def post(self, url: str, **kwargs):
self.requests.append(('POST', url, kwargs))
return DummyResponse(json_value={'posted': True})
async def close(self):
self.closed = True
def reset_dummy_sessions() -> None:
DummySession.instances.clear()
async def fake_sleep(_seconds: float) -> None:
return None
def test_api_keys_yaml_is_in_sync_with_core_accessors():
required = core_module.Core._API_KEY_FIELDS
assert required, "No API-key references were detected in `Core`"
config = yaml.safe_load((DATA_DIR / "api-keys.yaml").read_text(encoding="utf-8"))
apikeys = config["apikeys"]
missing_providers = sorted(set(required) - set(apikeys))
assert not missing_providers, f"Missing providers in api-keys.yaml: {missing_providers}"
missing_fields: dict[str, list[str]] = {}
for provider, fields in required.items():
for field in sorted(fields):
if field not in apikeys[provider]:
missing_fields.setdefault(provider, []).append(field)
assert not missing_fields, f"Missing fields in api-keys.yaml: {missing_fields}"
@pytest.mark.parametrize(
("accessor_name", "expected"),
[
("bevigil_key", "bevigil-key"),
("censys_key", ("censys-id", "censys-secret")),
("fofa_key", ("fofa-key", "fofa-email")),
("tomba_key", ("tomba-key", "tomba-secret")),
],
)
def test_api_key_accessors_delegate_to_shared_mapping(monkeypatch, accessor_name: str, expected: Any):
monkeypatch.setattr(
Core,
'api_keys',
staticmethod(
lambda: {
'bevigil': {'key': 'bevigil-key'},
'censys': {'id': 'censys-id', 'secret': 'censys-secret'},
'fofa': {'key': 'fofa-key', 'email': 'fofa-email'},
'tomba': {'key': 'tomba-key', 'secret': 'tomba-secret'},
}
),
)
accessor = getattr(Core, accessor_name)
assert accessor() == expected
@pytest.mark.asyncio
async def test_fetch_creates_session_with_default_headers(monkeypatch) -> None:
reset_dummy_sessions()
monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
monkeypatch.setattr(Core, 'get_user_agent', staticmethod(lambda: 'test-agent'))
result = await AsyncFetcher.fetch(url='https://example.com', follow_redirects=False)
assert result == 'response-text'
assert len(DummySession.instances) == 1
session = DummySession.instances[0]
assert session.headers == {'User-Agent': 'test-agent'}
assert session.closed is True
assert session.requests == [
('GET', 'https://example.com', {'ssl': 'ssl-context', 'allow_redirects': False})
]
@pytest.mark.asyncio
async def test_fetch_uses_http_proxy_when_enabled(monkeypatch) -> None:
reset_dummy_sessions()
monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
monkeypatch.setattr(AsyncFetcher, '_get_random_proxy', staticmethod(lambda proxy_dict: ('http://proxy.local:8080', 'http')))
async def fake_create_connector(proxy_url, proxy_type, ssl_context=None):
return 'connector'
monkeypatch.setattr(AsyncFetcher, '_create_connector', fake_create_connector)
result = await AsyncFetcher.fetch(url='https://example.com', proxy=True)
assert result == 'response-text'
session = DummySession.instances[0]
assert session.connector == 'connector'
assert session.requests == [
('GET', 'https://example.com', {'ssl': 'ssl-context', 'proxy': 'http://proxy.local:8080'})
]
@pytest.mark.asyncio
async def test_post_fetch_decodes_string_payload_and_posts_params(monkeypatch) -> None:
reset_dummy_sessions()
monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
monkeypatch.setattr(Core, 'get_user_agent', staticmethod(lambda: 'test-agent'))
result = await AsyncFetcher.post_fetch(
'https://example.com/api',
data='{"query": "example"}',
params={'page': 2},
json=True,
)
assert result == {'ok': True}
session = DummySession.instances[0]
assert session.headers == {'User-Agent': 'test-agent'}
assert session.requests == [
('POST', 'https://example.com/api', {'data': {'query': 'example'}, 'ssl': 'ssl-context', 'params': {'page': 2}})
]
@pytest.mark.asyncio
async def test_post_fetch_proxy_branch_uses_get_with_http_proxy(monkeypatch) -> None:
reset_dummy_sessions()
created_connectors = []
monkeypatch.setattr(core_module.aiohttp, 'ClientSession', DummySession)
monkeypatch.setattr(core_module.asyncio, 'sleep', fake_sleep)
monkeypatch.setattr(core_module.ssl, 'create_default_context', lambda cafile=None: 'ssl-context')
monkeypatch.setattr(core_module.certifi, 'where', lambda: '/tmp/cacert.pem')
monkeypatch.setattr(AsyncFetcher, '_get_random_proxy', staticmethod(lambda proxy_dict: ('http://proxy.local:8080', 'http')))
async def fake_create_connector(proxy_url, proxy_type, ssl_context=None):
created_connectors.append((proxy_url, proxy_type, ssl_context))
return 'connector'
monkeypatch.setattr(AsyncFetcher, '_create_connector', fake_create_connector)
result = await AsyncFetcher.post_fetch('https://example.com/resource', proxy=True)
assert result == 'response-text'
assert created_connectors == [('http://proxy.local:8080', 'http', 'ssl-context')]
session = DummySession.instances[0]
assert session.connector == 'connector'
assert session.requests == [
('GET', 'https://example.com/resource', {'proxy': 'http://proxy.local:8080'})
]
================================================
FILE: tests/lib/test_output.py
================================================
from __future__ import annotations
from theHarvester.lib.output import print_linkedin_sections, sorted_unique
def test_sorted_unique_sorts_and_deduplicates() -> None:
assert sorted_unique(["b", "a", "b"]) == ["a", "b"]
def test_print_linkedin_sections_prints_links_when_present(capsys) -> None:
# Regression coverage: the CLI previously never printed LinkedIn links when the list was non-empty.
print_linkedin_sections(
engines=["linkedin"],
people=[],
links=["https://b.example", "https://a.example", "https://a.example"],
)
out = capsys.readouterr().out
assert "No LinkedIn users found" in out
assert "LinkedIn Links found: 3" in out
assert "https://a.example" in out
assert "https://b.example" in out
def test_print_linkedin_sections_prints_people_and_links(capsys) -> None:
print_linkedin_sections(
engines=["rocketreach"],
people=["bob", "alice", "bob"],
links=["https://z.example", "https://z.example"],
)
out = capsys.readouterr().out
assert "LinkedIn Users found: 3" in out
assert "alice" in out
assert "bob" in out
assert "LinkedIn Links found: 2" in out
assert "https://z.example" in out
================================================
FILE: tests/test_hackertarget_apikey.py
================================================
import pytest
from theHarvester.discovery import hackertarget as ht_mod
from theHarvester.lib.core import Core
class TestHackerTargetApiKey:
@pytest.mark.asyncio
async def test_do_search_with_apikey(self, monkeypatch):
# make Core.hackertarget_key return a known key
monkeypatch.setattr(Core, "hackertarget_key", lambda: "TESTKEY")
# monkeypatch AsyncFetcher.fetch_all to capture requested URLs
async def fake_fetch_all(urls, headers=None, proxy=False):
# ensure apikey present in each URL
assert all("apikey=TESTKEY" in u for u in urls)
return ["1.2.3.4,host.example.com\n", "No PTR records found\n"]
monkeypatch.setattr(ht_mod.AsyncFetcher, "fetch_all", fake_fetch_all)
s = ht_mod.SearchHackerTarget("example.com")
await s.do_search()
# after do_search, total_results should include our fake response (commas replaced by colons)
assert "1.2.3.4:host.example.com" in s.total_results
@pytest.mark.asyncio
async def test_do_search_without_apikey(self, monkeypatch):
monkeypatch.setattr(Core, "hackertarget_key", lambda: None)
async def fake_fetch_all(urls, headers=None, proxy=False):
assert all("apikey=" not in u for u in urls)
return ["1.2.3.4,host.example.com\n"]
monkeypatch.setattr(ht_mod.AsyncFetcher, "fetch_all", fake_fetch_all)
s = ht_mod.SearchHackerTarget("example.com")
await s.do_search()
assert "1.2.3.4:host.example.com" in s.total_results
================================================
FILE: tests/test_mojeek.py
================================================
import pytest
from theHarvester.discovery import mojeek
class TestMojeekSearch:
@pytest.mark.asyncio
async def test_process_and_parsing(self, monkeypatch):
called = {}
async def fake_fetch_all(urls, headers=None, proxy=False):
called["urls"] = urls
called["headers"] = headers
called["proxy"] = proxy
return [
"Contact admin@exemple.com sur www.exemple.com \n",
" dev@exemple.com est présent sur api.exemple.com \n"
]
import theHarvester.lib.core as core_module
monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)
search = mojeek.SearchMojeek(word="exemple.com", limit=20)
await search.process(proxy=True)
expected_urls = [
"https://www.mojeek.com/search?q=%40exemple.com&s=0",
"https://www.mojeek.com/search?q=%40exemple.com&s=10"
]
assert any("mojeek.com" in url for url in called["urls"])
emails = await search.get_emails()
hosts = await search.get_hostnames()
assert "admin@exemple.com" in emails
assert "dev@exemple.com" in emails
assert "www.exemple.com" in hosts
assert "api.exemple.com" in hosts
@pytest.mark.asyncio
async def test_pagination_limit(self, monkeypatch):
captured = {}
async def fake_fetch_all(urls, headers=None, proxy=False):
captured["urls"] = urls
return [""] * len(urls)
import theHarvester.lib.core as core_module
monkeypatch.setattr(core_module.AsyncFetcher, "fetch_all", fake_fetch_all)
monkeypatch.setattr(core_module.Core, "get_user_agent", staticmethod(lambda: "UA"), raising=True)
search = mojeek.SearchMojeek(word="exemple.com", limit=10)
await search.process()
assert len(captured["urls"]) == 1
================================================
FILE: tests/test_myparser.py
================================================
#!/usr/bin/env python3
# coding=utf-8
import pytest
from theHarvester.parsers import myparser
class TestMyParser(object):
@pytest.mark.asyncio
async def test_emails(self) -> None:
word = "domain.com"
results = "@domain.com***a@domain***banotherdomain.com***c@domain.com***d@sub.domain.com***"
parse = myparser.Parser(results, word)
emails = sorted(await parse.emails())
assert emails, ["c@domain.com", "d@sub.domain.com"]
if __name__ == "__main__":
pytest.main()
================================================
FILE: tests/test_security.py
================================================
import os
import re
import tempfile
from pathlib import Path
import pytest
from fastapi.testclient import TestClient
from theHarvester.__main__ import sanitize_filename, sanitize_for_xml
class TestCORSConfiguration:
"""Test CORS security configuration."""
def test_cors_does_not_allow_credentials_with_wildcard_origins(self):
"""
Security Test: CORS should not allow credentials with wildcard origins.
This prevents credential theft attacks where any origin can make
authenticated requests to the API.
"""
from theHarvester.lib.api.api import app
# Find CORS middleware in the app
cors_middleware = None
for middleware in app.user_middleware:
if 'CORSMiddleware' in str(middleware.cls):
cors_middleware = middleware
break
assert cors_middleware is not None, 'CORS middleware should be configured'
# Check that if allow_origins contains '*', allow_credentials must be False
# Access kwargs from the middleware
options = cors_middleware.kwargs
allow_origins = options.get('allow_origins', [])
allow_credentials = options.get('allow_credentials', False)
if isinstance(allow_origins, (list, tuple, set)) and '*' in allow_origins:
assert (
allow_credentials is False
), 'CRITICAL: CORS must not allow credentials with wildcard origins (CVE risk)'
def test_cors_restricts_http_methods(self):
"""
Security Test: CORS should restrict HTTP methods to only what's needed.
Reduces attack surface by limiting available methods.
"""
from theHarvester.lib.api.api import app
cors_middleware = None
for middleware in app.user_middleware:
if 'CORSMiddleware' in str(middleware.cls):
cors_middleware = middleware
break
assert cors_middleware is not None
options = cors_middleware.kwargs
allow_methods = options.get('allow_methods', [])
# Should not allow all methods
assert allow_methods != ['*'], 'CORS should restrict HTTP methods, not allow all (*)'
# Should only allow necessary methods (GET, POST for this API)
if isinstance(allow_methods, list):
dangerous_methods = {'DELETE', 'PUT', 'PATCH', 'TRACE', 'CONNECT'}
allowed_set = {m.upper() for m in allow_methods}
assert not (
allowed_set & dangerous_methods
), f'Unnecessary HTTP methods detected: {allowed_set & dangerous_methods}'
class TestXMLInjectionPrevention:
"""Test XML injection prevention."""
def test_sanitize_for_xml_escapes_special_characters(self):
"""
Security Test: Verify XML special characters are properly escaped.
Prevents XML injection attacks.
"""
# Test all XML special characters
test_cases = [
('&', '&'),
('<', '<'),
('>', '>'),
('"', '"'),
("'", '''),
('<script>alert("XSS")</script>', '<script>alert("XSS")</script>'),
('user@example.com & <test>', 'user@example.com & <test>'),
('Normal text', 'Normal text'),
]
for input_text, expected_output in test_cases:
result = sanitize_for_xml(input_text)
assert result == expected_output, f'Failed to properly escape: {input_text}'
def test_sanitize_for_xml_prevents_xml_entity_injection(self):
"""
Security Test: Prevent XML entity injection attempts.
"""
malicious_inputs = [
'<?xml version="1.0"?><!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>',
'<!ENTITY xxe SYSTEM "file:///dev/random">',
'<![CDATA[malicious]]>',
'<script>',
]
for malicious_input in malicious_inputs:
result = sanitize_for_xml(malicious_input)
# Ensure dangerous characters are escaped
assert '<' in result or '&' in result, f'Failed to sanitize: {malicious_input}'
assert '<' not in result or result == malicious_input.replace('<', '<'), f'XML tags not escaped: {malicious_input}'
def test_command_line_args_are_sanitized_in_xml_output(self):
"""
Security Test: Command line arguments must be sanitized before XML output.
This test is a conceptual check - in real usage, ensure the XML writing
code uses sanitize_for_xml() on all user-controlled data.
"""
# Simulate dangerous command line arguments
dangerous_args = [
'--domain=test.com',
"--source='<script>alert(1)</script>'",
'--output="; rm -rf /',
'--domain=example.com¶m=<injection>',
]
for arg in dangerous_args:
sanitized = sanitize_for_xml(arg)
# Verify no unescaped XML special characters remain
assert '<script>' not in sanitized, f'Script tag not escaped in: {arg}'
assert '¶m=' not in sanitized or '&' in sanitized, f'Ampersand not escaped in: {arg}'
class TestInformationDisclosure:
"""Test information disclosure prevention."""
@pytest.fixture
def client(self):
"""Create a test client for API testing."""
from theHarvester.lib.api.api import app
return TestClient(app)
def test_api_does_not_expose_traceback_in_error_responses(self, client):
"""
Security Test: API should never expose stack traces to clients.
Stack traces can reveal sensitive information about the system.
"""
# Test the /sources endpoint with a simulated error condition
response = client.get('/sources')
# Even if there's an error, traceback should not be in response
if response.status_code >= 400:
response_data = response.json()
assert 'traceback' not in response_data, 'Traceback exposed in error response'
assert 'Traceback' not in str(response_data), 'Traceback text found in response'
assert 'File "' not in str(response_data), 'File paths exposed in response'
def test_error_responses_do_not_leak_internal_paths(self, client):
"""
Security Test: Error messages should not reveal internal file paths.
"""
# Try various endpoints
endpoints = ['/sources', '/dnsbrute?domain=test', '/query?domain=test&source=baidu']
for endpoint in endpoints:
response = client.get(endpoint)
response_text = str(response.json() if response.status_code != 200 else {})
# Check for common path leakage patterns
path_patterns = [
r'/home/\w+/',
r'/usr/local/',
r'C:\\Users\\',
r'/var/www/',
r'site-packages/',
r'\.py:\d+', # filename.py:123
]
for pattern in path_patterns:
matches = re.findall(pattern, response_text)
assert not matches, f'Internal path leaked in {endpoint}: {matches}'
def test_debug_mode_does_not_expose_sensitive_info(self, client, monkeypatch):
"""
Security Test: Even with DEBUG=1, sensitive info should not be exposed to clients.
"""
# Set DEBUG environment variable
monkeypatch.setenv('DEBUG', '1')
# Make request that might trigger an error
response = client.get('/dnsbrute?domain=') # Invalid request
if response.status_code >= 400:
response_data = response.json()
# Even with DEBUG=1, traceback should NOT be sent to client
assert 'traceback' not in response_data, 'DEBUG mode exposes tracebacks to clients'
class TestPathTraversalPrevention:
"""Test path traversal prevention."""
def test_sanitize_filename_removes_path_components(self):
"""
Security Test: Filenames should not contain path traversal sequences.
"""
dangerous_filenames = [
'../../../etc/passwd',
'..\\..\\..\\windows\\system32\\config\\sam',
'/etc/passwd',
'C:\\Windows\\System32\\config\\sam',
'../../sensitive_file.txt',
'./../hidden_file',
'subdir/../../../etc/passwd',
]
for dangerous_filename in dangerous_filenames:
result = sanitize_filename(dangerous_filename)
# Should not contain any path separators
assert '/' not in result, f'Path separator found in sanitized filename: {result}'
assert '\\' not in result, f'Windows path separator found: {result}'
# Should not start with .. (parent directory reference at the beginning is most dangerous)
assert not result.startswith('..'), f'Parent directory reference at start: {result}'
# Should only be the basename
assert os.path.dirname(result) == '', f'Path component remains: {result}'
def test_sanitize_filename_removes_dangerous_characters(self):
"""
Security Test: Filenames should only contain safe characters.
"""
test_cases = [
'file; rm -rf /',
'file`whoami`.txt',
'file$(malicious).txt',
'file|cmd.txt',
'file&background.txt',
'normal-file_123.txt',
]
for input_filename in test_cases:
result = sanitize_filename(input_filename)
# Should not be empty
assert len(result) > 0, f'Sanitized filename is empty for: {input_filename}'
# Should not contain shell special characters
dangerous_chars = [';', '|', '&', '$', '`', '(', ')', '{', '}', '[', ']', '<', '>']
for char in dangerous_chars:
assert char not in result, f'Dangerous character {char} found in: {result}'
# Should only contain alphanumeric, dash, underscore, and dot
assert re.match(r'^[a-zA-Z0-9._-]+$', result), f'Invalid characters in sanitized filename: {result}'
def test_sanitize_filename_prevents_hidden_files(self):
"""
Security Test: Prevent creation of hidden files.
"""
hidden_files = ['.bashrc', '.ssh_config', '.env', '..hidden', '.']
for hidden_file in hidden_files:
result = sanitize_filename(hidden_file)
# Should not start with a dot (except for allowed extensions)
if result: # If not empty
assert not result.startswith('.'), f'Hidden file not prevented: {result}'
def test_filename_sanitization_preserves_safe_filenames(self):
"""
Security Test: Safe filenames should remain mostly unchanged.
"""
safe_filenames = [
'report.json',
'results_2024-01-17.xml',
'scan-output.txt',
'data_file_v2.csv',
]
for safe_filename in safe_filenames:
result = sanitize_filename(safe_filename)
# Safe filenames should be preserved (possibly with minor changes)
assert len(result) > 0, 'Safe filename was completely removed'
assert '.' in result if '.' in safe_filename else True, 'File extension removed incorrectly'
def test_path_traversal_in_file_operations(self):
"""
Integration Test: Verify file operations don't allow path traversal.
"""
# This tests the actual usage in the code
from theHarvester.__main__ import sanitize_filename
# Simulate user input
user_input = '../../../etc/passwd'
sanitized = sanitize_filename(user_input)
# Try to create a file with sanitized name
with tempfile.TemporaryDirectory() as tmpdir:
safe_path = os.path.join(tmpdir, sanitized)
# Ensure the resolved path is still within tmpdir
assert os.path.commonpath([tmpdir, safe_path]) == tmpdir, 'Path traversal detected!'
# Verify we can't escape the directory
assert tmpdir in os.path.abspath(safe_path), 'File path escaped temporary directory'
class TestSecurityBestPractices:
"""Additional security best practices tests."""
def test_no_hardcoded_secrets_in_code(self):
"""
Security Test: Ensure no hardcoded secrets in main code files.
"""
# Check main application files for common secret patterns
files_to_check = [
'theHarvester/__main__.py',
'theHarvester/lib/api/api.py',
'theHarvester/lib/core.py',
]
# Patterns that might indicate hardcoded secrets
secret_patterns = [
r'password\s*=\s*["\'][^"\']+["\']',
r'api_key\s*=\s*["\'][a-zA-Z0-9]{20,}["\']',
r'secret\s*=\s*["\'][^"\']+["\']',
r'token\s*=\s*["\'][a-zA-Z0-9]{20,}["\']',
]
for file_path in files_to_check:
if os.path.exists(file_path):
with open(file_path) as f:
content = f.read()
for pattern in secret_patterns:
matches = re.findall(pattern, content, re.IGNORECASE)
# Filter out obvious non-secrets (like example values, empty strings, variable names)
real_matches = [
m
for m in matches
if 'example' not in m.lower()
and 'your_' not in m.lower()
and '""' not in m
and "''" not in m
]
assert not real_matches, f'Potential hardcoded secret in {file_path}: {real_matches}'
def test_api_has_rate_limiting(self):
"""
Security Test: Verify API endpoints have rate limiting enabled.
"""
from theHarvester.lib.api.api import app
# Check that rate limiting is configured
assert hasattr(app.state, 'limiter'), 'Rate limiter not configured'
assert app.state.limiter is not None, 'Rate limiter is None'
def test_sensitive_endpoints_require_validation(self):
"""
Security Test: Ensure sensitive endpoints validate input.
"""
from fastapi.testclient import TestClient
from theHarvester.lib.api.api import app
client = TestClient(app)
# Test that endpoints reject invalid input
# Note: The /query endpoint requires 'source' as a list parameter
test_cases = [
('/dnsbrute?domain=', 400), # Empty domain should be rejected
]
for endpoint, expected_status in test_cases:
response = client.get(endpoint)
assert (
response.status_code >= 400
), f'Endpoint {endpoint} should reject invalid input (got {response.status_code})'
# Test query endpoint with proper parameter format but invalid domain
response = client.get('/query?domain=a&source=baidu') # Too short domain
# This may or may not fail depending on validation, but we check it doesn't crash
assert response.status_code in [200, 400, 422, 500], 'Unexpected status code'
if __name__ == '__main__':
pytest.main([__file__, '-v'])
================================================
FILE: theHarvester/__init__.py
================================================
__version__ = '4.10.1'
================================================
FILE: theHarvester/__main__.py
================================================
import argparse
import asyncio
import os
import re
import secrets
import string
import sys
import time
import traceback
from typing import TYPE_CHECKING, Any
import anyio
import netaddr
import ujson
from aiomultiprocess import Pool
from theHarvester.discovery import (
api_endpoints,
baidusearch,
bevigil,
bitbucket,
bravesearch,
bufferoverun,
builtwith,
censysearch,
certspottersearch,
chaos,
commoncrawl,
criminalip,
crtsh,
dnssearch,
duckduckgosearch,
fofa,
fullhuntsearch,
githubcode,
gitlabsearch,
hackertarget,
haveibeenpwned,
hudsonrocksearch,
huntersearch,
intelxsearch,
leakix,
leaklookup,
mojeek,
netlas,
onyphe,
otxsearch,
pentesttools,
projectdiscovery,
rapiddns,
robtex,
rocketreach,
search_dehashed,
search_dnsdumpster,
searchhunterhow,
securityscorecard,
securitytrailssearch,
shodansearch,
subdomaincenter,
subdomainfinderc99,
takeover,
thc,
threatcrowd,
tombasearch,
urlscan,
venacussearch,
virustotal,
waybackarchive,
whoisxml,
windvane,
yahoosearch,
zoomeyesearch,
)
from theHarvester.discovery.constants import MissingKey
from theHarvester.lib import hostchecker, stash
from theHarvester.lib.core import DATA_DIR, Core, show_default_error_message
from theHarvester.lib.output import print_linkedin_sections, print_section, sorted_unique
from theHarvester.screenshot.screenshot import ScreenShotter
if TYPE_CHECKING:
from collections.abc import Awaitable
def sanitize_for_xml(text: str) -> str:
"""Sanitize text for safe inclusion in XML documents."""
text = text.replace('&', '&')
text = text.replace('<', '<')
text = text.replace('>', '>')
text = text.replace('"', '"')
text = text.replace("'", ''')
return text
def sanitize_filename(filename: str) -> str:
filename = os.path.basename(filename)
filename = re.sub(r'[^a-zA-Z0-9._-]', '_', filename)
# Remove consecutive underscores
filename = re.sub(r'_+', '_', filename)
filename = filename.strip('_.')
if filename.startswith('.'):
filename = '_' + filename
# Ensure we have a valid filename
if not filename:
filename = 'sanitized_file'
return filename
async def start(rest_args: argparse.Namespace | None = None):
"""Main program function"""
parser = argparse.ArgumentParser(
description='theHarvester is used to gather open source intelligence (OSINT) on a company or domain.'
)
parser.add_argument('-d', '--domain', help='Company name or domain to search.', required=True)
parser.add_argument(
'-l',
'--limit',
help='Limit the number of search results, default=500.',
default=500,
type=int,
)
parser.add_argument(
'-S',
'--start',
help='Start with result number X, default=0.',
default=0,
type=int,
)
parser.add_argument(
'-p',
'--proxies',
help='Use proxies for requests, enter proxies in proxies.yaml.',
default=False,
action='store_true',
)
parser.add_argument(
'-s',
'--shodan',
help='Use Shodan to query discovered hosts.',
default=False,
action='store_true',
)
parser.add_argument(
'--screenshot',
help='Take screenshots of resolved domains specify output directory: --screenshot output_directory',
default='',
type=str,
)
parser.add_argument('-e', '--dns-server', help='DNS server to use for lookup.')
parser.add_argument(
'-t',
'--take-over',
help='Check for takeovers.',
default=False,
action='store_true',
)
parser.add_argument(
'-r',
'--dns-resolve',
help='Perform DNS resolution on subdomains with a resolver list or passed in resolvers, default False.',
default='',
type=str,
nargs='?',
)
parser.add_argument(
'-n',
'--dns-lookup',
help='Enable DNS server lookup, default False.',
default=False,
action='store_true',
)
parser.add_argument(
'-c',
'--dns-brute',
help='Perform a DNS brute force on the domain.',
default=False,
action='store_true',
)
parser.add_argument(
'-f',
'--filename',
help='Save the results to an XML and JSON file.',
default='',
type=str,
)
parser.add_argument('-w', '--wordlist', help='Specify a wordlist for API endpoint scanning.', default='')
parser.add_argument('-a', '--api-scan', help='Scan for API endpoints.', action='store_true')
parser.add_argument(
'-q',
'--quiet',
help='Suppress missing API key warnings and reading the api-keys file.',
default=False,
action='store_true',
)
parser.add_argument(
'-b',
'--source',
help="""baidu, bevigil, bitbucket, brave, bufferoverun,
builtwith, censys, certspotter, chaos, commoncrawl, criminalip, crtsh, dehashed, dnsdumpster, duckduckgo, fofa, fullhunt, github-code,
gitlab, hackertarget, haveibeenpwned, hudsonrock, hunter, hunterhow, intelx, leakix, leaklookup, mojeek, netlas, onyphe, otx, pentesttools,
projectdiscovery, rapiddns, robtex, rocketreach, securityscorecard, securityTrails, shodan, subdomaincenter,
subdomainfinderc99, thc, threatcrowd, tomba, urlscan, venacus, virustotal, waybackarchive, whoisxml, windvane, yahoo, zoomeye""",
)
# determines if the filename is coming from rest api or user
rest_filename = ''
# indicates this from the rest API
if rest_args:
if rest_args.source and rest_args.source == 'getsources':
return list(sorted(Core.get_supportedengines()))
elif rest_args.dns_brute:
args = rest_args
dnsbrute = (rest_args.dns_brute, True)
else:
args = rest_args
# We need to make sure the filename is random as to not overwrite other files
filename: str = args.filename
alphabet = string.ascii_letters + string.digits
rest_filename += f'{"".join(secrets.choice(alphabet) for _ in range(32))}_{filename}' if len(filename) != 0 else ''
else:
args = parser.parse_args()
filename = args.filename
dnsbrute = (args.dns_brute, False)
Core.quiet = getattr(args, 'quiet', False)
try:
db = stash.StashManager()
await db.do_init()
except (AttributeError, OSError, RuntimeError, ValueError) as init_error:
if not args.quiet:
print(f'Error initializing StashManager: {init_error}')
raise ValueError('Failed to initialize StashManager')
if len(filename) > 0:
if filename.startswith('~/'):
# Allow home directory expansion but sanitize the rest
base_path = await anyio.Path('~').expanduser()
sanitized = sanitize_filename(filename[2:])
filename = str(base_path.joinpath(sanitized))
elif os.path.isabs(filename):
# For absolute paths, sanitize just the filename component
dirname = os.path.dirname(filename)
basename = sanitize_filename(os.path.basename(filename))
filename = os.path.join(dirname, basename)
else:
# For relative paths, sanitize the entire filename
filename = sanitize_filename(filename)
all_emails: list = []
all_hosts: list = []
all_ip: list = []
all_people: list[dict[str, str]] = []
dnslookup = args.dns_lookup
dnsserver = args.dns_server # TODO arg is not used anywhere replace with resolvers wordlist arg dnsresolve
dnsresolve: str | None = args.dns_resolve
final_dns_resolver_list = []
if dnsresolve is not None and len(dnsresolve) > 0:
# Three scenarios:
# 8.8.8.8
# 1.1.1.1,8.8.8.8 or 1.1.1.1, 8.8.8.8
# resolvers.txt
if await anyio.Path(dnsresolve).exists():
with open(dnsresolve, encoding='UTF-8') as fp:
for line in fp:
line = line.strip()
if len(line) == 0:
continue
try:
_ = netaddr.IPAddress(line)
final_dns_resolver_list.append(line)
except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
print(f'An exception has occurred while reading from: {dnsresolve}, {e}')
print(f'Current line: {line}')
else:
cleaned = dnsresolve.replace(' ', '')
resolver_candidates = cleaned.split(',') if ',' in cleaned else [cleaned]
for item in resolver_candidates:
if len(item) == 0:
continue
try:
# Verify user passed in an IP; this does not validate resolver behavior
_ = netaddr.IPAddress(item)
final_dns_resolver_list.append(item)
except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
print(f'Passed DNS resolver is invalid, skipping: {item} ({e})')
# if for some reason, there are duplicates
final_dns_resolver_list = list(set(final_dns_resolver_list))
if len(final_dns_resolver_list) == 0:
print('No valid DNS resolvers were parsed from --dns-resolve; continuing without custom resolvers.')
engines: list = []
# If the user specifies
full: list = []
ips: list = []
host_ip: list = []
limit: int = args.limit
shodan = args.shodan
start: int = args.start
all_urls: list = []
vhost: list = []
word: str = args.domain.rstrip('\n')
takeover_status = args.take_over
use_proxy = args.proxies
linkedin_people_list_tracker: list = []
linkedin_links_tracker: list = []
twitter_people_list_tracker: list = []
interesting_urls: list = []
total_asns: list = []
linkedin_people_list_tracker = []
linkedin_links_tracker = []
twitter_people_list_tracker = []
interesting_urls = []
total_asns = []
async def store(
search_engine: Any,
source: str,
process_param: Any = None,
store_host: bool = False,
store_emails: bool = False,
store_ip: bool = False,
store_people: bool = False,
store_links: bool = False,
store_results: bool = False,
store_interestingurls: bool = False,
store_asns: bool = False,
) -> None:
"""
Persist details into the database.
The details to be stored are controlled by the parameters passed to the method.
:param search_engine: search engine to fetch details from
:param source: source against which the details (corresponding to the search engine) need to be persisted
:param process_param: any parameters to be passed to the search engine eg: Google needs google_dorking
:param store_host: whether to store hosts
:param store_emails: whether to store emails
:param store_ip: whether to store IP address
:param store_people: whether to store user details
:param store_links: whether to store links
:param store_results: whether to fetch details from get_results() and persist
:param store_interestingurls: whether to store interesting urls
:param store_asns: whether to store asns
"""
(
await search_engine.process(use_proxy)
if process_param is None
else await search_engine.process(process_param, use_proxy)
)
db_stash = stash.StashManager()
if source:
print(f'[*] Searching {source[0].upper() + source[1:]}. ')
if store_host:
host_names = list({host for host in await search_engine.get_hostnames() if f'.{word}' in host})
host_names = list(host_names)
if source != 'hackertarget' and source != 'pentesttools' and source != 'rapiddns':
# If a source is inside this conditional, it means the hosts returned must be resolved to obtain ip
# This should only be checked if --dns-resolve has a wordlist
if dnsresolve is None or len(final_dns_resolver_list) > 0:
# indicates that -r was passed in if dnsresolve is None
full_hosts_checker = hostchecker.Checker(host_names, final_dns_resolver_list)
# If full, this is only getting resolved hosts
(
resolved_pair,
_temp_hosts,
temp_ips,
) = await full_hosts_checker.check()
all_ip.extend(temp_ips)
full.extend(resolved_pair)
# full.extend(temp_hosts)
else:
full.extend(host_names)
else:
full.extend(host_names)
all_hosts.extend(host_names)
await db_stash.store_all(word, all_hosts, 'host', source)
if store_emails:
email_list = await search_engine.get_emails()
all_emails.extend(email_list)
await db_stash.store_all(word, email_list, 'email', source)
if store_ip:
ips_list = await search_engine.get_ips()
all_ip.extend(ips_list)
await db_stash.store_all(word, all_ip, 'ip', source)
if store_results:
email_list, host_names, urls = await search_engine.get_results()
all_emails.extend(email_list)
host_names = list({host for host in host_names if f'.{word}' in host})
all_urls.extend(urls)
all_hosts.extend(host_names)
await db.store_all(word, all_hosts, 'host', source)
await db.store_all(word, all_emails, 'email', source)
if store_people:
people_list = await search_engine.get_people()
all_people.extend(people_list)
await db_stash.store_all(word, people_list, 'people', source)
if store_links:
links = await search_engine.get_links()
linkedin_links_tracker.extend(links)
if len(links) > 0:
await db.store_all(word, links, 'linkedinlinks', source)
if store_interestingurls:
iurls = await search_engine.get_interestingurls()
interesting_urls.extend(iurls)
if len(iurls) > 0:
await db.store_all(word, iurls, 'interestingurls', source)
if store_asns:
fasns = await search_engine.get_asns()
total_asns.extend(fasns)
if len(fasns) > 0:
await db.store_all(word, fasns, 'asns', source)
stor_lst = []
if args.source is not None:
if args.source.lower() != 'all':
engines = sorted(set(map(str.strip, args.source.split(','))))
else:
engines = Core.get_supportedengines()
# Iterate through search engines in order
if set(engines).issubset(Core.get_supportedengines()):
print(f'\n[*] Target: {word} \n')
for engineitem in engines:
if engineitem == 'baidu':
try:
baidu_search = baidusearch.SearchBaidu(word, limit)
stor_lst.append(
store(
baidu_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'bevigil':
try:
bevigil_search = bevigil.SearchBeVigil(word)
stor_lst.append(
store(
bevigil_search,
engineitem,
store_host=True,
store_interestingurls=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, error=e)
elif engineitem == 'bitbucket':
try:
bitbucket_search = bitbucket.SearchBitBucket(word, limit)
stor_lst.append(
store(
bitbucket_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as ex:
if isinstance(ex, MissingKey):
print(MissingKey('Bitbucket'))
else:
show_default_error_message(engineitem, word, ex)
elif engineitem == 'brave':
try:
brave_search = bravesearch.SearchBrave(word, limit)
stor_lst.append(
store(
brave_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, error=e)
elif engineitem == 'bufferoverun':
try:
bufferoverun_search = bufferoverun.SearchBufferover(word)
stor_lst.append(
store(
bufferoverun_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'builtwith':
try:
builtwith_search = builtwith.SearchBuiltWith(word)
stor_lst.append(store(builtwith_search, engineitem, store_host=True, store_interestingurls=True))
except Exception as e:
if isinstance(e, MissingKey):
print(f"Failed to perform BuiltWith search for word: '{word}'")
print(f'A Missing Key Error occurred in builtwith: {e}')
else:
show_default_error_message(engineitem, word, e)
elif engineitem == 'censys':
try:
censys_search = censysearch.SearchCensys(word, limit)
stor_lst.append(
store(
censys_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except MissingKey as mk:
if not args.quiet:
print(f'Censys API key is missing or invalid: {mk}')
except ConnectionError as ce:
if not args.quiet:
print(f'Network error while querying Censys: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Timeout occurred while contacting Censys: {te}')
except ValueError as ve:
if not args.quiet:
print(f'Censys returned unexpected data: {ve}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in Censys module: {e}')
elif engineitem == 'certspotter':
try:
certspotter_search = certspottersearch.SearchCertspoter(word)
stor_lst.append(store(certspotter_search, engineitem, None, store_host=True))
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing Certspotter: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to Certspotter timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'Certspotter returned invalid data: {ve}')
except MissingKey as mk:
if not args.quiet:
print(f'Unexpected response structure from Certspotter (missing key): {mk}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in Certspotter module: {e}')
elif engineitem == 'chaos':
try:
chaos_search = chaos.SearchChaos(word)
stor_lst.append(
store(
chaos_search,
engineitem,
store_host=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Chaos: {e}')
else:
show_default_error_message(engineitem, word, e)
elif engineitem == 'commoncrawl':
try:
commoncrawl_search = commoncrawl.SearchCommoncrawl(word)
stor_lst.append(
store(
commoncrawl_search,
engineitem,
store_host=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'criminalip':
try:
criminalip_search = criminalip.SearchCriminalIP(word)
stor_lst.append(
store(
criminalip_search,
engineitem,
store_host=True,
store_ip=True,
store_asns=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing key error occurred in criminalip: {e}')
else:
show_default_error_message(engineitem, word, e)
elif engineitem == 'crtsh':
try:
crtsh_search = crtsh.SearchCrtsh(word)
stor_lst.append(store(crtsh_search, 'CRTsh', store_host=True))
except Exception as e:
print(f'[!] A timeout occurred with crtsh, cannot find {args.domain}\n {e}')
elif engineitem == 'dehashed':
try:
dehashed_search = search_dehashed.SearchDehashed(word)
stor_lst.append(
store(
dehashed_search,
engineitem,
store_host=False,
store_ip=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in dehashed: {e}')
else:
show_default_error_message(engineitem, word, e)
elif engineitem == 'dnsdumpster':
try:
dnsdumpster_search = search_dnsdumpster.SearchDNSDumpster(word)
stor_lst.append(
store(
dnsdumpster_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except MissingKey as e:
if not args.quiet:
print(e)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'duckduckgo':
duckduckgo_search = duckduckgosearch.SearchDuckDuckGo(word, limit)
stor_lst.append(
store(
duckduckgo_search,
engineitem,
store_host=True,
store_emails=True,
)
)
elif engineitem == 'fofa':
try:
fofa_search = fofa.SearchFofa(word)
stor_lst.append(
store(
fofa_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Fofa: {e}')
else:
show_default_error_message(engineitem, word, e)
elif engineitem == 'fullhunt':
try:
fullhunt_search = fullhuntsearch.SearchFullHunt(word)
stor_lst.append(store(fullhunt_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in fullhunt: {e}')
elif engineitem == 'github-code':
try:
github_search = githubcode.SearchGithubCode(word, limit)
stor_lst.append(
store(
github_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except MissingKey as ex:
if not args.quiet:
print(f'A Missing Key error occurred in github-code: {ex}')
elif engineitem == 'gitlab':
try:
gitlab_search = gitlabsearch.SearchGitlab(word)
stor_lst.append(
store(
gitlab_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'hackertarget':
try:
hackertarget_search = hackertarget.SearchHackerTarget(word)
stor_lst.append(store(hackertarget_search, engineitem, store_host=True))
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'haveibeenpwned':
try:
haveibeenpwned_search = haveibeenpwned.SearchHaveIBeenPwned(word)
stor_lst.append(
store(
haveibeenpwned_search,
engineitem,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(MissingKey('HaveIBeenPwned'))
else:
print(f'An exception has occurred in HaveIBeenPwned search: {e}')
elif engineitem == 'hudsonrock':
try:
hudsonrock_search = hudsonrocksearch.SearchHudsonRock(word)
stor_lst.append(
store(
hudsonrock_search,
engineitem,
store_host=True,
store_emails=True,
store_ip=True,
)
)
except Exception as e:
print(f'An exception has occurred in Hudson Rock search: {e}')
elif engineitem == 'hunter':
try:
hunter_search = huntersearch.SearchHunter(word, limit, start)
stor_lst.append(
store(
hunter_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Hunter: {e}')
elif engineitem == 'hunterhow':
try:
hunterhow_search = searchhunterhow.SearchHunterHow(word)
stor_lst.append(store(hunterhow_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Hunter How: {e}')
else:
print(f'An exception has occurred in hunterhow search: {e}')
elif engineitem == 'intelx':
try:
intelx_search = intelxsearch.SearchIntelx(word)
stor_lst.append(
store(
intelx_search,
engineitem,
store_interestingurls=True,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in intelx: {e}')
else:
print(f'An exception has occurred in Intelx search: {e}')
elif engineitem == 'leakix':
try:
leakix_search = leakix.SearchLeakix(word)
stor_lst.append(
store(
leakix_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'leaklookup':
try:
leaklookup_search = leaklookup.SearchLeakLookup(word)
stor_lst.append(
store(
leaklookup_search,
engineitem,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
print(f'A Missing Key error occurred in LeakLookup: {e}')
else:
print(f'An exception has occurred in LeakLookup search: {e}')
elif engineitem == 'mojeek':
try:
mojeek_search = mojeek.SearchMojeek(word, limit)
stor_lst.append(
store(
mojeek_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
print(f'A Missing Key error occurred in Mojeek: {e}')
else:
print(f'An exception has occurred in Mojeek search: {e}')
elif engineitem == 'netlas':
try:
netlas_search = netlas.SearchNetlas(word, limit)
stor_lst.append(
store(
netlas_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Netlas: {e}')
elif engineitem == 'onyphe':
try:
onyphe_search = onyphe.SearchOnyphe(word)
stor_lst.append(
store(
onyphe_search,
engineitem,
store_host=True,
store_ip=True,
store_asns=True,
)
)
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing Onyphe: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to Onyphe timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'Onyphe returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from Onyphe (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in Onyphe module: {e}')
elif engineitem == 'otx':
try:
otxsearch_search = otxsearch.SearchOtx(word)
stor_lst.append(
store(
otxsearch_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing OTX: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to OTX timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'OTX returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from OTX (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in OTX module: {e}')
elif engineitem == 'pentesttools':
try:
pentesttools_search = pentesttools.SearchPentestTools(word)
stor_lst.append(store(pentesttools_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in PentestTools search: {e}')
else:
print(f'An exception has occurred in PentestTools search: {e}')
elif engineitem == 'projectdiscovery':
try:
projectdiscovery_search = projectdiscovery.SearchDiscovery(word)
stor_lst.append(store(projectdiscovery_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in ProjectDiscovery: {e}')
else:
print('An exception has occurred in ProjectDiscovery')
elif engineitem == 'rapiddns':
try:
rapiddns_search = rapiddns.SearchRapidDns(word)
stor_lst.append(store(rapiddns_search, engineitem, store_host=True))
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing RapidDNS: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to RapidDNS timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'RapidDNS returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from RapidDNS (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in RapidDNS module: {e}')
elif engineitem == 'robtex':
try:
robtex_search = robtex.SearchRobtex(word)
stor_lst.append(
store(
robtex_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'rocketreach':
try:
rocketreach_search = rocketreach.SearchRocketReach(word, limit)
stor_lst.append(store(rocketreach_search, engineitem, store_links=True, store_emails=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in RocketReach: {e}')
else:
print(f'An exception has occurred in RocketReach: {e}')
elif engineitem == 'securityscorecard':
try:
securityscorecard_search = securityscorecard.SearchSecurityScorecard(word)
stor_lst.append(
store(
securityscorecard_search,
engineitem,
store_host=True,
store_ip=True,
store_interestingurls=True,
store_asns=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
print(MissingKey('SecurityScorecard'))
else:
print(f'An exception has occurred in SecurityScorecard search: {e}')
elif engineitem == 'securityTrails':
try:
securitytrails_search = securitytrailssearch.SearchSecuritytrail(word)
stor_lst.append(
store(
securitytrails_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred Security Trails: {e}')
elif engineitem == 'shodan':
try:
shodan_search = shodansearch.SearchShodan()
# For normal module usage, we need to create a wrapper that works with the store function
class ShodanWrapper:
def __init__(self, domain):
self.word = domain
self.hosts = set()
self.shodan = shodan_search
async def process(self, use_proxy: bool = False):
import socket
try:
# Resolve domain to IP and search in Shodan
ip = socket.gethostbyname(self.word)
print(f'\tSearching Shodan for {ip}')
result = await self.shodan.search_ip(ip)
if ip in result and isinstance(result[ip], dict):
# Add the IP as a host for consistency with other modules
self.hosts.add(ip)
for host in result[ip].get('hostnames', []):
self.hosts.add(host)
print(f'Found Shodan data for {ip}')
elif ip in result and isinstance(result[ip], str):
print(f'{ip}: {result[ip]}')
except Exception as e:
print(f'Error in Shodan search: {e}')
async def get_hostnames(self):
return list(self.hosts)
shodan_wrapper = ShodanWrapper(word)
stor_lst.append(store(shodan_wrapper, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Shodan search: {e}')
else:
print(f'An exception has occurred in Shodan search: {e}')
elif engineitem == 'subdomaincenter':
try:
subdomaincenter_search = subdomaincenter.SubdomainCenter(word)
stor_lst.append(store(subdomaincenter_search, engineitem, store_host=True))
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing SubdomainCenter: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to SubdomainCenter timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'SubdomainCenter returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from SubdomainCenter (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in SubdomainCenter module: {e}')
elif engineitem == 'subdomainfinderc99':
try:
subdomainfinderc99_search = subdomainfinderc99.SearchSubdomainfinderc99(word)
stor_lst.append(store(subdomainfinderc99_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Subdomainfinderc99 search: {e}')
else:
print(f'An exception has occurred in Subdomainfinderc99 search: {e}')
elif engineitem == 'thc':
try:
thc_search = thc.SearchThc(word)
stor_lst.append(store(thc_search, engineitem, store_host=True))
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'threatcrowd':
try:
threatcrowd_search = threatcrowd.SearchThreatcrowd(word)
stor_lst.append(
store(
threatcrowd_search,
engineitem,
store_host=True,
store_ip=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'tomba':
try:
tomba_search = tombasearch.SearchTomba(word, limit, start)
stor_lst.append(
store(
tomba_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in Tomba: {e}')
elif engineitem == 'urlscan':
try:
urlscan_search = urlscan.SearchUrlscan(word)
stor_lst.append(
store(
urlscan_search,
engineitem,
store_host=True,
store_ip=True,
store_interestingurls=True,
store_asns=True,
)
)
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing Urlscan: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to Urlscan timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'Urlscan returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from Urlscan (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in Urlscan module: {e}')
elif engineitem == 'venacus':
try:
venacus_search = venacussearch.SearchVenacus(word=word, limit=limit, offset_doc=start)
stor_lst.append(
store(
venacus_search,
engineitem,
store_emails=True,
store_ip=True,
store_people=True,
store_interestingurls=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in venacus search: {e}')
else:
print(f'An exception has occurred in venacus search: {e}')
elif engineitem == 'virustotal':
try:
virustotal_search = virustotal.SearchVirustotal(word)
stor_lst.append(store(virustotal_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in virustotal search: {e}')
elif engineitem == 'waybackarchive':
try:
waybackarchive_search = waybackarchive.SearchWaybackarchive(word)
stor_lst.append(
store(
waybackarchive_search,
engineitem,
store_host=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'whoisxml':
try:
whoisxml_search = whoisxml.SearchWhoisXML(word)
stor_lst.append(store(whoisxml_search, engineitem, store_host=True))
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in whoisxml search: {e}')
else:
print(f'An exception has occurred in WhoisXML search: {e}')
elif engineitem == 'windvane':
try:
windvane_search = windvane.SearchWindvane(word)
stor_lst.append(
store(
windvane_search,
engineitem,
store_host=True,
store_ip=True,
store_emails=True,
)
)
except Exception as e:
show_default_error_message(engineitem, word, e)
elif engineitem == 'yahoo':
try:
yahoo_search = yahoosearch.SearchYahoo(word, limit)
stor_lst.append(
store(
yahoo_search,
engineitem,
store_host=True,
store_emails=True,
)
)
except ConnectionError as ce:
if not args.quiet:
print(f'Network connection error while accessing Yahoo: {ce}')
except TimeoutError as te:
if not args.quiet:
print(f'Request to Yahoo timed out: {te}')
except ValueError as ve:
if not args.quiet:
print(f'Yahoo returned invalid or unexpected data: {ve}')
except KeyError as ke:
if not args.quiet:
print(f'Unexpected response structure from Yahoo (missing key): {ke}')
except Exception as e:
if not args.quiet:
print(f'Unexpected error occurred in Yahoo module: {e}')
elif engineitem == 'zoomeye':
try:
zoomeye_search = zoomeyesearch.SearchZoomEye(word, limit)
stor_lst.append(
store(
zoomeye_search,
engineitem,
store_host=True,
store_emails=True,
store_ip=True,
store_interestingurls=True,
store_asns=True,
)
)
except Exception as e:
if isinstance(e, MissingKey):
if not args.quiet:
print(f'A Missing Key error occurred in zoomeye: {e}')
elif rest_args is not None:
try:
rest_args.dns_brute
except AttributeError:
print('\n[!] Invalid source.\n')
sys.exit(1)
else:
# Print which engines aren't supported
unsupported_engines = set(engines) - set(Core.get_supportedengines())
if unsupported_engines:
print(f'The following engines are not supported: {unsupported_engines}')
print('\n[!] Invalid source.\n')
sys.exit(1)
async def worker(queue):
while True:
# Get a "work item" out of the queue.
stor = await queue.get()
try:
await stor
queue.task_done()
# Notify the queue that the "work item" has been processed.
except Exception:
print('\n A error occurred while processing a "work item".\n')
queue.task_done()
async def handler(lst):
queue: asyncio.Queue[Awaitable[Any]] = asyncio.Queue()
for stor_method in lst:
# enqueue the coroutines
queue.put_nowait(stor_method)
# Create three worker tasks to process the queue concurrently.
tasks = []
for i in range(3):
task = asyncio.create_task(worker(queue))
tasks.append(task)
# Wait until the queue is fully processed.
await queue.join()
# Cancel our worker tasks.
for task in tasks:
task.cancel()
# Wait until all worker tasks are cancelled.
await asyncio.gather(*tasks, return_exceptions=True)
await handler(lst=stor_lst)
return_ips: list = []
if rest_args is not None and len(rest_filename) == 0 and rest_args.dns_brute is False:
# Indicates user is using REST api but not wanting output to be saved to a file
# cast to string so Rest API can understand the type
return_ips.extend([str(ip) for ip in sorted([netaddr.IPAddress(ip.strip()) for ip in set(all_ip)])])
# return list(set(all_emails)), return_ips, full, '', ''
all_hosts = [host.replace('www.', '') for host in all_hosts if host.replace('www.', '') in all_hosts]
all_hosts = list(sorted(set(all_hosts)))
return (
total_asns,
interesting_urls,
twitter_people_list_tracker,
linkedin_people_list_tracker,
linkedin_links_tracker,
all_urls,
all_ip,
all_emails,
all_hosts,
)
# Check to see if all_emails and all_hosts are defined.
try:
all_emails
except NameError:
print('\n\n[!] No emails found because all_emails is not defined.\n\n ')
sys.exit(1)
try:
all_hosts
except NameError:
print('\n\n[!] No hosts found because all_hosts is not defined.\n\n ')
sys.exit(1)
# Results
if len(total_asns) > 0:
print_section(f'\n[*] ASNS found: {len(total_asns)}', total_asns, '--------------------')
total_asns = sorted_unique(total_asns)
if len(interesting_urls) > 0:
print_section(f'\n[*] Interesting Urls found: {len(interesting_urls)}', interesting_urls, '--------------------')
interesting_urls = sorted_unique(interesting_urls)
if len(twitter_people_list_tracker) == 0 and 'twitter' in engines:
print('\n[*] No Twitter users found.\n\n')
elif len(twitter_people_list_tracker) >= 1:
print_section(
'\n[*] Twitter Users found: ' + str(len(twitter_people_list_tracker)),
twitter_people_list_tracker,
'---------------------',
)
twitter_people_list_tracker = sorted_unique(twitter_people_list_tracker)
print_linkedin_sections(engines, linkedin_people_list_tracker, linkedin_links_tracker)
linkedin_people_list_tracker = sorted_unique(linkedin_people_list_tracker)
linkedin_links_tracker = sorted_unique(linkedin_links_tracker)
length_urls = len(all_urls)
if length_urls == 0:
if len(engines) >= 1 and 'trello' in engines:
print('\n[*] No Trello URLs found.')
else:
total = length_urls
print_section('\n[*] Trello URLs found: ' + str(total), all_urls, '--------------------')
all_urls = sorted_unique(all_urls)
if len(all_ip) == 0:
print('\n[*] No IPs found.')
else:
print('\n[*] IPs found: ' + str(len(all_ip)))
print('-------------------')
# use netaddr as the list may contain ipv4 and ipv6 addresses
ip_list = []
for ip in set(all_ip):
try:
ip = ip.strip()
if len(ip) > 0:
if '/' in ip:
ip_list.append(str(netaddr.IPNetwork(ip)))
else:
ip_list.append(str(netaddr.IPAddress(ip)))
except (netaddr.core.AddrFormatError, ValueError, TypeError) as e:
print(f'An exception has occurred while adding: {ip} to ip_list: {e}')
continue
ip_list = list(sorted(ip_list))
print('\n'.join(map(str, ip_list)))
# Populate host_ip from ip_list for DNS lookup, virtual hosts search, and Shodan search
host_ip = ip_list
if len(all_emails) == 0:
print('\n[*] No emails found.')
else:
print('\n[*] Emails found: ' + str(len(all_emails)))
print('----------------------')
all_emails = sorted(list(set(all_emails)))
print('\n'.join(all_emails))
if len(all_people) == 0:
print('\n[*] No people found.')
else:
print('\n[*] People found: ' + str(len(all_people)))
print('----------------------')
for person in all_people:
print(person)
if len(all_hosts) == 0:
print('\n[*] No hosts found.\n\n')
else:
db = stash.StashManager()
if dnsresolve is None or len(final_dns_resolver_list) > 0:
temp = set()
for host in full:
if ':' in host:
# TODO parse addresses and sort them as they are IPs
subdomain, addr = host.split(':', 1)
if subdomain.endswith(word):
temp.add(subdomain + ':' + addr)
continue
if host.endswith(word):
if host[:4] == 'www.':
if host[4:] in all_hosts or host[4:] in full:
temp.add(host[4:])
continue
temp.add(host)
full = list(sorted(temp))
full.sort(key=lambda el: el.split(':')[0])
print('\n[*] Hosts found: ' + str(len(full)))
print('---------------------')
for host in full:
print(host)
try:
if ':' in host:
_, addr = host.split(':', 1)
await db.store(word, addr, 'ip', 'DNS-resolver')
except (OSError, RuntimeError, ValueError, TypeError) as e:
print(f'An exception has occurred while attempting to insert: {host} IP into DB: {e}')
continue
else:
all_hosts = [host.replace('www.', '') for host in all_hosts if host.replace('www.', '') in all_hosts]
all_hosts = list(sorted(set(all_hosts)))
print('\n[*] Hosts found: ' + str(len(all_hosts)))
print('---------------------')
for host in all_hosts:
print(host)
# DNS brute force
if dnsbrute and dnsbrute[0] is True:
print('\n[*] Starting DNS brute force.')
dns_force = dnssearch.DnsForce(word, final_dns_resolver_list, verbose=True)
resolved_pair, hosts, ips = await dns_force.run()
# Check if Rest API is being used if so return found hosts
if dnsbrute[1]:
return resolved_pair
db = stash.StashManager()
temp = set()
for host in resolved_pair:
if ':' in host:
# TODO parse addresses and sort them as they are IPs
subdomain, addr = host.split(':', 1)
if subdomain.endswith(word):
# Append to full, so it's within JSON/XML at the end if output file is requested
if host not in full:
full.append(host)
temp.add(subdomain + ':' + addr)
if host not in all_hosts:
all_hosts.append(host)
continue
if host.endswith(word):
if host[:4] == 'www.':
if host[4:] in all_hosts or host[4:] in full:
continue
if host not in full:
full.append(host)
temp.add(host)
if host not in all_hosts:
all_hosts.append(host)
print('\n[*] Hosts found after DNS brute force:')
for sub in temp:
print(sub)
await db.store_all(word, list(sorted(temp)), 'host', 'dns_bruteforce')
takeover_results = dict()
# TakeOver Checking
if takeover_status:
print('\n[*] Performing subdomain takeover check')
print('\n[*] Subdomain Takeover checking IS ACTIVE RECON')
search_take = takeover.TakeOver(all_hosts)
await search_take.populate_fingerprints()
await search_take.process(proxy=use_proxy)
takeover_results = await search_take.get_takeover_results()
# DNS reverse lookup
dnsrev: list = []
# print(f'DNSlookup: {dnslookup}')
if dnslookup is True:
print('\n[*] Starting active queries for DNSLookup.')
# reverse each iprange in a separate task
__reverse_dns_tasks: dict = {}
for entry in host_ip:
__ip_range = dnssearch.serialize_ip_range(ip=entry, netmask='24')
if __ip_range and __ip_range not in set(__reverse_dns_tasks.keys()):
print('\n[*] Performing reverse lookup on ' + __ip_range)
__reverse_dns_tasks[__ip_range] = asyncio.create_task(
dnssearch.reverse_all_ips_in_range(
iprange=__ip_range,
callback=dnssearch.generate_postprocessing_callback(
target=word, local_results=dnsrev, overall_results=full
),
nameservers=(final_dns_resolver_list if len(final_dns_resolver_list) > 0 else None),
)
)
# nameservers=list(map(str, dnsserver.split(','))) if dnsserver else None))
# run all the reversing tasks concurrently
await asyncio.gather(*__reverse_dns_tasks.values())
print('\n[*] Hosts found after reverse lookup (in target domain):')
print('--------------------------------------------------------')
for xh in dnsrev:
print(xh)
# Screenshots
screenshot_tups = []
if len(args.screenshot) > 0:
screen_shotter = ScreenShotter(args.screenshot)
path_exists = screen_shotter.verify_path()
# Verify the path exists, if not create it or if user does not create it skips screenshot
if path_exists:
await screen_shotter.verify_installation()
print(f'\nScreenshots can be found in: {screen_shotter.output}{screen_shotter.slash}')
start_time = time.perf_counter()
print('Filtering domains for ones we can reach')
if dnsresolve is None or len(final_dns_resolver_list) > 0:
unique_resolved_domains = {url.split(':')[0] for url in full if ':' in url and 'www.' not in url}
else:
# Technically not resolved in this case, which is not ideal
# You should always use dns resolve when doing screenshotting
print('NOTE for future use cases you should only use screenshotting in tandem with DNS resolving')
unique_resolved_domains = set(all_hosts)
if len(unique_resolved_domains) > 0:
# First filter out ones that didn't resolve
print('Attempting to visit unique resolved domains, this is ACTIVE RECON')
async with Pool(10) as pool:
results = await pool.map(screen_shotter.visit, list(unique_resolved_domains))
# Filter out domains that we couldn't connect to
unique_resolved_domains_list = list(sorted({tup[0] for tup in results if len(tup[1]) > 0}))
async with Pool(3) as pool:
print(f'Length of unique resolved domains: {len(unique_resolved_domains_list)} chunking now!\n')
# If you have the resources, you could make the function faster by increasing the chunk number
chunk_number = 14
for chunk in screen_shotter.chunk_list(unique_resolved_domains_list, chunk_number):
try:
screenshot_tups.extend(await pool.map(screen_shotter.take_screenshot, chunk))
except Exception as ee:
print(f'An exception has occurred while mapping: {ee}')
end = time.perf_counter()
# There is probably an easier way to do this
total = int(end - start_time)
mon, sec = divmod(total, 60)
hr, mon = divmod(mon, 60)
total_time = f'{mon:02d}:{sec:02d}'
print(f'Finished taking screenshots in {total_time} seconds')
print('[+] Note there may be leftover chrome processes you may have to kill manually\n')
# Shodan
shodanres = []
if shodan is True:
print('[*] Searching Shodan. ')
try:
for ip in host_ip:
try:
print('\tSearching for ' + ip)
shodan_search = shodansearch.SearchShodan()
shodandict = await shodan_search.search_ip(ip)
await asyncio.sleep(5)
# Check if the result is a string (error message)
if isinstance(shodandict[ip], str):
print(f'{ip}: {shodandict[ip]}')
continue
# Process the results if it's a dictionary
if isinstance(shodandict[ip], dict):
rowdata = []
for key, value in shodandict[ip].items():
if isinstance(value, int):
value = str(value)
if isinstance(value, list):
value = ', '.join(map(str, value))
rowdata.append(value)
shodanres.append(rowdata)
print(ujson.dumps(shodandict[ip], indent=4, sort_keys=True))
print('\n')
except Exception as ip_error:
print(f'[SHODAN-error] Error searching {ip}: {ip_error}')
continue
except Exception as e:
print(f'[!] An error occurred with Shodan: {e} ')
else:
pass
if filename != '':
print('\n[*] Reporting started.')
try:
if len(rest_filename) == 0:
filename = filename.rsplit('.', 1)[0] + '.xml'
else:
filename = 'theHarvester/app/static/' + rest_filename.rsplit('.', 1)[0] + '.xml'
# TODO use aiofiles if user is using rest api
# XML REPORT SECTION
with open(filename, 'w+') as file:
file.write('<?xml version="1.0" encoding="UTF-8"?><theHarvester>')
sanitized_args = [sanitize_for_xml(f'"{arg}"' if ' ' in arg else arg) for arg in sys.argv[1:]]
file.write('<cmd>' + ' '.join(sanitized_args) + '</cmd>')
for x in all_emails:
file.write('<email>' + sanitize_for_xml(x) + '</email>')
for x in full:
host, ip = x.split(':', 1) if ':' in x else (x, '')
if ip and len(ip) > 3:
file.write(f'<host><ip>{sanitize_for_xml(ip)}</ip><hostname>{sanitize_for_xml(host)}</hostname></host>')
else:
file.write(f'<host>{sanitize_for_xml(host)}</host>')
for x in vhost:
host, ip = x.split(':', 1) if ':' in x else (x, '')
if ip and len(ip) > 3:
file.write(
f'<vhost><ip>{sanitize_for_xml(ip)} </ip><hostname>{sanitize_for_xml(host)}</hostname></vhost>'
)
else:
file.write(f'<vhost>{sanitize_for_xml(host)}</vhost>')
# TODO add Shodan output into XML report
file.write('</theHarvester>')
print('[*] XML File saved.')
except (OSError, ValueError, TypeError, UnicodeEncodeError) as error:
print(f'[!] An error occurred while saving the XML file: {error}')
try:
# JSON REPORT SECTION
filename = filename.rsplit('.', 1)[0] + '.json'
# create dict with values for JSON output
json_dict: dict = dict()
# start by adding the command line arguments
json_dict['cmd'] = ' '.join([f'"{arg}"' if ' ' in arg else arg for arg in sys.argv[1:]])
# to determine if a variable exists
# it should but just a validation check
if 'ip_list' in locals():
if all_ip and len(all_ip) >= 1 and ip_list and len(ip_list) > 0:
json_dict['ips'] = ip_list
if len(all_emails) > 0:
json_dict['emails'] = all_emails
if dnsresolve is None or (len(final_dns_resolver_list) > 0 and len(full) > 0):
json_dict['hosts'] = full
elif len(all_hosts) > 0:
json_dict['hosts'] = all_hosts
else:
json_dict['hosts'] = []
if vhost and len(vhost) > 0:
json_dict['vhosts'] = vhost
if len(interesting_urls) > 0:
json_dict['interesting_urls'] = interesting_urls
if len(all_urls) > 0:
json_dict['trello_urls'] = all_urls
if len(total_asns) > 0:
json_dict['asns'] = total_asns
if len(twitter_people_list_tracker) > 0:
json_dict['twitter_people'] = twitter_people_list_tracker
if len(linkedin_people_list_tracker) > 0:
json_dict['linkedin_people'] = linkedin_people_list_tracker
if len(linkedin_links_tracker) > 0:
json_dict['linkedin_links'] = linkedin_links_tracker
if len(all_people) > 0:
json_dict['people'] = all_people
if takeover_status and len(takeover_results) > 0:
json_dict['takeover_results'] = takeover_results
json_dict['shodan'] = shodanres
with open(filename, 'w+') as fp:
dumped_json = ujson.dumps(json_dict, sort_keys=True)
fp.write(dumped_json)
print('[*] JSON File saved.')
except (OSError, ValueError, TypeError, UnicodeEncodeError) as er:
print(f'[!] An error occurred while saving the JSON file: {er} ')
print('\n\n')
# Enhanced code block for API Endpoint scanning feature
if args.api_scan or 'api_endpoints' in engines:
try:
# Define a default wordlist if none is specified
wordlist = args.wordlist if args.wordlist else str(DATA_DIR / 'wordlists' / 'api_endpoints.txt')
if not await anyio.Path(wordlist).exists():
print(f'\n[!] Wordlist not found: {wordlist}')
print('Creating a basic API wordlist for scanning...')
# Create a default simple API endpoint list
basic_endpoints = [
'/api',
'/api/v1',
'/api/v2',
'/api/v3',
'/graphql',
'/swagger',
'/docs',
'/redoc',
'/swagger-ui',
'/openapi.json',
'/api-docs',
'/rest',
'/ws',
'/swagger-ui.html',
'/health',
'/status',
'/metrics',
'/actuator',
'/debug',
]
temp_wordlist = str(DAT
gitextract_7wyx50xx/
├── .dockerignore
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ └── issue-template.md
│ ├── dependabot.yml
│ └── workflows/
│ ├── codeql-analysis.yml
│ ├── docker-build-push.yml
│ ├── dockerci.yml
│ └── theHarvester.yml
├── .gitignore
├── CHANGELOG.md
├── Dockerfile
├── README/
│ ├── CONTRIBUTING.md
│ ├── COPYING
│ └── LICENSES
├── README.md
├── bin/
│ ├── restfulHarvest
│ └── theHarvester
├── docker-compose.yml
├── pyproject.toml
├── tests/
│ ├── __init__.py
│ ├── discovery/
│ │ ├── __init__.py
│ │ ├── test_baidusearch.py
│ │ ├── test_censys.py
│ │ ├── test_certspotter.py
│ │ ├── test_criminalip.py
│ │ ├── test_githubcode.py
│ │ ├── test_githubcode_additions.py
│ │ ├── test_otx.py
│ │ ├── test_rocketreach.py
│ │ ├── test_shodan_engine.py
│ │ └── test_thc.py
│ ├── lib/
│ │ ├── test_core.py
│ │ └── test_output.py
│ ├── test_hackertarget_apikey.py
│ ├── test_mojeek.py
│ ├── test_myparser.py
│ └── test_security.py
└── theHarvester/
├── __init__.py
├── __main__.py
├── data/
│ ├── proxies.yaml
│ └── wordlists/
│ ├── api_endpoints.txt
│ ├── dns-big.txt
│ ├── dns-names.txt
│ ├── dorks.txt
│ ├── general/
│ │ └── common.txt
│ └── names_small.txt
├── discovery/
│ ├── __init__.py
│ ├── additional_apis.py
│ ├── api_endpoints.py
│ ├── baidusearch.py
│ ├── bevigil.py
│ ├── bitbucket.py
│ ├── bravesearch.py
│ ├── bufferoverun.py
│ ├── builtwith.py
│ ├── censysearch.py
│ ├── certspottersearch.py
│ ├── chaos.py
│ ├── commoncrawl.py
│ ├── constants.py
│ ├── criminalip.py
│ ├── crtsh.py
│ ├── dnssearch.py
│ ├── duckduckgosearch.py
│ ├── fofa.py
│ ├── fullhuntsearch.py
│ ├── githubcode.py
│ ├── gitlabsearch.py
│ ├── hackertarget.py
│ ├── haveibeenpwned.py
│ ├── hudsonrocksearch.py
│ ├── huntersearch.py
│ ├── intelxsearch.py
│ ├── leakix.py
│ ├── leaklookup.py
│ ├── mojeek.py
│ ├── netlas.py
│ ├── onyphe.py
│ ├── otxsearch.py
│ ├── pentesttools.py
│ ├── projectdiscovery.py
│ ├── rapiddns.py
│ ├── robtex.py
│ ├── rocketreach.py
│ ├── search_dehashed.py
│ ├── search_dnsdumpster.py
│ ├── searchhunterhow.py
│ ├── securityscorecard.py
│ ├── securitytrailssearch.py
│ ├── shodansearch.py
│ ├── subdomaincenter.py
│ ├── subdomainfinderc99.py
│ ├── takeover.py
│ ├── thc.py
│ ├── threatcrowd.py
│ ├── tombasearch.py
│ ├── urlscan.py
│ ├── venacussearch.py
│ ├── virustotal.py
│ ├── waybackarchive.py
│ ├── whoisxml.py
│ ├── windvane.py
│ ├── yahoosearch.py
│ └── zoomeyesearch.py
├── lib/
│ ├── __init__.py
│ ├── api/
│ │ ├── __init__.py
│ │ ├── additional_endpoints.py
│ │ ├── api.py
│ │ ├── api_example.py
│ │ ├── auth.py
│ │ └── static/
│ │ └── .gitkeep
│ ├── core.py
│ ├── hostchecker.py
│ ├── output.py
│ ├── resolvers.txt
│ └── stash.py
├── parsers/
│ ├── __init__.py
│ ├── intelxparser.py
│ ├── myparser.py
│ ├── securitytrailsparser.py
│ └── venacusparser.py
├── restfulHarvest.py
├── screenshot/
│ ├── __init__.py
│ └── screenshot.py
└── theHarvester.py
SYMBOL INDEX (777 symbols across 89 files)
FILE: tests/discovery/test_baidusearch.py
class TestBaiduSearch (line 6) | class TestBaiduSearch:
method test_process_and_parsing (line 8) | async def test_process_and_parsing(self, monkeypatch):
method test_pagination_limit_exclusive (line 50) | async def test_pagination_limit_exclusive(self, monkeypatch):
FILE: tests/discovery/test_censys.py
class _ProxyConnector (line 9) | class _ProxyConnector:
method from_url (line 11) | def from_url(*_args, **_kwargs):
class _FakeQuery (line 21) | class _FakeQuery:
method __init__ (line 22) | def __init__(self, pages):
method __iter__ (line 25) | def __iter__(self):
function test_missing_key_raises (line 30) | async def test_missing_key_raises(monkeypatch) -> None:
function test_search_uses_documented_pagination_and_fields (line 38) | async def test_search_uses_documented_pagination_and_fields(monkeypatch)...
function test_search_respects_limit_across_page_data (line 77) | async def test_search_respects_limit_across_page_data(monkeypatch) -> None:
FILE: tests/discovery/test_certspotter.py
class TestCertspotter (line 17) | class TestCertspotter(object):
method domain (line 19) | def domain() -> str:
class TestCertspotterSearch (line 24) | class TestCertspotterSearch(object):
method test_api (line 26) | async def test_api(self) -> None:
method test_search (line 33) | async def test_search(self) -> None:
FILE: tests/discovery/test_criminalip.py
function test_parser_handles_missing_legacy_fields (line 9) | async def test_parser_handles_missing_legacy_fields(monkeypatch) -> None:
function test_do_search_uses_v2_report_endpoint (line 53) | async def test_do_search_uses_v2_report_endpoint(monkeypatch) -> None:
FILE: tests/discovery/test_githubcode.py
class TestSearchGithubCode (line 9) | class TestSearchGithubCode:
class OkResponse (line 10) | class OkResponse:
method __init__ (line 14) | def __init__(self):
class FailureResponse (line 29) | class FailureResponse:
method __init__ (line 30) | def __init__(self):
class RetryResponse (line 34) | class RetryResponse:
method __init__ (line 35) | def __init__(self):
class MalformedResponse (line 39) | class MalformedResponse:
method __init__ (line 40) | def __init__(self):
method test_missing_key (line 57) | async def test_missing_key(self):
method test_fragments_from_response (line 63) | async def test_fragments_from_response(self):
method test_invalid_fragments_from_response (line 73) | async def test_invalid_fragments_from_response(self):
method test_next_page (line 82) | async def test_next_page(self):
method test_last_page (line 89) | async def test_last_page(self):
method test_infinite_loop_fix_page_zero (line 96) | async def test_infinite_loop_fix_page_zero(self):
method test_infinite_loop_fix_page_nonzero (line 111) | async def test_infinite_loop_fix_page_nonzero(self):
method test_infinite_loop_fix_old_vs_new_condition (line 126) | async def test_infinite_loop_fix_old_vs_new_condition(self):
FILE: tests/discovery/test_githubcode_additions.py
class TestSearchGithubCodeProcess (line 8) | class TestSearchGithubCodeProcess:
method test_process_stops_after_max_retries (line 10) | async def test_process_stops_after_max_retries(self, monkeypatch):
method test_process_stops_on_error_result (line 36) | async def test_process_stops_on_error_result(self, monkeypatch):
method test_process_breaks_on_same_page_pagination (line 59) | async def test_process_breaks_on_same_page_pagination(self, monkeypatch):
FILE: tests/discovery/test_otx.py
class TestOtx (line 16) | class TestOtx(object):
method domain (line 18) | def domain() -> str:
method test_search (line 22) | async def test_search(self) -> None:
FILE: tests/discovery/test_rocketreach.py
class _ProxyConnector (line 9) | class _ProxyConnector:
method from_url (line 11) | def from_url(*_args, **_kwargs):
function test_missing_key_raises (line 22) | async def test_missing_key_raises(monkeypatch) -> None:
function test_do_search_uses_people_data_endpoint_and_start_pagination (line 29) | async def test_do_search_uses_people_data_endpoint_and_start_pagination(...
function test_do_search_stops_on_throttling_message (line 98) | async def test_do_search_stops_on_throttling_message(monkeypatch) -> None:
FILE: tests/discovery/test_shodan_engine.py
class TestShodanEngine (line 8) | class TestShodanEngine:
method test_shodan_engine_processes_without_work_item_error_and_yields_hostnames (line 10) | async def test_shodan_engine_processes_without_work_item_error_and_yie...
FILE: tests/discovery/test_thc.py
class TestThcApi (line 28) | class TestThcApi:
method test_api_subdomains_download_endpoint_responds (line 32) | async def test_api_subdomains_download_endpoint_responds(self) -> None:
method test_api_subdomains_returns_text_format (line 43) | async def test_api_subdomains_returns_text_format(self) -> None:
method test_api_cli_subdomain_endpoint (line 55) | async def test_api_cli_subdomain_endpoint(self) -> None:
method test_api_returns_rate_limit_headers (line 66) | async def test_api_returns_rate_limit_headers(self) -> None:
class TestThcSubdomainSearch (line 81) | class TestThcSubdomainSearch:
method domain (line 85) | def domain() -> str:
method small_domain (line 89) | def small_domain() -> str:
method test_search_returns_set (line 93) | async def test_search_returns_set(self) -> None:
method test_search_finds_subdomains (line 104) | async def test_search_finds_subdomains(self) -> None:
method test_search_results_contain_target_domain (line 115) | async def test_search_results_contain_target_domain(self) -> None:
method test_search_no_duplicates (line 127) | async def test_search_no_duplicates(self) -> None:
class TestThcEdgeCases (line 142) | class TestThcEdgeCases:
method test_search_nonexistent_domain (line 146) | async def test_search_nonexistent_domain(self) -> None:
method test_search_empty_domain (line 159) | async def test_search_empty_domain(self) -> None:
method test_search_special_characters_domain (line 172) | async def test_search_special_characters_domain(self) -> None:
method test_search_unicode_domain (line 185) | async def test_search_unicode_domain(self) -> None:
method test_search_subdomain_as_input (line 198) | async def test_search_subdomain_as_input(self) -> None:
class TestThcProxy (line 212) | class TestThcProxy:
method domain (line 216) | def domain() -> str:
method test_process_accepts_proxy_parameter (line 220) | async def test_process_accepts_proxy_parameter(self) -> None:
method test_proxy_attribute_is_set (line 231) | async def test_proxy_attribute_is_set(self) -> None:
class TestThcInitialization (line 240) | class TestThcInitialization:
method test_init_sets_word (line 243) | def test_init_sets_word(self) -> None:
method test_init_creates_empty_results (line 249) | def test_init_creates_empty_results(self) -> None:
method test_init_proxy_default_false (line 255) | def test_init_proxy_default_false(self) -> None:
method test_init_has_rate_limit_settings (line 260) | def test_init_has_rate_limit_settings(self) -> None:
method test_class_has_required_methods (line 268) | def test_class_has_required_methods(self) -> None:
class TestThcResponseFormat (line 282) | class TestThcResponseFormat:
method domain (line 286) | def domain() -> str:
method test_hostnames_are_strings (line 290) | async def test_hostnames_are_strings(self) -> None:
method test_hostnames_are_valid_format (line 302) | async def test_hostnames_are_valid_format(self) -> None:
method test_hostnames_are_lowercase (line 316) | async def test_hostnames_are_lowercase(self) -> None:
class TestThcIntegration (line 332) | class TestThcIntegration:
method test_module_can_be_imported (line 336) | async def test_module_can_be_imported(self) -> None:
method test_search_class_exists (line 342) | async def test_search_class_exists(self) -> None:
method test_compatible_with_store_function (line 348) | async def test_compatible_with_store_function(self) -> None:
FILE: tests/lib/test_core.py
function mock_environ (line 15) | def mock_environ(monkeypatch, tmp_path: Path):
function mock_read_text (line 19) | def mock_read_text(mocked: dict[Path, str | Exception]):
function test_read_config_searches_config_dirs (line 40) | def test_read_config_searches_config_dirs(
function test_read_config_copies_default_to_home (line 57) | def test_read_config_copies_default_to_home(name: str, capsys):
class DummyResponse (line 79) | class DummyResponse:
method __init__ (line 80) | def __init__(self, text_value: str = 'response-text', json_value: Any ...
method __aenter__ (line 84) | async def __aenter__(self):
method __aexit__ (line 87) | async def __aexit__(self, exc_type, exc, tb):
method text (line 90) | async def text(self):
method json (line 93) | async def json(self):
class DummySession (line 97) | class DummySession:
method __init__ (line 100) | def __init__(self, *, headers=None, timeout=None, connector=None):
method __aenter__ (line 108) | async def __aenter__(self):
method __aexit__ (line 111) | async def __aexit__(self, exc_type, exc, tb):
method request (line 115) | def request(self, method: str, url: str, **kwargs):
method get (line 119) | def get(self, url: str, **kwargs):
method post (line 123) | def post(self, url: str, **kwargs):
method close (line 127) | async def close(self):
function reset_dummy_sessions (line 131) | def reset_dummy_sessions() -> None:
function fake_sleep (line 135) | async def fake_sleep(_seconds: float) -> None:
function test_api_keys_yaml_is_in_sync_with_core_accessors (line 139) | def test_api_keys_yaml_is_in_sync_with_core_accessors():
function test_api_key_accessors_delegate_to_shared_mapping (line 167) | def test_api_key_accessors_delegate_to_shared_mapping(monkeypatch, acces...
function test_fetch_creates_session_with_default_headers (line 186) | async def test_fetch_creates_session_with_default_headers(monkeypatch) -...
function test_fetch_uses_http_proxy_when_enabled (line 207) | async def test_fetch_uses_http_proxy_when_enabled(monkeypatch) -> None:
function test_post_fetch_decodes_string_payload_and_posts_params (line 231) | async def test_post_fetch_decodes_string_payload_and_posts_params(monkey...
function test_post_fetch_proxy_branch_uses_get_with_http_proxy (line 255) | async def test_post_fetch_proxy_branch_uses_get_with_http_proxy(monkeypa...
FILE: tests/lib/test_output.py
function test_sorted_unique_sorts_and_deduplicates (line 7) | def test_sorted_unique_sorts_and_deduplicates() -> None:
function test_print_linkedin_sections_prints_links_when_present (line 11) | def test_print_linkedin_sections_prints_links_when_present(capsys) -> None:
function test_print_linkedin_sections_prints_people_and_links (line 26) | def test_print_linkedin_sections_prints_people_and_links(capsys) -> None:
FILE: tests/test_hackertarget_apikey.py
class TestHackerTargetApiKey (line 6) | class TestHackerTargetApiKey:
method test_do_search_with_apikey (line 9) | async def test_do_search_with_apikey(self, monkeypatch):
method test_do_search_without_apikey (line 28) | async def test_do_search_without_apikey(self, monkeypatch):
FILE: tests/test_mojeek.py
class TestMojeekSearch (line 4) | class TestMojeekSearch:
method test_process_and_parsing (line 7) | async def test_process_and_parsing(self, monkeypatch):
method test_pagination_limit (line 42) | async def test_pagination_limit(self, monkeypatch):
FILE: tests/test_myparser.py
class TestMyParser (line 9) | class TestMyParser(object):
method test_emails (line 11) | async def test_emails(self) -> None:
FILE: tests/test_security.py
class TestCORSConfiguration (line 12) | class TestCORSConfiguration:
method test_cors_does_not_allow_credentials_with_wildcard_origins (line 15) | def test_cors_does_not_allow_credentials_with_wildcard_origins(self):
method test_cors_restricts_http_methods (line 44) | def test_cors_restricts_http_methods(self):
class TestXMLInjectionPrevention (line 75) | class TestXMLInjectionPrevention:
method test_sanitize_for_xml_escapes_special_characters (line 78) | def test_sanitize_for_xml_escapes_special_characters(self):
method test_sanitize_for_xml_prevents_xml_entity_injection (line 100) | def test_sanitize_for_xml_prevents_xml_entity_injection(self):
method test_command_line_args_are_sanitized_in_xml_output (line 117) | def test_command_line_args_are_sanitized_in_xml_output(self):
class TestInformationDisclosure (line 139) | class TestInformationDisclosure:
method client (line 143) | def client(self):
method test_api_does_not_expose_traceback_in_error_responses (line 149) | def test_api_does_not_expose_traceback_in_error_responses(self, client):
method test_error_responses_do_not_leak_internal_paths (line 165) | def test_error_responses_do_not_leak_internal_paths(self, client):
method test_debug_mode_does_not_expose_sensitive_info (line 190) | def test_debug_mode_does_not_expose_sensitive_info(self, client, monke...
class TestPathTraversalPrevention (line 206) | class TestPathTraversalPrevention:
method test_sanitize_filename_removes_path_components (line 209) | def test_sanitize_filename_removes_path_components(self):
method test_sanitize_filename_removes_dangerous_characters (line 236) | def test_sanitize_filename_removes_dangerous_characters(self):
method test_sanitize_filename_prevents_hidden_files (line 263) | def test_sanitize_filename_prevents_hidden_files(self):
method test_filename_sanitization_preserves_safe_filenames (line 276) | def test_filename_sanitization_preserves_safe_filenames(self):
method test_path_traversal_in_file_operations (line 294) | def test_path_traversal_in_file_operations(self):
class TestSecurityBestPractices (line 316) | class TestSecurityBestPractices:
method test_no_hardcoded_secrets_in_code (line 319) | def test_no_hardcoded_secrets_in_code(self):
method test_api_has_rate_limiting (line 356) | def test_api_has_rate_limiting(self):
method test_sensitive_endpoints_require_validation (line 366) | def test_sensitive_endpoints_require_validation(self):
FILE: theHarvester/__main__.py
function sanitize_for_xml (line 84) | def sanitize_for_xml(text: str) -> str:
function sanitize_filename (line 94) | def sanitize_filename(filename: str) -> str:
function start (line 108) | async def start(rest_args: argparse.Namespace | None = None):
function entry_point (line 1881) | async def entry_point() -> None:
FILE: theHarvester/discovery/additional_apis.py
class AdditionalAPIs (line 11) | class AdditionalAPIs:
method __init__ (line 14) | def __init__(self, domain: str, api_keys: dict[str, str] | None = None):
method process (line 41) | async def process(self, proxy: bool = False) -> dict[str, Any]:
method _process_haveibeenpwned (line 60) | async def _process_haveibeenpwned(self, proxy: bool = False):
method _process_leaklookup (line 70) | async def _process_leaklookup(self, proxy: bool = False):
method _process_securityscorecard (line 80) | async def _process_securityscorecard(self, proxy: bool = False):
method _process_builtwith (line 94) | async def _process_builtwith(self, proxy: bool = False):
method _process_shodan (line 110) | async def _process_shodan(self, proxy: bool = False):
method _is_valid_ip (line 161) | def _is_valid_ip(ip_str: str) -> bool:
method get_hosts (line 171) | async def get_hosts(self) -> set[str]:
method get_emails (line 175) | async def get_emails(self) -> set[str]:
FILE: theHarvester/discovery/api_endpoints.py
class EndpointResult (line 26) | class EndpointResult:
method to_dict (line 45) | def to_dict(self) -> dict[str, Any]:
class SearchApiEndpoints (line 50) | class SearchApiEndpoints:
method __init__ (line 55) | def __init__(
method do_search (line 392) | async def do_search(self) -> None:
method _detect_schema (line 434) | async def _detect_schema(self) -> str:
method _check_endpoint_with_semaphore (line 453) | async def _check_endpoint_with_semaphore(self, url: str) -> EndpointRe...
method _load_wordlist (line 458) | def _load_wordlist(self) -> list[str]:
method _check_endpoint (line 482) | async def _check_endpoint(self, url: str) -> EndpointResult | None:
method _get_headers (line 529) | def _get_headers(self) -> dict[str, str]:
method _process_response (line 544) | def _process_response(self, url: str, method: str, response, response_...
method _post_scan_analysis (line 711) | async def _post_scan_analysis(self) -> None:
method get_results_summary (line 729) | def get_results_summary(self) -> dict[str, Any]:
method _get_tech_stack_summary (line 750) | def _get_tech_stack_summary(self) -> dict[str, int]:
method get_detailed_results (line 758) | def get_detailed_results(self) -> list[dict[str, Any]]:
method get_hostnames (line 767) | def get_hostnames(self) -> set[str]:
method get_endpoints (line 771) | def get_endpoints(self) -> set[str]:
method get_found_endpoints (line 775) | def get_found_endpoints(self) -> dict[str, EndpointResult]:
method get_interesting_endpoints (line 779) | def get_interesting_endpoints(self) -> dict[str, EndpointResult]:
method get_auth_required (line 783) | def get_auth_required(self) -> dict[str, EndpointResult]:
method get_api_versions (line 787) | def get_api_versions(self) -> set[str]:
method get_rate_limits (line 791) | def get_rate_limits(self) -> dict[str, EndpointResult]:
method get_methods (line 795) | def get_methods(self) -> set[str]:
method get_status_codes (line 799) | def get_status_codes(self) -> set[int]:
method get_response_sizes (line 803) | def get_response_sizes(self) -> dict[str, int]:
method get_tech_stack (line 807) | def get_tech_stack(self) -> dict[str, list[str]]:
method get_schema_detected (line 811) | def get_schema_detected(self) -> dict[str, dict[str, Any]]:
method export_results (line 815) | def export_results(self, output_file: str | None = None, format: str =...
FILE: theHarvester/discovery/baidusearch.py
class SearchBaidu (line 5) | class SearchBaidu:
method __init__ (line 6) | def __init__(self, word, limit) -> None:
method do_search (line 14) | async def do_search(self) -> None:
method process (line 22) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 26) | async def get_emails(self):
method get_hostnames (line 30) | async def get_hostnames(self):
FILE: theHarvester/discovery/bevigil.py
class SearchBeVigil (line 5) | class SearchBeVigil:
method __init__ (line 6) | def __init__(self, word) -> None:
method do_search (line 16) | async def do_search(self) -> None:
method get_hostnames (line 31) | async def get_hostnames(self) -> set:
method get_interestingurls (line 34) | async def get_interestingurls(self) -> set:
method process (line 37) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/bitbucket.py
class RetryResult (line 14) | class RetryResult(NamedTuple):
class SuccessResult (line 18) | class SuccessResult(NamedTuple):
class ErrorResult (line 24) | class ErrorResult(NamedTuple):
class SearchBitBucket (line 29) | class SearchBitBucket:
method __init__ (line 30) | def __init__(self, word, limit) -> None:
method fragments_from_response (line 52) | async def fragments_from_response(json_data: dict) -> list[str]:
method page_from_response (line 65) | async def page_from_response(page: str, links) -> int | None:
method handle_response (line 76) | async def handle_response(self, response: tuple[str, dict, int, Any]) ...
method next_page_or_end (line 93) | async def next_page_or_end(result: SuccessResult) -> int | None:
method do_search (line 99) | async def do_search(self, page: int) -> tuple[str, dict, int, Any]:
method process (line 109) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 151) | async def get_emails(self):
method get_hostnames (line 159) | async def get_hostnames(self):
FILE: theHarvester/discovery/bravesearch.py
class SearchBrave (line 9) | class SearchBrave:
method __init__ (line 10) | def __init__(self, word, limit):
method do_search (line 22) | async def do_search(self):
method get_emails (line 120) | async def get_emails(self):
method get_hostnames (line 124) | async def get_hostnames(self):
method process (line 128) | async def process(self, proxy=False):
FILE: theHarvester/discovery/bufferoverun.py
class SearchBufferover (line 7) | class SearchBufferover:
method __init__ (line 8) | def __init__(self, word) -> None:
method do_search (line 17) | async def do_search(self) -> None:
method get_hostnames (line 40) | async def get_hostnames(self) -> set:
method get_ips (line 43) | async def get_ips(self) -> set:
method process (line 46) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/builtwith.py
class SearchBuiltWith (line 9) | class SearchBuiltWith:
method __init__ (line 10) | def __init__(self, word: str):
method process (line 26) | async def process(self, proxy: bool = False) -> None:
method _extract_data (line 46) | def _extract_data(self) -> None:
method get_hostnames (line 68) | async def get_hostnames(self) -> set[str]:
method get_tech_stack (line 71) | async def get_tech_stack(self) -> dict:
method get_interesting_urls (line 74) | async def get_interesting_urls(self) -> set[str]:
method get_frameworks (line 77) | async def get_frameworks(self) -> set[str]:
method get_languages (line 80) | async def get_languages(self) -> set[str]:
method get_servers (line 83) | async def get_servers(self) -> set[str]:
method get_cms (line 86) | async def get_cms(self) -> set[str]:
method get_analytics (line 89) | async def get_analytics(self) -> set[str]:
FILE: theHarvester/discovery/censysearch.py
class SearchCensys (line 15) | class SearchCensys:
method __init__ (line 18) | def __init__(self, domain, limit: int = 500) -> None:
method _normalize_emails (line 29) | def _normalize_emails(email_address: object) -> set[str]:
method do_search (line 36) | async def do_search(self) -> None:
method get_hostnames (line 69) | async def get_hostnames(self) -> set:
method get_emails (line 72) | async def get_emails(self) -> set:
method process (line 75) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/certspottersearch.py
class SearchCertspoter (line 4) | class SearchCertspoter:
method __init__ (line 5) | def __init__(self, word) -> None:
method do_search (line 10) | async def do_search(self) -> None:
method get_hostnames (line 35) | async def get_hostnames(self) -> set:
method process (line 38) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/chaos.py
class SearchChaos (line 18) | class SearchChaos:
method __init__ (line 23) | def __init__(self, word) -> None:
method _get_api_key (line 30) | def _get_api_key(self) -> str:
method _safe_parse_json (line 38) | def _safe_parse_json(payload: object) -> dict:
method do_search (line 49) | async def do_search(self) -> None:
method get_hostnames (line 110) | async def get_hostnames(self) -> set:
method process (line 113) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/commoncrawl.py
class SearchCommoncrawl (line 18) | class SearchCommoncrawl:
method __init__ (line 23) | def __init__(self, word) -> None:
method _safe_parse_json_lines (line 30) | def _safe_parse_json_lines(payload: str) -> list:
method _extract_domain_from_url (line 44) | def _extract_domain_from_url(self, url: str) -> str:
method do_search (line 60) | async def do_search(self) -> None:
method get_hostnames (line 123) | async def get_hostnames(self) -> set:
method process (line 126) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/constants.py
function splitter (line 6) | async def splitter(links):
function filter (line 31) | def filter(lst):
function get_delay (line 50) | def get_delay() -> float:
function search (line 55) | async def search(text: str) -> bool:
function google_workaround (line 71) | async def google_workaround(visit_url: str) -> bool | str:
class MissingKeyError (line 111) | class MissingKeyError(Exception):
method __init__ (line 116) | def __init__(self, source: str | None) -> None:
method __str__ (line 122) | def __str__(self) -> str:
FILE: theHarvester/discovery/criminalip.py
class SearchCriminalIP (line 9) | class SearchCriminalIP:
method __init__ (line 10) | def __init__(self, word) -> None:
method _normalize_host (line 20) | def _normalize_host(self, hostname: str | None) -> str | None:
method _add_host (line 40) | def _add_host(self, hostname: str | None, include_root: bool = True) -...
method _add_host_from_url (line 49) | def _add_host_from_url(self, url: str | None) -> None:
method _add_ip (line 60) | def _add_ip(self, ip: str | None) -> None:
method _add_asn (line 64) | def _add_asn(self, asn: str | int | None) -> None:
method _collect_hosts_from_value (line 71) | def _collect_hosts_from_value(self, value) -> None:
method do_search (line 86) | async def do_search(self) -> None:
method parser (line 187) | async def parser(self, jlines):
method get_asns (line 323) | async def get_asns(self) -> set:
method get_hostnames (line 326) | async def get_hostnames(self) -> set:
method get_ips (line 329) | async def get_ips(self) -> set:
method process (line 332) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/crtsh.py
class SearchCrtsh (line 4) | class SearchCrtsh:
method __init__ (line 5) | def __init__(self, word) -> None:
method do_search (line 10) | async def do_search(self) -> list:
method process (line 31) | async def process(self, proxy: bool = False) -> None:
method get_hostnames (line 36) | async def get_hostnames(self) -> list:
FILE: theHarvester/discovery/dnssearch.py
class DnsForce (line 27) | class DnsForce:
method __init__ (line 28) | def __init__(self, domain, dnsserver, verbose: bool = False) -> None:
method run (line 40) | async def run(self):
function serialize_ip_range (line 58) | def serialize_ip_range(ip: str, netmask: str = '24') -> str:
function list_ips_in_network_range (line 89) | def list_ips_in_network_range(iprange: str) -> list[str]:
function reverse_single_ip (line 111) | async def reverse_single_ip(ip: str, resolver: DNSResolver) -> str:
function reverse_all_ips_in_range (line 130) | async def reverse_all_ips_in_range(iprange: str, callback: Callable, nam...
function log_query (line 164) | def log_query(ip: str) -> None:
function log_result (line 182) | def log_result(host: str) -> None:
function generate_postprocessing_callback (line 199) | def generate_postprocessing_callback(target: str, **allhosts: list[str])...
FILE: theHarvester/discovery/duckduckgosearch.py
class SearchDuckDuckGo (line 7) | class SearchDuckDuckGo:
method __init__ (line 8) | def __init__(self, word, limit) -> None:
method do_search (line 20) | async def do_search(self) -> None:
method crawl (line 32) | async def crawl(self, text: str) -> set[str]:
method get_emails (line 77) | async def get_emails(self):
method get_hostnames (line 81) | async def get_hostnames(self):
method process (line 85) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/fofa.py
class SearchFofa (line 19) | class SearchFofa:
method __init__ (line 25) | def __init__(self, word) -> None:
method _get_api_credentials (line 33) | def _get_api_credentials(self) -> tuple[str, str]:
method _safe_parse_json (line 41) | def _safe_parse_json(payload: object) -> dict:
method do_search (line 51) | async def do_search(self) -> None:
method get_hostnames (line 118) | async def get_hostnames(self) -> set:
method get_ips (line 121) | async def get_ips(self) -> set:
method process (line 124) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/fullhuntsearch.py
class SearchFullHunt (line 8) | class SearchFullHunt:
method __init__ (line 113) | def __init__(self, word) -> None:
method _get_headers (line 136) | def _get_headers(self) -> dict[str, str]:
method _fetch_data (line 140) | async def _fetch_data(self, endpoint: str) -> dict[str, Any]:
method add_filter (line 151) | def add_filter(self, filter_name: str, filter_value: str) -> None:
method add_filters (line 167) | def add_filters(self, filters: dict[str, str]) -> None:
method clear_filters (line 179) | def clear_filters(self) -> None:
method _build_query_string (line 183) | def _build_query_string(self) -> str:
method advanced_search (line 198) | async def advanced_search(self) -> dict[str, Any]:
method get_domain_details (line 212) | async def get_domain_details(self) -> dict[str, Any]:
method get_subdomains (line 217) | async def get_subdomains(self) -> dict[str, Any]:
method get_host_details (line 222) | async def get_host_details(self, host: str) -> dict[str, Any]:
method search_tech (line 227) | async def search_tech(self, tech_name: str) -> dict[str, Any]:
method search_service (line 232) | async def search_service(self, service_name: str) -> dict[str, Any]:
method search_port (line 237) | async def search_port(self, port: int) -> dict[str, Any]:
method search_country (line 242) | async def search_country(self, country_code: str) -> dict[str, Any]:
method search_cloud_provider (line 247) | async def search_cloud_provider(self, provider: str) -> dict[str, Any]:
method search_http_status (line 252) | async def search_http_status(self, status_code: int) -> dict[str, Any]:
method search_certificate (line 257) | async def search_certificate(self, filter_name: str, value: str) -> di...
method search_with_dns (line 266) | async def search_with_dns(self, dns_type: str, value: str) -> dict[str...
method extract_data_from_domain_details (line 277) | async def extract_data_from_domain_details(self, details: dict[str, An...
method extract_data_from_search_results (line 360) | async def extract_data_from_search_results(self, results: dict[str, An...
method do_search (line 371) | async def do_search(self) -> None:
method get_hostnames (line 393) | async def get_hostnames(self) -> list[str]:
method get_ips (line 397) | async def get_ips(self) -> list[str]:
method get_ports (line 401) | async def get_ports(self) -> list[int]:
method get_technologies (line 405) | async def get_technologies(self) -> list[str]:
method get_tags (line 409) | async def get_tags(self) -> list[str]:
method get_dns_records (line 413) | async def get_dns_records(self) -> dict[str, dict[str, list[str]]]:
method get_http_info (line 417) | async def get_http_info(self) -> dict[str, dict[str, Any]]:
method get_geo_info (line 421) | async def get_geo_info(self) -> dict[str, dict[str, Any]]:
method get_cloud_info (line 425) | async def get_cloud_info(self) -> dict[str, dict[str, Any]]:
method get_certificate_info (line 429) | async def get_certificate_info(self) -> list[dict[str, Any]]:
method get_all_results (line 433) | async def get_all_results(self) -> dict[str, Any]:
method process (line 437) | async def process(self, proxy: bool = False, filters: dict[str, str] |...
FILE: theHarvester/discovery/githubcode.py
class RetryResult (line 13) | class RetryResult(NamedTuple):
class SuccessResult (line 17) | class SuccessResult(NamedTuple):
class ErrorResult (line 23) | class ErrorResult(NamedTuple):
class SearchGithubCode (line 28) | class SearchGithubCode:
method __init__ (line 29) | def __init__(self, word, limit) -> None:
method fragments_from_response (line 56) | async def fragments_from_response(json_data: dict) -> list[str]:
method page_from_response (line 69) | async def page_from_response(page: str, links) -> int | None:
method handle_response (line 80) | async def handle_response(self, response: tuple[str, dict, int, Any]) ...
method next_page_or_end (line 97) | async def next_page_or_end(result: SuccessResult) -> int | None:
method do_search (line 103) | async def do_search(self, page: int) -> tuple[str, dict, int, Any]:
method process (line 113) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 155) | async def get_emails(self):
method get_hostnames (line 163) | async def get_hostnames(self):
FILE: theHarvester/discovery/gitlabsearch.py
class SearchGitlab (line 18) | class SearchGitlab:
method __init__ (line 23) | def __init__(self, word) -> None:
method _safe_parse_json (line 32) | def _safe_parse_json(payload: object) -> dict:
method _extract_domains_from_text (line 43) | def _extract_domains_from_text(self, text: str) -> set:
method _extract_emails_from_text (line 61) | def _extract_emails_from_text(self, text: str) -> set:
method search_projects (line 77) | async def search_projects(self) -> None:
method search_users (line 138) | async def search_users(self) -> None:
method do_search (line 189) | async def do_search(self) -> None:
method get_hostnames (line 193) | async def get_hostnames(self) -> set:
method get_emails (line 196) | async def get_emails(self) -> set:
method get_urls (line 199) | async def get_urls(self) -> set:
method process (line 202) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/hackertarget.py
class SearchHackerTarget (line 5) | class SearchHackerTarget:
method __init__ (line 13) | def __init__(self, word) -> None:
method do_search (line 21) | async def do_search(self) -> None:
method process (line 44) | async def process(self, proxy: bool = False) -> None:
method get_hostnames (line 48) | async def get_hostnames(self) -> list:
FILE: theHarvester/discovery/haveibeenpwned.py
class SearchHaveIBeenPwned (line 7) | class SearchHaveIBeenPwned:
method __init__ (line 8) | def __init__(self, word: str):
method process (line 23) | async def process(self, proxy: bool = False) -> None:
method _extract_data (line 42) | def _extract_data(self) -> None:
method get_hostnames (line 54) | async def get_hostnames(self) -> set[str]:
method get_emails (line 57) | async def get_emails(self) -> set[str]:
method get_breaches (line 60) | async def get_breaches(self) -> list[dict]:
method get_pastes (line 63) | async def get_pastes(self) -> list[dict]:
method get_breach_dates (line 66) | async def get_breach_dates(self) -> set[str]:
method get_breach_types (line 69) | async def get_breach_types(self) -> set[str]:
method get_affected_data (line 72) | async def get_affected_data(self) -> set[str]:
FILE: theHarvester/discovery/hudsonrocksearch.py
class SearchHudsonRock (line 8) | class SearchHudsonRock:
method __init__ (line 15) | def __init__(self, word: str) -> None:
method do_search (line 35) | async def do_search(self) -> None:
method _is_valid_email (line 67) | def _is_valid_email(self, email: str) -> bool:
method _search_domain (line 81) | async def _search_domain(self, domain: str) -> None:
method _search_email (line 107) | async def _search_email(self, email: str) -> None:
method _process_domain_response (line 133) | def _process_domain_response(self, response: dict) -> None:
method _extract_hosts_from_urls (line 176) | def _extract_hosts_from_urls(self, urls_data: list[dict], source_type:...
method _extract_emails_from_data (line 203) | def _extract_emails_from_data(self, data: dict) -> None:
method _process_email_response (line 226) | def _process_email_response(self, response: dict) -> None:
method _is_valid_ip (line 273) | def _is_valid_ip(self, ip: str) -> bool:
method _extract_hosts_from_services (line 293) | def _extract_hosts_from_services(self, services: list[dict]) -> None:
method get_hostnames (line 319) | async def get_hostnames(self) -> set[str]:
method get_ips (line 327) | async def get_ips(self) -> set[str]:
method get_emails (line 335) | async def get_emails(self) -> set[str]:
method get_infostealers (line 343) | async def get_infostealers(self) -> list[dict]:
method get_compromised_data (line 351) | async def get_compromised_data(self) -> dict:
method get_summary (line 359) | def get_summary(self) -> dict:
method process (line 377) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/huntersearch.py
class SearchHunter (line 7) | class SearchHunter:
method __init__ (line 8) | def __init__(self, word, limit, start) -> None:
method do_search (line 23) | async def do_search(self) -> None:
method parse_resp (line 65) | async def parse_resp(self, json_resp):
method process (line 79) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 83) | async def get_emails(self):
method get_hostnames (line 86) | async def get_hostnames(self):
FILE: theHarvester/discovery/intelxsearch.py
class SearchIntelx (line 12) | class SearchIntelx:
method __init__ (line 13) | def __init__(self, word) -> None:
method do_search (line 25) | async def do_search(self) -> None:
method process (line 64) | async def process(self, proxy: bool = False):
method get_emails (line 70) | async def get_emails(self) -> list[str]:
method get_interestingurls (line 73) | async def get_interestingurls(self) -> tuple[list[str], list[str]]:
FILE: theHarvester/discovery/leakix.py
class SearchLeakix (line 17) | class SearchLeakix:
method __init__ (line 23) | def __init__(self, word) -> None:
method _safe_parse_json (line 31) | def _safe_parse_json(payload: object) -> list:
method do_search (line 43) | async def do_search(self) -> None:
method get_hostnames (line 108) | async def get_hostnames(self) -> set:
method get_emails (line 111) | async def get_emails(self) -> set:
method process (line 114) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/leaklookup.py
class SearchLeakLookup (line 7) | class SearchLeakLookup:
method __init__ (line 8) | def __init__(self, word: str):
method process (line 20) | async def process(self, proxy: bool = False) -> None:
method _extract_data (line 45) | def _extract_data(self) -> None:
method get_hostnames (line 59) | async def get_hostnames(self) -> set[str]:
method get_emails (line 62) | async def get_emails(self) -> set[str]:
method get_leaks (line 65) | async def get_leaks(self) -> list[dict]:
method get_passwords (line 68) | async def get_passwords(self) -> set[str]:
method get_sources (line 71) | async def get_sources(self) -> set[str]:
method get_leak_dates (line 74) | async def get_leak_dates(self) -> set[str]:
FILE: theHarvester/discovery/mojeek.py
class SearchMojeek (line 5) | class SearchMojeek:
method __init__ (line 6) | def __init__(self, word, limit) -> None:
method do_search (line 24) | async def do_search(self) -> None:
method process (line 63) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 67) | async def get_emails(self):
method get_hostnames (line 71) | async def get_hostnames(self):
FILE: theHarvester/discovery/netlas.py
class SearchNetlas (line 7) | class SearchNetlas:
method __init__ (line 8) | def __init__(self, word, limit: int) -> None:
method do_count (line 18) | async def do_count(self) -> None:
method do_search (line 29) | async def do_search(self) -> None:
method get_hostnames (line 57) | async def get_hostnames(self) -> list:
method process (line 60) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/onyphe.py
class SearchOnyphe (line 9) | class SearchOnyphe:
method __init__ (line 10) | def __init__(self, word) -> None:
method do_search (line 21) | async def do_search(self) -> None:
method parse_onyphe_resp_json (line 35) | async def parse_onyphe_resp_json(self):
method get_asns (line 85) | async def get_asns(self) -> set:
method get_hostnames (line 88) | async def get_hostnames(self) -> set:
method get_ips (line 91) | async def get_ips(self) -> set:
method process (line 94) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/otxsearch.py
class SearchOtx (line 7) | class SearchOtx:
method __init__ (line 8) | def __init__(self, word) -> None:
method do_search (line 14) | async def do_search(self) -> None:
method get_hostnames (line 51) | async def get_hostnames(self) -> set:
method get_ips (line 54) | async def get_ips(self) -> set:
method process (line 57) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/pentesttools.py
class SearchPentestTools (line 9) | class SearchPentestTools:
method __init__ (line 10) | def __init__(self, word) -> None:
method poll (line 20) | async def poll(self, scan_id):
method parse_json (line 44) | async def parse_json(json_results):
method get_hostnames (line 53) | async def get_hostnames(self) -> list:
method do_search (line 56) | async def do_search(self) -> None:
method process (line 72) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/projectdiscovery.py
class SearchDiscovery (line 5) | class SearchDiscovery:
method __init__ (line 6) | def __init__(self, word) -> None:
method do_search (line 14) | async def do_search(self):
method get_hostnames (line 24) | async def get_hostnames(self):
method process (line 27) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/rapiddns.py
class SearchRapidDns (line 7) | class SearchRapidDns:
method __init__ (line 8) | def __init__(self, word) -> None:
method do_search (line 13) | async def do_search(self):
method process (line 47) | async def process(self, proxy: bool = False) -> None:
method get_hostnames (line 51) | async def get_hostnames(self):
FILE: theHarvester/discovery/robtex.py
class SearchRobtex (line 19) | class SearchRobtex:
method __init__ (line 24) | def __init__(self, word) -> None:
method _safe_parse_json_lines (line 32) | def _safe_parse_json_lines(payload: str) -> list:
method do_search (line 46) | async def do_search(self) -> None:
method get_hostnames (line 109) | async def get_hostnames(self) -> set:
method get_ips (line 112) | async def get_ips(self) -> set:
method process (line 115) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/rocketreach.py
class SearchRocketReach (line 7) | class SearchRocketReach:
method __init__ (line 8) | def __init__(self, word, limit) -> None:
method do_search (line 21) | async def do_search(self) -> None:
method get_links (line 86) | async def get_links(self):
method get_emails (line 89) | async def get_emails(self):
method process (line 92) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/search_dehashed.py
class SearchDehashed (line 10) | class SearchDehashed:
method __init__ (line 11) | def __init__(self, word) -> None:
method do_search (line 26) | async def do_search(self) -> None:
method print_csv_results (line 74) | async def print_csv_results(self) -> None:
method process (line 93) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 98) | async def get_emails(self) -> set:
method get_hostnames (line 105) | async def get_hostnames(self) -> set:
method get_ips (line 108) | async def get_ips(self) -> set:
FILE: theHarvester/discovery/search_dnsdumpster.py
class SearchDNSDumpster (line 6) | class SearchDNSDumpster:
method __init__ (line 7) | def __init__(self, word) -> None:
method do_search (line 16) | async def do_search(self) -> None:
method process (line 44) | async def process(self, proxy: bool = False) -> None:
method get_hostnames (line 47) | async def get_hostnames(self) -> set:
method get_ips (line 50) | async def get_ips(self) -> set:
FILE: theHarvester/discovery/searchhunterhow.py
class SearchHunterHow (line 10) | class SearchHunterHow:
method __init__ (line 11) | def __init__(self, word) -> None:
method do_search (line 19) | async def do_search(self) -> None:
method get_hostnames (line 54) | async def get_hostnames(self) -> set:
method process (line 57) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/securityscorecard.py
class SearchSecurityScorecard (line 7) | class SearchSecurityScorecard:
method __init__ (line 8) | def __init__(self, word: str):
method process (line 23) | async def process(self, proxy: bool = False) -> None:
method _extract_data (line 41) | def _extract_data(self, data: dict) -> None:
method get_hostnames (line 73) | async def get_hostnames(self) -> set[str]:
method get_ips (line 76) | async def get_ips(self) -> list[str]:
method get_score (line 79) | async def get_score(self) -> int:
method get_grades (line 82) | async def get_grades(self) -> dict:
method get_issues (line 85) | async def get_issues(self) -> list[dict]:
method get_recommendations (line 88) | async def get_recommendations(self) -> list[dict]:
method get_history (line 91) | async def get_history(self) -> list[dict]:
FILE: theHarvester/discovery/securitytrailssearch.py
class SearchSecuritytrail (line 8) | class SearchSecuritytrail:
method __init__ (line 9) | def __init__(self, word) -> None:
method authenticate (line 23) | async def authenticate(self) -> None:
method do_search (line 33) | async def do_search(self) -> None:
method process (line 68) | async def process(self, proxy: bool = False) -> None:
method get_ips (line 88) | async def get_ips(self) -> set:
method get_hostnames (line 91) | async def get_hostnames(self) -> set:
FILE: theHarvester/discovery/shodansearch.py
class SearchShodan (line 9) | class SearchShodan:
method __init__ (line 10) | def __init__(self) -> None:
method search_ip (line 18) | async def search_ip(self, ip) -> OrderedDict:
FILE: theHarvester/discovery/subdomaincenter.py
class SubdomainCenter (line 4) | class SubdomainCenter:
method __init__ (line 5) | def __init__(self, word):
method do_search (line 11) | async def do_search(self):
method get_hostnames (line 21) | async def get_hostnames(self):
method process (line 24) | async def process(self, proxy=False):
FILE: theHarvester/discovery/subdomainfinderc99.py
class SearchSubdomainfinderc99 (line 12) | class SearchSubdomainfinderc99:
method __init__ (line 13) | def __init__(self, word) -> None:
method do_search (line 21) | async def do_search(self) -> None:
method get_hostnames (line 44) | async def get_hostnames(self):
method process (line 48) | async def process(self, proxy: bool = False) -> None:
method get_csrf_params (line 53) | async def get_csrf_params(data):
FILE: theHarvester/discovery/takeover.py
class TakeOver (line 10) | class TakeOver:
method __init__ (line 11) | def __init__(self, hosts) -> None:
method populate_fingerprints (line 19) | async def populate_fingerprints(self):
method check (line 61) | async def check(self, url, resp) -> None:
method do_take (line 76) | async def do_take(self) -> None:
method process (line 101) | async def process(self, proxy: bool = False) -> None:
method get_takeover_results (line 105) | async def get_takeover_results(self):
FILE: theHarvester/discovery/thc.py
class SearchThc (line 8) | class SearchThc:
method __init__ (line 11) | def __init__(self, word: str) -> None:
method do_search (line 18) | async def do_search(self) -> None:
method get_hostnames (line 56) | async def get_hostnames(self) -> set:
method process (line 59) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/threatcrowd.py
class SearchThreatcrowd (line 17) | class SearchThreatcrowd:
method __init__ (line 23) | def __init__(self, word) -> None:
method _safe_parse_json (line 31) | def _safe_parse_json(payload: object) -> dict:
method do_search (line 41) | async def do_search(self) -> None:
method get_hostnames (line 93) | async def get_hostnames(self) -> set:
method get_ips (line 96) | async def get_ips(self) -> set:
method process (line 99) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/tombasearch.py
class SearchTomba (line 7) | class SearchTomba:
method __init__ (line 8) | def __init__(self, word, limit, start) -> None:
method do_search (line 23) | async def do_search(self) -> None:
method parse_resp (line 74) | async def parse_resp(self, json_resp):
method process (line 88) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 92) | async def get_emails(self):
method get_hostnames (line 95) | async def get_hostnames(self):
FILE: theHarvester/discovery/urlscan.py
class SearchUrlscan (line 4) | class SearchUrlscan:
method __init__ (line 5) | def __init__(self, word) -> None:
method do_search (line 13) | async def do_search(self) -> None:
method get_hostnames (line 24) | async def get_hostnames(self) -> set:
method get_ips (line 27) | async def get_ips(self) -> set:
method get_interestingurls (line 30) | async def get_interestingurls(self) -> set:
method get_asns (line 33) | async def get_asns(self) -> set:
method process (line 36) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/venacussearch.py
class SearchVenacus (line 10) | class SearchVenacus:
method __init__ (line 11) | def __init__(self, word: str, limit=1000, offset_doc=0) -> None:
method do_search (line 26) | async def do_search(self) -> None:
method process (line 69) | async def process(self, proxy: bool = False):
method get_people (line 75) | async def get_people(self) -> list[dict[str, str]]:
method get_emails (line 80) | async def get_emails(self) -> set[str]:
method get_ips (line 85) | async def get_ips(self) -> set[str]:
method get_interestingurls (line 90) | async def get_interestingurls(self) -> set[str]:
FILE: theHarvester/discovery/virustotal.py
class SearchVirustotal (line 7) | class SearchVirustotal:
method __init__ (line 8) | def __init__(self, word) -> None:
method do_search (line 16) | async def do_search(self) -> None:
method get_hostnames (line 63) | async def get_hostnames(self) -> list:
method parse_hostnames (line 67) | async def parse_hostnames(data, word):
method process (line 100) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/waybackarchive.py
class SearchWaybackarchive (line 6) | class SearchWaybackarchive:
method __init__ (line 11) | def __init__(self, word) -> None:
method _extract_domain_from_url (line 17) | def _extract_domain_from_url(self, url: str) -> str:
method do_search (line 33) | async def do_search(self) -> None:
method get_hostnames (line 72) | async def get_hostnames(self) -> set:
method process (line 75) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/whoisxml.py
class SearchWhoisXML (line 5) | class SearchWhoisXML:
method __init__ (line 6) | def __init__(self, word) -> None:
method do_search (line 14) | async def do_search(self):
method get_hostnames (line 34) | async def get_hostnames(self):
method process (line 37) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/windvane.py
class SearchWindvane (line 17) | class SearchWindvane:
method __init__ (line 37) | def __init__(self, word) -> None:
method _get_api_key (line 46) | def _get_api_key(self) -> str | None:
method _safe_parse_json (line 54) | def _safe_parse_json(payload: object) -> dict:
method do_search (line 65) | async def do_search(self) -> None:
method _search_subdomains (line 86) | async def _search_subdomains(self, headers: dict) -> None:
method _search_dns_history (line 126) | async def _search_dns_history(self, headers: dict) -> None:
method _search_emails (line 170) | async def _search_emails(self, headers: dict) -> None:
method _search_subdomains_limited (line 203) | async def _search_subdomains_limited(self, headers: dict) -> None:
method _fallback_search (line 245) | async def _fallback_search(self) -> None:
method set_api_key (line 307) | def set_api_key(self, api_key: str) -> None:
method _is_valid_ip (line 315) | def _is_valid_ip(self, ip: str) -> bool:
method get_hostnames (line 323) | async def get_hostnames(self) -> set:
method get_ips (line 326) | async def get_ips(self) -> set:
method get_emails (line 329) | async def get_emails(self) -> set:
method process (line 332) | async def process(self, proxy: bool = False) -> None:
FILE: theHarvester/discovery/yahoosearch.py
class SearchYahoo (line 5) | class SearchYahoo:
method __init__ (line 6) | def __init__(self, word, limit) -> None:
method do_search (line 13) | async def do_search(self) -> None:
method process (line 21) | async def process(self, proxy: bool = False) -> None:
method get_emails (line 25) | async def get_emails(self):
method get_hostnames (line 38) | async def get_hostnames(self, proxy: bool = False):
FILE: theHarvester/discovery/zoomeyesearch.py
class SearchZoomEye (line 12) | class SearchZoomEye:
method __init__ (line 13) | def __init__(self, word, limit) -> None:
method _build_headers (line 64) | def _build_headers(self) -> dict[str, str]:
method _is_success (line 69) | def _is_success(resp: dict[str, Any]) -> bool:
method _unwrap_data (line 86) | def _unwrap_data(resp: dict[str, Any]) -> dict[str, Any]:
method _page_total_from_payload (line 92) | def _page_total_from_payload(payload: dict[str, Any], page_size: int) ...
method _safe_add_hostname (line 116) | def _safe_add_hostname(container: set, value: str | None) -> None:
method fetch_subdomains (line 126) | async def fetch_subdomains(self) -> None:
method do_search (line 181) | async def do_search(self) -> None:
method parse_matches (line 261) | async def parse_matches(self, matches):
method process (line 358) | async def process(self, proxy: bool = False) -> None:
method parse_emails (line 362) | async def parse_emails(self, content):
method parse_hostnames (line 366) | async def parse_hostnames(self, content):
method get_hostnames (line 370) | async def get_hostnames(self):
method get_emails (line 373) | async def get_emails(self):
method get_ips (line 376) | async def get_ips(self):
method get_asns (line 379) | async def get_asns(self):
method get_interestingurls (line 382) | async def get_interestingurls(self):
FILE: theHarvester/lib/api/additional_endpoints.py
class DomainRequest (line 10) | class DomainRequest(BaseModel):
function get_breaches (line 16) | async def get_breaches(request: DomainRequest, api_key: str = Depends(ge...
function get_leaks (line 28) | async def get_leaks(request: DomainRequest, api_key: str = Depends(get_a...
function get_security_score (line 40) | async def get_security_score(request: DomainRequest, api_key: str = Depe...
function get_tech_stack (line 52) | async def get_tech_stack(request: DomainRequest, api_key: str = Depends(...
function get_all_info (line 64) | async def get_all_info(request: DomainRequest, api_key: str = Depends(ge...
FILE: theHarvester/lib/api/api.py
class QueryResponse (line 22) | class QueryResponse(BaseModel):
class ErrorResponse (line 34) | class ErrorResponse(BaseModel):
function root (line 78) | async def root(*, user_agent: str = Header(None)) -> Response:
class BotResponse (line 134) | class BotResponse(BaseModel):
function bot (line 139) | async def bot() -> Response:
class SourcesResponse (line 149) | class SourcesResponse(BaseModel):
function getsources (line 162) | async def getsources(request: Request) -> Response:
class DnsBruteResponse (line 187) | class DnsBruteResponse(BaseModel):
function dnsbrute (line 201) | async def dnsbrute(
function query (line 274) | async def query(
FILE: theHarvester/lib/api/api_example.py
function fetch_json (line 13) | async def fetch_json(session, url):
function fetch (line 23) | async def fetch(session, url):
function main (line 33) | async def main() -> None:
FILE: theHarvester/lib/api/auth.py
function get_api_key (line 4) | def get_api_key(x_api_key: str | None = Header(None)) -> str:
FILE: theHarvester/lib/core.py
class Core (line 31) | class Core:
method _read_config (line 71) | def _read_config(filename: str) -> str:
method api_keys (line 90) | def api_keys() -> dict:
method _api_key_value (line 95) | def _api_key_value(provider: str) -> Any:
method bevigil_key (line 102) | def bevigil_key() -> str:
method bitbucket_key (line 106) | def bitbucket_key() -> str:
method brave_key (line 110) | def brave_key() -> str:
method bufferoverun_key (line 114) | def bufferoverun_key() -> str:
method builtwith_key (line 118) | def builtwith_key() -> str:
method censys_key (line 122) | def censys_key() -> tuple:
method criminalip_key (line 126) | def criminalip_key() -> str:
method dehashed_key (line 130) | def dehashed_key() -> str:
method dnsdumpster_key (line 134) | def dnsdumpster_key() -> str:
method fofa_key (line 138) | def fofa_key() -> tuple[str, str]:
method fullhunt_key (line 142) | def fullhunt_key() -> str:
method github_key (line 146) | def github_key() -> str:
method hackertarget_key (line 150) | def hackertarget_key() -> str:
method haveibeenpwned_key (line 154) | def haveibeenpwned_key() -> str:
method hunter_key (line 158) | def hunter_key() -> str:
method hunterhow_key (line 162) | def hunterhow_key() -> str:
method intelx_key (line 166) | def intelx_key() -> str:
method leaklookup_key (line 170) | def leaklookup_key() -> str:
method mojeek_key (line 174) | def mojeek_key() -> str:
method leakix_key (line 178) | def leakix_key() -> str:
method netlas_key (line 182) | def netlas_key() -> str:
method onyphe_key (line 186) | def onyphe_key() -> str:
method pentest_tools_key (line 190) | def pentest_tools_key() -> str:
method projectdiscovery_key (line 194) | def projectdiscovery_key() -> str:
method rocketreach_key (line 198) | def rocketreach_key() -> str:
method securityscorecard_key (line 202) | def securityscorecard_key() -> str:
method security_trails_key (line 206) | def security_trails_key() -> str:
method shodan_key (line 210) | def shodan_key() -> str:
method tomba_key (line 214) | def tomba_key() -> tuple[str, str]:
method venacus_key (line 218) | def venacus_key() -> str:
method virustotal_key (line 222) | def virustotal_key() -> str:
method whoisxml_key (line 226) | def whoisxml_key() -> str:
method windvane_key (line 230) | def windvane_key() -> str:
method zoomeye_key (line 234) | def zoomeye_key() -> str:
method _proxy_urls (line 238) | def _proxy_urls(config: dict[str, list[str] | None], proxy_type: str) ...
method proxy_list (line 243) | def proxy_list() -> dict:
method banner (line 251) | def banner() -> None:
method get_supportedengines (line 267) | def get_supportedengines() -> list[str]:
method get_user_agent (line 334) | def get_user_agent() -> str:
class AsyncFetcher (line 401) | class AsyncFetcher:
method _default_headers (line 405) | def _default_headers(headers: dict[str, str] | None = None) -> dict[st...
method _ssl_context (line 409) | def _ssl_context(verify: bool | None = True) -> ssl.SSLContext | bool:
method _request_timeout (line 415) | def _request_timeout(total: int | None) -> aiohttp.ClientTimeout | None:
method _normalize_data (line 419) | def _normalize_data(data: str | dict[str, Any]) -> str | dict[str, Any]:
method _resolve_proxy (line 423) | def _resolve_proxy(cls, proxy: str | bool | None) -> tuple[str | None,...
method _build_session (line 434) | async def _build_session(
method _read_response (line 448) | async def _read_response(response: aiohttp.ClientResponse, *, json: bo...
method _request (line 453) | async def _request(
method _get_random_proxy (line 473) | def _get_random_proxy(proxy_dict: dict) -> tuple[str | None, str | None]:
method _create_connector (line 490) | async def _create_connector(
method post_fetch (line 507) | async def post_fetch(
method fetch (line 572) | async def fetch(
method takeover_fetch (line 635) | async def takeover_fetch(session, url: str, proxy: str | None = None) ...
method fetch_all (line 667) | async def fetch_all(
function show_default_error_message (line 738) | def show_default_error_message(engine_name: str, word: str, error) -> None:
FILE: theHarvester/lib/hostchecker.py
class Checker (line 20) | class Checker:
method __init__ (line 21) | def __init__(self, hosts: list[str], nameservers: list[str]) -> None:
method resolve_host (line 40) | async def resolve_host(host: str, resolver: aiodns.DNSResolver) -> str:
method chunks (line 55) | def chunks(lst: list[str], n: int) -> Iterator[list[str]]:
method query_all (line 60) | async def query_all(self, resolver: aiodns.DNSResolver, hosts: list[st...
method check (line 65) | async def check(self) -> tuple[list[str], list[str], list[str]]:
FILE: theHarvester/lib/output.py
function sorted_unique (line 9) | def sorted_unique[T: Hashable](items: Iterable[T]) -> list[T]:
function print_section (line 15) | def print_section(header: str, items: Iterable[str], separator: str) -> ...
function print_linkedin_sections (line 22) | def print_linkedin_sections(
FILE: theHarvester/lib/stash.py
class StashManager (line 14) | class StashManager:
method __init__ (line 15) | def __init__(self) -> None:
method _col0_int (line 27) | def _col0_int(row: Row | None) -> int:
method _col0_value (line 35) | def _col0_value(row: Row | None):
method do_init (line 38) | async def do_init(self) -> None:
method store (line 45) | async def store(self, domain, resource, res_type, source) -> None:
method store_all (line 61) | async def store_all(self, domain, all, res_type, source) -> None:
method generatedashboardcode (line 82) | async def generatedashboardcode(self, domain):
method getlatestscanresults (line 170) | async def getlatestscanresults(self, domain, previousday: bool = False...
method getscanboarddata (line 239) | async def getscanboarddata(self):
method getscanhistorydomain (line 264) | async def getscanhistorydomain(self, domain):
method getpluginscanstatistics (line 311) | async def getpluginscanstatistics(self) -> Iterable[Row] | None:
method latestscanchartdata (line 327) | async def latestscanchartdata(self, domain):
FILE: theHarvester/parsers/intelxparser.py
class Parser (line 1) | class Parser:
method __init__ (line 2) | def __init__(self) -> None:
method parse_dictionaries (line 6) | async def parse_dictionaries(self, results: dict) -> tuple:
FILE: theHarvester/parsers/myparser.py
class Parser (line 5) | class Parser:
method __init__ (line 6) | def __init__(self, results, word) -> None:
method generic_clean (line 11) | async def generic_clean(self) -> None:
method url_clean (line 40) | async def url_clean(self) -> None:
method emails (line 45) | async def emails(self):
method fileurls (line 63) | async def fileurls(self, file) -> list:
method hostnames (line 75) | async def hostnames(self):
method hostnames_all (line 90) | async def hostnames_all(self):
method set (line 102) | async def set(self):
method urls (line 112) | async def urls(self) -> Set[str]:
method unique (line 117) | async def unique(self) -> list:
FILE: theHarvester/parsers/securitytrailsparser.py
class Parser (line 4) | class Parser:
method __init__ (line 5) | def __init__(self, word, text) -> None:
method parse_text (line 11) | async def parse_text(self) -> tuple[set, set]:
FILE: theHarvester/parsers/venacusparser.py
class TokenTypesEnum (line 6) | class TokenTypesEnum(enum.StrEnum):
class Parser (line 33) | class Parser:
method __init__ (line 34) | def __init__(self) -> None:
method parse_text_tokens (line 38) | async def parse_text_tokens(self, results: list[dict[str, Any]]) -> Ma...
FILE: theHarvester/restfulHarvest.py
function main (line 7) | def main():
FILE: theHarvester/screenshot/screenshot.py
class ScreenShotter (line 18) | class ScreenShotter:
method __init__ (line 19) | def __init__(self, output) -> None:
method verify_path (line 24) | def verify_path(self) -> bool:
method verify_installation (line 39) | async def verify_installation() -> None:
method chunk_list (line 50) | def chunk_list(items: Collection, chunk_size: int) -> list:
method visit (line 55) | async def visit(url: str, proxy: str | None = None) -> tuple[str, str]:
method take_screenshot (line 91) | async def take_screenshot(self, url: str) -> None:
FILE: theHarvester/theHarvester.py
function main (line 7) | def main():
Condensed preview — 127 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,363K chars).
[
{
"path": ".dockerignore",
"chars": 160,
"preview": ".github/*\n.gitattributes\n.git-blame-ignore-revs\n.idea/\n.pytest_cache\n.mypy_cache\ntests/*\nREADME/\nbin/\ntheHarvester-logo."
},
{
"path": ".git-blame-ignore-revs",
"chars": 76,
"preview": "# #1492 run `black .` and `isort .`\nc13843ec0d513ac7f9c35b7bd0501fa46e356415"
},
{
"path": ".gitattributes",
"chars": 682,
"preview": "# Set the default behavior, which is to have git automatically determine\n# whether a file is a text or binary, unless ot"
},
{
"path": ".github/FUNDING.yml",
"chars": 574,
"preview": "# These are supported funding model platforms\n\ngithub: [L1ghtn1ng, NotoriousRebel]\nopen_collective: # Replace with a sin"
},
{
"path": ".github/ISSUE_TEMPLATE/issue-template.md",
"chars": 845,
"preview": "---\nname: Issue Template\nabout: A template for new issues.\ntitle: \"[Bug|Feature Request|Other] Short Description of Issu"
},
{
"path": ".github/dependabot.yml",
"chars": 356,
"preview": "version: 2\nupdates:\n- package-ecosystem: github-actions\n directory: \"/\"\n schedule:\n interval: daily\n timezone: E"
},
{
"path": ".github/workflows/codeql-analysis.yml",
"chars": 2360,
"preview": "# For most projects, this workflow file will not need changing; you simply need\n# to commit it to your repository.\n#\n# Y"
},
{
"path": ".github/workflows/docker-build-push.yml",
"chars": 1142,
"preview": "name: Build and Push Docker Image\n\non:\n push:\n branches:\n - master\n\npermissions:\n contents: read\n packages: w"
},
{
"path": ".github/workflows/dockerci.yml",
"chars": 329,
"preview": "name: TheHarvester Docker Image CI\n\non: [push, pull_request]\n\njobs:\n build:\n runs-on: ubuntu-latest\n steps:\n "
},
{
"path": ".github/workflows/theHarvester.yml",
"chars": 2730,
"preview": "name: TheHarvester Python CI\n\non:\n push:\n branches:\n - '*'\n\n pull_request:\n branches:\n - '*'\n\njobs:\n "
},
{
"path": ".gitignore",
"chars": 192,
"preview": "*.idea\n*.pyc\n*.sqlite\n*.html\n*.htm\n*.vscode\n*.xml\n*.json\ndebug_results.txt\nvenv\n.mypy_cache\n.pytest_cache\nbuild/\ndist/\nt"
},
{
"path": "CHANGELOG.md",
"chars": 4456,
"preview": "# Changelog\n\nAll notable changes to this project will be documented in this file.\n\nThe format is based on [Keep a Change"
},
{
"path": "Dockerfile",
"chars": 763,
"preview": "FROM python:3.14-slim-trixie\n\nLABEL maintainer=\"@jay_townsend1 & @NotoriousRebel1\"\n\nRUN useradd -m -u 1000 -s /bin/bash "
},
{
"path": "README/CONTRIBUTING.md",
"chars": 790,
"preview": "# Contributing to theHarvester Project\nWelcome to theHarvester project, so you would like to contribute.\nThe following b"
},
{
"path": "README/COPYING",
"chars": 15216,
"preview": " GNU GENERAL PUBLIC LICENSE\n Version 2, June 1991\n\n Copyright (C) 1989, 1991 Fre"
},
{
"path": "README/LICENSES",
"chars": 640,
"preview": "Released under the GPL v 2.0.\n\nIf you did not receive a copy of the GPL, try http://www.gnu.org/.\n\nCopyright 2011 Christ"
},
{
"path": "README.md",
"chars": 9461,
"preview": "\n\n\n"
},
{
"path": "bin/theHarvester",
"chars": 308,
"preview": "#!/usr/bin/env python3\n# Note: This script runs theHarvester\nimport sys\n\nfrom theHarvester.theHarvester import main\n\nif "
},
{
"path": "docker-compose.yml",
"chars": 477,
"preview": "services:\n theharvester.svc.local:\n container_name: theHarvester\n volumes:\n - ./theHarvester/data/api-keys.y"
},
{
"path": "pyproject.toml",
"chars": 3698,
"preview": "[project]\nname = \"theHarvester\"\ndescription = \"theHarvester is a very simple, yet effective tool designed to be used in "
},
{
"path": "tests/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/discovery/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/discovery/test_baidusearch.py",
"chars": 2705,
"preview": "import pytest\n\nfrom theHarvester.discovery import baidusearch\n\n\nclass TestBaiduSearch:\n @pytest.mark.asyncio\n asyn"
},
{
"path": "tests/discovery/test_censys.py",
"chars": 3520,
"preview": "import sys\nimport types\n\nimport pytest\n\nif 'aiohttp_socks' not in sys.modules:\n aiohttp_socks_stub = types.ModuleType"
},
{
"path": "tests/discovery/test_certspotter.py",
"chars": 1156,
"preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport os\nfrom typing import Optional\n\nimport pytest\nimport httpx\n\nfrom theHarvest"
},
{
"path": "tests/discovery/test_criminalip.py",
"chars": 3875,
"preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport pytest\n\nfrom theHarvester.discovery import criminalip\n\n\n@pytest.mark.asynci"
},
{
"path": "tests/discovery/test_githubcode.py",
"chars": 5987,
"preview": "from unittest.mock import MagicMock\nimport pytest\nfrom httpx import Response\nfrom theHarvester.discovery import githubco"
},
{
"path": "tests/discovery/test_githubcode_additions.py",
"chars": 2996,
"preview": "from unittest.mock import MagicMock, AsyncMock\nimport asyncio\nimport pytest\nfrom theHarvester.discovery import githubcod"
},
{
"path": "tests/discovery/test_otx.py",
"chars": 865,
"preview": "#!/usr/bin/env python3\n# coding=utf-8\nimport os\nfrom typing import Optional\nimport httpx\nimport pytest\n\nfrom theHarveste"
},
{
"path": "tests/discovery/test_rocketreach.py",
"chars": 4317,
"preview": "import sys\nimport types\n\nimport pytest\n\nif 'aiohttp_socks' not in sys.modules:\n aiohttp_socks_stub = types.ModuleType"
},
{
"path": "tests/discovery/test_shodan_engine.py",
"chars": 1770,
"preview": "import socket\nimport sys\nfrom collections import OrderedDict\n\nimport pytest\n\n\nclass TestShodanEngine:\n @pytest.mark.a"
},
{
"path": "tests/discovery/test_thc.py",
"chars": 13901,
"preview": "#!/usr/bin/env python3\n# coding=utf-8\n\"\"\"\nTests for THC (ip.thc.org) discovery module.\n\nTHC provides multiple endpoints:"
},
{
"path": "tests/lib/test_core.py",
"chars": 10216,
"preview": "from __future__ import annotations\n\nfrom pathlib import Path\nfrom typing import Any\nfrom unittest import mock\n\nimport py"
},
{
"path": "tests/lib/test_output.py",
"chars": 1224,
"preview": "from __future__ import annotations\n\n\nfrom theHarvester.lib.output import print_linkedin_sections, sorted_unique\n\n\ndef te"
},
{
"path": "tests/test_hackertarget_apikey.py",
"chars": 1563,
"preview": "import pytest\nfrom theHarvester.discovery import hackertarget as ht_mod\nfrom theHarvester.lib.core import Core\n\n\nclass "
},
{
"path": "tests/test_mojeek.py",
"chars": 2038,
"preview": "import pytest\nfrom theHarvester.discovery import mojeek\n\nclass TestMojeekSearch:\n\n @pytest.mark.asyncio\n async def"
},
{
"path": "tests/test_myparser.py",
"chars": 521,
"preview": "#!/usr/bin/env python3\n# coding=utf-8\n\nimport pytest\n\nfrom theHarvester.parsers import myparser\n\n\nclass TestMyParser(obj"
},
{
"path": "tests/test_security.py",
"chars": 15479,
"preview": "import os\nimport re\nimport tempfile\nfrom pathlib import Path\n\nimport pytest\nfrom fastapi.testclient import TestClient\n\nf"
},
{
"path": "theHarvester/__init__.py",
"chars": 23,
"preview": "__version__ = '4.10.1'\n"
},
{
"path": "theHarvester/__main__.py",
"chars": 82996,
"preview": "import argparse\nimport asyncio\nimport os\nimport re\nimport secrets\nimport string\nimport sys\nimport time\nimport traceback\n"
},
{
"path": "theHarvester/data/proxies.yaml",
"chars": 42,
"preview": "http:\n - ip:port\nsocks5:\n - ip:port\n"
},
{
"path": "theHarvester/data/wordlists/api_endpoints.txt",
"chars": 21190,
"preview": "# Common API endpoints - Most frequently found in web services\n/api\n/api/v1\n/api/v2\n/api/v3\n/api/latest\n/rest\n/restapi\n/"
},
{
"path": "theHarvester/data/wordlists/dns-big.txt",
"chars": 1115633,
"preview": "www\nmail\nftp\nlocalhost\nwebmail\nsmtp\nwebdisk\npop\ncpanel\nwhm\nns1\nns2\nautodiscover\nautoconfig\nns\ntest\nm\nblog\ndev\nwww2\nns3\np"
},
{
"path": "theHarvester/data/wordlists/dns-names.txt",
"chars": 33565,
"preview": "www\nmail\nftp\nlocalhost\nwebmail\nsmtp\nwebdisk\npop\ncpanel\nwhm\nns1\nns2\nautodiscover\nautoconfig\nns\ntest\nm\nblog\ndev\nwww2\nns3\np"
},
{
"path": "theHarvester/data/wordlists/dorks.txt",
"chars": 374,
"preview": "inurl:\"contact\"\nintext:email filetype:log\n\"Index of /mail\"\n\"admin account info\" filetype:log\nintext:@\nadministrator acco"
},
{
"path": "theHarvester/data/wordlists/general/common.txt",
"chars": 42,
"preview": "admin\ntest\nhello\nuk\nlogin\nbook\nrobots.txt\n"
},
{
"path": "theHarvester/data/wordlists/names_small.txt",
"chars": 406279,
"preview": "www\n_tcp\n_tls\n_udp\n_domainkey\n_pkixrep._tcp\n_aix._tcp\n_afpovertcp._tcp\n_autodiscover._tcp\n_caldav._tcp\n_certificates._tc"
},
{
"path": "theHarvester/discovery/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "theHarvester/discovery/additional_apis.py",
"chars": 6821,
"preview": "import asyncio\nfrom typing import Any\n\nfrom theHarvester.discovery.builtwith import SearchBuiltWith\nfrom theHarvester.di"
},
{
"path": "theHarvester/discovery/api_endpoints.py",
"chars": 29017,
"preview": "\"\"\"\nAPI endpoint scanner module.\nThis module contains the SearchApiEndpoints class that performs comprehensive API endpo"
},
{
"path": "theHarvester/discovery/baidusearch.py",
"chars": 1220,
"preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchBaidu:\n "
},
{
"path": "theHarvester/discovery/bevigil.py",
"chars": 1422,
"preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
},
{
"path": "theHarvester/discovery/bitbucket.py",
"chars": 6991,
"preview": "import asyncio\nimport random\nimport re\nimport urllib.parse as urlparse\nfrom typing import Any, NamedTuple\n\nimport aiohtt"
},
{
"path": "theHarvester/discovery/bravesearch.py",
"chars": 5802,
"preview": "import asyncio\nfrom urllib.parse import quote\n\nfrom theHarvester.discovery.constants import MissingKey, get_delay\nfrom t"
},
{
"path": "theHarvester/discovery/bufferoverun.py",
"chars": 1521,
"preview": "import re\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n"
},
{
"path": "theHarvester/discovery/builtwith.py",
"chars": 3395,
"preview": "from typing import Any\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.co"
},
{
"path": "theHarvester/discovery/censysearch.py",
"chars": 2728,
"preview": "from math import ceil\n\nfrom censys.common import __version__\nfrom censys.common.exceptions import (\n CensysRateLimitE"
},
{
"path": "theHarvester/discovery/certspottersearch.py",
"chars": 1677,
"preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchCertspoter:\n def __init__(self, word) -> None:\n s"
},
{
"path": "theHarvester/discovery/chaos.py",
"chars": 4309,
"preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom t"
},
{
"path": "theHarvester/discovery/commoncrawl.py",
"chars": 4576,
"preview": "import json as _stdlib_json\nimport re\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core"
},
{
"path": "theHarvester/discovery/constants.py",
"chars": 4348,
"preview": "import random\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nasync def splitter(links):\n \"\"\"\n Method that"
},
{
"path": "theHarvester/discovery/criminalip.py",
"chars": 13816,
"preview": "import asyncio\nfrom typing import Any\nfrom urllib.parse import urlparse\n\nfrom theHarvester.discovery.constants import Mi"
},
{
"path": "theHarvester/discovery/crtsh.py",
"chars": 1354,
"preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchCrtsh:\n def __init__(self, word) -> None:\n self.w"
},
{
"path": "theHarvester/discovery/dnssearch.py",
"chars": 6232,
"preview": "\"\"\"\n============\nDNS Browsing\n============\n\nExplore the space around known hosts & ips for extra catches.\n\"\"\"\n\nimport as"
},
{
"path": "theHarvester/discovery/duckduckgosearch.py",
"chars": 3351,
"preview": "import ujson\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass Sea"
},
{
"path": "theHarvester/discovery/fofa.py",
"chars": 4438,
"preview": "import base64\nimport json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.discovery.constants import Mis"
},
{
"path": "theHarvester/discovery/fullhuntsearch.py",
"chars": 17808,
"preview": "from typing import Any, ClassVar\nfrom urllib.parse import quote\n\nfrom theHarvester.discovery.constants import MissingKey"
},
{
"path": "theHarvester/discovery/githubcode.py",
"chars": 6865,
"preview": "import asyncio\nimport random\nimport urllib.parse as urlparse\nfrom typing import Any, NamedTuple\n\nimport aiohttp\n\nfrom th"
},
{
"path": "theHarvester/discovery/gitlabsearch.py",
"chars": 7890,
"preview": "import json as _stdlib_json\nimport re\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core"
},
{
"path": "theHarvester/discovery/hackertarget.py",
"chars": 1809,
"preview": "# theHarvester/discovery/hackertarget.py\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SearchHackerTarget"
},
{
"path": "theHarvester/discovery/haveibeenpwned.py",
"chars": 2808,
"preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/hudsonrocksearch.py",
"chars": 15034,
"preview": "import asyncio\nimport logging\nfrom urllib.parse import urlparse\n\nfrom theHarvester.lib.core import AsyncFetcher\n\n\nclass "
},
{
"path": "theHarvester/discovery/huntersearch.py",
"chars": 4269,
"preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/intelxsearch.py",
"chars": 3007,
"preview": "import asyncio\nfrom typing import Any\nfrom urllib.parse import urlparse\n\nimport aiohttp\n\nfrom theHarvester.discovery.con"
},
{
"path": "theHarvester/discovery/leakix.py",
"chars": 4302,
"preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
},
{
"path": "theHarvester/discovery/leaklookup.py",
"chars": 2854,
"preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/mojeek.py",
"chars": 2704,
"preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchMojeek:\n "
},
{
"path": "theHarvester/discovery/netlas.py",
"chars": 2088,
"preview": "import json\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Cor"
},
{
"path": "theHarvester/discovery/onyphe.py",
"chars": 4453,
"preview": "from urllib.parse import urlparse\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core im"
},
{
"path": "theHarvester/discovery/otxsearch.py",
"chars": 1968,
"preview": "import re\nfrom typing import Any\n\nfrom theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchOtx:\n def __init__(sel"
},
{
"path": "theHarvester/discovery/pentesttools.py",
"chars": 2985,
"preview": "import asyncio\n\nimport ujson\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import "
},
{
"path": "theHarvester/discovery/projectdiscovery.py",
"chars": 1007,
"preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
},
{
"path": "theHarvester/discovery/rapiddns.py",
"chars": 2099,
"preview": "from bs4 import BeautifulSoup\nfrom bs4.element import Tag\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass "
},
{
"path": "theHarvester/discovery/robtex.py",
"chars": 4609,
"preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nimport aiohttp\n\nfrom theHarvester.lib.core import AsyncFetcher"
},
{
"path": "theHarvester/discovery/rocketreach.py",
"chars": 3233,
"preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey, get_delay\nfrom theHarvester.lib.core import Asy"
},
{
"path": "theHarvester/discovery/search_dehashed.py",
"chars": 3980,
"preview": "import asyncio\nimport random\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester."
},
{
"path": "theHarvester/discovery/search_dnsdumpster.py",
"chars": 1772,
"preview": "#!/usr/bin/env python3\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFe"
},
{
"path": "theHarvester/discovery/searchhunterhow.py",
"chars": 2408,
"preview": "import base64\nfrom datetime import datetime\n\nfrom dateutil.relativedelta import relativedelta\n\nfrom theHarvester.discove"
},
{
"path": "theHarvester/discovery/securityscorecard.py",
"chars": 3412,
"preview": "import aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/securitytrailssearch.py",
"chars": 4241,
"preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/shodansearch.py",
"chars": 4425,
"preview": "from collections import OrderedDict\n\nfrom shodan import Shodan, exception\n\nfrom theHarvester.discovery.constants import "
},
{
"path": "theHarvester/discovery/subdomaincenter.py",
"chars": 925,
"preview": "from theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SubdomainCenter:\n def __init__(self, word):\n self"
},
{
"path": "theHarvester/discovery/subdomainfinderc99.py",
"chars": 2438,
"preview": "import asyncio\n\nimport ujson\nfrom bs4 import BeautifulSoup\nfrom bs4.element import Tag\n\nfrom theHarvester.discovery.cons"
},
{
"path": "theHarvester/discovery/takeover.py",
"chars": 5496,
"preview": "import re\nfrom collections import defaultdict\nfrom random import shuffle\n\nimport ujson\n\nfrom theHarvester.lib.core impor"
},
{
"path": "theHarvester/discovery/thc.py",
"chars": 2467,
"preview": "import asyncio\n\nimport aiohttp\n\nfrom theHarvester.lib.core import Core\n\n\nclass SearchThc:\n \"\"\"Class to search for sub"
},
{
"path": "theHarvester/discovery/threatcrowd.py",
"chars": 3721,
"preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
},
{
"path": "theHarvester/discovery/tombasearch.py",
"chars": 4448,
"preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/urlscan.py",
"chars": 1404,
"preview": "from theHarvester.lib.core import AsyncFetcher\n\n\nclass SearchUrlscan:\n def __init__(self, word) -> None:\n self"
},
{
"path": "theHarvester/discovery/venacussearch.py",
"chars": 3213,
"preview": "from typing import Any\n\nimport aiohttp\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.co"
},
{
"path": "theHarvester/discovery/virustotal.py",
"chars": 4538,
"preview": "import asyncio\n\nfrom theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, "
},
{
"path": "theHarvester/discovery/waybackarchive.py",
"chars": 2699,
"preview": "import re\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass SearchWaybackarchive:\n \"\"\"\n Class uses Int"
},
{
"path": "theHarvester/discovery/whoisxml.py",
"chars": 1569,
"preview": "from theHarvester.discovery.constants import MissingKey\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\n\nclass Sea"
},
{
"path": "theHarvester/discovery/windvane.py",
"chars": 13223,
"preview": "import json as _stdlib_json\nfrom types import ModuleType\n\nfrom theHarvester.lib.core import AsyncFetcher, Core\n\njson: Mo"
},
{
"path": "theHarvester/discovery/yahoosearch.py",
"chars": 1650,
"preview": "from theHarvester.lib.core import AsyncFetcher, Core\nfrom theHarvester.parsers import myparser\n\n\nclass SearchYahoo:\n "
},
{
"path": "theHarvester/discovery/zoomeyesearch.py",
"chars": 15807,
"preview": "import asyncio\nimport math\nimport re\nfrom collections.abc import Iterable\nfrom typing import Any\n\nfrom theHarvester.disc"
},
{
"path": "theHarvester/lib/__init__.py",
"chars": 26,
"preview": "__all__ = ['hostchecker']\n"
},
{
"path": "theHarvester/lib/api/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "theHarvester/lib/api/additional_endpoints.py",
"chars": 2685,
"preview": "from fastapi import APIRouter, Depends, HTTPException\nfrom pydantic import BaseModel\n\nfrom theHarvester.discovery.additi"
},
{
"path": "theHarvester/lib/api/api.py",
"chars": 13686,
"preview": "import argparse\nimport os\nimport traceback\nfrom typing import Any, cast\n\nfrom fastapi import FastAPI, Header, HTTPExcept"
},
{
"path": "theHarvester/lib/api/api_example.py",
"chars": 4615,
"preview": "\"\"\"\nExample script to query theHarvester rest API, obtain results, and write out to stdout as well as an html\n\"\"\"\n\nimpor"
},
{
"path": "theHarvester/lib/api/auth.py",
"chars": 444,
"preview": "from fastapi import Header\n\n\ndef get_api_key(x_api_key: str | None = Header(None)) -> str:\n \"\"\"\n Simple API key au"
},
{
"path": "theHarvester/lib/api/static/.gitkeep",
"chars": 0,
"preview": ""
},
{
"path": "theHarvester/lib/core.py",
"chars": 31198,
"preview": "from __future__ import annotations\n\nimport asyncio\nimport contextlib\nimport random\nimport ssl\nfrom pathlib import Path\nf"
},
{
"path": "theHarvester/lib/hostchecker.py",
"chars": 3462,
"preview": "#!/usr/bin/env python\n\"\"\"\nCreated by laramies on 2008-08-21.\nRevised to use aiodns & asyncio on 2019-09-23\n\"\"\"\n\n# Suppor"
},
{
"path": "theHarvester/lib/output.py",
"chars": 1166,
"preview": "from __future__ import annotations\n\nfrom collections.abc import Hashable, Iterable, Sequence\nfrom typing import TypeVar\n"
},
{
"path": "theHarvester/lib/resolvers.txt",
"chars": 29331,
"preview": "1.0.0.1\n1.1.1.1\n141.1.27.249\n194.190.225.2 \n194.225.16.5 \n91.185.6.10 \n194.2.0.50 \n66.187.16.5 \n83.222.161.130 \n69.60.16"
},
{
"path": "theHarvester/lib/stash.py",
"chars": 19249,
"preview": "import datetime\nimport os\nfrom collections.abc import Iterable\nfrom sqlite3.dbapi2 import Row\n\nimport aiosqlite\n\ndb_path"
},
{
"path": "theHarvester/parsers/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "theHarvester/parsers/intelxparser.py",
"chars": 1023,
"preview": "class Parser:\n def __init__(self) -> None:\n self.emails: set = set()\n self.hosts: set = set()\n\n asyn"
},
{
"path": "theHarvester/parsers/myparser.py",
"chars": 4285,
"preview": "import re\nfrom collections.abc import Set\n\n\nclass Parser:\n def __init__(self, results, word) -> None:\n self.re"
},
{
"path": "theHarvester/parsers/securitytrailsparser.py",
"chars": 4460,
"preview": "import ipaddress\n\n\nclass Parser:\n def __init__(self, word, text) -> None:\n self.word = word\n self.text "
},
{
"path": "theHarvester/parsers/venacusparser.py",
"chars": 4562,
"preview": "import enum\nfrom collections.abc import Mapping\nfrom typing import Any\n\n\nclass TokenTypesEnum(enum.StrEnum):\n ID = 'i"
},
{
"path": "theHarvester/restfulHarvest.py",
"chars": 1346,
"preview": "import argparse\nimport os\n\nimport uvicorn\n\n\ndef main():\n parser = argparse.ArgumentParser()\n parser.add_argument(\n"
},
{
"path": "theHarvester/screenshot/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "theHarvester/screenshot/screenshot.py",
"chars": 4700,
"preview": "\"\"\"\nScreenshot module that utilizes playwright to asynchronously\ntake screenshots\n\"\"\"\n\nimport os\nimport ssl\nimport sys\nf"
},
{
"path": "theHarvester/theHarvester.py",
"chars": 1035,
"preview": "import asyncio\nimport sys\n\nfrom theHarvester import __main__\n\n\ndef main():\n platform = sys.platform\n if platform ="
}
]
About this extraction
This page contains the full source code of the laramies/theHarvester GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 127 files (2.1 MB), approximately 551.8k tokens, and a symbol index with 777 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.