Showing preview only (545K chars total). Download the full file or copy to clipboard to get everything.
Repository: benbusby/whoogle-search
Branch: main
Commit: 2949510d682d
Files: 119
Total size: 513.4 KB
Directory structure:
gitextract_kjzxsbiv/
├── .dockerignore
├── .github/
│ ├── FUNDING.yml
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ ├── feature_request.md
│ │ ├── new-theme.md
│ │ └── question.md
│ └── workflows/
│ ├── buildx.yml
│ ├── docker_main.yml
│ ├── docker_tests.yml
│ ├── pypi.yml
│ ├── scan.yml
│ ├── stale.yml
│ └── tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .replit
├── Dockerfile
├── LICENSE
├── MANIFEST.in
├── README.md
├── app/
│ ├── __init__.py
│ ├── __main__.py
│ ├── filter.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── endpoint.py
│ │ └── g_classes.py
│ ├── request.py
│ ├── routes.py
│ ├── services/
│ │ ├── __init__.py
│ │ ├── cse_client.py
│ │ ├── http_client.py
│ │ └── provider.py
│ ├── static/
│ │ ├── bangs/
│ │ │ └── 00-whoogle.json
│ │ ├── build/
│ │ │ └── .gitignore
│ │ ├── css/
│ │ │ ├── dark-theme.css
│ │ │ ├── error.css
│ │ │ ├── header.css
│ │ │ ├── input.css
│ │ │ ├── light-theme.css
│ │ │ ├── logo.css
│ │ │ ├── main.css
│ │ │ ├── search.css
│ │ │ └── variables.css
│ │ ├── img/
│ │ │ └── favicon/
│ │ │ ├── browserconfig.xml
│ │ │ └── manifest.json
│ │ ├── js/
│ │ │ ├── autocomplete.js
│ │ │ ├── controller.js
│ │ │ ├── currency.js
│ │ │ ├── header.js
│ │ │ ├── keyboard.js
│ │ │ └── utils.js
│ │ ├── settings/
│ │ │ ├── countries.json
│ │ │ ├── header_tabs.json
│ │ │ ├── languages.json
│ │ │ ├── themes.json
│ │ │ ├── time_periods.json
│ │ │ └── translations.json
│ │ └── widgets/
│ │ └── calculator.html
│ ├── templates/
│ │ ├── display.html
│ │ ├── error.html
│ │ ├── footer.html
│ │ ├── header.html
│ │ ├── imageresults.html
│ │ ├── index.html
│ │ ├── logo.html
│ │ ├── opensearch.xml
│ │ └── search.html
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── bangs.py
│ │ ├── misc.py
│ │ ├── results.py
│ │ ├── search.py
│ │ ├── session.py
│ │ ├── ua_generator.py
│ │ └── widgets.py
│ └── version.py
├── app.json
├── charts/
│ └── whoogle/
│ ├── .helmignore
│ ├── Chart.yaml
│ ├── templates/
│ │ ├── NOTES.txt
│ │ ├── _helpers.tpl
│ │ ├── deployment.yaml
│ │ ├── hpa.yaml
│ │ ├── ingress.yaml
│ │ ├── service.yaml
│ │ ├── serviceaccount.yaml
│ │ └── tests/
│ │ └── test-connection.yaml
│ └── values.yaml
├── docker-compose-traefik.yaml
├── docker-compose.yml
├── heroku.yml
├── letsencrypt/
│ └── acme.json
├── misc/
│ ├── check_google_user_agents.py
│ ├── generate_uas.py
│ ├── heroku-regen.sh
│ ├── instances.txt
│ ├── replit.py
│ ├── tor/
│ │ ├── start-tor.sh
│ │ └── torrc
│ └── update-translations.py
├── pyproject.toml
├── requirements.txt
├── run
├── setup.cfg
├── test/
│ ├── __init__.py
│ ├── conftest.py
│ ├── mock_google.py
│ ├── test_alts.py
│ ├── test_autocomplete.py
│ ├── test_autocomplete_xml.py
│ ├── test_http_client.py
│ ├── test_json.py
│ ├── test_misc.py
│ ├── test_results.py
│ ├── test_routes.py
│ ├── test_routes_json.py
│ └── test_tor.py
└── whoogle.template.env
================================================
FILE CONTENTS
================================================
================================================
FILE: .dockerignore
================================================
.git/
venv/
test/
================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
github: benbusby
ko_fi: benbusby
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a bug report to help fix an issue with Whoogle
title: "[BUG] <brief bug description>"
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Deployment Method**
- [ ] Heroku (one-click deploy)
- [ ] Docker
- [ ] `run` executable
- [ ] pip/pipx
- [ ] Other: [describe setup]
**Version of Whoogle Search**
- [ ] Latest build from [source] (i.e. GitHub, Docker Hub, pip, etc)
- [ ] Version [version number]
- [ ] Not sure
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest a feature that would improve Whoogle
title: "[FEATURE] <description of feature>"
labels: enhancement
assignees: ''
---
<!--
DO NOT REQUEST UI/THEME/GUI/APPEARANCE IMPROVEMENTS HERE
THESE SHOULD GO IN ISSUE #60
REQUESTING A NEW FEATURE SHOULD BE STRICTLY RELATED TO NEW FUNCTIONALITY
-->
**Describe the feature you'd like to see added**
A short description of the feature, and what it would accomplish.
**Additional context**
Add any other context or screenshots about the feature request here.
================================================
FILE: .github/ISSUE_TEMPLATE/new-theme.md
================================================
---
name: New theme
about: Create a new theme for Whoogle
title: "[THEME] <your theme name>"
labels: theme
assignees: benbusby
---
Use the following template to design your theme, replacing the blank spaces with the colors of your choice.
```css
:root {
/* LIGHT THEME COLORS */
--whoogle-logo: #______;
--whoogle-page-bg: #______;
--whoogle-element-bg: #______;
--whoogle-text: #______;
--whoogle-contrast-text: #______;
--whoogle-secondary-text: #______;
--whoogle-result-bg: #______;
--whoogle-result-title: #______;
--whoogle-result-url: #______;
--whoogle-result-visited: #______;
/* DARK THEME COLORS */
--whoogle-dark-logo: #______;
--whoogle-dark-page-bg: #______;
--whoogle-dark-element-bg: #______;
--whoogle-dark-text: #______;
--whoogle-dark-contrast-text: #______;
--whoogle-dark-secondary-text: #______;
--whoogle-dark-result-bg: #______;
--whoogle-dark-result-title: #______;
--whoogle-dark-result-url: #______;
--whoogle-dark-result-visited: #______;
}
```
================================================
FILE: .github/ISSUE_TEMPLATE/question.md
================================================
---
name: Question
about: Ask a (simple) question about Whoogle
title: "[QUESTION] <question here>"
labels: question
assignees: ''
---
Type out your question here. Please make sure that this is a topic that isn't already covered in the README.
================================================
FILE: .github/workflows/buildx.yml
================================================
name: buildx
on:
workflow_run:
workflows: ["docker_main"]
branches: [main, updates]
types:
- completed
push:
tags:
- '*'
release:
types:
- published
jobs:
on-success:
runs-on: ubuntu-latest
steps:
- name: Wait for tests to succeed
if: ${{ github.event.workflow_run.conclusion != 'success' && startsWith(github.ref, 'refs/tags') != true }}
run: exit 1
- name: Debug workflow context
run: |
echo "Event name: ${{ github.event_name }}"
echo "Ref: ${{ github.ref }}"
echo "Actor: ${{ github.actor }}"
echo "Branch: ${{ github.event.workflow_run.head_branch }}"
echo "Conclusion: ${{ github.event.workflow_run.conclusion }}"
- name: checkout code
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to ghcr.io
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# Disabled: only build on release events now
# - name: build and push the image
# if: startsWith(github.ref, 'refs/heads/main') && (github.actor == 'benbusby' || github.actor == 'Don-Swanson')
# run: |
# docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
# docker buildx ls
# docker buildx build --push \
# --tag benbusby/whoogle-search:latest \
# --platform linux/amd64,linux/arm64 .
# docker buildx build --push \
# --tag ghcr.io/benbusby/whoogle-search:latest \
# --platform linux/amd64,linux/arm64 .
- name: build and push updates branch (update-testing tag)
if: github.event_name == 'workflow_run' && github.event.workflow_run.head_branch == 'updates' && github.event.workflow_run.conclusion == 'success' && (github.event.workflow_run.actor.login == 'benbusby' || github.event.workflow_run.actor.login == 'Don-Swanson')
run: |
docker buildx build --push \
--tag benbusby/whoogle-search:update-testing \
--tag ghcr.io/benbusby/whoogle-search:update-testing \
--platform linux/amd64,linux/arm64 .
- name: build and push release (version + latest)
if: github.event_name == 'release' && github.event.release.prerelease == false && (github.actor == 'benbusby' || github.actor == 'Don-Swanson')
run: |
TAG="${{ github.event.release.tag_name }}"
VERSION="${TAG#v}"
docker buildx build --push \
--tag benbusby/whoogle-search:${VERSION} \
--tag benbusby/whoogle-search:latest \
--tag ghcr.io/benbusby/whoogle-search:${VERSION} \
--tag ghcr.io/benbusby/whoogle-search:latest \
--platform linux/amd64,linux/arm64 .
- name: build and push pre-release (version only)
if: github.event_name == 'release' && github.event.release.prerelease == true && (github.actor == 'benbusby' || github.actor == 'Don-Swanson')
run: |
TAG="${{ github.event.release.tag_name }}"
VERSION="${TAG#v}"
docker buildx build --push \
--tag benbusby/whoogle-search:${VERSION} \
--tag ghcr.io/benbusby/whoogle-search:${VERSION} \
--platform linux/amd64,linux/arm64 .
- name: build and push tag
if: startsWith(github.ref, 'refs/tags')
run: |
docker buildx build --push \
--tag benbusby/whoogle-search:${GITHUB_REF#refs/*/v} \
--tag ghcr.io/benbusby/whoogle-search:${GITHUB_REF#refs/*/v} \
--platform linux/amd64,linux/arm64 .
================================================
FILE: .github/workflows/docker_main.yml
================================================
name: docker_main
on:
workflow_run:
workflows: ["tests"]
branches: [main, updates]
types:
- completed
# TODO: Needs refactoring to use reusable workflows and share w/ docker_tests
jobs:
on-success:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- name: checkout code
uses: actions/checkout@v4
- name: build and test (docker)
run: |
docker build --tag whoogle-search:test .
docker run --publish 5000:5000 --detach --name whoogle-search-nocompose whoogle-search:test
sleep 15
docker exec whoogle-search-nocompose curl -f http://localhost:5000/healthz || exit 1
- name: build and test (docker-compose)
run: |
docker rm -f whoogle-search-nocompose
WHOOGLE_IMAGE="whoogle-search:test" docker compose up --detach
sleep 15
docker exec whoogle-search curl -f http://localhost:5000/healthz || exit 1
================================================
FILE: .github/workflows/docker_tests.yml
================================================
name: docker_tests
on:
push:
branches: main
pull_request:
branches: main
jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: checkout code
uses: actions/checkout@v2
- name: build and test (docker)
run: |
docker build --tag whoogle-search:test .
docker run --publish 5000:5000 --detach --name whoogle-search-nocompose whoogle-search:test
sleep 15
docker exec whoogle-search-nocompose curl -f http://localhost:5000/healthz || exit 1
- name: build and test (docker compose)
run: |
docker rm -f whoogle-search-nocompose
WHOOGLE_IMAGE="whoogle-search:test" docker compose up --detach
sleep 15
docker exec whoogle-search curl -f http://localhost:5000/healthz || exit 1
================================================
FILE: .github/workflows/pypi.yml
================================================
name: pypi
on:
push:
branches: main
tags: v*
jobs:
publish-test:
name: Build and publish to TestPyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: 3.9
- name: Install pypa/build
run: >-
python -m
pip install
build
setuptools
--user
- name: Set dev timestamp
run: echo "DEV_BUILD=$(date +%s)" >> $GITHUB_ENV
- name: Build binary wheel and source tarball
run: >-
python -m
build
--sdist
--wheel
--outdir dist/
.
- name: Publish distribution to TestPyPI
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
publish:
# Gate real PyPI publishing to stable SemVer tags only
if: startsWith(github.ref, 'refs/tags/')
name: Build and publish to PyPI
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check if stable release
id: check_tag
run: |
TAG="${{ github.ref_name }}"
if echo "$TAG" | grep -qE '^v?[0-9]+\.[0-9]+\.[0-9]+$'; then
echo "is_stable=true" >> $GITHUB_OUTPUT
echo "Tag '$TAG' is a stable release. Will publish to PyPI."
else
echo "is_stable=false" >> $GITHUB_OUTPUT
echo "Tag '$TAG' is not a stable release (contains pre-release suffix). Skipping PyPI publish."
fi
- name: Set up Python 3.9
if: steps.check_tag.outputs.is_stable == 'true'
uses: actions/setup-python@v5
with:
python-version: 3.9
- name: Install pypa/build
if: steps.check_tag.outputs.is_stable == 'true'
run: >-
python -m
pip install
build
--user
- name: Build binary wheel and source tarball
if: steps.check_tag.outputs.is_stable == 'true'
run: >-
python -m
build
--sdist
--wheel
--outdir dist/
.
- name: Publish distribution to PyPI
if: steps.check_tag.outputs.is_stable == 'true'
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.PYPI_API_TOKEN }}
================================================
FILE: .github/workflows/scan.yml
================================================
name: scan
on:
schedule:
- cron: '0 0 * * *'
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build the container image
run: |
docker build --tag whoogle-search:test .
- name: Initiate grype scan
run: |
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b .
chmod +x ./grype
./grype whoogle-search:test --only-fixed
================================================
FILE: .github/workflows/stale.yml
================================================
# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.
#
# You can adjust the behavior by modifying this file.
# For more information, see:
# https://github.com/actions/stale
name: Mark stale issues and pull requests
on:
schedule:
- cron: '35 10 * * *'
jobs:
stale:
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v10
with:
days-before-stale: 90
days-before-close: 7
stale-issue-message: 'This issue has been automatically marked as stale due to inactivity. If it is still valid please comment within 7 days or it will be auto-closed.'
close-issue-message: 'Closing this issue due to prolonged inactivity.'
# Disabled PR Closing for now, but pre-staged the settings
days-before-pr-stale: -1
days-before-pr-close: -1
operations-per-run: 100
stale-pr-message: "This PR appears to be stale. If it is still valid please comment within 14 days or it will be auto-closed."
close-pr-message: "This PR was closed as stale."
exempt-issue-labels: 'keep-open,enhancement,critical,dependencies,documentation'
================================================
FILE: .github/workflows/tests.yml
================================================
name: tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install dependencies
run: pip install --upgrade pip && pip install -r requirements.txt
- name: Run tests
run: ./run test
================================================
FILE: .gitignore
================================================
venv/
.venv/
.idea/
__pycache__/
*.pyc
*.pem
*.conf
*.key
config.json
test/static
flask_session/
app/static/config
app/static/custom_config
app/static/bangs/*
!app/static/bangs/00-whoogle.json
# pip stuff
/build/
dist/
*.egg-info/
# env
whoogle.env
# vim
*~
*.swp
================================================
FILE: .pre-commit-config.yaml
================================================
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black
args: [--quiet]
================================================
FILE: .replit
================================================
entrypoint = "misc/replit.py"
================================================
FILE: Dockerfile
================================================
# NOTE: ARMv7 support has been dropped due to lack of pre-built cryptography wheels for Alpine/musl.
# To restore ARMv7 support for local builds:
# 1. Change requirements.txt:
# cryptography==3.3.2; platform_machine == 'armv7l'
# cryptography==46.0.1; platform_machine != 'armv7l'
# pyOpenSSL==19.1.0; platform_machine == 'armv7l'
# pyOpenSSL==25.3.0; platform_machine != 'armv7l'
# 2. Add linux/arm/v7 to --platform flag when building:
# docker buildx build --platform linux/amd64,linux/arm/v7,linux/arm64 .
FROM python:3.12-alpine3.22 AS builder
RUN apk --no-cache add \
build-base \
libxml2-dev \
libxslt-dev \
openssl-dev \
libffi-dev
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --prefix /install --no-warn-script-location --no-cache-dir -r requirements.txt
FROM python:3.12-alpine3.22
# Remove bridge package to avoid CVEs (not needed for Docker containers)
RUN apk add --no-cache --no-scripts tor curl openrc libstdc++ && \
apk del --no-cache bridge || true
# git go //for obfs4proxy
# libcurl4-openssl-dev
RUN pip install --upgrade pip
RUN apk --no-cache upgrade && \
apk del --no-cache --rdepends bridge || true
# uncomment to build obfs4proxy
# RUN git clone https://gitlab.com/yawning/obfs4.git
# WORKDIR /obfs4
# RUN go build -o obfs4proxy/obfs4proxy ./obfs4proxy
# RUN cp ./obfs4proxy/obfs4proxy /usr/bin/obfs4proxy
ARG DOCKER_USER=whoogle
ARG DOCKER_USERID=927
ARG config_dir=/config
RUN mkdir -p $config_dir
RUN chmod a+w $config_dir
VOLUME $config_dir
ARG url_prefix=''
ARG username=''
ARG password=''
ARG proxyuser=''
ARG proxypass=''
ARG proxytype=''
ARG proxyloc=''
ARG whoogle_dotenv=''
ARG use_https=''
ARG whoogle_port=5000
ARG twitter_alt='farside.link/nitter'
ARG youtube_alt='farside.link/invidious'
ARG reddit_alt='farside.link/libreddit'
ARG medium_alt='farside.link/scribe'
ARG translate_alt='farside.link/lingva'
ARG imgur_alt='farside.link/rimgo'
ARG wikipedia_alt='farside.link/wikiless'
ARG imdb_alt='farside.link/libremdb'
ARG quora_alt='farside.link/quetre'
ARG so_alt='farside.link/anonymousoverflow'
ENV CONFIG_VOLUME=$config_dir \
WHOOGLE_URL_PREFIX=$url_prefix \
WHOOGLE_USER=$username \
WHOOGLE_PASS=$password \
WHOOGLE_PROXY_USER=$proxyuser \
WHOOGLE_PROXY_PASS=$proxypass \
WHOOGLE_PROXY_TYPE=$proxytype \
WHOOGLE_PROXY_LOC=$proxyloc \
WHOOGLE_DOTENV=$whoogle_dotenv \
HTTPS_ONLY=$use_https \
EXPOSE_PORT=$whoogle_port \
WHOOGLE_ALT_TW=$twitter_alt \
WHOOGLE_ALT_YT=$youtube_alt \
WHOOGLE_ALT_RD=$reddit_alt \
WHOOGLE_ALT_MD=$medium_alt \
WHOOGLE_ALT_TL=$translate_alt \
WHOOGLE_ALT_IMG=$imgur_alt \
WHOOGLE_ALT_WIKI=$wikipedia_alt \
WHOOGLE_ALT_IMDB=$imdb_alt \
WHOOGLE_ALT_QUORA=$quora_alt \
WHOOGLE_ALT_SO=$so_alt
WORKDIR /whoogle
COPY --from=builder /install /usr/local
COPY misc/tor/torrc /etc/tor/torrc
COPY misc/tor/start-tor.sh misc/tor/start-tor.sh
COPY app/ app/
COPY run whoogle.env* ./
# Create user/group to run as
RUN adduser -D -g $DOCKER_USERID -u $DOCKER_USERID $DOCKER_USER
# Fix ownership / permissions
RUN chown -R ${DOCKER_USER}:${DOCKER_USER} /whoogle /var/lib/tor
# Allow writing symlinks to build dir
RUN chown $DOCKER_USERID:$DOCKER_USERID app/static/build
USER $DOCKER_USER:$DOCKER_USER
EXPOSE $EXPOSE_PORT
HEALTHCHECK --interval=30s --timeout=5s \
CMD curl -f http://localhost:${EXPOSE_PORT}/healthz || exit 1
CMD ["/bin/sh", "-c", "misc/tor/start-tor.sh & ./run"]
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2020 Ben Busby
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: MANIFEST.in
================================================
graft app/static
graft app/templates
graft app/misc
include requirements.txt
recursive-include test
global-exclude *.pyc
================================================
FILE: README.md
================================================
>[!WARNING]
>
>Since 16 January, 2025, Google has been attacking the ability to perform search queries without JavaScript enabled. This is a fundamental part of how Whoogle
>works -- Whoogle requests the JavaScript-free search results, then filters out garbage from the results page and proxies all external content for the user.
>
>This is possibly a breaking change that may mean the end for Whoogle. We'll continue fighting back and releasing workarounds until all workarounds are
>exhausted or a better method is found. If you know of a better way, please review and comment in our Way Forward Discussion
___

[](https://github.com/benbusby/shoogle/releases)
[](https://opensource.org/licenses/MIT)
[](https://github.com/benbusby/whoogle-search/actions/workflows/tests.yml)
[](https://github.com/benbusby/whoogle-search/actions/workflows/buildx.yml)
[](https://hub.docker.com/r/benbusby/whoogle-search)
<table>
<tr>
<td><a href="https://sr.ht/~benbusby/whoogle-search">SourceHut</a></td>
<td><a href="https://github.com/benbusby/whoogle-search">GitHub</a></td>
</tr>
</table>
Get Google search results, but without any ads, JavaScript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile.
Contents
1. [Features](#features)
3. [Install/Deploy Options](#install)
1. [Heroku Quick Deploy](#heroku-quick-deploy)
1. [Render.com](#render)
1. [Repl.it](#replit)
1. [Fly.io](#flyio)
1. [Koyeb](#koyeb)
1. [pipx](#pipx)
1. [pip](#pip)
1. [Manual](#manual)
1. [Docker](#manual-docker)
1. [Arch/AUR](#arch-linux--arch-based-distributions)
1. [Helm/Kubernetes](#helm-chart-for-kubernetes)
4. [Environment Variables and Configuration](#environment-variables)
5. [Google Custom Search (BYOK)](#google-custom-search-byok)
6. [Usage](#usage)
7. [Extra Steps](#extra-steps)
1. [Set Primary Search Engine](#set-whoogle-as-your-primary-search-engine)
2. [Custom Redirecting](#custom-redirecting)
2. [Custom Bangs](#custom-bangs)
3. [Prevent Downtime (Heroku Only)](#prevent-downtime-heroku-only)
4. [Manual HTTPS Enforcement](#https-enforcement)
5. [Using with Firefox Containers](#using-with-firefox-containers)
6. [Reverse Proxying](#reverse-proxying)
1. [Nginx](#nginx)
8. [Contributing](#contributing)
9. [FAQ](#faq)
10. [Public Instances](#public-instances)
11. [Screenshots](#screenshots)
## Features
- No ads or sponsored content
- No JavaScript\*
- No cookies\*\*
- No tracking/linking of your personal IP address\*\*\*
- No AMP links
- No URL tracking tags (i.e. utm=%s)
- No referrer header
- Tor and HTTP/SOCKS proxy support
- Autocomplete/search suggestions
- POST request search and suggestion queries (when possible)
- View images at full res without site redirect (currently mobile only)
- Light/Dark/System theme modes (with support for [custom CSS theming](https://github.com/benbusby/whoogle-search/wiki/User-Contributed-CSS-Themes))
- Auto-generated Opera User Agents with random rotation
- 10 unique Opera-based UAs generated on startup from 115 language variants
- Randomly rotated for each search request to avoid detection patterns
- Cached across restarts with configurable refresh options
- Fallback to safe default UA if generation fails
- Optional display of current UA in search results footer
- Easy to install/deploy
- DDG-style bang (i.e. `!<tag> <query>`) searches
- User-defined [custom bangs](#custom-bangs)
- Optional location-based searching (i.e. results near \<city\>)
- Optional NoJS mode to view search results in a separate window with JavaScript blocked
- JSON output for results via content negotiation (see "JSON results (API)")
<sup>*No third party JavaScript. Whoogle can be used with JavaScript disabled, but if enabled, uses JavaScript for things like presenting search suggestions.</sup>
<sup>**No third party cookies. Whoogle uses server side cookies (sessions) to store non-sensitive configuration settings such as theme, language, etc. Just like with JavaScript, cookies can be disabled and not affect Whoogle's search functionality.</sup>
<sup>***If deployed to a remote server, or configured to send requests through a VPN, Tor, proxy, etc.</sup>
## Install
### Supported Platforms
Official Docker images are built for:
- **linux/amd64** (x86_64)
- **linux/arm64** (ARM 64-bit, Raspberry Pi 3/4/5, Apple Silicon)
**Note**: ARMv7 support (32-bit ARM, Raspberry Pi 2) was dropped in v1.2.0 due to incompatibility with modern security libraries on Alpine Linux. Users with ARMv7 devices can either:
- Use an older version (v1.1.x or earlier)
- Build locally with pinned dependencies (see notes in Dockerfile)
- Upgrade to a 64-bit OS if hardware supports it (Raspberry Pi 3+)
There are a few different ways to begin using the app, depending on your preferences:
___
### [Heroku Quick Deploy](https://heroku.com/about)
[](https://heroku.com/deploy?template=https://github.com/benbusby/whoogle-search/tree/main)
Provides:
- Easy Deployment of App
- A HTTPS url (https://\<your app name\>.herokuapp.com)
Notes:
- Requires a **PAID** Heroku Account.
- Sometimes has issues with auto-redirecting to `https`. Make sure to navigate to the `https` version of your app before adding as a default search engine.
___
### [Render](https://render.com)
Create an account on [render.com](https://render.com) and import the Whoogle repo with the following settings:
- Runtime: `Python 3`
- Build Command: `pip install -r requirements.txt`
- Run Command: `./run`
___
### [Repl.it](https://repl.it)
[](https://repl.it/github/benbusby/whoogle-search)
*Note: Requires a (free) Replit account*
Provides:
- Free deployment of app
- Free HTTPS url (https://\<app name\>.\<username\>\.repl\.co)
- Supports custom domains
- Downtime after periods of inactivity ([solution](https://repl.it/talk/learn/How-to-use-and-setup-UptimeRobot/9003)\)
___
### [Fly.io](https://fly.io)
You will need a [Fly.io](https://fly.io) account to deploy Whoogle.
#### Install the CLI: https://fly.io/docs/hands-on/installing/
#### Deploy the app
```bash
flyctl auth login
flyctl launch --image benbusby/whoogle-search:latest
```
The first deploy won't succeed because the default `internal_port` is wrong.
To fix this, open the generated `fly.toml` file, set `services.internal_port` to `5000` and run `flyctl launch` again.
Your app is now available at `https://<app-name>.fly.dev`.
Notes:
- Requires a [**PAID**](https://fly.io/docs/about/pricing/#free-allowances) Fly.io Account.
___
### [Koyeb](https://www.koyeb.com)
Use one of the following guides to install Whoogle on Koyeb:
1. Using GitHub: https://www.koyeb.com/docs/quickstart/deploy-with-git
2. Using Docker: https://www.koyeb.com/docs/quickstart/deploy-a-docker-application
___
### [RepoCloud](https://repocloud.io)
[](https://repocloud.io/details/?app_id=309)
1. Sign up for a free [RepoCloud account](https://repocloud.io) and receive free credits to get started.
2. Click "Deploy" to launch the app and access it instantly via your RepoCloud URL.
___
### [pipx](https://github.com/pipxproject/pipx#install-pipx)
Persistent install:
`pipx install https://github.com/benbusby/whoogle-search/archive/refs/heads/main.zip`
Sandboxed temporary instance:
`pipx run --spec git+https://github.com/benbusby/whoogle-search.git whoogle-search`
___
### pip
`pip install whoogle-search`
```bash
$ whoogle-search --help
usage: whoogle-search [-h] [--port <port number>] [--host <ip address>] [--debug] [--https-only] [--userpass <username:password>]
[--proxyauth <username:password>] [--proxytype <socks4|socks5|http>] [--proxyloc <location:port>]
Whoogle Search console runner
optional arguments:
-h, --help Show this help message and exit
--port <port number> Specifies a port to run on (default 5000)
--host <ip address> Specifies the host address to use (default 127.0.0.1)
--debug Activates debug mode for the server (default False)
--https-only Enforces HTTPS redirects for all requests
--userpass <username:password>
Sets a username/password basic auth combo (default None)
--proxyauth <username:password>
Sets a username/password for a HTTP/SOCKS proxy (default None)
--proxytype <socks4|socks5|http>
Sets a proxy type for all connections (default None)
--proxyloc <location:port>
Sets a proxy location for all connections (default None)
```
See the [available environment variables](#environment-variables) for additional configuration.
___
### Manual
*Note: `Content-Security-Policy` headers can be sent by Whoogle if you set `WHOOGLE_CSP`.*
#### Dependencies
- [Python3](https://www.python.org/downloads/)
- `libcurl4-openssl-dev` and `libssl-dev`
- macOS: `brew install openssl curl-openssl`
- Ubuntu: `sudo apt-get install -y libcurl4-openssl-dev libssl-dev`
- Arch: `pacman -S curl openssl`
#### Install
Clone the repo and run the following commands to start the app in a local-only environment:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./run
```
See the [available environment variables](#environment-variables) for additional configuration.
#### systemd Configuration
After building the virtual environment, you can add something like the following to `/lib/systemd/system/whoogle.service` to set up a Whoogle Search systemd service:
```ini
[Unit]
Description=Whoogle
[Service]
# Basic auth configuration, uncomment to enable
#Environment=WHOOGLE_USER=<username>
#Environment=WHOOGLE_PASS=<password>
# Proxy configuration, uncomment to enable
#Environment=WHOOGLE_PROXY_USER=<proxy username>
#Environment=WHOOGLE_PROXY_PASS=<proxy password>
#Environment=WHOOGLE_PROXY_TYPE=<proxy type (http|https|proxy4|proxy5)
#Environment=WHOOGLE_PROXY_LOC=<proxy host/ip>
# Site alternative configurations, uncomment to enable
# Note: If not set, the feature will still be available
# with default values.
#Environment=WHOOGLE_ALT_TW=farside.link/nitter
#Environment=WHOOGLE_ALT_YT=farside.link/invidious
#Environment=WHOOGLE_ALT_RD=farside.link/libreddit
#Environment=WHOOGLE_ALT_MD=farside.link/scribe
#Environment=WHOOGLE_ALT_TL=farside.link/lingva
#Environment=WHOOGLE_ALT_IMG=farside.link/rimgo
#Environment=WHOOGLE_ALT_WIKI=farside.link/wikiless
#Environment=WHOOGLE_ALT_IMDB=farside.link/libremdb
#Environment=WHOOGLE_ALT_QUORA=farside.link/quetre
#Environment=WHOOGLE_ALT_SO=farside.link/anonymousoverflow
# Load values from dotenv only
#Environment=WHOOGLE_DOTENV=1
# specify dotenv location if not in default location
#Environment=WHOOGLE_DOTENV_PATH=<path/to>/whoogle.env
Type=simple
User=<username>
# If installed as a package, add:
ExecStart=<python_install_dir>/python3 <whoogle_install_dir>/whoogle-search --host 127.0.0.1 --port 5000
# For example:
# ExecStart=/usr/bin/python3 /home/my_username/.local/bin/whoogle-search --host 127.0.0.1 --port 5000
# Otherwise if running the app from source, add:
ExecStart=<whoogle_repo_dir>/run
# For example:
# ExecStart=/var/www/whoogle-search/run
WorkingDirectory=<whoogle_repo_dir>
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=whoogle
[Install]
WantedBy=multi-user.target
```
Then,
```
sudo systemctl daemon-reload
sudo systemctl enable whoogle
sudo systemctl start whoogle
```
#### Tor Configuration *optional*
If routing your request through Tor you will need to make the following adjustments.
Due to the nature of interacting with Google through Tor we will need to be able to send signals to Tor and therefore authenticate with it.
There are two authentication methods, password and cookie. You will need to make changes to your torrc:
* Cookie
1. Uncomment or add the following lines in your torrc:
- `ControlPort 9051`
- `CookieAuthentication 1`
- `DataDirectoryGroupReadable 1`
- `CookieAuthFileGroupReadable 1`
2. Make the tor auth cookie readable:
- This is assuming that you are using a dedicated user to run whoogle. If you are using a different user replace `whoogle` with that user.
1. `chmod tor:whoogle /var/lib/tor`
2. `chmod tor:whoogle /var/lib/tor/control_auth_cookie`
3. Restart the tor service:
- `systemctl restart tor`
4. Set the Tor environment variable to 1, `WHOOGLE_CONFIG_TOR`. Refer to the [Environment Variables](#environment-variables) section for more details.
- This may be added in the systemd unit file or env file `WHOOGLE_CONFIG_TOR=1`
* Password
1. Run this command:
- `tor --hash-password {Your Password Here}`; put your password in place of `{Your Password Here}`.
- Keep the output of this command, you will be placing it in your torrc.
- Keep the password input of this command, you will be using it later.
2. Uncomment or add the following lines in your torrc:
- `ControlPort 9051`
- `HashedControlPassword {Place output here}`; put the output of the previous command in place of `{Place output here}`.
3. Now take the password from the first step and place it in the control.conf file within the whoogle working directory, ie. [misc/tor/control.conf](misc/tor/control.conf)
- If you want to place your password file in a different location set this location with the `WHOOGLE_TOR_CONF` environment variable. Refer to the [Environment Variables](#environment-variables) section for more details.
4. Heavily restrict access to control.conf to only be readable by the user running whoogle:
- `chmod 400 control.conf`
5. Finally set the Tor environment variable and use password variable to 1, `WHOOGLE_CONFIG_TOR` and `WHOOGLE_TOR_USE_PASS`. Refer to the [Environment Variables](#environment-variables) section for more details.
- These may be added to the systemd unit file or env file:
- `WHOOGLE_CONFIG_TOR=1`
- `WHOOGLE_TOR_USE_PASS=1`
___
### Manual (Docker)
1. Ensure the Docker daemon is running, and is accessible by your user account
- To add user permissions, you can execute `sudo usermod -aG docker yourusername`
- Running `docker ps` should return something besides an error. If you encounter an error saying the daemon isn't running, try `sudo systemctl start docker` (Linux) or ensure the docker tool is running (Windows/macOS).
2. Clone and deploy the docker app using a method below:
#### Docker CLI
Through Docker Hub:
```bash
docker pull benbusby/whoogle-search
docker run --publish 5000:5000 --detach --name whoogle-search benbusby/whoogle-search:latest
```
or with docker-compose:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
docker-compose up
```
or by building yourself:
```bash
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
docker build --tag whoogle-search:1.0 .
docker run --publish 5000:5000 --detach --name whoogle-search whoogle-search:1.0
```
Optionally, you can also enable some of the following environment variables to further customize your instance:
```bash
docker run --publish 5000:5000 --detach --name whoogle-search \
-e WHOOGLE_USER=username \
-e WHOOGLE_PASS=password \
-e WHOOGLE_PROXY_USER=username \
-e WHOOGLE_PROXY_PASS=password \
-e WHOOGLE_PROXY_TYPE=socks5 \
-e WHOOGLE_PROXY_LOC=ip \
whoogle-search:1.0
```
And kill with: `docker rm --force whoogle-search`
#### Using [Heroku CLI](https://devcenter.heroku.com/articles/heroku-cli)
```bash
heroku login
heroku container:login
git clone https://github.com/benbusby/whoogle-search.git
cd whoogle-search
heroku create
heroku container:push web
heroku container:release web
heroku open
```
This series of commands can take a while, but once you run it once, you shouldn't have to run it again. The final command, `heroku open` will launch a tab in your web browser, where you can test out Whoogle and even [set it as your primary search engine](https://github.com/benbusby/whoogle-search#set-whoogle-as-your-primary-search-engine).
You may also edit environment variables from your app’s Settings tab in the Heroku Dashboard.
___
### Arch Linux & Arch-based Distributions
There is an [AUR package available](https://aur.archlinux.org/packages/whoogle-git/), as well as a pre-built and daily updated package available at [Chaotic-AUR](https://chaotic.cx).
___
### Helm chart for Kubernetes
To use the Kubernetes Helm Chart:
1. Ensure you have [Helm](https://helm.sh/docs/intro/install/) `>=3.0.0` installed
2. Clone this repository
3. Update [charts/whoogle/values.yaml](./charts/whoogle/values.yaml) as desired
4. Run `helm upgrade --install whoogle ./charts/whoogle`
___
#### Using your own server, or alternative container deployment
There are other methods for deploying docker containers that are well outlined in [this article](https://rollout.io/blog/the-shortlist-of-docker-hosting/), but there are too many to describe set up for each here. Generally it should be about the same amount of effort as the Heroku deployment.
Depending on your preferences, you can also deploy the app yourself on your own infrastructure. This route would require a few extra steps:
- A server (I personally recommend [Digital Ocean](https://www.digitalocean.com/pricing/) or [Linode](https://www.linode.com/pricing/), their cheapest tiers will work fine)
- Your own URL (I suppose this is optional, but recommended)
- SSL certificates (free through [Let's Encrypt](https://letsencrypt.org/getting-started/))
- A bit more experience or willingness to work through issues
## Environment Variables
There are a few optional environment variables available for customizing a Whoogle instance. These can be set manually, or copied into `whoogle.env` and enabled for your preferred deployment method:
- Local runs: Set `WHOOGLE_DOTENV=1` before running
- With `docker-compose`: Uncomment the `env_file` option
- With `docker build/run`: Add `--env-file ./whoogle.env` to your command
| Variable | Description |
| -------------------- | ----------------------------------------------------------------------------------------- |
| WHOOGLE_URL_PREFIX | The URL prefix to use for the whoogle instance (i.e. "/whoogle") |
| WHOOGLE_DOTENV | Load environment variables in `whoogle.env` |
| WHOOGLE_DOTENV_PATH | The path to `whoogle.env` if not in default location |
| WHOOGLE_USER | The username for basic auth. WHOOGLE_PASS must also be set if used. |
| WHOOGLE_PASS | The password for basic auth. WHOOGLE_USER must also be set if used. |
| WHOOGLE_PROXY_USER | The username of the proxy server. |
| WHOOGLE_PROXY_PASS | The password of the proxy server. |
| WHOOGLE_PROXY_TYPE | The type of the proxy server. Can be "socks5", "socks4", or "http". |
| WHOOGLE_PROXY_LOC | The location of the proxy server (host or ip). |
| WHOOGLE_USER_AGENT | The desktop user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
| WHOOGLE_USER_AGENT_MOBILE | The mobile user agent to use when using 'env_conf' option. Leave empty to use auto-generated Opera UAs. |
| WHOOGLE_USE_CLIENT_USER_AGENT | Enable to use your own user agent for all requests. Defaults to false. |
| WHOOGLE_UA_CACHE_PERSISTENT | Whether to persist auto-generated UAs across restarts. Set to '0' to regenerate on each startup. Default '1'. |
| WHOOGLE_UA_CACHE_REFRESH_DAYS | Auto-refresh UA cache after N days. Set to '0' to never refresh (cache persists indefinitely). Default '0'. |
| WHOOGLE_UA_LIST_FILE | Path to text file containing custom UA strings (one per line). When set, uses these instead of auto-generated UAs. |
| WHOOGLE_REDIRECTS | Specify sites that should be redirected elsewhere. See [custom redirecting](#custom-redirecting). |
| EXPOSE_PORT | The port where Whoogle will be exposed. |
| HTTPS_ONLY | Enforce HTTPS. (See [here](https://github.com/benbusby/whoogle-search#https-enforcement)) |
| WHOOGLE_ALT_TW | The twitter.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_YT | The youtube.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_RD | The reddit.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_TL | The Google Translate alternative to use. This is used for all "translate ____" searches. Set to "" to disable. |
| WHOOGLE_ALT_MD | The medium.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMG | The imgur.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_WIKI | The wikipedia.org alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_IMDB | The imdb.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_QUORA | The quora.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_ALT_SO | The stackoverflow.com alternative to use when site alternatives are enabled in the config. Set to "" to disable. |
| WHOOGLE_AUTOCOMPLETE | Controls visibility of autocomplete/search suggestions. Default on -- use '0' to disable. |
| WHOOGLE_MINIMAL | Remove everything except basic result cards from all search queries. |
| WHOOGLE_CSP | Sets a default set of 'Content-Security-Policy' headers |
| WHOOGLE_TOR_SERVICE | Enable/disable the Tor service on startup. Default on -- use '0' to disable. |
| WHOOGLE_TOR_USE_PASS | Use password authentication for tor control port. |
| WHOOGLE_TOR_CONF | The absolute path to the config file containing the password for the tor control port. Default: ./misc/tor/control.conf WHOOGLE_TOR_PASS must be 1 for this to work.|
| WHOOGLE_SHOW_FAVICONS | Show/hide favicons next to search result URLs. Default on. |
| WHOOGLE_UPDATE_CHECK | Enable/disable the automatic daily check for new versions of Whoogle. Default on. |
| WHOOGLE_FALLBACK_ENGINE_URL | Set a fallback Search Engine URL when there is internal server error or instance is rate-limited. Search query is appended to the end of the URL (eg. https://duckduckgo.com/?k1=-1&q=). |
| WHOOGLE_BUNDLE_STATIC | When set to 1, serve a single bundled CSS and JS file generated at startup to reduce requests. Default off. |
| WHOOGLE_HTTP2 | Enable HTTP/2 for upstream requests (via httpx). Default on — set to 0 to force HTTP/1.1. |
### Config Environment Variables
These environment variables allow setting default config values, but can be overwritten manually by using the home page config menu. These allow a shortcut for destroying/rebuilding an instance to the same config state every time.
| Variable | Description |
| ------------------------------------ | --------------------------------------------------------------- |
| WHOOGLE_CONFIG_DISABLE | Hide config from UI and disallow changes to config by client |
| WHOOGLE_CONFIG_COUNTRY | Filter results by hosting country |
| WHOOGLE_CONFIG_LANGUAGE | Set interface language |
| WHOOGLE_CONFIG_SEARCH_LANGUAGE | Set search result language |
| WHOOGLE_CONFIG_BLOCK | Block websites from search results (use comma-separated list) |
| WHOOGLE_CONFIG_BLOCK_TITLE | Block search result with a REGEX filter on title |
| WHOOGLE_CONFIG_BLOCK_URL | Block search result with a REGEX filter on URL |
| WHOOGLE_CONFIG_THEME | Set theme mode (light, dark, or system) |
| WHOOGLE_CONFIG_SAFE | Enable safe searches |
| WHOOGLE_CONFIG_ALTS | Use social media site alternatives (nitter, invidious, etc) |
| WHOOGLE_CONFIG_NEAR | Restrict results to only those near a particular city |
| WHOOGLE_CONFIG_TOR | Use Tor routing (if available) |
| WHOOGLE_CONFIG_NEW_TAB | Always open results in new tab |
| WHOOGLE_CONFIG_VIEW_IMAGE | Enable View Image option |
| WHOOGLE_CONFIG_GET_ONLY | Search using GET requests only |
| WHOOGLE_CONFIG_URL | The root url of the instance (`https://<your url>/`) |
| WHOOGLE_CONFIG_STYLE | The custom CSS to use for styling (should be single line) |
| WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED | Encrypt preferences token, requires preferences key |
| WHOOGLE_CONFIG_PREFERENCES_KEY | Key to encrypt preferences in URL (REQUIRED to show url) |
| WHOOGLE_CONFIG_ANON_VIEW | Include the "anonymous view" option for each search result |
| WHOOGLE_CONFIG_SHOW_USER_AGENT | Display the User Agent string used for search in results footer |
### Google Custom Search (BYOK) Environment Variables
These environment variables configure the "Bring Your Own Key" feature for Google Custom Search API:
| Variable | Description |
| -------------------- | ----------------------------------------------------------------------------------------- |
| WHOOGLE_CSE_API_KEY | Your Google API key with Custom Search API enabled |
| WHOOGLE_CSE_ID | Your Custom Search Engine ID (cx parameter) |
| WHOOGLE_USE_CSE | Enable Custom Search API by default (set to '1' to enable) |
## Google Custom Search (BYOK)
If Google blocks traditional search scraping (captchas, IP bans), you can use your own Google Custom Search Engine credentials as a fallback. This uses Google's official API with your own quota.
### Why Use This?
- **Reliability**: Official API never gets blocked or rate-limited (within quota)
- **Speed**: Direct JSON responses are faster than HTML scraping
- **Fallback**: Works when all scraping workarounds fail
- **Privacy**: Your searches still don't go through third parties—they go directly to Google with your own API key
### Limitations vs Standard Whoogle
| Feature | Standard Scraping | CSE API |
|------------------|--------------------------|---------------------|
| Daily limit | None (until blocked) | 100 free, then paid |
| Image search | ✅ Full support | ✅ Supported |
| News/Videos tabs | ✅ | ❌ Web results only |
| Speed | Slower (HTML parsing) | Faster (JSON) |
| Reliability | Can be blocked | Always works |
### Setup Steps
#### 1. Create a Custom Search Engine
1. Go to [Programmable Search Engine](https://programmablesearchengine.google.com/controlpanel/all)
2. Click **"Add"** to create a new search engine
3. Under "What to search?", select **"Search the entire web"**
4. Give it a name (e.g., "My Whoogle CSE")
5. Click **"Create"**
6. Copy your **Search Engine ID**
#### 2. Get an API Key
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select an existing one
3. Go to **APIs & Services** → **Library**
4. Search for **"Custom Search API"** and click **Enable**
5. Go to **APIs & Services** → **Credentials**
6. Click **"Create Credentials"** → **"API Key"**
7. Copy your API key (looks like `AIza...`)
#### 3. (Recommended) Restrict Your API Key
To prevent misuse if your key is exposed:
1. Click on your API key in Credentials
2. Under **"API restrictions"**, select **"Restrict key"**
3. Choose only **"Custom Search API"**
4. Under **"Application restrictions"**, consider adding IP restrictions if using on a server
5. Click **Save**
#### 4. Configure Whoogle
**Option A: Via Settings UI**
1. Open your Whoogle instance
2. Click the **Config** button
3. Scroll to "Google Custom Search (BYOK)" section
4. Enter your API Key and CSE ID
5. Check "Use Custom Search API"
6. Click **Apply**
**Option B: Via Environment Variables**
```bash
WHOOGLE_CSE_API_KEY=AIza...
WHOOGLE_CSE_ID=23f...
WHOOGLE_USE_CSE=1
```
### Pricing & Avoiding Charges
| Tier | Queries | Cost |
|------|------------------|-----------------------|
| Free | 100/day | $0 |
| Paid | Up to 10,000/day | $5 per 1,000 queries |
**⚠️ To avoid unexpected charges:**
1. **Don't add a payment method** to Google Cloud (safest option—API stops at 100/day)
2. **Set a billing budget alert**: [Billing → Budgets & Alerts](https://console.cloud.google.com/billing/budgets)
3. **Cap API usage**: APIs & Services → Custom Search API → Quotas → Set "Queries per day" to 100
4. **Monitor usage**: APIs & Services → Custom Search API → Metrics
### Troubleshooting
| Error | Cause | Solution |
|---------------------|---------------------------|-----------------------------------------------------------------|
| "API key not valid" | Invalid or restricted key | Check key in Cloud Console, ensure Custom Search API is enabled |
| "Quota exceeded" | Hit 100/day limit | Wait until midnight PT, or enable billing |
| "Invalid CSE ID" | Wrong cx parameter | Copy ID from Programmable Search Engine control panel |
## Usage
Same as most search engines, with the exception of filtering by time range.
To filter by a range of time, append ":past <time>" to the end of your search, where <time> can be `hour`, `day`, `month`, or `year`. Example: `coronavirus updates :past hour`
### JSON results (API)
Whoogle can return filtered results as JSON using the same sanitization rules as the HTML view.
- Send `Accept: application/json` or append `format=json` to the search URL.
- Example: `/search?q=whoogle` with `Accept: application/json`, or `/search?q=whoogle&format=json`.
- Response shape:
```
{
"query": "whoogle",
"search_type": "",
"results": [
{"href": "https://example.com/page", "text": "Example Page"},
...
]
}
```
Special cases:
- Feeling Lucky returns HTTP 303 with body `{ "redirect": "<url>" }`.
- Temporary blocks (captcha) return HTTP 503 with `{ "blocked": true, "error_message": "...", "query": "..." }`.
## Extra Steps
### Set Whoogle as your primary search engine
*Note: If you're using a reverse proxy to run Whoogle Search, make sure the "Root URL" config option on the home page is set to your URL before going through these steps.*
Browser settings:
- Firefox (Desktop)
- Version 89+
- Navigate to your app's url, right click the address bar, and select "Add Search Engine".
- Previous versions
- Navigate to your app's url, and click the 3 dot menu in the address bar. At the bottom, there should be an option to "Add Search Engine".
- Once you've added the new search engine, open your Firefox Preferences menu, click "Search" in the left menu, and use the available dropdown to select "Whoogle" from the list.
- **Note**: If your Whoogle instance uses Firefox Containers, you'll need to [go through the steps here](#using-with-firefox-containers) to get it working properly.
- Firefox (iOS)
- In the mobile app Settings page, tap "Search" within the "General" section. There should be an option titled "Add Search Engine" to select. It should prompt you to enter a title and search query url - use the following elements to fill out the form:
- Title: "Whoogle"
- URL: `http[s]://\<your whoogle url\>/search?q=%s`
- Firefox (Android)
- Version <79.0.0
- Navigate to your app's url
- Long-press on the search text field
- Click the "Add Search Engine" menu item
- Select a name and click ok
- Click the 3 dot menu in the top right
- Navigate to the settings menu and select the "Search" sub-menu
- Select Whoogle and press "Set as default"
- Version >=79.0.0
- Click the 3 dot menu in the top right
- Navigate to the settings menu and select the "Search" sub-menu
- Click "Add search engine"
- Select the 'Other' radio button
- Name: "Whoogle"
- Search string to use: `https://\<your whoogle url\>/search?q=%s`
- [Alfred](https://www.alfredapp.com/) (Mac OS X)
1. Go to `Alfred Preferences` > `Features` > `Web Search` and click `Add Custom Search`. Then configure these settings
- Search URL: `https://\<your whoogle url\>/search?q={query}`
- Title: `Whoogle for '{query}'` (or whatever you want)
- Keyword: `whoogle`
2. Go to `Default Results` and click the `Setup fallback results` button. Click `+` and add Whoogle, then drag it to the top.
- Chrome/Chromium-based Browsers
- Automatic
- Visit the home page of your Whoogle Search instance -- this will automatically add the search engine if the [requirements](https://www.chromium.org/tab-to-search/) are met (GET request, no OnSubmit script, no path). If not, you can add it manually.
- Manual
- Under search engines > manage search engines > add, manually enter your Whoogle instance details with a `<whoogle url>/search?q=%s` formatted search URL.
### Custom Redirecting
You can set custom site redirects using the `WHOOGLE_REDIRECTS` environment
variable. A lot of sites, such as Twitter, Reddit, etc, have built-in redirects
to [Farside links](https://sr.ht/~benbusby/farside), but you may want to define
your own.
To do this, you can use the following syntax:
```
WHOOGLE_REDIRECTS="<parent_domain>:<new_domain>"
```
For example, if you want to redirect from "badsite.com" to "goodsite.com":
```
WHOOGLE_REDIRECTS="badsite.com:goodsite.com"
```
This can be used for multiple sites as well, with comma separation:
```
WHOOGLE_REDIRECTS="badA.com:goodA.com,badB.com:goodB.com"
```
NOTE: Do not include "http(s)://" when defining your redirect.
### Custom Bangs
You can create your own custom bangs. By default, bangs are stored in
`app/static/bangs`. See [`00-whoogle.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/bangs/00-whoogle.json)
for an example. These are parsed in alphabetical order with later files
overriding bangs set in earlier files, with the exception that DDG bangs
(downloaded to `app/static/bangs/bangs.json`) are always parsed first. Thus,
any custom bangs will always override the DDG ones.
### Prevent Downtime (Heroku only)
Part of the deal with Heroku's free tier is that you're allocated 550 hours/month (meaning it can't stay active 24/7), and the app is temporarily shut down after 30 minutes of inactivity. Once it becomes inactive, any Whoogle searches will still work, but it'll take an extra 10-15 seconds for the app to come back online before displaying the result, which can be frustrating if you're in a hurry.
A good solution for this is to set up a simple cronjob on any device at your home that is consistently powered on and connected to the internet (in my case, a PiHole worked perfectly). All the device needs to do is fetch app content on a consistent basis to keep the app alive in whatever ~17 hour window you want it on (17 hrs * 31 days = 527, meaning you'd still have 23 leftover hours each month if you searched outside of your target window).
For instance, adding `*/20 7-23 * * * curl https://<your heroku app name>.herokuapp.com > /home/<username>/whoogle-refresh` will fetch the home page of the app every 20 minutes between 7am and midnight, allowing for downtime from midnight to 7am. And again, this wouldn't be a hard limit - you'd still have plenty of remaining hours of uptime each month in case you were searching after this window has closed.
Since the instance is destroyed and rebuilt after inactivity, config settings will be reset once the app enters downtime. If you have configuration settings active that you'd like to keep between periods of downtime (like dark mode for example), you could instead add `*/20 7-23 * * * curl -d "dark=1" -X POST https://<your heroku app name>.herokuapp.com/config > /home/<username>/whoogle-refresh` to keep these settings more or less permanent, and still keep the app from entering downtime when you're using it.
### HTTPS Enforcement
Only needed if your setup requires Flask to redirect to HTTPS on its own -- generally this is something that doesn't need to be handled by Whoogle Search.
Note: You should have your own domain name and [an https certificate](https://letsencrypt.org/getting-started/) in order for this to work properly.
- Heroku: Ensure that the `Root URL` configuration on the home page begins with `https://` and not `http://`
- Docker build: Add `--build-arg use_https=1` to your run command
- Docker image: Set the environment variable HTTPS_ONLY=1
- Pip/Pipx: Add the `--https-only` flag to the end of the `whoogle-search` command
- Default `run` script: Modify the script locally to include the `--https-only` flag at the end of the python run command
### Using with Firefox Containers
Unfortunately, Firefox Containers do not currently pass through `POST` requests (the default) to the engine, and Firefox caches the opensearch template on initial page load. To get around this, you can take the following steps to get it working as expected:
1. Remove any existing Whoogle search engines from Firefox settings
2. Enable `GET Requests Only` in Whoogle config
3. Clear Firefox cache
4. Restart Firefox
5. Navigate to Whoogle instance and [re-add the engine](#set-whoogle-as-your-primary-search-engine)
### Reverse Proxying
#### Nginx
Here is a sample Nginx config for Whoogle:
```
server {
server_name your_domain_name.com;
access_log /dev/null;
error_log /dev/null;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $host;
proxy_set_header X-NginX-Proxy true;
proxy_set_header X-Forwarded-Host $http_host;
proxy_pass http://localhost:5000;
}
}
```
You can then add SSL support using LetsEncrypt by following a guide such as [this one](https://www.nginx.com/blog/using-free-ssltls-certificates-from-lets-encrypt-with-nginx/).
### Static asset bundling (optional)
Whoogle can optionally serve a single bundled CSS and JS to reduce the number of HTTP requests.
- Enable by setting `WHOOGLE_BUNDLE_STATIC=1` and restarting the app.
- On startup, Whoogle concatenates local CSS/JS into hashed files under `app/static/build/` and templates will prefer those bundles.
- When disabled (default), templates load individual CSS/JS files for easier development.
- Note: Theme CSS (`*-theme.css`) are still loaded separately to honor user theme selection.
## User Agent Generator Tool
A standalone command-line tool is available for generating Opera User Agent strings on demand:
```bash
# Generate 10 User Agent strings (default)
python misc/generate_uas.py
# Generate custom number of UAs
python misc/generate_uas.py 20
```
This tool is useful for:
- Testing different UA strings
- Generating UAs for other projects
- Verifying UA generation patterns
- Debugging UA-related issues
## Using Custom User Agent Lists
Instead of using auto-generated Opera UA strings, you can provide your own list of User Agent strings for Whoogle to use.
### Setup
1. Create a text file with your preferred UA strings (one per line):
```
Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.13337/22.478; U; en) Presto/2.4.15 Version/10.00
Opera/9.80 (Android; Linux; Opera Mobi/498; U; en) Presto/2.12.423 Version/10.1
```
2. Set the `WHOOGLE_UA_LIST_FILE` environment variable to point to your file:
```bash
# Docker
docker run -e WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt ...
# Docker Compose
environment:
- WHOOGLE_UA_LIST_FILE=/config/my_user_agents.txt
# Manual/systemd
export WHOOGLE_UA_LIST_FILE=/path/to/my_user_agents.txt
```
### Priority Order
Whoogle uses the following priority when loading User Agent strings:
1. **Custom UA list file** (if `WHOOGLE_UA_LIST_FILE` is set and valid)
2. **Cached auto-generated UAs** (if cache exists and is valid)
3. **Newly generated UAs** (if no cache or cache expired)
### Tips
- You can use the output from `misc/check_google_user_agents.py` as your custom UA list
- Generate a list with `python misc/generate_uas.py 50 2>/dev/null > my_uas.txt`
- Mix different UA types (Opera, Firefox, Chrome) for more variety
- Keep the file readable by Whoogle (proper permissions)
- One UA string per line, blank lines are ignored
### Example Workflow
```bash
# Generate and test UAs, save working ones
python misc/generate_uas.py 100 2>/dev/null > candidate_uas.txt
python misc/check_google_user_agents.py candidate_uas.txt --output working_uas.txt
# Use the working UAs with Whoogle
export WHOOGLE_UA_LIST_FILE=./working_uas.txt
./run
```
## User Agent Testing Tool
Whoogle now includes a comprehensive testing tool (`misc/check_google_user_agents.py`) to verify which User Agent strings successfully return Google search results without triggering blocks, JavaScript-only pages, or browser upgrade prompts.
### Usage
```bash
# Test all UAs from a file
python misc/check_google_user_agents.py UAs.txt
# Save working UAs to a file (appends incrementally)
python misc/check_google_user_agents.py UAs.txt --output working_uas.txt
# Use a specific search query
python misc/check_google_user_agents.py UAs.txt --query "python programming"
# Verbose mode to see detailed results
python misc/check_google_user_agents.py UAs.txt --output working.txt --verbose
# Adjust delay between requests (default: 0.5 seconds)
python misc/check_google_user_agents.py UAs.txt --delay 1.0
# Set request timeout (default: 10 seconds)
python misc/check_google_user_agents.py UAs.txt --timeout 15.0
```
### Features
- **Incremental Results**: Working UAs are saved immediately to the output file (append mode), so progress is preserved even if interrupted
- **Duplicate Detection**: Automatically skips UAs already in the output file when resuming
- **Random Query Cycling**: By default, cycles through diverse search queries to simulate realistic usage patterns
- **Rate Limit Detection**: Detects and reports Google rate limiting with recovery instructions
- **Comprehensive Validation**: Checks for:
- HTTP status codes (blocks, server errors, rate limits)
- Block markers (unusual traffic, upgrade browser messages)
- Success markers (actual search result HTML elements)
- JavaScript-only pages and redirects
- Response size validation
### Testing Methodology
The tool evaluates UAs against multiple criteria:
1. **HTTP Status**: Rejects 4xx/5xx errors, detects 429 rate limits
2. **Block Detection**: Searches for Google's block messages (CAPTCHA, unusual traffic, etc.)
3. **JavaScript Detection**: Identifies JS-only pages and noscript redirects
4. **Result Validation**: Confirms presence of actual search result HTML elements
5. **Content Analysis**: Validates response size and structure
This tool was used to discover and validate the working Opera UA patterns that power Whoogle's auto-generation feature.
## Known Issues
### User Agent Strings and Image Search
**Issue**: Most, if not all, of the auto-generated Opera User Agent strings may fail when performing **image searches** on Google. This appears to be a limitation with how Google's image search validates User Agent strings.
**Impact**:
- Regular web searches work correctly with generated UAs
- Image search may return errors or no results
## Contributing
Under the hood, Whoogle is a basic Flask app with the following structure:
- `app/`
- `routes.py`: Primary app entrypoint, contains all API routes
- `request.py`: Handles all outbound requests, including proxied/Tor connectivity
- `filter.py`: Functions and utilities used for filtering out content from upstream Google search results
- `utils/`
- `bangs.py`: All logic related to handling DDG-style "bang" queries
- `results.py`: Utility functions for interpreting/modifying individual search results
- `search.py`: Creates and handles new search queries
- `session.py`: Miscellaneous methods related to user sessions
- `ua_generator.py`: Auto-generates Opera User Agent strings with pattern-based randomization
- `templates/`
- `index.html`: The home page template
- `display.html`: The search results template
- `header.html`: A general "top of the page" query header for desktop and mobile
- `search.html`: An iframe-able search page
- `logo.html`: A template consisting mostly of the Whoogle logo as an SVG (separated to help keep `index.html` a bit cleaner)
- `opensearch.xml`: A template used for supporting [OpenSearch](https://developer.mozilla.org/en-US/docs/Web/OpenSearch).
- `imageresults.html`: An "experimental" template used for supporting the "Full Size" image feature on desktop.
- `static/<css|js>`
- CSS/JavaScript files, should be self-explanatory
- `static/settings`
- Key-value JSON files for establishing valid configuration values
If you're new to the project, the easiest way to get started would be to try fixing [an open bug report](https://github.com/benbusby/whoogle-search/issues?q=is%3Aissue+is%3Aopen+label%3Abug). If there aren't any open, or if the open ones are too stale, try taking on a [feature request](https://github.com/benbusby/whoogle-search/issues?q=is%3Aissue+is%3Aopen+label%3Aenhancement). Generally speaking, if you can write something that has any potential of breaking down in the future, you should write a test for it.
The project follows the [PEP 8 Style Guide](https://www.python.org/dev/peps/pep-0008/), but is liable to change. Static typing should always be used when possible. Function documentation is greatly appreciated, and typically follows the below format:
```python
def contains(x: list, y: int) -> bool:
"""Check a list (x) for the presence of an element (y)
Args:
x: The list to inspect
y: The int to look for
Returns:
bool: True if the list contains the item, otherwise False
"""
return y in x
```
#### Translating
Whoogle currently supports translations using [`translations.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/settings/translations.json). Language values in this file need to match the "value" of the according language in [`languages.json`](https://github.com/benbusby/whoogle-search/blob/main/app/static/settings/languages.json) (i.e. "lang_en" for English, "lang_es" for Spanish, etc). After you add a new set of translations to `translations.json`, open a PR with your changes and they will be merged in as soon as possible.
## FAQ
**What's the difference between this and [Searx](https://github.com/asciimoo/searx)?**
Whoogle is intended to only ever be deployed to private instances by individuals of any background, with as little effort as possible. Prior knowledge of/experience with the command line or deploying applications is not necessary to deploy Whoogle, which isn't the case with Searx. As a result, Whoogle is missing some features of Searx in order to be as easy to deploy as possible.
Whoogle also only uses Google search results, not Bing/Quant/etc, and uses the existing Google search UI to make the transition away from Google search as unnoticeable as possible.
I'm a huge fan of Searx though and encourage anyone to use that instead if they want access to other search engines/a different UI/more configuration.
**Why does the image results page look different?**
A lot of the app currently piggybacks on Google's existing support for fetching results pages with JavaScript disabled. To their credit, they've done an excellent job with styling pages, but it seems that the image results page - particularly on mobile - is a little rough. Moving forward, with enough interest, I'd like to transition to fetching the results and parsing them into a unique Whoogle-fied interface that I can style myself.
## Public Instances
*Note: Use public instances at your own discretion. The maintainers of Whoogle do not personally validate the integrity of any other instances. Popular public instances are more likely to be rate-limited or blocked.*
| Website | Country | Language | Cloudflare |
|-|-|-|-|
| [https://search.garudalinux.org](https://search.garudalinux.org) | 🇫🇮 FI | Multi-choice | ✅ |
| [https://whoogle.privacydev.net](https://whoogle.privacydev.net) | 🇫🇷 FR | English | |
| [https://whoogle.lunar.icu](https://whoogle.lunar.icu) | 🇩🇪 DE | Multi-choice | ✅ |
* A checkmark in the "Cloudflare" category here refers to the use of the reverse proxy, [Cloudflare](https://cloudflare.com). The checkmark will not be listed for a site which uses Cloudflare DNS but rather the proxying service which grants Cloudflare the ability to monitor traffic to the website.
#### Onion Instances
| Website | Country | Language |
|-|-|-|
NONE of the existing Onion accessible sites appear to be live anymore
## Screenshots
#### Desktop

#### Mobile

================================================
FILE: app/__init__.py
================================================
from app.filter import clean_query
from app.request import send_tor_signal
from app.utils.session import generate_key
from app.utils.bangs import gen_bangs_json, load_all_bangs
from app.utils.misc import gen_file_hash, read_config_bool
from app.utils.ua_generator import load_ua_pool
from base64 import b64encode
from bs4 import MarkupResemblesLocatorWarning
from datetime import datetime, timedelta
from dotenv import load_dotenv
from flask import Flask
import json
import logging.config
import os
import sys
from stem import Signal
import threading
import warnings
from werkzeug.middleware.proxy_fix import ProxyFix
from app.services.http_client import HttpxClient
from app.services.provider import close_all_clients
from app.version import __version__
app = Flask(__name__, static_folder=os.path.join(
os.path.dirname(os.path.abspath(__file__)), 'static'))
app.wsgi_app = ProxyFix(app.wsgi_app)
# look for WHOOGLE_ENV, else look in parent directory
dot_env_path = os.getenv(
"WHOOGLE_DOTENV_PATH",
os.path.join(os.path.dirname(os.path.abspath(__file__)), "../whoogle.env"))
# Load .env file if enabled
if os.path.exists(dot_env_path):
load_dotenv(dot_env_path)
app.enc_key = generate_key()
if read_config_bool('HTTPS_ONLY'):
app.config['SESSION_COOKIE_NAME'] = '__Secure-session'
app.config['SESSION_COOKIE_SECURE'] = True
app.config['VERSION_NUMBER'] = __version__
app.config['APP_ROOT'] = os.getenv(
'APP_ROOT',
os.path.dirname(os.path.abspath(__file__)))
app.config['STATIC_FOLDER'] = os.getenv(
'STATIC_FOLDER',
os.path.join(app.config['APP_ROOT'], 'static'))
app.config['BUILD_FOLDER'] = os.path.join(
app.config['STATIC_FOLDER'], 'build')
app.config['CACHE_BUSTING_MAP'] = {}
app.config['BUNDLE_STATIC'] = read_config_bool('WHOOGLE_BUNDLE_STATIC')
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/languages.json'), 'r', encoding='utf-8') as f:
app.config['LANGUAGES'] = json.load(f)
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/countries.json'), 'r', encoding='utf-8') as f:
app.config['COUNTRIES'] = json.load(f)
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/time_periods.json'), 'r', encoding='utf-8') as f:
app.config['TIME_PERIODS'] = json.load(f)
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/translations.json'), 'r', encoding='utf-8') as f:
app.config['TRANSLATIONS'] = json.load(f)
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/themes.json'), 'r', encoding='utf-8') as f:
app.config['THEMES'] = json.load(f)
with open(os.path.join(app.config['STATIC_FOLDER'], 'settings/header_tabs.json'), 'r', encoding='utf-8') as f:
app.config['HEADER_TABS'] = json.load(f)
app.config['CONFIG_PATH'] = os.getenv(
'CONFIG_VOLUME',
os.path.join(app.config['STATIC_FOLDER'], 'config'))
app.config['DEFAULT_CONFIG'] = os.path.join(
app.config['CONFIG_PATH'],
'config.json')
app.config['CONFIG_DISABLE'] = read_config_bool('WHOOGLE_CONFIG_DISABLE')
app.config['SESSION_FILE_DIR'] = os.path.join(
app.config['CONFIG_PATH'],
'session')
# Maximum session file size in bytes (4KB limit to prevent abuse and disk exhaustion)
# Session files larger than this are ignored during cleanup to avoid processing
# potentially malicious or corrupted files
app.config['MAX_SESSION_SIZE'] = 4000
app.config['BANG_PATH'] = os.getenv(
'CONFIG_VOLUME',
os.path.join(app.config['STATIC_FOLDER'], 'bangs'))
app.config['BANG_FILE'] = os.path.join(
app.config['BANG_PATH'],
'bangs.json')
# Global services registry (simple DI)
app.services = {}
@app.teardown_appcontext
def _teardown_clients(exception):
try:
close_all_clients()
except Exception:
pass
# Ensure all necessary directories exist
if not os.path.exists(app.config['CONFIG_PATH']):
os.makedirs(app.config['CONFIG_PATH'])
if not os.path.exists(app.config['SESSION_FILE_DIR']):
os.makedirs(app.config['SESSION_FILE_DIR'])
if not os.path.exists(app.config['BANG_PATH']):
os.makedirs(app.config['BANG_PATH'])
if not os.path.exists(app.config['BUILD_FOLDER']):
os.makedirs(app.config['BUILD_FOLDER'])
# Initialize User Agent pool
app.config['UA_CACHE_PATH'] = os.path.join(app.config['CONFIG_PATH'], 'ua_cache.json')
try:
app.config['UA_POOL'] = load_ua_pool(app.config['UA_CACHE_PATH'], count=10)
except Exception as e:
# If UA pool loading fails, log warning and set empty pool
# The gen_user_agent function will handle the fallback
print(f"Warning: Could not initialize UA pool: {e}")
app.config['UA_POOL'] = []
# Session values - Secret key management
# Priority: environment variable → file → generate new
def get_secret_key():
"""Load or generate secret key with validation.
Priority order:
1. WHOOGLE_SECRET_KEY environment variable
2. Existing key file
3. Generate new key and save to file
Returns:
str: Valid secret key for Flask sessions
"""
# Check environment variable first
env_key = os.getenv('WHOOGLE_SECRET_KEY', '').strip()
if env_key:
# Validate env key has minimum length
if len(env_key) >= 32:
return env_key
else:
print(f"Warning: WHOOGLE_SECRET_KEY too short ({len(env_key)} chars, need 32+). Using file/generated key instead.", file=sys.stderr)
# Check file-based key
app_key_path = os.path.join(app.config['CONFIG_PATH'], 'whoogle.key')
if os.path.exists(app_key_path):
try:
with open(app_key_path, 'r', encoding='utf-8') as f:
key = f.read().strip()
# Validate file key
if len(key) >= 32:
return key
else:
print(f"Warning: Key file too short, regenerating", file=sys.stderr)
except (PermissionError, IOError) as e:
print(f"Warning: Could not read key file: {e}", file=sys.stderr)
# Generate new key
new_key = str(b64encode(os.urandom(32)))
try:
with open(app_key_path, 'w', encoding='utf-8') as key_file:
key_file.write(new_key)
except (PermissionError, IOError) as e:
print(f"Warning: Could not save key file: {e}. Key will not persist across restarts.", file=sys.stderr)
return new_key
app.config['SECRET_KEY'] = get_secret_key()
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(days=365)
# NOTE: SESSION_COOKIE_SAMESITE must be set to 'lax' to allow the user's
# previous session to persist when accessing the instance from an external
# link. Setting this value to 'strict' causes Whoogle to revalidate a new
# session, and fail, resulting in cookies being disabled.
app.config['SESSION_COOKIE_SAMESITE'] = 'Lax'
# Config fields that are used to check for updates
app.config['RELEASES_URL'] = 'https://github.com/' \
'benbusby/whoogle-search/releases'
app.config['LAST_UPDATE_CHECK'] = datetime.now() - timedelta(hours=24)
app.config['HAS_UPDATE'] = ''
# The alternative to Google Translate is treated a bit differently than other
# social media site alternatives, in that it is used for any translation
# related searches.
translate_url = os.getenv('WHOOGLE_ALT_TL', 'https://farside.link/lingva')
if not translate_url.startswith('http'):
translate_url = 'https://' + translate_url
app.config['TRANSLATE_URL'] = translate_url
app.config['CSP'] = 'default-src \'none\';' \
'frame-src ' + translate_url + ';' \
'manifest-src \'self\';' \
'img-src \'self\' data:;' \
'style-src \'self\' \'unsafe-inline\';' \
'script-src \'self\';' \
'media-src \'self\';' \
'connect-src \'self\';'
# Generate DDG bang filter
generating_bangs = False
if not os.path.exists(app.config['BANG_FILE']):
generating_bangs = True
with open(app.config['BANG_FILE'], 'w', encoding='utf-8') as f:
json.dump({}, f)
bangs_thread = threading.Thread(
target=gen_bangs_json,
args=(app.config['BANG_FILE'],))
bangs_thread.start()
# Build new mapping of static files for cache busting
cache_busting_dirs = ['css', 'js']
for cb_dir in cache_busting_dirs:
full_cb_dir = os.path.join(app.config['STATIC_FOLDER'], cb_dir)
for cb_file in os.listdir(full_cb_dir):
# Create hash from current file state
full_cb_path = os.path.join(full_cb_dir, cb_file)
cb_file_link = gen_file_hash(full_cb_dir, cb_file)
build_path = os.path.join(app.config['BUILD_FOLDER'], cb_file_link)
try:
os.symlink(full_cb_path, build_path)
except FileExistsError:
# Symlink hasn't changed, ignore
pass
# Create mapping for relative path urls
map_path = build_path.replace(app.config['APP_ROOT'], '')
if map_path.startswith('/'):
map_path = map_path[1:]
app.config['CACHE_BUSTING_MAP'][cb_file] = map_path
# Optionally create simple bundled assets (opt-in via WHOOGLE_BUNDLE_STATIC=1)
if app.config['BUNDLE_STATIC']:
# CSS bundle: include all css except theme files (end with -theme.css)
css_dir = os.path.join(app.config['STATIC_FOLDER'], 'css')
css_parts = []
for name in sorted(os.listdir(css_dir)):
if not name.endswith('.css'):
continue
if name.endswith('-theme.css'):
continue
try:
with open(os.path.join(css_dir, name), 'r', encoding='utf-8') as f:
css_parts.append(f.read())
except Exception:
pass
css_bundle = '\n'.join(css_parts)
if css_bundle:
css_tmp = os.path.join(app.config['BUILD_FOLDER'], 'app.css')
with open(css_tmp, 'w', encoding='utf-8') as f:
f.write(css_bundle)
css_hashed = gen_file_hash(app.config['BUILD_FOLDER'], 'app.css')
os.replace(css_tmp, os.path.join(app.config['BUILD_FOLDER'], css_hashed))
map_path = os.path.join('app/static/build', css_hashed)
app.config['CACHE_BUSTING_MAP']['bundle.css'] = map_path
# JS bundle: include all js files
js_dir = os.path.join(app.config['STATIC_FOLDER'], 'js')
js_parts = []
for name in sorted(os.listdir(js_dir)):
if not name.endswith('.js'):
continue
try:
with open(os.path.join(js_dir, name), 'r', encoding='utf-8') as f:
js_parts.append(f.read())
except Exception:
pass
js_bundle = '\n;'.join(js_parts)
if js_bundle:
js_tmp = os.path.join(app.config['BUILD_FOLDER'], 'app.js')
with open(js_tmp, 'w', encoding='utf-8') as f:
f.write(js_bundle)
js_hashed = gen_file_hash(app.config['BUILD_FOLDER'], 'app.js')
os.replace(js_tmp, os.path.join(app.config['BUILD_FOLDER'], js_hashed))
map_path = os.path.join('app/static/build', js_hashed)
app.config['CACHE_BUSTING_MAP']['bundle.js'] = map_path
# Templating functions
app.jinja_env.globals.update(clean_query=clean_query)
app.jinja_env.globals.update(
cb_url=lambda f: app.config['CACHE_BUSTING_MAP'][f.lower()])
app.jinja_env.globals.update(
bundle_static=lambda: app.config.get('BUNDLE_STATIC', False))
# Attempt to acquire tor identity, to determine if Tor config is available
send_tor_signal(Signal.HEARTBEAT)
# Suppress spurious warnings from BeautifulSoup
warnings.simplefilter('ignore', MarkupResemblesLocatorWarning)
from app import routes # noqa
# The gen_bangs_json function takes care of loading bangs, so skip it here if
# it's already being loaded
if not generating_bangs:
load_all_bangs(app.config['BANG_FILE'])
# Disable logging from imported modules
logging.config.dictConfig({
'version': 1,
'disable_existing_loggers': True,
})
================================================
FILE: app/__main__.py
================================================
from .routes import run_app
run_app()
================================================
FILE: app/filter.py
================================================
import cssutils
from bs4 import BeautifulSoup
from bs4.element import ResultSet, Tag
from cryptography.fernet import Fernet
from flask import render_template
import html
import urllib.parse as urlparse
import os
from urllib.parse import parse_qs, urlencode, urlunparse
import re
from app.models.g_classes import GClasses
from app.request import VALID_PARAMS, MAPS_URL
from app.utils.misc import get_abs_url, read_config_bool
from app.utils.results import (
BLANK_B64, GOOG_IMG, GOOG_STATIC, G_M_LOGO_URL, LOGO_URL, SITE_ALTS,
has_ad_content, filter_link_args, append_anon_view, get_site_alt,
)
from app.models.endpoint import Endpoint
from app.models.config import Config
MAPS_ARGS = ['q', 'daddr']
minimal_mode_sections = ['Top stories', 'Images', 'People also ask']
unsupported_g_pages = [
'support.google.com',
'accounts.google.com',
'policies.google.com',
'google.com/preferences',
'google.com/intl',
'advanced_search',
'tbm=shop',
'ageverification.google.co.kr'
]
unsupported_g_divs = ['google.com/preferences?hl=', 'ageverification.google.co.kr']
def extract_q(q_str: str, href: str) -> str:
"""Extracts the 'q' element from a result link. This is typically
either the link to a result's website, or a string.
Args:
q_str: The result link to parse
href: The full url to check for standalone 'q' elements first,
rather than parsing the whole query string and then checking.
Returns:
str: The 'q' element of the link, or an empty string
"""
return parse_qs(q_str, keep_blank_values=True)['q'][0] if ('&q=' in href or '?q=' in href) else ''
def build_map_url(href: str) -> str:
"""Tries to extract known args that explain the location in the url. If a
location is found, returns the default url with it. Otherwise, returns the
url unchanged.
Args:
href: The full url to check.
Returns:
str: The parsed url, or the url unchanged.
"""
# parse the url
parsed_url = parse_qs(href)
# iterate through the known parameters and try build the url
for param in MAPS_ARGS:
if param in parsed_url:
return MAPS_URL + "?q=" + parsed_url[param][0]
# query could not be extracted returning unchanged url
return href
def clean_query(query: str) -> str:
"""Strips the blocked site list from the query, if one is being
used.
Args:
query: The query string
Returns:
str: The query string without any "-site:..." filters
"""
return query[:query.find('-site:')] if '-site:' in query else query
def clean_css(css: str, page_url: str) -> str:
"""Removes all remote URLs from a CSS string.
Args:
css: The CSS string
Returns:
str: The filtered CSS, with URLs proxied through Whoogle
"""
sheet = cssutils.parseString(css)
urls = cssutils.getUrls(sheet)
for url in urls:
abs_url = get_abs_url(url, page_url)
if abs_url.startswith('data:'):
continue
css = css.replace(
url,
f'{Endpoint.element}?type=image/png&url={abs_url}'
)
return css
class Filter:
# Minimum number of child div elements that indicates a collapsible section
# Regular search results typically have fewer child divs (< 7)
# Special sections like "People also ask", "Related searches" have more (>= 7)
# This threshold helps identify and collapse these extended result sections
RESULT_CHILD_LIMIT = 7
def __init__(
self,
user_key: str,
config: Config,
root_url='',
page_url='',
query='',
mobile=False) -> None:
self.soup = None
self.config = config
self.mobile = mobile
self.user_key = user_key
self.page_url = page_url
self.query = query
self.main_divs = ResultSet('')
self._elements = 0
self._av = set()
self.root_url = root_url[:-1] if root_url.endswith('/') else root_url
def __getitem__(self, name):
return getattr(self, name)
@property
def elements(self):
return self._elements
def encrypt_path(self, path, is_element=False) -> str:
# Encrypts path to avoid plaintext results in logs
if is_element:
# Element paths are encrypted separately from text, to allow key
# regeneration once all items have been served to the user
enc_path = Fernet(self.user_key).encrypt(path.encode()).decode()
self._elements += 1
return enc_path
return Fernet(self.user_key).encrypt(path.encode()).decode()
def clean(self, soup) -> BeautifulSoup:
self.soup = soup
self.main_divs = self.soup.find('div', {'id': 'main'})
self.remove_ads()
self.remove_ai_overview()
self.remove_block_titles()
self.remove_block_url()
self.collapse_sections()
self.update_css()
self.update_styling()
self.remove_block_tabs()
# self.main_divs is only populated for the main page of search results
# (i.e. not images/news/etc).
if self.main_divs:
for div in self.main_divs:
self.sanitize_div(div)
for img in [_ for _ in self.soup.find_all('img') if 'src' in _.attrs]:
self.update_element_src(img, 'image/png')
for audio in [_ for _ in self.soup.find_all('audio') if 'src' in _.attrs]:
self.update_element_src(audio, 'audio/mpeg')
audio['controls'] = ''
for link in self.soup.find_all('a', href=True):
self.update_link(link)
self.add_favicon(link)
if self.config.alts:
self.site_alt_swap()
input_form = self.soup.find('form')
if input_form is not None:
input_form['method'] = 'GET' if self.config.get_only else 'POST'
# Use a relative URI for submissions
input_form['action'] = 'search'
# Ensure no extra scripts passed through
for script in self.soup('script'):
script.decompose()
# Update default footer and header
footer = self.soup.find('footer')
if footer:
# Remove divs that have multiple links beyond just page navigation
[_.decompose() for _ in footer.find_all('div', recursive=False)
if len(_.find_all('a', href=True)) > 3]
for link in footer.find_all('a', href=True):
link['href'] = f'{link["href"]}&preferences={self.config.preferences}'
header = self.soup.find('header')
if header:
header.decompose()
# Remove broken "Dark theme" toggle snippets that occasionally slip
# into the footer.
self.remove_dark_theme_toggle(self.soup)
self.remove_site_blocks(self.soup)
return self.soup
def sanitize_div(self, div) -> None:
"""Removes escaped script and iframe tags from results
Returns:
None (The soup object is modified directly)
"""
if not div or not isinstance(div, Tag):
return
for d in div.find_all('div', recursive=True):
d_text = d.find(string=True, recursive=False)
# Ensure we're working with tags that contain text content
if not d_text or not d.string:
continue
d.string = html.unescape(d_text)
div_soup = BeautifulSoup(d.string, 'html.parser')
# Remove all valid script or iframe tags in the div
for script in div_soup.find_all('script'):
script.decompose()
for iframe in div_soup.find_all('iframe'):
iframe.decompose()
d.string = str(div_soup)
def add_favicon(self, link) -> None:
"""Adds icons for each returned result, using the result site's favicon
Returns:
None (The soup object is modified directly)
"""
# Skip empty, parentless, or internal links
show_favicons = read_config_bool('WHOOGLE_SHOW_FAVICONS', True)
is_valid_link = link and link.parent and link['href'].startswith('http')
if not show_favicons or not is_valid_link:
return
parent = link.parent
is_result_div = False
# Check each parent to make sure that the div doesn't already have a
# favicon attached, and that the div is a result div
while parent:
p_cls = parent.attrs.get('class') or []
if 'has-favicon' in p_cls or GClasses.scroller_class in p_cls:
return
elif GClasses.result_class_a not in p_cls:
parent = parent.parent
else:
is_result_div = True
break
if not is_result_div:
return
# Construct the html for inserting the icon into the parent div
parsed = urlparse.urlparse(link['href'])
favicon = self.encrypt_path(
f'{parsed.scheme}://{parsed.netloc}/favicon.ico',
is_element=True)
src = f'{self.root_url}/{Endpoint.element}?url={favicon}' + \
'&type=image/x-icon'
html = f'<img class="site-favicon" src="{src}">'
favicon = BeautifulSoup(html, 'html.parser')
link.parent.insert(0, favicon)
# Update all parents to indicate that a favicon has been attached
parent = link.parent
while parent:
p_cls = parent.get('class') or []
p_cls.append('has-favicon')
parent['class'] = p_cls
parent = parent.parent
if GClasses.result_class_a in p_cls:
break
def remove_dark_theme_toggle(self, soup: BeautifulSoup) -> None:
"""Removes stray Dark theme toggle/link fragments that can appear
in the footer."""
for node in soup.find_all(string=re.compile(r'Dark theme', re.I)):
try:
parent = node.find_parent(
lambda tag: tag.name in ['div', 'span', 'p', 'a', 'li',
'section'])
target = parent or node.parent
if target:
target.decompose()
else:
node.extract()
except Exception:
continue
def remove_site_blocks(self, soup) -> None:
if not self.config.block or not soup.body:
return
search_string = ' '.join(['-site:' +
_ for _ in self.config.block.split(',')])
selected = soup.body.find_all(string=re.compile(search_string))
for result in selected:
result.string.replace_with(result.string.replace(
search_string, ''))
def remove_ai_overview(self) -> None:
"""Removes Google's AI Overview/SGE results from search results
Returns:
None (The soup object is modified directly)
"""
if not self.main_divs:
return
# Patterns that identify AI Overview sections
ai_patterns = [
'AI Overview',
'AI responses may include mistakes',
]
# Result div classes - check both original Google classes and mapped ones
# since this runs before CSS class replacement
result_classes = [GClasses.result_class_a] # 'ZINbbc'
result_classes.extend(GClasses.result_classes.get(
GClasses.result_class_a, [])) # ['Gx5Zad']
# Collect divs to remove first to avoid modifying while iterating
divs_to_remove = []
for div in self.main_divs.find_all('div', recursive=True):
# Check if this div or its children contain AI Overview markers
div_text = div.get_text()
if any(pattern in div_text for pattern in ai_patterns):
# Walk up to find the top-level result div
parent = div
while parent:
p_cls = parent.attrs.get('class') or []
if any(rc in p_cls for rc in result_classes):
if parent not in divs_to_remove:
divs_to_remove.append(parent)
break
parent = parent.parent
# Remove collected divs
for div in divs_to_remove:
div.decompose()
def remove_ads(self) -> None:
"""Removes ads found in the list of search result divs
Returns:
None (The soup object is modified directly)
"""
if not self.main_divs:
return
for div in [_ for _ in self.main_divs.find_all('div', recursive=True)]:
div_ads = [_ for _ in div.find_all('span', recursive=True)
if has_ad_content(_.text)]
_ = div.decompose() if len(div_ads) else None
def remove_block_titles(self) -> None:
if not self.main_divs or not self.config.block_title:
return
block_title = re.compile(self.config.block_title)
for div in [_ for _ in self.main_divs.find_all('div', recursive=True)]:
block_divs = [_ for _ in div.find_all('h3', recursive=True)
if block_title.search(_.text) is not None]
_ = div.decompose() if len(block_divs) else None
def remove_block_url(self) -> None:
if not self.main_divs or not self.config.block_url:
return
block_url = re.compile(self.config.block_url)
for div in [_ for _ in self.main_divs.find_all('div', recursive=True)]:
block_divs = [_ for _ in div.find_all('a', recursive=True)
if block_url.search(_.attrs['href']) is not None]
_ = div.decompose() if len(block_divs) else None
def remove_block_tabs(self) -> None:
if self.main_divs:
for div in self.main_divs.find_all(
'div',
attrs={'class': f'{GClasses.main_tbm_tab}'}
):
_ = div.decompose()
else:
# when in images tab
for div in self.soup.find_all(
'div',
attrs={'class': f'{GClasses.images_tbm_tab}'}
):
_ = div.decompose()
def collapse_sections(self) -> None:
"""Collapses long result sections ("people also asked", "related
searches", etc) into "details" elements
These sections are typically the only sections in the results page that
have more than ~5 child divs within a primary result div.
Returns:
None (The soup object is modified directly)
"""
minimal_mode = read_config_bool('WHOOGLE_MINIMAL')
def pull_child_divs(result_div: BeautifulSoup):
try:
top_level_divs = result_div.find_all('div', recursive=False)
if not top_level_divs:
return []
return top_level_divs[0].find_all('div', recursive=False)
except Exception:
return []
if not self.main_divs:
return
# Skip collapsing for CSE (Custom Search Engine) results
# CSE results have a data-cse attribute on the main container
if self.soup.find(attrs={'data-cse': 'true'}):
return
# Loop through results and check for the number of child divs in each
for result in self.main_divs.find_all():
result_children = pull_child_divs(result)
if minimal_mode:
if any(f">{x}</span" in str(s) for s in result_children
for x in minimal_mode_sections):
result.decompose()
continue
for s in result_children:
if ('Twitter ›' in str(s)):
result.decompose()
continue
if len(result_children) < self.RESULT_CHILD_LIMIT:
continue
else:
if len(result_children) < self.RESULT_CHILD_LIMIT:
continue
# Find and decompose the first element with an inner HTML text val.
# This typically extracts the title of the section (i.e. "Related
# Searches", "People also ask", etc)
# If there are more than one child tags with text
# parenthesize the rest except the first
label = 'Collapsed Results'
subtitle = None
for elem in result_children:
if elem.text:
content = list(elem.strings)
label = content[0]
if len(content) > 1:
subtitle = '<span> (' + \
''.join(content[1:]) + ')</span>'
elem.decompose()
break
# Create the new details element to wrap around the result's
# first parent
parent = None
idx = 0
while not parent and idx < len(result_children):
parent = result_children[idx].parent
idx += 1
details = BeautifulSoup(features='html.parser').new_tag('details')
summary = BeautifulSoup(features='html.parser').new_tag('summary')
summary.string = label
if subtitle:
soup = BeautifulSoup(subtitle, 'html.parser')
summary.append(soup)
details.append(summary)
if parent and not minimal_mode:
parent.wrap(details)
elif parent and minimal_mode:
# Remove parent element from document if "minimal mode" is
# enabled
parent.decompose()
def update_element_src(self, element: Tag, mime: str, attr='src') -> None:
"""Encrypts the original src of an element and rewrites the element src
to use the "/element?src=" pass-through.
Returns:
None (The soup element is modified directly)
"""
src = element[attr].split(' ')[0]
if src.startswith('//'):
src = 'https:' + src
elif src.startswith('data:'):
return
if src.startswith(LOGO_URL):
# Re-brand with Whoogle logo
element.replace_with(BeautifulSoup(
render_template('logo.html'),
features='html.parser'))
return
elif src.startswith(G_M_LOGO_URL):
# Re-brand with single-letter Whoogle logo
element['src'] = 'static/img/favicon/apple-icon.png'
element.parent['href'] = 'home'
return
elif src.startswith(GOOG_IMG) or GOOG_STATIC in src:
element['src'] = BLANK_B64
return
element[attr] = f'{self.root_url}/{Endpoint.element}?url=' + (
self.encrypt_path(
src,
is_element=True
) + '&type=' + urlparse.quote(mime)
)
def update_css(self) -> None:
"""Updates URLs used in inline styles to be proxied by Whoogle
using the /element endpoint.
Returns:
None (The soup element is modified directly)
"""
# Filter all <style> tags
for style in self.soup.find_all('style'):
style.string = clean_css(style.string, self.page_url)
# TODO: Convert remote stylesheets to style tags and proxy all
# remote requests
# for link in soup.find_all('link', attrs={'rel': 'stylesheet'}):
# print(link)
def update_styling(self) -> None:
# Update CSS classes for result divs
soup = GClasses.replace_css_classes(self.soup)
# Remove unnecessary button(s)
for button in self.soup.find_all('button'):
button.decompose()
# Remove svg logos
for svg in self.soup.find_all('svg'):
svg.decompose()
# Update logo
logo = self.soup.find('a', {'class': 'l'})
if logo and self.mobile:
logo['style'] = ('display:flex; justify-content:center; '
'align-items:center; color:#685e79; '
'font-size:18px; ')
# Fix search bar length on mobile
try:
search_bar = self.soup.find('header').find('form').find('div')
search_bar['style'] = 'width: 100%;'
except AttributeError:
pass
# Fix body max width on images tab
style = self.soup.find('style')
div = self.soup.find('div', attrs={
'class': f'{GClasses.images_tbm_tab}'})
if style and div and not self.mobile:
css = style.string
css_html_tag = (
'html{'
'font-family: Roboto, Helvetica Neue, Arial, sans-serif;'
'font-size: 14px;'
'line-height: 20px;'
'text-size-adjust: 100%;'
'word-wrap: break-word;'
'}'
)
css = f"{css_html_tag}{css}"
css = re.sub('body{(.*?)}',
'body{padding:0 12px;margin:0 auto;max-width:1200px;}',
css)
style.string = css
# Normalize the max width between result types so the page doesn't
# jump in size when switching tabs.
if not self.mobile:
max_width_css = (
'body, #cnt, #center_col, .main, .e9EfHf, #searchform, '
'.GyAeWb, .s6JM6d {'
'max-width:1200px;'
'margin:0 auto;'
'padding-left:12px;'
'padding-right:12px;'
'}'
)
# Build the style tag using a fresh soup to avoid cases where the
# current soup lacks the helper methods (e.g., non-root elements).
factory_soup = BeautifulSoup('', 'html.parser')
extra_style = factory_soup.new_tag('style')
extra_style.string = max_width_css
if self.soup.head:
self.soup.head.append(extra_style)
else:
self.soup.insert(0, extra_style)
def update_link(self, link: Tag) -> None:
"""Update internal link paths with encrypted path, otherwise remove
unnecessary redirects and/or marketing params from the url
Args:
link: A bs4 Tag element to inspect and update
Returns:
None (the tag is updated directly)
"""
parsed_link = urlparse.urlparse(link['href'])
if '/url?q=' in link['href']:
link_netloc = extract_q(parsed_link.query, link['href'])
else:
link_netloc = parsed_link.netloc
# Remove any elements that direct to unsupported Google pages
if any(url in link_netloc for url in unsupported_g_pages):
# Replaces the /url google unsupported link to the direct url
link['href'] = link_netloc
parent = link.parent
if any(divlink in link_netloc for divlink in unsupported_g_divs):
# Handle case where a search is performed in a different
# language than what is configured. This usually returns a
# div with the same classes as normal search results, but with
# a link to configure language preferences through Google.
# Since we want all language config done through Whoogle, we
# can safely decompose this element.
while parent:
p_cls = parent.attrs.get('class') or []
if f'{GClasses.result_class_a}' in p_cls:
parent.decompose()
break
parent = parent.parent
else:
# Remove cases where google links appear in the footer
while parent:
p_cls = parent.attrs.get('class') or []
if parent.name == 'footer' or f'{GClasses.footer}' in p_cls:
link.decompose()
parent = parent.parent
if link.decomposed:
return
# Replace href with only the intended destination (no "utm" type tags)
href = link['href'].replace('https://www.google.com', '')
result_link = urlparse.urlparse(href)
q = extract_q(result_link.query, href)
if q.startswith('/') and q not in self.query and 'spell=1' not in href:
# Internal google links (i.e. mail, maps, etc) should still
# be forwarded to Google
link['href'] = 'https://google.com' + q
elif q.startswith('https://accounts.google.com'):
# Remove Sign-in link
link.decompose()
return
elif '/search?q=' in href:
# "li:1" implies the query should be interpreted verbatim,
# which is accomplished by wrapping the query in double quotes
if 'li:1' in href:
q = '"' + q + '"'
new_search = 'search?q=' + self.encrypt_path(q)
query_params = parse_qs(urlparse.urlparse(href).query)
for param in VALID_PARAMS:
if param not in query_params:
continue
param_val = query_params[param][0]
new_search += '&' + param + '=' + param_val
link['href'] = new_search
elif 'url?q=' in href:
# Strip unneeded arguments
link['href'] = filter_link_args(q)
# Add alternate viewing options for results,
# if the result doesn't already have an AV link
netloc = urlparse.urlparse(link['href']).netloc
if self.config.anon_view and netloc not in self._av:
self._av.add(netloc)
append_anon_view(link, self.config)
else:
if href.startswith(MAPS_URL):
# Maps links don't work if a site filter is applied
link['href'] = build_map_url(link['href'])
elif (href.startswith('/?') or href.startswith('/search?') or
href.startswith('/imgres?')):
# make sure that tags can be clicked as relative URLs
link['href'] = href[1:]
elif href.startswith('/intl/'):
# do nothing, keep original URL for ToS
pass
elif href.startswith('/preferences'):
# there is no config specific URL, remove this
link.decompose()
return
else:
link['href'] = href
if self.config.new_tab and (
link["href"].startswith("http")
or link["href"].startswith("imgres?")
):
link["target"] = "_blank"
def site_alt_swap(self) -> None:
"""Replaces link locations and page elements if "alts" config
is enabled
"""
# Precompute regex for sites (escape dots) and common prefixes
site_keys = list(SITE_ALTS.keys())
if not site_keys:
return
sites_pattern = re.compile('|'.join([re.escape(k) for k in site_keys]))
prefix_pattern = re.compile(r'^(?:https?:\/\/)?(?:(?:www|mobile|m)\.)?')
# 1) Replace bare domain divs (single token) once, avoiding duplicates
for div in self.soup.find_all('div', string=sites_pattern):
if not div or not div.string:
continue
if len(div.string.split(' ')) != 1:
continue
match = sites_pattern.search(div.string)
if not match:
continue
site = match.group(0)
alt = SITE_ALTS.get(site, '')
if not alt:
continue
# Skip if already contains the alt to avoid old.old.* repetition
if alt in div.string:
continue
div.string = div.string.replace(site, alt)
# 2) Update link hrefs and descriptions in a single pass
for link in self.soup.find_all('a', href=True):
link['href'] = get_site_alt(link['href'])
# Find a description text node matching a known site
desc_nodes = link.find_all(string=sites_pattern)
if not desc_nodes:
continue
desc_node = desc_nodes[0]
link_str = str(desc_node)
# Determine which site key is present in the description
site_match = sites_pattern.search(link_str)
if not site_match:
continue
site = site_match.group(0)
alt = SITE_ALTS.get(site, '')
if not alt:
continue
# Avoid duplication if alt already present
if alt in link_str:
continue
# Medium-specific handling remains to avoid matching substrings
if 'medium.com' in link_str:
if link_str.startswith('medium.com') or '.medium.com' in link_str:
replaced = SITE_ALTS['medium.com'] + link_str[
link_str.find('medium.com') + len('medium.com'):
]
else:
replaced = link_str
else:
# If the description looks like a URL with scheme, replace only the host
if '://' in link_str:
scheme, rest = link_str.split('://', 1)
host, sep, path = rest.partition('/')
# Drop common prefixes from host when swapping to a fully-qualified alt
alt_parsed = urlparse.urlparse(alt)
alt_host = alt_parsed.netloc if alt_parsed.netloc else alt.replace('https://', '').replace('http://', '')
# If alt includes a scheme, prefer its host; otherwise use alt as host
if alt_parsed.scheme:
new_host = alt_host
else:
# When alt has no scheme, still replace entire host
new_host = alt
# Prevent replacing if host already equals target
if host == new_host:
replaced = link_str
else:
replaced = f"{scheme}://{new_host}{sep}{path}"
else:
# No scheme in the text; include optional prefixes in replacement
# Replace any leading www./m./mobile. + site with alt host (no scheme)
alt_parsed = urlparse.urlparse(alt)
alt_host = alt_parsed.netloc if alt_parsed.netloc else alt.replace('https://', '').replace('http://', '')
# Build a pattern that includes optional prefixes for the specific site
site_with_prefix = re.compile(rf'(?:(?:www|mobile|m)\.)?{re.escape(site)}')
replaced = site_with_prefix.sub(alt_host, link_str, count=1)
new_desc = BeautifulSoup(features='html.parser').new_tag('div')
new_desc.string = replaced
desc_node.replace_with(new_desc)
def view_image(self, soup) -> BeautifulSoup:
"""Parses image results from Google Images and rewrites them into the
lightweight Whoogle image results template.
Google now serves image results via the modern udm=2 endpoint, where
the raw HTML contains only placeholder thumbnails. The actual image
URLs live inside serialized data blobs in script tags. We extract that
data and pair it with the visible result cards.
"""
def _decode_url(url: str) -> str:
if not url:
return ''
# Decode common escaped characters found in the script blobs
return html.unescape(
url.replace('\\u003d', '=').replace('\\u0026', '&')
)
def _extract_image_data(modern_soup: BeautifulSoup) -> dict:
"""Extracts docid -> {img_url, img_tbn} from serialized scripts."""
scripts_text = ' '.join(
script.string for script in modern_soup.find_all('script')
if script.string
)
pattern = re.compile(
r'\[0,"(?P<docid>[^"]+)",\["(?P<thumb>https://encrypted-tbn[^"]+)"'
r'(?:,\d+,\d+)?\],\["(?P<full>https?://[^"]+?)"'
r'(?:,\d+,\d+)?\]',
re.DOTALL
)
results_map = {}
for match in pattern.finditer(scripts_text):
docid = match.group('docid')
thumb = _decode_url(match.group('thumb'))
full = _decode_url(match.group('full'))
results_map[docid] = {
'img_tbn': thumb,
'img_url': full
}
return results_map
def _parse_modern_results(modern_soup: BeautifulSoup) -> list:
cards = modern_soup.find_all(
'div',
attrs={
'data-attrid': 'images universal',
'data-docid': True
}
)
if not cards:
return []
meta_map = _extract_image_data(modern_soup)
parsed = []
seen = set()
for card in cards:
docid = card.get('data-docid')
meta = meta_map.get(docid, {})
img_url = meta.get('img_url')
img_tbn = meta.get('img_tbn')
# Fall back to the inline src if we failed to map the docid
if not img_tbn:
img_tag = card.find('img')
if img_tag:
candidate_src = img_tag.get('src')
if candidate_src and candidate_src.startswith('http'):
img_tbn = candidate_src
web_page = card.get('data-lpage') or ''
if not web_page:
link = card.find('a', href=True)
if link:
web_page = link['href']
key = (img_url, img_tbn, web_page)
if not any(key) or key in seen:
continue
seen.add(key)
parsed.append({
'domain': urlparse.urlparse(web_page).netloc
if web_page else '',
'img_url': img_url or img_tbn or '',
'web_page': web_page,
'img_tbn': img_tbn or img_url or ''
})
return parsed
# Try parsing the modern (udm=2) layout first
modern_results = _parse_modern_results(soup)
if modern_results:
# TODO: Implement proper image pagination. Google images uses
# infinite scroll with `ijn` offsets; we need a clean,
# de-duplicated pagination strategy before exposing a Next link.
next_link = None
return BeautifulSoup(
render_template(
'imageresults.html',
length=len(modern_results),
results=modern_results,
view_label="View Image",
next_link=next_link
),
features='html.parser'
)
# get some tags that are unchanged between mobile and pc versions
cor_suggested = soup.find_all('table', attrs={'class': "By0U9"})
next_pages = soup.find('table', attrs={'class': "uZgmoc"})
results = []
# find results div
results_div = soup.find('div', attrs={'class': "nQvrDb"})
# find all the results (if any)
results_all = []
if results_div:
results_all = results_div.find_all('div', attrs={'class': "lIMUZd"})
for item in results_all:
link = item.find('a', href=True)
if not link:
continue
urls = link['href'].split('&imgrefurl=')
# Skip urls that are not two-element lists
if len(urls) != 2:
continue
img_url = urlparse.unquote(urls[0].replace(
f'/{Endpoint.imgres}?imgurl=', ''))
try:
# Try to strip out only the necessary part of the web page link
web_page = urlparse.unquote(urls[1].split('&')[0])
except IndexError:
web_page = urlparse.unquote(urls[1])
img_tag = link.find('img')
if not img_tag:
continue
img_tbn = urlparse.unquote(
img_tag.get('src') or img_tag.get('data-src', '')
)
if not img_tbn:
continue
results.append({
'domain': urlparse.urlparse(web_page).netloc,
'img_url': img_url,
'web_page': web_page,
'img_tbn': img_tbn
})
soup = BeautifulSoup(render_template('imageresults.html',
length=len(results),
results=results,
view_label="View Image"),
features='html.parser')
# replace correction suggested by google object if exists
if len(cor_suggested):
suggested_tables = soup.find_all(
'table',
attrs={'class': "By0U9"}
)
if suggested_tables:
suggested_tables[0].replaceWith(cor_suggested[0])
# replace next page object at the bottom of the page, when present
next_page_tables = soup.find_all('table', attrs={'class': "uZgmoc"})
if next_pages and next_page_tables:
next_page_tables[0].replaceWith(next_pages)
# TODO: Reintroduce pagination for legacy image layout if needed.
return soup
================================================
FILE: app/models/__init__.py
================================================
================================================
FILE: app/models/config.py
================================================
from inspect import Attribute
from typing import Optional
from app.utils.misc import read_config_bool
from flask import current_app
import os
from base64 import urlsafe_b64encode, urlsafe_b64decode
from cryptography.fernet import Fernet
import hashlib
import brotli
import logging
import json
import cssutils
from cssutils.css.cssstylesheet import CSSStyleSheet
from cssutils.css.cssstylerule import CSSStyleRule
# removes warnings from cssutils
cssutils.log.setLevel(logging.CRITICAL)
def get_rule_for_selector(stylesheet: CSSStyleSheet,
selector: str) -> Optional[CSSStyleRule]:
"""Search for a rule that matches a given selector in a stylesheet.
Args:
stylesheet (CSSStyleSheet) -- the stylesheet to search
selector (str) -- the selector to search for
Returns:
Optional[CSSStyleRule] -- the rule that matches the selector or None
"""
for rule in stylesheet.cssRules:
if hasattr(rule, "selectorText") and selector == rule.selectorText:
return rule
return None
class Config:
def __init__(self, **kwargs):
# User agent configuration - default to env_conf if environment variables exist, otherwise default
env_user_agent = os.getenv('WHOOGLE_USER_AGENT', '')
env_mobile_agent = os.getenv('WHOOGLE_USER_AGENT_MOBILE', '')
default_ua_option = 'env_conf' if (env_user_agent or env_mobile_agent) else 'default'
self.user_agent = kwargs.get('user_agent', default_ua_option)
self.custom_user_agent = kwargs.get('custom_user_agent', '')
self.use_custom_user_agent = kwargs.get('use_custom_user_agent', False)
self.show_user_agent = read_config_bool('WHOOGLE_CONFIG_SHOW_USER_AGENT')
# Add user agent related keys to safe_keys
# Note: CSE credentials (cse_api_key, cse_id) are intentionally NOT included
# in safe_keys for security - they should not be shareable via URL
self.safe_keys = [
'lang_search',
'lang_interface',
'country',
'theme',
'alts',
'new_tab',
'view_image',
'block',
'safe',
'nojs',
'anon_view',
'preferences_encrypted',
'tbs',
'user_agent',
'custom_user_agent',
'use_custom_user_agent',
'show_user_agent'
]
app_config = current_app.config
self.url = os.getenv('WHOOGLE_CONFIG_URL', '')
self.lang_search = os.getenv('WHOOGLE_CONFIG_SEARCH_LANGUAGE', '')
self.lang_interface = os.getenv('WHOOGLE_CONFIG_LANGUAGE', '')
self.style_modified = os.getenv(
'WHOOGLE_CONFIG_STYLE', '')
self.block = os.getenv('WHOOGLE_CONFIG_BLOCK', '')
self.block_title = os.getenv('WHOOGLE_CONFIG_BLOCK_TITLE', '')
self.block_url = os.getenv('WHOOGLE_CONFIG_BLOCK_URL', '')
self.country = os.getenv('WHOOGLE_CONFIG_COUNTRY', '')
self.tbs = os.getenv('WHOOGLE_CONFIG_TIME_PERIOD', '')
self.theme = os.getenv('WHOOGLE_CONFIG_THEME', 'system')
self.safe = read_config_bool('WHOOGLE_CONFIG_SAFE')
self.alts = read_config_bool('WHOOGLE_CONFIG_ALTS')
self.nojs = read_config_bool('WHOOGLE_CONFIG_NOJS')
self.tor = read_config_bool('WHOOGLE_CONFIG_TOR')
self.near = os.getenv('WHOOGLE_CONFIG_NEAR', '')
self.new_tab = read_config_bool('WHOOGLE_CONFIG_NEW_TAB')
self.view_image = read_config_bool('WHOOGLE_CONFIG_VIEW_IMAGE')
self.get_only = read_config_bool('WHOOGLE_CONFIG_GET_ONLY')
self.anon_view = read_config_bool('WHOOGLE_CONFIG_ANON_VIEW')
self.preferences_encrypted = read_config_bool('WHOOGLE_CONFIG_PREFERENCES_ENCRYPTED')
self.preferences_key = os.getenv('WHOOGLE_CONFIG_PREFERENCES_KEY', '')
# Google Custom Search Engine (CSE) BYOK settings
self.cse_api_key = os.getenv('WHOOGLE_CSE_API_KEY', '')
self.cse_id = os.getenv('WHOOGLE_CSE_ID', '')
self.use_cse = read_config_bool('WHOOGLE_USE_CSE')
self.accept_language = False
# Skip setting custom config if there isn't one
if kwargs:
mutable_attrs = self.get_mutable_attrs()
for attr in mutable_attrs:
if attr == 'show_user_agent':
# Handle show_user_agent as boolean
self.show_user_agent = bool(kwargs.get(attr))
elif attr in kwargs.keys():
setattr(self, attr, kwargs[attr])
elif attr not in kwargs.keys() and mutable_attrs[attr] == bool:
setattr(self, attr, False)
def __getitem__(self, name):
return getattr(self, name)
def __setitem__(self, name, value):
return setattr(self, name, value)
def __delitem__(self, name):
return delattr(self, name)
def __contains__(self, name):
return hasattr(self, name)
def get_mutable_attrs(self):
return {name: type(attr) for name, attr in self.__dict__.items()
if not name.startswith("__")
and (type(attr) is bool or type(attr) is str)}
def get_attrs(self):
return {name: attr for name, attr in self.__dict__.items()
if not name.startswith("__")
and (type(attr) is bool or type(attr) is str)}
@property
def style(self) -> str:
"""Returns the default style updated with specified modifications.
Returns:
str -- the new style
"""
vars_path = os.path.join(current_app.config['STATIC_FOLDER'], 'css/variables.css')
with open(vars_path, 'r', encoding='utf-8') as f:
style_sheet = cssutils.parseString(f.read())
modified_sheet = cssutils.parseString(self.style_modified)
for rule in modified_sheet:
rule_default = get_rule_for_selector(style_sheet,
rule.selectorText)
# if modified rule is in default stylesheet, update it
if rule_default is not None:
# TODO: update this in a smarter way to handle :root better
# for now if we change a varialbe in :root all other default
# variables need to be also present
rule_default.style = rule.style
# else add the new rule to the default stylesheet
else:
style_sheet.add(rule)
return str(style_sheet.cssText, 'utf-8')
@property
def preferences(self) -> str:
# if encryption key is not set will uncheck preferences encryption
if self.preferences_encrypted:
self.preferences_encrypted = bool(self.preferences_key)
# add a tag for visibility if preferences token startswith 'e' it means
# the token is encrypted, 'u' means the token is unencrypted and can be
# used by other whoogle instances
encrypted_flag = "e" if self.preferences_encrypted else 'u'
preferences_digest = self._encode_preferences()
return f"{encrypted_flag}{preferences_digest}"
def is_safe_key(self, key) -> bool:
"""Establishes a group of config options that are safe to set
in the url.
Args:
key (str) -- the key to check against
Returns:
bool -- True/False depending on if the key is in the "safe"
array
"""
return key in self.safe_keys
def get_localization_lang(self):
"""Returns the correct language to use for localization, but falls
back to english if not set.
Returns:
str -- the localization language string
"""
if (self.lang_interface and
self.lang_interface in current_app.config['TRANSLATIONS']):
return self.lang_interface
return 'lang_en'
def from_params(self, params) -> 'Config':
"""Modify user config with search parameters. This is primarily
used for specifying configuration on a search-by-search basis on
public instances.
Args:
params -- the url arguments (can be any deemed safe by is_safe())
Returns:
Config -- a modified config object
"""
if 'preferences' in params:
params_new = self._decode_preferences(params['preferences'])
# if preferences leads to an empty dictionary it means preferences
# parameter was not decrypted successfully
if len(params_new):
params = params_new
for param_key in params.keys():
if not self.is_safe_key(param_key):
continue
param_val = params.get(param_key)
if param_val == 'off':
param_val = False
elif isinstance(param_val, str):
if param_val.isdigit():
param_val = int(param_val)
self[param_key] = param_val
return self
def to_params(self, keys: list = []) -> str:
"""Generates a set of safe params for using in Whoogle URLs
Args:
keys (list) -- optional list of keys of URL parameters
Returns:
str -- a set of URL parameters
"""
if not len(keys):
keys = self.safe_keys
param_str = ''
for safe_key in keys:
if not self[safe_key]:
continue
param_str = param_str + f'&{safe_key}={self[safe_key]}'
return param_str
def _get_fernet_key(self, password: str) -> bytes:
"""Derive a Fernet-compatible key from a password using PBKDF2.
Note: This uses a static salt for simplicity. This is a breaking change
from the previous MD5-based implementation. Existing encrypted preferences
will need to be re-encrypted.
Args:
password: The password to derive the key from
Returns:
bytes: A URL-safe base64 encoded 32-byte key suitable for Fernet
"""
# Use a static salt derived from app context
# In a production system, you'd want to store per-user salts
salt = b'whoogle-preferences-salt-v2'
# Derive a 32-byte key using PBKDF2 with SHA256
# 100,000 iterations is a reasonable balance of security and performance
kdf_key = hashlib.pbkdf2_hmac(
'sha256',
password.encode('utf-8'),
salt,
100000,
dklen=32
)
# Fernet requires a URL-safe base64 encoded key
return urlsafe_b64encode(kdf_key)
def _encode_preferences(self) -> str:
preferences_json = json.dumps(self.get_attrs()).encode()
compressed_preferences = brotli.compress(preferences_json)
if self.preferences_encrypted and self.preferences_key:
key = self._get_fernet_key(self.preferences_key)
encrypted_preferences = Fernet(key).encrypt(compressed_preferences)
compressed_preferences = brotli.compress(encrypted_preferences)
return urlsafe_b64encode(compressed_preferences).decode()
def _decode_preferences(self, preferences: str) -> dict:
mode = preferences[0]
preferences = preferences[1:]
try:
decoded_data = brotli.decompress(urlsafe_b64decode(preferences.encode() + b'=='))
if mode == 'e' and self.preferences_key:
# preferences are encrypted
key = self._get_fernet_key(self.preferences_key)
decrypted_data = Fernet(key).decrypt(decoded_data)
decoded_data = brotli.decompress(decrypted_data)
config = json.loads(decoded_data)
except Exception:
config = {}
return config
================================================
FILE: app/models/endpoint.py
================================================
from enum import Enum
class Endpoint(Enum):
autocomplete = 'autocomplete'
home = 'home'
healthz = 'healthz'
config = 'config'
opensearch = 'opensearch.xml'
search = 'search'
search_html = 'search.html'
url = 'url'
imgres = 'imgres'
element = 'element'
window = 'window'
def __str__(self):
return self.value
def in_path(self, path: str) -> bool:
return path.startswith(self.value) or \
path.startswith(f'/{self.value}')
================================================
FILE: app/models/g_classes.py
================================================
from bs4 import BeautifulSoup
class GClasses:
"""A class for tracking obfuscated class names used in Google results that
are directly referenced in Whoogle's filtering code.
Note: Using these should be a last resort. It is always preferred to filter
results using structural cues instead of referencing class names, as these
are liable to change at any moment.
"""
main_tbm_tab = 'KP7LCb'
images_tbm_tab = 'n692Zd'
footer = 'TuS8Ad'
result_class_a = 'ZINbbc'
result_class_b = 'luh4td'
scroller_class = 'idg8be'
line_tag = 'BsXmcf'
result_classes = {
result_class_a: ['Gx5Zad'],
result_class_b: ['fP1Qef']
}
@classmethod
def replace_css_classes(cls, soup: BeautifulSoup) -> BeautifulSoup:
"""Replace updated Google classes with the original class names that
Whoogle relies on for styling.
Args:
soup: The result page as a BeautifulSoup object
Returns:
BeautifulSoup: The new BeautifulSoup
"""
result_divs = soup.find_all('div', {
'class': [_ for c in cls.result_classes.values() for _ in c]
})
for div in result_divs:
new_class = ' '.join(div['class'])
for key, val in cls.result_classes.items():
new_class = ' '.join(new_class.replace(_, key) for _ in val)
div['class'] = new_class.split(' ')
return soup
def __str__(self):
return self.value
================================================
FILE: app/request.py
================================================
from app.models.config import Config
from app.utils.misc import read_config_bool
from app.services.provider import get_http_client
from app.utils.ua_generator import load_ua_pool, get_random_ua, DEFAULT_FALLBACK_UA
from defusedxml import ElementTree as ET
import httpx
import urllib.parse as urlparse
import os
from stem import Signal, SocketError
from stem.connection import AuthenticationFailure
from stem.control import Controller
from stem.connection import authenticate_cookie, authenticate_password
MAPS_URL = 'https://maps.google.com/maps'
AUTOCOMPLETE_URL = ('https://suggestqueries.google.com/'
'complete/search?client=toolbar&')
# Valid query params
VALID_PARAMS = ['tbs', 'tbm', 'start', 'near', 'source', 'nfpr']
class TorError(Exception):
"""Exception raised for errors in Tor requests.
Attributes:
message: a message describing the error that occurred
disable: optionally disables Tor in the user config (note:
this should only happen if the connection has been dropped
altogether).
"""
def __init__(self, message, disable=False) -> None:
self.message = message
self.disable = disable
super().__init__(message)
def send_tor_signal(signal: Signal) -> bool:
use_pass = read_config_bool('WHOOGLE_TOR_USE_PASS')
confloc = './misc/tor/control.conf'
# Check that the custom location of conf is real.
temp = os.getenv('WHOOGLE_TOR_CONF', '')
if os.path.isfile(temp):
confloc = temp
# Attempt to authenticate and send signal.
try:
with Controller.from_port(port=9051) as c:
if use_pass:
with open(confloc, "r") as conf:
# Scan for the last line of the file.
for line in conf:
pass
secret = line.strip('\n')
authenticate_password(c, password=secret)
else:
cookie_path = '/var/lib/tor/control_auth_cookie'
authenticate_cookie(c, cookie_path=cookie_path)
c.signal(signal)
os.environ['TOR_AVAILABLE'] = '1'
return True
except (SocketError, AuthenticationFailure,
ConnectionRefusedError, ConnectionError):
# TODO: Handle Tor authentication (password and cookie)
os.environ['TOR_AVAILABLE'] = '0'
return False
def gen_user_agent(config, is_mobile) -> str:
# If using custom user agent, return the custom string
if config.user_agent == 'custom' and config.custom_user_agent:
return config.custom_user_agent
# If using environment configuration
if config.user_agent == 'env_conf':
if is_mobile:
env_ua = os.getenv('WHOOGLE_USER_AGENT_MOBILE', '')
if env_ua:
return env_ua
else:
env_ua = os.getenv('WHOOGLE_USER_AGENT', '')
if env_ua:
return env_ua
# If env vars are not set, fall back to Opera UA
return DEFAULT_FALLBACK_UA
# If using default user agent - use auto-generated Opera UA pool
if config.user_agent == 'default':
try:
# Try to load UA pool from cache (lazy loading if not in app.config)
# First check if we have access to Flask app context
try:
from flask import current_app
if hasattr(current_app, 'config') and 'UA_POOL' in current_app.config:
ua_pool = current_app.config['UA_POOL']
else:
# Fall back to loading from disk
raise ImportError("UA_POOL not in app config")
except (ImportError, RuntimeError):
# No Flask context available or UA_POOL not in config, load from disk
config_path = os.environ.get('CONFIG_VOLUME',
os.path.join(os.path.dirname(os.path.abspath(__file__)),
'static', 'config'))
cache_path = os.path.join(config_path, 'ua_cache.json')
ua_pool = load_ua_pool(cache_path, count=10)
return get_random_ua(ua_pool)
except Exception as e:
# If anything goes wrong, fall back to default Opera UA
print(f"Warning: Could not load UA pool, using fallback Opera UA: {e}")
return DEFAULT_FALLBACK_UA
# Fallback for backwards compatibility (old configs or invalid user_agent values)
return DEFAULT_FALLBACK_UA
def gen_query(query, args, config) -> str:
param_dict = {key: '' for key in VALID_PARAMS}
# Use :past(hour/day/week/month/year) if available
# example search "new restaurants :past month"
lang = ''
if ':past' in query and 'tbs' not in args:
time_range = str.strip(query.split(':past', 1)[-1])
param_dict['tbs'] = '&tbs=' + ('qdr:' + str.lower(time_range[0]))
elif 'tbs' in args or 'tbs' in config:
result_tbs = args.get('tbs') if 'tbs' in args else config['tbs']
param_dict['tbs'] = '&tbs=' + result_tbs
# Occasionally the 'tbs' param provided by google also contains a
# field for 'lr', but formatted strangely. This is a rough solution
# for this.
#
# Example:
# &tbs=qdr:h,lr:lang_1pl
# -- the lr param needs to be extracted and remove the leading '1'
result_params = [_ for _ in result_tbs.split(',') if 'lr:' in _]
if len(result_params) > 0:
result_param = result_params[0]
lang = result_param[result_param.find('lr:') + 3:len(result_param)]
# Ensure search query is parsable
query = urlparse.quote(query)
# Pass along type of results (news, images, books, etc)
if 'tbm' in args:
param_dict['tbm'] = '&tbm=' + args.get('tbm')
# Google Images now expects the modern udm=2 layout; force it when
# requesting images to avoid redirects to the new AI/text layout.
if args.get('tbm') == 'isch' and 'udm' not in args:
param_dict['udm'] = '&udm=2'
# Get results page start value (10 per page, ie page 2 start val = 20)
if 'start' in args:
param_dict['start'] = '&start=' + args.get('start')
# Search for results near a particular city, if available
if config.near:
param_dict['near'] = '&near=' + urlparse.quote(config.near)
# Set language for results (lr) if source isn't set, otherwise use the
# result language param provided in the results
if 'source' in args:
param_dict['source'] = '&source=' + args.get('source')
param_dict['lr'] = ('&lr=' + ''.join(
[_ for _ in lang if not _.isdigit()]
)) if lang else ''
else:
param_dict['lr'] = (
'&lr=' + config.lang_search
) if config.lang_search else ''
# 'nfpr' defines the exclusion of results from an auto-corrected query
if 'nfpr' in args:
param_dict['nfpr'] = '&nfpr=' + args.get('nfpr')
# 'chips' is used in image tabs to pass the optional 'filter' to add to the
# given search term
if 'chips' in args:
param_dict['chips'] = '&chips=' + args.get('chips')
param_dict['gl'] = (
'&gl=' + config.country
) if config.country else ''
param_dict['hl'] = (
'&hl=' + config.lang_interface.replace('lang_', '')
) if config.lang_interface else ''
param_dict['safe'] = '&safe=' + ('active' if config.safe else 'off')
# Block all sites specified in the user config
unquoted_query = urlparse.unquote(query)
for blocked_site in config.block.replace(' ', '').split(','):
if not blocked_site:
continue
block = (' -site:' + blocked_site)
query += block if block not in unquoted_query else ''
for val in param_dict.values():
if not val:
continue
query += val
return query
class Request:
"""Class used for handling all outbound requests, including search queries,
search suggestions, and loading of external content (images, audio, etc).
Attributes:
normal_ua: the user's current user agent
root_path: the root path of the whoogle instance
config: the user's current whoogle configuration
"""
def __init__(self, normal_ua, root_path, config: Config, http_client=None):
self.search_url = 'https://www.google.com/search?gbv=1&q='
# Google Images rejects the lightweight gbv=1 interface. Use the
# modern udm=2 entrypoint specifically for image searches to avoid the
# "update your browser" interstitial.
self.image_search_url = 'https://www.google.com/search?udm=2&q='
# Optionally send heartbeat to Tor to determine availability
# Only when Tor is enabled in config to avoid unnecessary socket usage
if config.tor:
send_tor_signal(Signal.HEARTBEAT)
self.language = config.lang_search if config.lang_search else ''
self.country = config.country if config.country else ''
# For setting Accept-language Header
self.lang_interface = ''
if config.accept_language:
self.lang_interface = config.lang_interface
self.mobile = bool(normal_ua) and ('Android' in normal_ua
or 'iPhone' in normal_ua)
# Generate user agent based on config
self.modified_user_agent = gen_user_agent(config, self.mobile)
if not self.mobile:
self.modified_user_agent_mobile = gen_user_agent(config, True)
# Dedicated modern UA to use when Google rejects legacy ones (e.g. Images)
self.image_user_agent = (
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/127.0.0.0 Safari/537.36'
)
# Set up proxy configuration
proxy_path = os.environ.get('WHOOGLE_PROXY_LOC', '')
if proxy_path:
proxy_type = os.environ.get('WHOOGLE_PROXY_TYPE', '')
proxy_user = os.environ.get('WHOOGLE_PROXY_USER', '')
proxy_pass = os.environ.get('WHOOGLE_PROXY_PASS', '')
auth_str = ''
if proxy_user:
auth_str = f'{proxy_user}:{proxy_pass}@'
proxy_str = f'{proxy_type}://{auth_str}{proxy_path}'
self.proxies = {
'https': proxy_str,
'http': proxy_str
}
else:
self.proxies = {
'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'
} if config.tor else {}
self.tor = config.tor
self.tor_valid = False
self.root_path = root_path
# Initialize HTTP client (shared per proxies)
self.http_client = http_client or get_http_client(self.proxies)
def __getitem__(self, name):
return getattr(self, name)
def autocomplete(self, query) -> list:
"""Sends a query to Google's search suggestion service
Args:
query: The in-progress query to send
Returns:
list: The list of matches for possible search suggestions
"""
# Check if autocomplete is disabled via environment variable
if os.environ.get('WHOOGLE_AUTOCOMPLETE', '1') == '0':
return []
try:
ac_query = dict(q=query)
if self.language:
ac_query['lr'] = self.language
if self.country:
ac_query['gl'] = self.country
if self.lang_interface:
ac_query['hl'] = self.lang_interface
response = self.send(base_url=AUTOCOMPLETE_URL,
query=urlparse.urlencode(ac_query)).text
if not response:
return []
try:
root = ET.fromstring(response)
return [_.attrib['data'] for _ in
root.findall('.//suggestion/[@data]')]
except ET.ParseError:
# Malformed XML response
return []
except Exception as e:
# Log the error but don't crash - autocomplete is non-essential
print(f"Autocomplete error: {str(e)}")
return []
def send(self, base_url='', query='', attempt=0,
force_mobile=False, user_agent=''):
"""Sends an outbound request to a URL. Optionally sends the request
using Tor, if enabled by the user.
Args:
base_url: The URL to use in the request
query: The optional query string for the request
attempt: The number of attempts made for the request
(used for cycling through Tor identities, if enabled)
force_mobile: Optional flag to enable a mobile user agent
(used for fetching full size images in search results)
Returns:
Response: The Response object returned by the requests call
"""
use_client_user_agent = int(os.environ.get('WHOOGLE_USE_CLIENT_USER_AGENT', '0'))
if user_agent and use_client_user_agent == 1:
modified_user_agent = user_agent
else:
if force_mobile and not self.mobile:
modified_user_agent = self.modified_user_agent_mobile
else:
modified_user_agent = self.modified_user_agent
# Some Google endpoints (notably Images) now refuse legacy user agents.
# If an image search is detected and the generated UA isn't Chromium-
# like, retry with a modern Chrome string to avoid the "update your
# browser" interstitial.
if (('tbm=isch' in query) or ('udm=2' in query)) and 'Chrome' not in modified_user_agent:
modified_user_agent = self.image_user_agent
headers = {
'User-Agent': modified_user_agent,
'Accept': ('text/html,application/xhtml+xml,application/xml;'
'q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8'),
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Pragma': 'no-cache',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document'
}
# Only attach client hints when using a Chromium-like user agent to
# avoid sending conflicting information that can trigger unsupported
# browser pages.
if 'Chrome' in headers['User-Agent']:
headers.update({
'Sec-CH-UA': (
'"Not/A)Brand";v="8", '
'"Chromium";v="127", '
'"Google Chrome";v="127"'
),
'Sec-CH-UA-Mobile': '?0',
'Sec-CH-UA-Platform': '"Windows"'
})
# Add Accept-Language header tied to the current config if requested
if self.lang_interface:
headers['Accept-Language'] = (
self.lang_interface.replace('lang_', '') + ';q=1.0'
)
# Consent cookies keep Google from showing the interstitial consent wall
consent_cookies = {
'CONSENT': 'PENDING+987',
'SOCS': 'CAESHAgBEhIaAB'
}
# Validate Tor conn and request new identity if the last one failed
if self.tor and not send_tor_signal(
Signal.NEWNYM if attempt > 0 else Signal.HEARTBEAT):
raise TorError(
"Tor was previously enabled, but the connection has been "
"dropped. Please check your Tor configuration and try again.",
disable=True)
# Make sure that the tor connection is valid, if enabled
if self.tor:
try:
tor_check = self.http_client.get('https://check.torproject.org/',
headers=headers,
retries=1)
self.tor_valid = 'Congratulations' in tor_check.text
if not self.tor_valid:
raise TorError(
"Tor connection succeeded, but the connection could "
"not be validated by torproject.org",
disable=True)
except httpx.RequestError:
raise TorError(
"Error raised during Tor connection validation",
disable=True)
search_base = base_url or self.search_url
if not base_url and ('tbm=isch' in query or 'udm=2' in query):
search_base = self.image_search_url
try:
response = self.http_client.get(
search_base + query,
headers=headers,
cookies=consent_cookies)
except httpx.HTTPError as e:
raise
# Retry query with new identity if using Tor (max 10 attempts)
if 'form id="captcha-form"' in response.text and self.tor:
attempt += 1
if attempt > 10:
raise TorError("Tor query failed -- max attempts exceeded 10")
return self.send(search_base, query, attempt)
return response
================================================
FILE: app/routes.py
================================================
import argparse
import base64
import io
import json
import os
import re
import urllib.parse as urlparse
import uuid
import validators
import sys
import traceback
from datetime import datetime, timedelta
from functools import wraps
import waitress
from app import app
from app.models.config import Config
from app.models.endpoint import Endpoint
from app.request import Request, TorError
from app.services.cse_client import CSEException
from app.utils.bangs import suggest_bang, resolve_bang
from app.utils.misc import empty_gif, placeholder_img, get_proxy_host_url, \
fetch_favicon
from app.filter import Filter
from app.utils.misc import read_config_bool, get_client_ip, get_request_url, \
check_for_update, encrypt_string
from app.utils.widgets import *
from app.utils.results import bold_search_terms,\
add_currency_card, check_currency, get_tabs_content
from app.utils.search import Search, needs_https, has_captcha
from app.utils.session import valid_user_session
from bs4 import BeautifulSoup as bsoup
from flask import jsonify, make_response, request, redirect, render_template, \
send_file, session, url_for, g
import httpx
from cryptography.fernet import Fernet, InvalidToken
from cryptography.exceptions import InvalidSignature
from werkzeug.datastructures import MultiDict
ac_var = 'WHOOGLE_AUTOCOMPLETE'
autocomplete_enabled = os.getenv(ac_var, '1')
def get_search_name(tbm):
for tab in app.config['HEADER_TABS'].values():
if tab['tbm'] == tbm:
return tab['name']
def auth_required(f):
@wraps(f)
def decorated(*args, **kwargs):
# do not ask password if cookies already present
if (
valid_user_session(session)
and 'cookies_disabled' not in request.args
and session['auth']
):
return f(*args, **kwargs)
auth = request.authorization
# Skip if username/password not set
whoogle_user = os.getenv('WHOOGLE_USER', '')
whoogle_pass = os.getenv('WHOOGLE_PASS', '')
if (not whoogle_user or not whoogle_pass) or (
auth
and whoogle_user == auth.username
and whoogle_pass == auth.password):
session['auth'] = True
return f(*args, **kwargs)
else:
return make_response('Not logged in', 401, {
'WWW-Authenticate': 'Basic realm="Login Required"'})
return decorated
def session_required(f):
@wraps(f)
def decorated(*args, **kwargs):
if not valid_user_session(session):
session.pop('_permanent', None)
# Note: This sets all requests to use the encryption key determined per
# instance on app init. This can be updated in the future to use a key
# that is unique for their session (session['key']) but this should use
# a config setting to enable the session based key. Otherwise there can
# be problems with searches performed by users with cookies blocked if
# a session based key is always used.
g.session_key = app.enc_key
# Clear out old sessions
invalid_sessions = []
for user_session in os.listdir(app.config['SESSION_FILE_DIR']):
file_path = os.path.join(
app.config['SESSION_FILE_DIR'],
user_session)
try:
# Ignore files that are larger than the max session file size
if os.path.getsize(file_path) > app.config['MAX_SESSION_SIZE']:
continue
with open(file_path, 'r', encoding='utf-8') as session_file:
data = json.load(session_file)
if isinstance(data, dict) and 'valid' in data:
continue
invalid_sessions.append(file_path)
except Exception:
# Broad exception handling here due to how instances installed
# with pip seem to have issues storing unrelated files in the
# same directory as sessions
pass
for invalid_session in invalid_sessions:
try:
os.remove(invalid_session)
except FileNotFoundError:
# Don't throw error if the invalid session has been removed
pass
return f(*args, **kwargs)
return decorated
@app.before_request
def before_request_func():
session.permanent = True
# Check for latest version if needed
now = datetime.now()
needs_update_check = now - timedelta(hours=24) > app.config['LAST_UPDATE_CHECK']
if read_config_bool('WHOOGLE_UPDATE_CHECK', True) and needs_update_check:
app.config['LAST_UPDATE_CHECK'] = now
app.config['HAS_UPDATE'] = check_for_update(
app.config['RELEASES_URL'],
app.config['VERSION_NUMBER'])
g.request_params = (
request.args if request.method == 'GET' else request.form
)
default_config = json.load(open(app.config['DEFAULT_CONFIG'])) \
if os.path.exists(app.config['DEFAULT_CONFIG']) else {}
# Generate session values for user if unavailable
if not valid_user_session(session):
session['config'] = default_config
session['uuid'] = str(uuid.uuid4())
session['key'] = app.enc_key
session['auth'] = False
# Establish config values per user session
g.user_config = Config(**session['config'])
# Update user config if specified in search args
g.user_config = g.user_config.from_params(g.request_params)
if not g.user_config.url:
g.user_config.url = get_request_url(request.url_root)
g.user_request = Request(
request.headers.get('User-Agent'),
get_request_url(request.url_root),
config=g.user_config
)
g.app_location = g.user_config.url
@app.after_request
def after_request_func(resp):
resp.headers['X-Content-Type-Options'] = 'nosniff'
resp.headers['X-Frame-Options'] = 'DENY'
resp.headers['Cache-Control'] = 'max-age=86400'
# Security headers
resp.headers['Referrer-Policy'] = 'no-referrer'
resp.headers['Permissions-Policy'] = 'geolocation=(), microphone=(), camera=()'
# Add HSTS header if HTTPS is enabled
if os.environ.get('HTTPS_ONLY', False):
resp.headers['Strict-Transport-Security'] = 'max-age=31536000; includeSubDomains'
# Enable CSP by default (can be disabled via env var)
if os.getenv('WHOOGLE_CSP', '1') != '0':
resp.headers['Content-Security-Policy'] = app.config['CSP']
if os.environ.get('HTTPS_ONLY', False):
resp.headers['Content-Security-Policy'] += \
' upgrade-insecure-requests'
return resp
@app.errorhandler(404)
def unknown_page(e):
app.logger.warning(e)
return redirect(g.app_location)
@app.route(f'/{Endpoint.healthz}', methods=['GET'])
def healthz():
return ''
@app.route('/', methods=['GET'])
@app.route(f'/{Endpoint.home}', methods=['GET'])
@auth_required
def index():
# Redirect if an error was raised
if 'error_message' in session and session['error_message']:
error_message = session['error_message']
session['error_message'] = ''
return render_template('error.html', error_message=error_message)
return render_template('index.html',
has_update=app.config['HAS_UPDATE'],
languages=app.config['LANGUAGES'],
countries=app.config['COUNTRIES'],
time_periods=app.config['TIME_PERIODS'],
themes=app.config['THEMES'],
autocomplete_enabled=autocomplete_enabled,
translation=app.config['TRANSLATIONS'][
g.user_config.get_localization_lang()
],
logo=render_template('logo.html'),
config_disabled=(
app.config['CONFIG_DISABLE'] or
not valid_user_session(session)),
config=g.user_config,
tor_available=int(os.environ.get('TOR_AVAILABLE')),
version_number=app.config['VERSION_NUMBER'])
@app.route(f'/{Endpoint.opensearch}', methods=['GET'])
def opensearch():
opensearch_url = g.app_location
if opensearch_url.endswith('/'):
opensearch_url = opensearch_url[:-1]
# Enforce https for opensearch template
if needs_https(opensearch_url):
opensearch_url = opensearch_url.replace('http://', 'https://', 1)
get_only = g.user_config.get_only or 'Chrome' in request.headers.get(
'User-Agent')
return render_template(
'opensearch.xml',
main_url=opensearch_url,
request_type='' if get_only else 'method="post"',
search_type=request.args.get('tbm'),
search_name=get_search_name(request.args.get('tbm'))
), 200, {'Content-Type': 'application/xml'}
@app.route(f'/{Endpoint.search_html}', methods=['GET'])
def search_html():
search_url = g.app_location
if search_url.endswith('/'):
search_url = search_url[:-1]
return render_template('search.html', url=search_url)
@app.route(f'/{Endpoint.autocomplete}', methods=['GET', 'POST'])
def autocomplete():
if os.getenv(ac_var) and not read_config_bool(ac_var):
return jsonify({})
q = g.request_params.get('q')
if not q:
# FF will occasionally (incorrectly) send the q field without a
# mimetype in the format "b'q=<query>'" through the request.data field
q = str(request.data).replace('q=', '')
# Search bangs if the query begins with "!", but not "! " (feeling lucky)
if q.startswith('!') and len(q) > 1 and not q.startswith('! '):
return jsonify([q, suggest_bang(q)])
if not q and not request.data:
return jsonify({'?': []})
elif request.data:
q = urlparse.unquote_plus(
request.data.decode('utf-8').replace('q=', ''))
# Return a list of suggestions for the query
#
# Note: If Tor is enabled, this returns nothing, as the request is
# almost always rejected
# Also check if autocomplete is disabled globally
autocomplete_enabled = os.environ.get('WHOOGLE_AUTOCOMPLETE', '1') != '0'
return jsonify([
q,
g.user_request.autocomplete(q) if (not g.user_config.tor and autocomplete_enabled) else []
])
def clean_text_spacing(text: str) -> str:
"""Clean up text spacing issues from HTML extraction.
Args:
text: Text extracted from HTML that may have spacing issues
Returns:
Cleaned text with proper spacing
"""
if not text:
return text
# Normalize multiple spaces to single space
text = re.sub(r'\s+', ' ', text)
# Fix domain names: remove space before period followed by domain extension
# Examples: "weather .com" -> "weather.com", "example .org" -> "example.org"
text = re.sub(r'\s+\.([a-zA-Z]{2,})\b', r'.\1', text)
# Fix www/http/https patterns
# Examples: "www .example" -> "www.example"
text = re.sub(r'\b(www|http|https)\s+\.', r'\1.', text)
# Fix spaces before common punctuation
text = re.sub(r'\s+([,;:])', r'\1', text)
# Strip leading/trailing whitespace
return text.strip()
@app.route(f'/{Endpoint.search}', methods=['GET', 'POST'])
@session_required
@auth_required
def search():
if request.method == 'POST':
# Redirect as a GET request with an encrypted query
post_data = MultiDict(request.form)
post_data['q'] = encrypt_string(g.session_key, post_data['q'])
get_req_str = urlparse.urlencode(post_data)
return redirect(url_for('.search') + '?' + get_req_str)
search_util = Search(request, g.user_config, g.session_key, user_request=g.user_request)
query = search_util.new_search_query()
bang = resolve_bang(query)
if bang:
return redirect(bang)
# Redirect to home if invalid/blank search
if not query:
return redirect(url_for('.index'))
# Generate response and number of external elements from the page
try:
response = search_util.generate_response()
except TorError as e:
session['error_message'] = e.message + (
"\\n\\nTor config is now disabled!" if e.disable else "")
session['config']['tor'] = False if e.disable else session['config'][
'tor']
return redirect(url_for('.index'))
except CSEException as e:
localization_lang = g.user_config.get_localization_lang()
translation = app.config['TRANSLATIONS'][localization_lang]
wants_json = (
request.args.get('format') == 'json' or
'application/json' in request.headers.get('Accept', '') or
'application/*+json' in request.headers.get('Accept', '')
)
error_msg = f"Custom Search API Error: {e.message}"
if e.is_quota_error:
error_msg = ("Google Custom Search API quota exceeded. "
"Free tier allows 100 queries/day. "
"Wait until midnight PT or disable CSE in settings.")
if wants_json:
return jsonify({
'error': True,
'error_message': error_msg,
'query': urlparse.unquote(query)
}), e.code
return render_template(
'error.html',
error_message=error_msg,
translation=translation,
config=g.user_config), e.code
wants_json = (
request.args.get('format') == 'json' or
'application/json' in request.headers.get('Accept', '') or
'application/*+json' in request.headers.get('Accept', '')
)
if search_util.feeling_lucky:
if wants_json:
return jsonify({'redirect': response}), 303
return redirect(response, code=303)
# If the user is attempting to translate a string, determine the correct
# string for formatting the lingva.ml url
localization_lang = g.user_config.get_localization_lang()
translation = app.config['TRANSLATIONS'][localization_lang]
translate_to = localization_lang.replace('lang_', '')
# removing st-card to only use whoogle time selector
soup = bsoup(response, "html.parser");
for x in soup.find_all(attrs={"id": "st-card"}):
x.replace_with("")
response = str(soup)
# Return 503 if temporarily blocked by captcha
if has_captcha(str(response)):
app.logger.error('503 (CAPTCHA)')
fallback_engine = os.environ.get('WHOOGLE_FALLBACK_ENGINE_URL', '')
if (fallback_engine):
if wants_json:
return jsonify({'redirect': fallback_engine + query}), 302
return redirect(fallback_engine + query)
if wants_json:
return jsonify({
'blocked': True,
'error_message': translation['ratelimit'],
'query': urlparse.unquote(query)
}), 503
else:
return render_template(
'error.html',
blocked=True,
error_message=translation['ratelimit'],
translation=translation,
farside='https://farside.link',
config=g.user_config,
query=urlparse.unquote(query),
params=g.user_config.to_params(keys=['preferences'])), 503
response = bold_search_terms(response, query)
# check for widgets and add if requested
if search_util.widget != '':
html_soup = bsoup(str(response), 'html.parser')
if search_util.widget == 'ip':
response = add_ip_card(html_soup, get_client_ip(request))
elif search_util.widget == 'calculator' and not 'nojs' in request.args:
response = add_calculator_card(html_soup)
# Update tabs content (fallback to the raw query if full_query isn't set)
full_query_val = getattr(search_util, 'full_query', query)
tabs = get_tabs_content(app.config['HEADER_TABS'],
full_query_val,
search_util.search_type,
g.user_config.preferences,
translation)
# Filter out unsupported tabs when CSE is enabled
# CSE only supports web (all) and image search, not videos/news
use_cse = (
g.user_config.use_cse and
g.user_config.cse_api_key and
g.user_config.cse_id
)
if use_cse:
tabs = {k: v for k, v in tabs.items() if k in ['all', 'images', 'maps']}
# Feature to display currency_card
# Since this is determined by more than just the
# query is it not defined as a standard widget
conversion = check_currency(str(response))
if conversion:
html_soup = bsoup(str(response), 'html.parser')
response = add_currency_card(html_soup, conversion)
preferences = g.user_config.preferences
home_url = f"home?preferences={preferences}" if preferences else "home"
cleanresponse = str(response).replace("andlt;","<").replace("andgt;",">")
if wants_json:
# Build a parsable JSON from the filtered soup
json_soup = bsoup(str(response), 'html.parser')
results = []
seen = set()
# Find all result containers (using known result classes)
result_divs = json_soup.find_all('div', class_=['ZINbbc', 'ezO2md'])
if result_divs:
# Process structured Google results with container divs
for div in result_divs:
# Find the first valid link in this result container
link = None
for a in div.find_all('a', href=True):
if a['href'].startswith('http'):
link = a
break
if not link:
continue
href = link['href']
if href in seen:
continue
# Get all text from the result container, not just the link
text = clean_text_spacing(div.get_text(separator=' ', strip=True))
if not text:
continue
# Extract title and content separately
# Title is typically in an h3 tag, CVA68e span, or the main link text
title = ''
# First try h3 tag
h3_tag = div.find('h3')
if h3_tag:
title = clean_text_spacing(h3_tag.get_text(separator=' ', strip=True))
else:
# Try CVA68e class (common title class in Google results)
title_span = div.find('span', class_='CVA68e')
if title_span:
title = clean_text_spacing(title_span.get_text(separator=' ', strip=True))
elif link:
# Fallback to link text, but exclude URL breadcrumb
title = clean_text_spacing(link.get_text(separator=' ', strip=True))
# Content is the description/snippet text
# Look for description/snippet elements
content = ''
# Common classes for snippets/descriptions in Google results
snippet_selectors = [
{'class_': 'VwiC3b'}, # Standard snippet
{'class_': 'FrIlee'}, # Alternative snippet class (common in current Google)
{'class_': 's'}, # Another snippet class
{'class_': 'st'}, # Legacy snippet class
]
for selector in snippet_selectors:
snippet_elem = div.find('span', selector) or div.find('div', selector)
if snippet_elem:
# Get text but exclude any nested links (like "Related searches")
content = clean_text_spacing(snippet_elem.get_text(separator=' ', strip=True))
# Only use if it's substantial content (not just the URL breadcrumb)
if content and not content.startswith('www.') and '›' not in content:
break
else:
content = ''
# If no specific content found, use text minus title as fallback
if not content and title:
# Try to extract content by removing title from full text
if text.startswith(title):
content = text[len(title):].strip()
else:
content = text
elif not content:
content = text
seen.add(href)
results.append({
'href': href,
'text': text,
'title': title,
'content': content
})
else:
# Fallback: extract links directly if no result containers found
for a in json_soup.find_all('a', href=True):
href = a['href']
if not href.startswith('http'):
continue
if href in seen:
continue
text = clean_text_spacing(a.get_text(separator=' ', strip=True))
if not text:
continue
seen.add(href)
# In fallback mode, the link text serves as both title and text
results.append({
'href': href,
'text': text,
'title': text,
'content': ''
})
return jsonify({
'query': urlparse.unquote(query),
'search_type': search_util.search_type,
'results': results
})
# Get the user agent that was used for the search
used_user_agent = ''
if search_util.user_request:
used_user_agent = search_util.user_request.modified_user_agent
elif hasattr(g, 'user_request') and g.user_request:
used_user_agent = g.user_request.modified_user_agent
return render_template(
'display.html',
has_update=app.config['HAS_UPDATE'],
query=urlparse.unquote(query),
search_type=search_util.search_type,
search_name=get_search_name(search_util.search_type),
config=g.user_config,
autocomplete_enabled=autocomplete_enabled,
lingva_url=app.config['TRANSLATE_URL'],
translation=translation,
translate_to=translate_to,
translate_str=query.replace(
'translate', ''
).replace(
translation['translate'], ''
),
is_translation=any(
_ in query.lower() for _ in [translation['translate'], 'translate']
) and not search_util.search_type, # Standard search queries only
response=cleanresponse,
version_number=app.config['VERSION_NUMBER'],
used_user_agent=used_user_agent,
search_header=render_template(
'header.html',
home_url=home_url,
config=g.user_config,
translation=translation,
languages=app.config['LANGUAGES'],
countries=app.config['COUNTRIES'],
time_periods=app.config['TIME_PERIODS'],
logo=render_template('logo.html'),
query=urlparse.unquote(query),
search_type=search_util.search_type,
mobile=g.user_request.mobile,
tabs=tabs)).replace(" ", "")
@app.route(f'/{Endpoint.config}', methods=['GET', 'POST', 'PUT'])
@session_required
@auth_required
def config():
config_disabled = (
app.config['CONFIG_DISABLE'] or
not valid_user_session(session))
name = ''
if 'name' in request.args:
name = os.path.normpath(request.args.get('name'))
if not re.match(r'^[A-Za-z0-9_.+-]+$', name):
return make_response('Invalid config name', 400)
if request.method == 'GET':
return json.dumps(g.user_config.__dict__)
elif request.method == 'PUT' and not config_disabled:
if name:
config_file = os.path.join(app.config['CONFIG_PATH'], name)
if os.path.exists(config_file):
with open(config_file, 'r', encoding='utf-8') as f:
session['config'] = json.load(f)
# else keep existing session['config']
return json.dumps(session['config'])
else:
return json.dumps({})
elif not config_disabled:
config_data = request.form.to_dict()
if 'url' not in config_data or not config_data['url']:
config_data['url'] = g.user_config.url
# Handle user agent configuration
if 'user_agent' in config_data:
if config_data['user_agent'] == 'custom':
config_data['use_custom_user_agent'] = True
# Keep both the selection and the custom string
if 'custom_user_agent' in config_data:
config_data['custom_user_agent'] = config_data['custom_user_agent']
app.logger.debug(f"Setting custom user agent to: {config_data['custom_user_agent']}")
else:
config_data['use_custom_user_agent'] = False
# Only clear custom_user_agent if not using custom option
if config_data['user_agent'] != 'custom':
config_data['custom_user_agent'] = ''
# Save config by name to allow a user to easily load later
if name:
config_file = os.path.join(app.config['CONFIG_PATH'], name)
with open(config_file, 'w', encoding='utf-8') as f:
json.dump(config_data, f, indent=2)
session['config'] = config_data
return redirect(config_data['url'])
else:
return redirect(url_for('.index'), code=403)
@app.route(f'/{Endpoint.imgres}')
@session_required
@auth_required
def imgres():
return redirect(request.args.get('imgurl'))
@app.route(f'/{Endpoint.element}')
@session_required
@auth_required
def element():
element_url = src_url = request.args.get('url')
if element_url.startswith('gAAAAA'):
try:
cipher_suite = Fernet(g.session_key)
src_url = cipher_suite.decrypt(element_url.encode()).decode()
except (InvalidSignature, InvalidToken) as e:
return render_template(
'error.html',
error_message=str(e)), 401
src_type = request.args.get('type')
# Ensure requested element is from a valid domain
domain = urlparse.urlparse(src_url).netloc
if not validators.domain(domain):
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
try:
response = g.user_request.send(base_url=src_url)
# Display an empty gif if the requested element couldn't be retrieved
if response.status_code != 200 or len(response.content) == 0:
if 'favicon' in src_url:
favicon = fetch_favicon(src_url)
return send_file(io.BytesIO(favicon), mimetype='image/png')
else:
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
file_data = response.content
tmp_mem = io.BytesIO()
tmp_mem.write(file_data)
tmp_mem.seek(0)
return send_file(tmp_mem, mimetype=src_type)
except httpx.HTTPError:
pass
return send_file(io.BytesIO(empty_gif), mimetype='image/gif')
@app.route(f'/{Endpoint.window}')
@session_required
@auth_required
def window():
target_url = request.args.get('location')
if target_url.startswith('gAAAAA'):
cipher_suite = Fernet(g.session_key)
target_url = cipher_suite.decrypt(target_url.encode()).decode()
content_filter = Filter(
g.session_key,
root_url=request.url_root,
config=g.user_config)
target = urlparse.urlparse(target_url)
# Ensure requested URL has a valid domain
if not validators.domain(target.netloc):
return render_template(
'error.html',
error_message='Invalid location'), 400
host_url = f'{target.scheme}://{target.netloc}'
get_body = g.user_request.send(base_url=target_url).text
results = bsoup(get_body, 'html.parser')
src_attrs = ['src', 'href', 'srcset', 'data-srcset', 'data-src']
# Parse HTML response and replace relative links w/ absolute
for element in results.find_all():
for attr in src_attrs:
if not element.has_attr(attr) or not element[attr].startswith('/'):
continue
element[attr] = host_url + element[attr]
# Replace or remove javascript sources
for script in results.find_all('script', {'src': True}):
if 'nojs' in request.args:
script.decompose()
else:
content_filter.update_element_src(script, 'application/javascript')
# Replace all possible image attributes
img_sources = ['src', 'data-src', 'data-srcset', 'srcset']
for img in results.find_all('img'):
_ = [
content_filter.update_element_src(img, 'image/png', attr=_)
for _ in img_sources if img.has_attr(_)
]
# Replace all stylesheet sources
for link in results.find_all('link', {'href': True}):
content_filter.update_element_src(link, 'text/css', attr='href')
# Use anonymous view for all links on page
for a in results.find_all('a', {'href': True}):
a['href'] = f'{Endpoint.window}?location=' + a['href'] + (
'&nojs=1' if 'nojs' in request.args else '')
# Remove all iframes -- these are commonly used inside of <noscript> tags
# to enforce loading Google Analytics
for iframe in results.find_all('iframe'):
iframe.decompose()
return render_template(
'display.html',
response=results,
translation=app.config['TRANSLATIONS'][
g.user_config.get_localization_lang()
]
)
@app.route('/robots.txt')
def robots():
response = make_response(
'''User-Agent: *
Disallow: /''', 200)
response.mimetype = 'text/plain'
return response
@app.route('/favicon.ico')
def favicon():
return app.send_static_file('img/favicon.ico')
@app.errorhandler(404)
def page_not_found(e):
return render_template('error.html', error_message=str(e)), 404
@app.errorhandler(Exception)
def internal_error(e):
query = ''
if request.method == 'POST':
query = request.form.get('q')
else:
query = request.args.get('q')
# Attempt to parse the query
try:
if hasattr(g, 'user_config') and hasattr(g, 'session_key'):
search_util = Search(request, g.user_config, g.session_key)
query = search_util.new_search_query()
except Exception:
pass
print(traceback.format_exc(), file=sys.stderr)
fallback_engine = os.environ.get('WHOOGLE_FALLBACK_ENGINE_URL', '')
if (fallback_engine):
return redirect(fallback_engine + (query or ''))
# Safely get localization language with fallback
if hasattr(g, 'user_config'):
localization_lang = g.user_config.get_localization_lang()
else:
localization_lang = 'lang_en'
translation = app.config['TRANSLATIONS'][localization_lang]
# Build template context with safe defaults
template_context = {
'error_message': 'Internal server error (500)',
'translation': translation,
'farside': 'https://farside.link',
'query': urlparse.unquote(query or '')
}
# Add user config if available
if hasattr(g, 'user_config'):
template_context['config'] = g.user_config
template_context['params'] = g.user_config.to_params(keys=['preferences'])
return render_template('error.html', **template_context), 500
def run_app() -> None:
parser = argparse.ArgumentParser(
description='Whoogle Search console runner')
parser.add_argument(
'--port',
default=5000,
metavar='<port number>',
help='Specifies a port to run on (default 5000)')
parser.add_argument(
'--host',
default='127.0.0.1',
metavar='<ip address>',
help='Specifies the host address to use (default 127.0.0.1)')
parser.add_argument(
'--unix-socket',
default='',
metavar='</path/to/unix.sock>',
help='Listen for app on unix socket instead of host:port')
parser.add_argument(
'--unix-socket-perms',
default='600',
metavar='<octal permissions>',
help='Octal permissions to use for the Unix domain socket (default 600)')
parser.add_argument(
'--debug',
default=False,
action='store_true',
help='Activates debug mode for the server (default False)')
parser.add_argument(
'--https-only',
default=False,
action='store_true',
help='Enforces HTTPS redirects for all requests')
parser.add_argument(
'--userpass',
default='',
metavar='<username:password>',
help='Sets a username/password basic auth combo (default None)')
parser.add_argument(
'--proxyauth',
default='',
metavar='<username:password>',
help='Sets a username/password for a HTTP/SOCKS proxy (default None)')
parser.add_argument(
'--proxytype',
default='',
metavar='<socks4|socks5|http>',
help='Sets a proxy type for all connections (default None)')
parser.add_argument(
'--proxyloc',
default='',
metavar='<location:port>',
help='Sets a proxy location for all connections (default None)')
args = parser.parse_args()
if args.userpass:
user_pass = args.userpass.split(':')
os.environ['WHOOGLE_USER'] = user_pass[0]
os.environ['WHOOGLE_PASS'] = user_pass[1]
if args.proxytype and args.proxyloc:
if args.proxyauth:
proxy_user_pass = args.proxyauth.split(':')
os.environ['WHOOGLE_PROXY_USER'] = proxy_user_pass[0]
os.environ['WHOOGLE_PROXY_PASS'] = proxy_user_pass[1]
os.environ['WHOOGLE_PROXY_TYPE'] = args.proxytype
os.environ['WHOOGLE_PROXY_LOC'] = args.proxyloc
if args.https_only:
os.environ['HTTPS_ONLY'] = '1'
if args.debug:
app.run(host=args.host, port=args.port, debug=args.debug)
elif args.unix_socket:
waitress.serve(app, unix_socket=args.unix_socket, unix_socket_perms=args.unix_socket_perms)
else:
waitress.serve(
app,
listen="{}:{}".format(args.host, args.port),
url_prefix=os.environ.get('WHOOGLE_URL_PREFIX', ''))
================================================
FILE: app/services/__init__.py
================================================
================================================
FILE: app/services/cse_client.py
================================================
"""Google Custom Search Engine (CSE) API Client
This module provides a client for Google's Custom Search JSON API,
allowing users to bring their own API key (BYOK) for search functionality.
"""
import httpx
from typing import Optional
from dataclasses import dataclass
from urllib.parse import urlparse
from flask import render_template
# Google Custom Search API endpoint
CSE_API_URL = 'https://www.googleapis.com/customsearch/v1'
class CSEException(Exception):
"""Exception raised for CSE API errors"""
def __init__(self, message: str, code: int = 500, is_quota_error: bool = False):
self.message = message
self.code = code
self.is_quota_error = is_quota_error
super().__init__(self.message)
@dataclass
class CSEError:
"""Represents an error from the CSE API"""
code: int
message: str
@property
def is_quota_exceeded(self) -> bool:
return self.code == 429 or 'quota' in self.message.lower()
@property
def is_invalid_key(self) -> bool:
return self.code == 400 or 'invalid' in self.message.lower()
@dataclass
class CSEResult:
"""Represents a single search result from CSE API"""
title: str
link: str
snippet: str
display_link: str
html_title: Optional[str] = None
html_snippet: Optional[str] = None
# Image-specific fields (populated for image search)
image_url: Optional[str] = None
thumbnail_url: Optional[str] = None
image_width: Optional[int] = None
image_height: Optional[int] = None
context_link: Optional[str] = None # Page where image was found
@dataclass
class CSEResponse:
"""Represents a complete CSE API response"""
results: list[CSEResult]
total_results: str
search_time: float
query: str
start_index: int
is_image_search: bool = False
error: Optional[CSEError] = None
@property
def has_error(self) -> bool:
return self.error is not None
@property
def has_results(self) -> bool:
return len(self.results) > 0
class CSEClient:
"""Client for Google Custom Search Engine API
Usage:
client = CSEClient(api_key='your-key', cse_id='your-cse-id')
response = client.search('python programming')
if response.has_error:
print(f"Error: {response.error.message}")
else:
for result in response.results:
print(f"{result.title}: {result.link}")
"""
def __init__(self, api_key: str, cse_id: str, timeout: float = 10.0):
"""Initialize CSE client
Args:
api_key: Google API key with Custom Search API enabled
cse_id: Custom Search Engine ID (cx parameter)
timeout: Request timeout in seconds
"""
self.api_key = api_key
self.cse_id = cse_id
self.timeout = timeout
self._client = httpx.Client(timeout=timeout)
def search(
self,
query: str,
start: int = 1,
num: int = 10,
safe: str = 'off',
language: str = '',
country: str = '',
search_type: str = ''
) -> CSEResponse:
"""Execute a search query against the CSE API
Args:
query: Search query string
start: Starting result index (1-based, for pagination)
num: Number of results to return (max 10)
safe: Safe search setting ('off', 'medium', 'high')
language: Language restriction (e.g., 'lang_en')
country: Country restriction (e.g., 'countryUS')
search_type: Type of search ('image' for image search, '' for web)
Returns:
CSEResponse with results or error information
"""
params = {
'key': self.api_key,
'cx': self.cse_id,
'q': query,
'start': start,
'num': min(num, 10), # API max is 10
'safe': safe,
}
# Add search type for image search
if search_type == 'image':
params['searchType'] = 'image'
# Add optional parameters
if language:
# CSE uses 'lr' for language restrict
params['lr'] = language
if country:
# CSE uses 'cr' for country restrict
params['cr'] = country
try:
response = self._client.get(CSE_API_URL, params=params)
data = response.json()
# Check for API errors
if 'error' in data:
error_info = data['error']
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(
code=error_info.get('code', 500),
message=error_info.get('message', 'Unknown error')
)
)
# Parse successful response
search_info = data.get('searchInformation', {})
items = data.get('items', [])
is_image = search_type == 'image'
results = []
for item in items:
# Extract image-specific data if present
image_data = item.get('image', {})
results.append(CSEResult(
title=item.get('title', ''),
link=item.get('link', ''),
snippet=item.get('snippet', ''),
display_link=item.get('displayLink', ''),
html_title=item.get('htmlTitle'),
html_snippet=item.get('htmlSnippet'),
# Image fields
image_url=item.get('link') if is_image else None,
thumbnail_url=image_data.get('thumbnailLink'),
image_width=image_data.get('width'),
image_height=image_data.get('height'),
context_link=image_data.get('contextLink')
))
return CSEResponse(
results=results,
total_results=search_info.get('totalResults', '0'),
search_time=float(search_info.get('searchTime', 0)),
query=query,
start_index=start,
is_image_search=is_image
)
except httpx.TimeoutException:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=408, message='Request timed out')
)
except httpx.RequestError as e:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=500, message=f'Request failed: {str(e)}')
)
except Exception as e:
return CSEResponse(
results=[],
total_results='0',
search_time=0.0,
query=query,
start_index=start,
error=CSEError(code=500, message=f'Unexpected error: {str(e)}')
)
def close(self):
"""Close the HTTP client"""
self._client.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def cse_results_to_html(response: CSEResponse, query: str) -> str:
"""Convert CSE API response to HTML matching Whoogle's result format
This generates HTML that mimics the structure expected by Whoogle's
existing filter and result processing pipeline.
Args:
response: CSEResponse from the API
query: Original search query
Returns:
HTML string formatted like Google search results
"""
if response.has_error:
error = response.error
if error.is_quota_exceeded:
return _error_html(
'API Quota Exceeded',
'Your Google Custom Search API quota has been exceeded. '
'Free tier allows 100 queries/day. Wait until midnight PT '
'or enable billing in Google Cloud Console.'
)
elif error.is_invalid_key:
return _error_html(
'Invalid API Key',
'Your Google Custom Search API key is invalid. '
'Please check your API key and CSE ID in settings.'
)
else:
return _error_html('Search Error', error.message)
if not response.has_results:
return _no_results_html(query)
gitextract_kjzxsbiv/ ├── .dockerignore ├── .github/ │ ├── FUNDING.yml │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.md │ │ ├── feature_request.md │ │ ├── new-theme.md │ │ └── question.md │ └── workflows/ │ ├── buildx.yml │ ├── docker_main.yml │ ├── docker_tests.yml │ ├── pypi.yml │ ├── scan.yml │ ├── stale.yml │ └── tests.yml ├── .gitignore ├── .pre-commit-config.yaml ├── .replit ├── Dockerfile ├── LICENSE ├── MANIFEST.in ├── README.md ├── app/ │ ├── __init__.py │ ├── __main__.py │ ├── filter.py │ ├── models/ │ │ ├── __init__.py │ │ ├── config.py │ │ ├── endpoint.py │ │ └── g_classes.py │ ├── request.py │ ├── routes.py │ ├── services/ │ │ ├── __init__.py │ │ ├── cse_client.py │ │ ├── http_client.py │ │ └── provider.py │ ├── static/ │ │ ├── bangs/ │ │ │ └── 00-whoogle.json │ │ ├── build/ │ │ │ └── .gitignore │ │ ├── css/ │ │ │ ├── dark-theme.css │ │ │ ├── error.css │ │ │ ├── header.css │ │ │ ├── input.css │ │ │ ├── light-theme.css │ │ │ ├── logo.css │ │ │ ├── main.css │ │ │ ├── search.css │ │ │ └── variables.css │ │ ├── img/ │ │ │ └── favicon/ │ │ │ ├── browserconfig.xml │ │ │ └── manifest.json │ │ ├── js/ │ │ │ ├── autocomplete.js │ │ │ ├── controller.js │ │ │ ├── currency.js │ │ │ ├── header.js │ │ │ ├── keyboard.js │ │ │ └── utils.js │ │ ├── settings/ │ │ │ ├── countries.json │ │ │ ├── header_tabs.json │ │ │ ├── languages.json │ │ │ ├── themes.json │ │ │ ├── time_periods.json │ │ │ └── translations.json │ │ └── widgets/ │ │ └── calculator.html │ ├── templates/ │ │ ├── display.html │ │ ├── error.html │ │ ├── footer.html │ │ ├── header.html │ │ ├── imageresults.html │ │ ├── index.html │ │ ├── logo.html │ │ ├── opensearch.xml │ │ └── search.html │ ├── utils/ │ │ ├── __init__.py │ │ ├── bangs.py │ │ ├── misc.py │ │ ├── results.py │ │ ├── search.py │ │ ├── session.py │ │ ├── ua_generator.py │ │ └── widgets.py │ └── version.py ├── app.json ├── charts/ │ └── whoogle/ │ ├── .helmignore │ ├── Chart.yaml │ ├── templates/ │ │ ├── NOTES.txt │ │ ├── _helpers.tpl │ │ ├── deployment.yaml │ │ ├── hpa.yaml │ │ ├── ingress.yaml │ │ ├── service.yaml │ │ ├── serviceaccount.yaml │ │ └── tests/ │ │ └── test-connection.yaml │ └── values.yaml ├── docker-compose-traefik.yaml ├── docker-compose.yml ├── heroku.yml ├── letsencrypt/ │ └── acme.json ├── misc/ │ ├── check_google_user_agents.py │ ├── generate_uas.py │ ├── heroku-regen.sh │ ├── instances.txt │ ├── replit.py │ ├── tor/ │ │ ├── start-tor.sh │ │ └── torrc │ └── update-translations.py ├── pyproject.toml ├── requirements.txt ├── run ├── setup.cfg ├── test/ │ ├── __init__.py │ ├── conftest.py │ ├── mock_google.py │ ├── test_alts.py │ ├── test_autocomplete.py │ ├── test_autocomplete_xml.py │ ├── test_http_client.py │ ├── test_json.py │ ├── test_misc.py │ ├── test_results.py │ ├── test_routes.py │ ├── test_routes_json.py │ └── test_tor.py └── whoogle.template.env
SYMBOL INDEX (230 symbols across 34 files)
FILE: app/__init__.py
function _teardown_clients (line 95) | def _teardown_clients(exception):
function get_secret_key (line 126) | def get_secret_key():
FILE: app/filter.py
function extract_q (line 40) | def extract_q(q_str: str, href: str) -> str:
function build_map_url (line 55) | def build_map_url(href: str) -> str:
function clean_query (line 77) | def clean_query(query: str) -> str:
function clean_css (line 90) | def clean_css(css: str, page_url: str) -> str:
class Filter (line 114) | class Filter:
method __init__ (line 121) | def __init__(
method __getitem__ (line 141) | def __getitem__(self, name):
method elements (line 145) | def elements(self):
method encrypt_path (line 148) | def encrypt_path(self, path, is_element=False) -> str:
method clean (line 159) | def clean(self, soup) -> BeautifulSoup:
method sanitize_div (line 219) | def sanitize_div(self, div) -> None:
method add_favicon (line 247) | def add_favicon(self, link) -> None:
method remove_dark_theme_toggle (line 300) | def remove_dark_theme_toggle(self, soup: BeautifulSoup) -> None:
method remove_site_blocks (line 316) | def remove_site_blocks(self, soup) -> None:
method remove_ai_overview (line 327) | def remove_ai_overview(self) -> None:
method remove_ads (line 369) | def remove_ads(self) -> None:
method remove_block_titles (line 383) | def remove_block_titles(self) -> None:
method remove_block_url (line 392) | def remove_block_url(self) -> None:
method remove_block_tabs (line 401) | def remove_block_tabs(self) -> None:
method collapse_sections (line 416) | def collapse_sections(self) -> None:
method update_element_src (line 505) | def update_element_src(self, element: Tag, mime: str, attr='src') -> N...
method update_css (line 542) | def update_css(self) -> None:
method update_styling (line 559) | def update_styling(self) -> None:
method update_link (line 628) | def update_link(self, link: Tag) -> None:
method site_alt_swap (line 737) | def site_alt_swap(self) -> None:
method view_image (line 830) | def view_image(self, soup) -> BeautifulSoup:
FILE: app/models/config.py
function get_rule_for_selector (line 21) | def get_rule_for_selector(stylesheet: CSSStyleSheet,
class Config (line 38) | class Config:
method __init__ (line 39) | def __init__(self, **kwargs):
method __getitem__ (line 116) | def __getitem__(self, name):
method __setitem__ (line 119) | def __setitem__(self, name, value):
method __delitem__ (line 122) | def __delitem__(self, name):
method __contains__ (line 125) | def __contains__(self, name):
method get_mutable_attrs (line 128) | def get_mutable_attrs(self):
method get_attrs (line 133) | def get_attrs(self):
method style (line 139) | def style(self) -> str:
method preferences (line 165) | def preferences(self) -> str:
method is_safe_key (line 177) | def is_safe_key(self, key) -> bool:
method get_localization_lang (line 191) | def get_localization_lang(self):
method from_params (line 204) | def from_params(self, params) -> 'Config':
method to_params (line 236) | def to_params(self, keys: list = []) -> str:
method _get_fernet_key (line 256) | def _get_fernet_key(self, password: str) -> bytes:
method _encode_preferences (line 286) | def _encode_preferences(self) -> str:
method _decode_preferences (line 297) | def _decode_preferences(self, preferences: str) -> dict:
FILE: app/models/endpoint.py
class Endpoint (line 4) | class Endpoint(Enum):
method __str__ (line 17) | def __str__(self):
method in_path (line 20) | def in_path(self, path: str) -> bool:
FILE: app/models/g_classes.py
class GClasses (line 4) | class GClasses:
method replace_css_classes (line 26) | def replace_css_classes(cls, soup: BeautifulSoup) -> BeautifulSoup:
method __str__ (line 47) | def __str__(self):
FILE: app/request.py
class TorError (line 22) | class TorError(Exception):
method __init__ (line 32) | def __init__(self, message, disable=False) -> None:
function send_tor_signal (line 38) | def send_tor_signal(signal: Signal) -> bool:
function gen_user_agent (line 71) | def gen_user_agent(config, is_mobile) -> str:
function gen_query (line 119) | def gen_query(query, args, config) -> str:
class Request (line 208) | class Request:
method __init__ (line 218) | def __init__(self, normal_ua, root_path, config: Config, http_client=N...
method __getitem__ (line 279) | def __getitem__(self, name):
method autocomplete (line 282) | def autocomplete(self, query) -> list:
method send (line 323) | def send(self, base_url='', query='', attempt=0,
FILE: app/routes.py
function get_search_name (line 44) | def get_search_name(tbm):
function auth_required (line 50) | def auth_required(f):
function session_required (line 79) | def session_required(f):
function before_request_func (line 129) | def before_request_func():
function after_request_func (line 174) | def after_request_func(resp):
function unknown_page (line 198) | def unknown_page(e):
function healthz (line 204) | def healthz():
function index (line 211) | def index():
function opensearch (line 238) | def opensearch():
function search_html (line 260) | def search_html():
function autocomplete (line 268) | def autocomplete():
function clean_text_spacing (line 299) | def clean_text_spacing(text: str) -> str:
function search (line 332) | def search():
function config (line 634) | def config():
function imgres (line 691) | def imgres():
function element (line 698) | def element():
function window (line 742) | def window():
function robots (line 814) | def robots():
function favicon (line 823) | def favicon():
function page_not_found (line 828) | def page_not_found(e):
function internal_error (line 833) | def internal_error(e):
function run_app (line 876) | def run_app() -> None:
FILE: app/services/cse_client.py
class CSEException (line 19) | class CSEException(Exception):
method __init__ (line 21) | def __init__(self, message: str, code: int = 500, is_quota_error: bool...
class CSEError (line 29) | class CSEError:
method is_quota_exceeded (line 35) | def is_quota_exceeded(self) -> bool:
method is_invalid_key (line 39) | def is_invalid_key(self) -> bool:
class CSEResult (line 44) | class CSEResult:
class CSEResponse (line 61) | class CSEResponse:
method has_error (line 72) | def has_error(self) -> bool:
method has_results (line 76) | def has_results(self) -> bool:
class CSEClient (line 80) | class CSEClient:
method __init__ (line 94) | def __init__(self, api_key: str, cse_id: str, timeout: float = 10.0):
method search (line 107) | def search(
method close (line 233) | def close(self):
method __enter__ (line 237) | def __enter__(self):
method __exit__ (line 240) | def __exit__(self, *args):
function cse_results_to_html (line 244) | def cse_results_to_html(response: CSEResponse, query: str) -> str:
function _escape_html (line 347) | def _escape_html(text: str) -> str:
function _error_html (line 359) | def _error_html(title: str, message: str) -> str:
function _no_results_html (line 375) | def _no_results_html(query: str) -> str:
function _image_results_html (line 390) | def _image_results_html(response: CSEResponse, query: str) -> str:
function _pagination_html (line 431) | def _pagination_html(current_start: int, query: str) -> str:
FILE: app/services/http_client.py
class HttpxClient (line 17) | class HttpxClient:
method __init__ (line 24) | def __init__(
method _determine_verify_setting (line 64) | def _determine_verify_setting(self):
method _build_client (line 86) | def _build_client(self, client_kwargs: Dict[str, Any], verify: Any) ->...
method proxies (line 118) | def proxies(self) -> Dict[str, str]:
method _cache_key (line 121) | def _cache_key(self, method: str, url: str, headers: Optional[Dict[str...
method get (line 125) | def get(self,
method _recreate_client (line 189) | def _recreate_client(self) -> None:
method close (line 216) | def close(self) -> None:
FILE: app/services/provider.py
function _proxies_key (line 10) | def _proxies_key(proxies: Dict[str, str]) -> Tuple[Tuple[str, str], Tupl...
function get_http_client (line 18) | def get_http_client(proxies: Dict[str, str]) -> HttpxClient:
function close_all_clients (line 32) | def close_all_clients() -> None:
FILE: app/static/js/header.js
function tackOnParams (line 30) | function tackOnParams(str) {
FILE: app/static/js/keyboard.js
function goUp (line 39) | function goUp () {
function goDown (line 44) | function goDown () {
function focusResult (line 48) | function focusResult (idx) {
function focusSearch (line 54) | function focusSearch () {
FILE: app/utils/bangs.py
function load_all_bangs (line 11) | def load_all_bangs(ddg_bangs_file: str, ddg_bangs: dict = {}):
function gen_bangs_json (line 57) | def gen_bangs_json(bangs_file: str) -> None:
function suggest_bang (line 90) | def suggest_bang(query: str) -> list[str]:
function resolve_bang (line 104) | def resolve_bang(query: str) -> str:
FILE: app/utils/misc.py
function fetch_favicon (line 30) | def fetch_favicon(url: str) -> bytes:
function gen_file_hash (line 50) | def gen_file_hash(path: str, static_file: str) -> str:
function read_config_bool (line 59) | def read_config_bool(var: str, default: bool=False) -> bool:
function get_client_ip (line 67) | def get_client_ip(r: Request) -> str:
function get_request_url (line 74) | def get_request_url(url: str) -> str:
function get_proxy_host_url (line 81) | def get_proxy_host_url(r: Request, default: str, root=False) -> str:
function check_for_update (line 98) | def check_for_update(version_url: str, current: str) -> int:
function get_abs_url (line 111) | def get_abs_url(url, page_url):
function list_to_dict (line 125) | def list_to_dict(lst: list) -> dict:
function encrypt_string (line 132) | def encrypt_string(key: bytes, string: str) -> str:
function decrypt_string (line 137) | def decrypt_string(key: bytes, string: str) -> str:
FILE: app/utils/results.py
function contains_cjko (line 54) | def contains_cjko(s: str) -> bool:
function bold_search_terms (line 75) | def bold_search_terms(response: str, query: str) -> BeautifulSoup:
function has_ad_content (line 125) | def has_ad_content(element: str) -> bool:
function get_first_link (line 140) | def get_first_link(soup) -> str:
function get_site_alt (line 166) | def get_site_alt(link: str, site_alts: dict = SITE_ALTS) -> str:
function filter_link_args (line 230) | def filter_link_args(link: str) -> str:
function append_nojs (line 263) | def append_nojs(result: BeautifulSoup) -> None:
function append_anon_view (line 279) | def append_anon_view(result: BeautifulSoup, config: Config) -> None:
function check_currency (line 302) | def check_currency(response: str) -> dict:
function add_currency_card (line 352) | def add_currency_card(soup: BeautifulSoup,
function get_tabs_content (line 419) | def get_tabs_content(tabs: dict,
FILE: app/utils/search.py
function needs_https (line 17) | def needs_https(url: str) -> bool:
function has_captcha (line 37) | def has_captcha(results: str) -> bool:
class Search (line 50) | class Search:
method __init__ (line 59) | def __init__(self, request, config, session_key, cookies_disabled=Fals...
method __getitem__ (line 74) | def __getitem__(self, name) -> Any:
method __setitem__ (line 77) | def __setitem__(self, name, value) -> None:
method __delitem__ (line 80) | def __delitem__(self, name) -> None:
method __contains__ (line 83) | def __contains__(self, name) -> bool:
method new_search_query (line 86) | def new_search_query(self) -> str:
method generate_response (line 125) | def generate_response(self) -> str:
method _generate_cse_response (line 161) | def _generate_cse_response(self, content_filter: Filter, root_url: str...
method _generate_scrape_response (line 218) | def _generate_scrape_response(self, content_filter: Filter, root_url: ...
FILE: app/utils/session.py
function generate_key (line 7) | def generate_key() -> bytes:
function valid_user_session (line 23) | def valid_user_session(session: dict) -> bool:
FILE: app/utils/ua_generator.py
function generate_opera_ua (line 138) | def generate_opera_ua() -> str:
function generate_ua_pool (line 171) | def generate_ua_pool(count: int = 10) -> List[str]:
function save_ua_pool (line 206) | def save_ua_pool(uas: List[str], cache_path: str) -> None:
function load_custom_ua_list (line 228) | def load_custom_ua_list(file_path: str) -> List[str]:
function load_ua_pool (line 251) | def load_ua_pool(cache_path: str, count: int = 10) -> List[str]:
function get_random_ua (line 317) | def get_random_ua(ua_pool: List[str]) -> str:
FILE: app/utils/widgets.py
function add_ip_card (line 8) | def add_ip_card(html_soup: BeautifulSoup, ip: str) -> BeautifulSoup:
function add_calculator_card (line 44) | def add_calculator_card(html_soup: BeautifulSoup) -> BeautifulSoup:
FILE: misc/check_google_user_agents.py
function read_user_agents (line 74) | def read_user_agents(file_path: str) -> List[str]:
function test_user_agent (line 88) | def test_user_agent(user_agent: str, query: str = "test", timeout: float...
function main (line 165) | def main():
FILE: misc/generate_uas.py
function generate_opera_ua (line 120) | def generate_opera_ua():
function generate_ua_pool (line 139) | def generate_ua_pool(count=10):
function main (line 163) | def main():
FILE: misc/update-translations.py
function format_lang (line 8) | def format_lang(lang: str) -> str:
function translate (line 21) | def translate(v: str, lang: str) -> str:
FILE: test/conftest.py
function mock_google (line 19) | def mock_google(monkeypatch):
function client (line 51) | def client():
FILE: test/mock_google.py
function _result_block (line 12) | def _result_block(title, href, snippet):
function _main_results (line 27) | def _main_results(query, params, language='', country=''):
function build_mock_response (line 109) | def build_mock_response(raw_query, language='', country=''):
FILE: test/test_alts.py
function build_soup (line 13) | def build_soup(html: str):
function make_filter (line 17) | def make_filter(soup: BeautifulSoup):
function test_no_duplicate_alt_prefix_reddit (line 26) | def test_no_duplicate_alt_prefix_reddit(monkeypatch):
function test_wikipedia_simple_no_lang_param (line 56) | def test_wikipedia_simple_no_lang_param(monkeypatch):
function test_single_pass_description_replacement (line 85) | def test_single_pass_description_replacement(monkeypatch):
FILE: test/test_autocomplete.py
function test_autocomplete_get (line 4) | def test_autocomplete_get(client):
function test_autocomplete_post (line 11) | def test_autocomplete_post(client):
FILE: test/test_autocomplete_xml.py
class FakeHttpClient (line 6) | class FakeHttpClient:
method get (line 7) | def get(self, url, headers=None, cookies=None, retries=0, backoff_seco...
method close (line 20) | def close(self):
function test_autocomplete_parsing (line 24) | def test_autocomplete_parsing():
FILE: test/test_http_client.py
function test_httpxclient_follow_redirects_and_proxy (line 9) | def test_httpxclient_follow_redirects_and_proxy(monkeypatch):
FILE: test/test_json.py
function stubbed_search_response (line 11) | def stubbed_search_response(monkeypatch):
function test_search_json_accept (line 33) | def test_search_json_accept(client, stubbed_search_response):
function test_search_json_format_param (line 52) | def test_search_json_format_param(client, stubbed_search_response):
function test_search_json_feeling_lucky (line 60) | def test_search_json_feeling_lucky(client, monkeypatch):
FILE: test/test_misc.py
function test_generate_user_keys (line 14) | def test_generate_user_keys():
function test_valid_session (line 20) | def test_valid_session(client):
function test_valid_translation_keys (line 26) | def test_valid_translation_keys(client):
function test_query_decryption (line 38) | def test_query_decryption(client):
function test_prefs_url (line 60) | def test_prefs_url(client):
FILE: test/test_results.py
function get_search_results (line 15) | def get_search_results(data):
function test_get_results (line 36) | def test_get_results(client, monkeypatch):
function test_post_results (line 56) | def test_post_results(client):
function test_translate_search (line 61) | def test_translate_search(client):
function test_block_results (line 71) | def test_block_results(client):
function test_view_my_ip (line 97) | def test_view_my_ip(client, monkeypatch):
function test_recent_results (line 113) | def test_recent_results(client, monkeypatch):
function test_leading_slash_search (line 148) | def test_leading_slash_search(client):
function test_site_alt_prefix_skip (line 168) | def test_site_alt_prefix_skip():
FILE: test/test_routes.py
function test_main (line 9) | def test_main(client):
function test_search (line 14) | def test_search(client):
function test_feeling_lucky (line 19) | def test_feeling_lucky(client):
function test_ddg_bang (line 31) | def test_ddg_bang(client):
function test_custom_bang (line 58) | def test_custom_bang(client):
function test_config (line 65) | def test_config(client):
function test_opensearch (line 88) | def test_opensearch(client):
FILE: test/test_routes_json.py
function test_captcha_json_block (line 9) | def test_captcha_json_block(client, monkeypatch):
FILE: test/test_tor.py
class FakeResponse (line 8) | class FakeResponse:
method __init__ (line 9) | def __init__(self, text: str = '', status_code: int = 200, content: by...
class FakeHttpClient (line 15) | class FakeHttpClient:
method __init__ (line 16) | def __init__(self, tor_ok: bool):
method get (line 19) | def get(self, url, headers=None, cookies=None, retries=0, backoff_seco...
method close (line 24) | def close(self):
function build_config (line 28) | def build_config(tor: bool) -> Config:
function test_tor_validation_success (line 34) | def test_tor_validation_success(monkeypatch):
function test_tor_validation_failure (line 45) | def test_tor_validation_failure(monkeypatch):
Condensed preview — 119 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (560K chars).
[
{
"path": ".dockerignore",
"chars": 18,
"preview": ".git/\nvenv/\ntest/\n"
},
{
"path": ".github/FUNDING.yml",
"chars": 495,
"preview": "# These are supported funding model platforms\ngithub: benbusby\nko_fi: benbusby\ntidelift: # Replace with a single Tidelif"
},
{
"path": ".github/ISSUE_TEMPLATE/bug_report.md",
"chars": 1001,
"preview": "---\nname: Bug report\nabout: Create a bug report to help fix an issue with Whoogle\ntitle: \"[BUG] <brief bug description>\""
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.md",
"chars": 539,
"preview": "---\nname: Feature request\nabout: Suggest a feature that would improve Whoogle\ntitle: \"[FEATURE] <description of feature>"
},
{
"path": ".github/ISSUE_TEMPLATE/new-theme.md",
"chars": 1065,
"preview": "---\nname: New theme\nabout: Create a new theme for Whoogle\ntitle: \"[THEME] <your theme name>\"\nlabels: theme\nassignees: be"
},
{
"path": ".github/ISSUE_TEMPLATE/question.md",
"chars": 246,
"preview": "---\nname: Question\nabout: Ask a (simple) question about Whoogle\ntitle: \"[QUESTION] <question here>\"\nlabels: question\nass"
},
{
"path": ".github/workflows/buildx.yml",
"chars": 4017,
"preview": "name: buildx\n\non:\n workflow_run:\n workflows: [\"docker_main\"]\n branches: [main, updates]\n types:\n - comple"
},
{
"path": ".github/workflows/docker_main.yml",
"chars": 965,
"preview": "name: docker_main\n\non:\n workflow_run:\n workflows: [\"tests\"]\n branches: [main, updates]\n types:\n - complet"
},
{
"path": ".github/workflows/docker_tests.yml",
"chars": 781,
"preview": "name: docker_tests\n\non:\n push:\n branches: main\n pull_request:\n branches: main\n\njobs:\n docker:\n runs-on: ubun"
},
{
"path": ".github/workflows/pypi.yml",
"chars": 2453,
"preview": "name: pypi\n\non:\n push:\n branches: main\n tags: v*\n\njobs:\n publish-test:\n name: Build and publish to TestPyPI\n "
},
{
"path": ".github/workflows/scan.yml",
"chars": 455,
"preview": "name: scan\n\non:\n schedule:\n - cron: '0 0 * * *'\n\njobs:\n scan:\n runs-on: ubuntu-latest\n steps:\n - uses: act"
},
{
"path": ".github/workflows/stale.yml",
"chars": 1238,
"preview": "# This workflow warns and then closes issues and PRs that have had no activity for a specified amount of time.\n#\n# You c"
},
{
"path": ".github/workflows/tests.yml",
"chars": 394,
"preview": "name: tests\n\non: [push, pull_request]\n\njobs:\n test:\n runs-on: ubuntu-latest\n steps:\n - uses: actions/checkou"
},
{
"path": ".gitignore",
"chars": 267,
"preview": "venv/\n.venv/\n.idea/\n__pycache__/\n*.pyc\n*.pem\n*.conf\n*.key\nconfig.json\ntest/static\nflask_session/\napp/static/config\napp/s"
},
{
"path": ".pre-commit-config.yaml",
"chars": 261,
"preview": "repos:\n - repo: https://github.com/astral-sh/ruff-pre-commit\n rev: v0.6.9\n hooks:\n - id: ruff\n args: "
},
{
"path": ".replit",
"chars": 30,
"preview": "entrypoint = \"misc/replit.py\"\n"
},
{
"path": "Dockerfile",
"chars": 3502,
"preview": "# NOTE: ARMv7 support has been dropped due to lack of pre-built cryptography wheels for Alpine/musl.\n# To restore ARMv7 "
},
{
"path": "LICENSE",
"chars": 1066,
"preview": "MIT License\n\nCopyright (c) 2020 Ben Busby\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
},
{
"path": "MANIFEST.in",
"chars": 121,
"preview": "graft app/static\ngraft app/templates\ngraft app/misc\ninclude requirements.txt\nrecursive-include test\nglobal-exclude *.pyc"
},
{
"path": "README.md",
"chars": 51259,
"preview": ">[!WARNING]\n>\n>Since 16 January, 2025, Google has been attacking the ability to perform search queries without JavaScrip"
},
{
"path": "app/__init__.py",
"chars": 11873,
"preview": "from app.filter import clean_query\nfrom app.request import send_tor_signal\nfrom app.utils.session import generate_key\nfr"
},
{
"path": "app/__main__.py",
"chars": 39,
"preview": "from .routes import run_app\n\nrun_app()\n"
},
{
"path": "app/filter.py",
"chars": 38304,
"preview": "import cssutils\nfrom bs4 import BeautifulSoup\nfrom bs4.element import ResultSet, Tag\nfrom cryptography.fernet import Fer"
},
{
"path": "app/models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "app/models/config.py",
"chars": 11931,
"preview": "from inspect import Attribute\nfrom typing import Optional\nfrom app.utils.misc import read_config_bool\nfrom flask import "
},
{
"path": "app/models/endpoint.py",
"chars": 506,
"preview": "from enum import Enum\n\n\nclass Endpoint(Enum):\n autocomplete = 'autocomplete'\n home = 'home'\n healthz = 'healthz"
},
{
"path": "app/models/g_classes.py",
"chars": 1507,
"preview": "from bs4 import BeautifulSoup\n\n\nclass GClasses:\n \"\"\"A class for tracking obfuscated class names used in Google result"
},
{
"path": "app/request.py",
"chars": 17531,
"preview": "from app.models.config import Config\nfrom app.utils.misc import read_config_bool\nfrom app.services.provider import get_h"
},
{
"path": "app/routes.py",
"chars": 35215,
"preview": "import argparse\nimport base64\nimport io\nimport json\nimport os\nimport re\nimport urllib.parse as urlparse\nimport uuid\nimpo"
},
{
"path": "app/services/__init__.py",
"chars": 2,
"preview": "\n\n"
},
{
"path": "app/services/cse_client.py",
"chars": 14427,
"preview": "\"\"\"Google Custom Search Engine (CSE) API Client\n\nThis module provides a client for Google's Custom Search JSON API,\nallo"
},
{
"path": "app/services/http_client.py",
"chars": 8834,
"preview": "import threading\nimport time\nfrom typing import Any, Dict, Optional, Tuple\n\nimport httpx\nfrom cachetools import TTLCache"
},
{
"path": "app/services/provider.py",
"chars": 1076,
"preview": "import os\nfrom typing import Dict, Tuple\n\nfrom app.services.http_client import HttpxClient\n\n\n_clients: Dict[tuple, Httpx"
},
{
"path": "app/static/bangs/00-whoogle.json",
"chars": 269,
"preview": "{\n \"!i\": {\n \"url\": \"search?q={}&tbm=isch\",\n \"suggestion\": \"!i (Whoogle Images)\"\n },\n \"!v\": {\n \"url\": \"search"
},
{
"path": "app/static/build/.gitignore",
"chars": 14,
"preview": "*\n!.gitignore\n"
},
{
"path": "app/static/css/dark-theme.css",
"chars": 4770,
"preview": "html {\n background: var(--whoogle-dark-page-bg) !important;\n}\n\nbody {\n background: var(--whoogle-dark-page-bg) !im"
},
{
"path": "app/static/css/error.css",
"chars": 106,
"preview": "html {\n font-size: 1.3rem;\n}\n\n@media (max-width: 1000px) {\n html {\n font-size: 3rem;\n }\n}\n"
},
{
"path": "app/static/css/header.css",
"chars": 4256,
"preview": "header {\n font-family: Roboto,HelveticaNeue,Arial,sans-serif;\n font-size: 14px;\n line-height: 20px;\n color: "
},
{
"path": "app/static/css/input.css",
"chars": 793,
"preview": "#search-bar {\n background: transparent !important;\n padding-right: 50px;\n}\n\n#search-reset {\n all: unset;\n ma"
},
{
"path": "app/static/css/light-theme.css",
"chars": 4139,
"preview": "html {\n background: var(--whoogle-page-bg) !important;\n}\n\nbody {\n background: var(--whoogle-page-bg) !important;\n}"
},
{
"path": "app/static/css/logo.css",
"chars": 187,
"preview": ".cls-1 {\n fill: transparent;\n}\n\nsvg {\n height: inherit;\n}\n\na {\n height: inherit;\n}\n\n@media (max-width: 1000px) "
},
{
"path": "app/static/css/main.css",
"chars": 2910,
"preview": "body {\n font-family: Avenir, Helvetica, Arial, sans-serif;\n}\n\n.logo {\n width: 80%;\n display: block;\n margin:"
},
{
"path": "app/static/css/search.css",
"chars": 1386,
"preview": "body {\n display: block !important;\n margin: auto !important;\n}\n\n.vvjwJb {\n font-size: 16px !important;\n}\n\n.ezO2"
},
{
"path": "app/static/css/variables.css",
"chars": 1074,
"preview": "/* Colors */\n:root {\n /* LIGHT THEME COLORS */\n --whoogle-logo: #685e79;\n --whoogle-page-bg: #ffffff;\n --who"
},
{
"path": "app/static/img/favicon/browserconfig.xml",
"chars": 281,
"preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<browserconfig><msapplication><tile><square70x70logo src=\"/ms-icon-70x70.png\"/><s"
},
{
"path": "app/static/img/favicon/manifest.json",
"chars": 787,
"preview": "{\n \"name\": \"Whoogle Search\",\n \"short_name\": \"Whoogle\",\n \"display\": \"fullscreen\",\n \"scope\": \"/\",\n \"icons\": [\n {\n \"src\""
},
{
"path": "app/static/js/autocomplete.js",
"chars": 4530,
"preview": "let searchInput;\nlet currentFocus;\nlet originalSearch;\nlet autocompleteResults;\n\nconst handleUserInput = () => {\n let"
},
{
"path": "app/static/js/controller.js",
"chars": 3236,
"preview": "const setupSearchLayout = () => {\n // Setup search field\n const searchBar = document.getElementById(\"search-bar\");"
},
{
"path": "app/static/js/currency.js",
"chars": 394,
"preview": "const convert = (n1, n2, conversionFactor) => {\n // id's for currency input boxes\n let id1 = \"cb\" + n1; \n let i"
},
{
"path": "app/static/js/header.js",
"chars": 2238,
"preview": "document.addEventListener(\"DOMContentLoaded\", () => {\n const advSearchToggle = document.getElementById(\"adv-search-to"
},
{
"path": "app/static/js/keyboard.js",
"chars": 1645,
"preview": "(function () {\n let searchBar, results;\n let shift = false;\n const keymap = {\n ArrowUp: goUp,\n Ar"
},
{
"path": "app/static/js/utils.js",
"chars": 2625,
"preview": "const checkForTracking = () => {\n const mainDiv = document.getElementById(\"main\");\n const searchBar = document.get"
},
{
"path": "app/static/settings/countries.json",
"chars": 10181,
"preview": "[\n {\"name\": \"-------\", \"value\": \"\"},\n {\"name\": \"Afghanistan\", \"value\": \"AF\"},\n {\"name\": \"Albania\", \"value\": \"AL\"},\n "
},
{
"path": "app/static/settings/header_tabs.json",
"chars": 618,
"preview": "{\n \"all\": {\n \"tbm\": null,\n \"href\": \"search?q={query}\",\n \"name\": \"All\",\n \"selected\": true\n },\n \"im"
},
{
"path": "app/static/settings/languages.json",
"chars": 2845,
"preview": "[\n {\"name\": \"-------\", \"value\": \"\"},\n {\"name\": \"English\", \"value\": \"lang_en\"},\n {\"name\": \"Afrikaans (Afrikaans)\", \"va"
},
{
"path": "app/static/settings/themes.json",
"chars": 42,
"preview": "[\n \"light\",\n \"dark\",\n \"system\"\n]\n"
},
{
"path": "app/static/settings/time_periods.json",
"chars": 260,
"preview": "[\n {\"name\": \"Any time\", \"value\": \"\"},\n {\"name\": \"Past hour\", \"value\": \"qdr:h\"},\n {\"name\": \"Past 24 hours\", \"value\": \""
},
{
"path": "app/static/settings/translations.json",
"chars": 57409,
"preview": "{\n \"lang_en\": {\n \"\": \"--\",\n \"search\": \"Search\",\n \"config\": \"Configuration\",\n \"config-coun"
},
{
"path": "app/static/widgets/calculator.html",
"chars": 8799,
"preview": "<!--\n Calculator widget.\n This file should contain all required \n CSS, HTML, and JS for it.\n-->\n\n<style>\n #c"
},
{
"path": "app/templates/display.html",
"chars": 2270,
"preview": "<html>\n<head>\n <link rel=\"shortcut icon\" href=\"static/img/favicon.ico\" type=\"image/x-icon\">\n <link rel=\"icon\" href"
},
{
"path": "app/templates/error.html",
"chars": 4886,
"preview": "{% if config.theme %}\n {% if config.theme == 'system' %}\n <style>\n @import \"{{ cb_url('light-theme."
},
{
"path": "app/templates/footer.html",
"chars": 541,
"preview": "<footer>\n <p class=\"footer\">\n Whoogle Search v{{ version_number }} ||\n <a class=\"link\" href=\"https://gi"
},
{
"path": "app/templates/header.html",
"chars": 7262,
"preview": "{% if mobile %}\n <header>\n <div class=\"header-div\">\n <form class=\"search-form header\"\n "
},
{
"path": "app/templates/imageresults.html",
"chars": 8369,
"preview": "<div>\n <style>\n html {\n font-family: Roboto, Helvetica Neue, Arial, sans-serif;\n font-size: 14px;\n li"
},
{
"path": "app/templates/index.html",
"chars": 20391,
"preview": "<html style=\"background: #000;\">\n<head>\n <link rel=\"apple-touch-icon\" sizes=\"57x57\" href=\"static/img/favicon/apple-ic"
},
{
"path": "app/templates/logo.html",
"chars": 7543,
"preview": " <svg id=\"Layer_1\" class=\"whoogle-svg\" data-name=\"Layer 1\" xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 1028 254\">"
},
{
"path": "app/templates/opensearch.xml",
"chars": 6458,
"preview": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<OpenSearchDescription xmlns=\"http://a9.com/-/spec/opensearch/1.1/\"\n "
},
{
"path": "app/templates/search.html",
"chars": 453,
"preview": "<form id=\"search-form\" action=\"search\" method=\"post\">\n <input\n type=\"text\"\n name=\"q\"\n "
},
{
"path": "app/utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "app/utils/bangs.py",
"chars": 4097,
"preview": "import json\nimport httpx\nimport urllib.parse as urlparse\nimport os\nimport glob\n\nbangs_dict = {}\nDDG_BANGS = 'https://duc"
},
{
"path": "app/utils/misc.py",
"chars": 4379,
"preview": "import base64\nimport hashlib\nimport contextlib\nimport io\nimport os\nimport re\n\nimport httpx\nfrom urllib.parse import urlp"
},
{
"path": "app/utils/results.py",
"chars": 16230,
"preview": "from app.models.config import Config\nfrom app.models.endpoint import Endpoint\nfrom app.utils.misc import list_to_dict\nfr"
},
{
"path": "app/utils/search.py",
"chars": 10279,
"preview": "import os\nimport re\nfrom typing import Any\nfrom app.filter import Filter\nfrom app.request import gen_query\nfrom app.util"
},
{
"path": "app/utils/session.py",
"chars": 1031,
"preview": "from cryptography.fernet import Fernet\nfrom flask import current_app as app\n\nREQUIRED_SESSION_VALUES = ['uuid', 'config'"
},
{
"path": "app/utils/ua_generator.py",
"chars": 10605,
"preview": "\"\"\"\nUser Agent Generator for Opera-based UA strings.\n\nThis module generates realistic Opera User Agent strings based on "
},
{
"path": "app/utils/widgets.py",
"chars": 2337,
"preview": "from pathlib import Path\nfrom bs4 import BeautifulSoup\n\n\n# root\nBASE_DIR = Path(__file__).parent.parent.parent\n\ndef add_"
},
{
"path": "app/version.py",
"chars": 158,
"preview": "import os\n\noptional_dev_tag = ''\nif os.getenv('DEV_BUILD'):\n optional_dev_tag = '.dev' + os.getenv('DEV_BUILD')\n\n__ve"
},
{
"path": "app.json",
"chars": 8152,
"preview": "{\n \"name\": \"Whoogle Search\",\n \"description\": \"A lightweight, privacy-oriented, containerized Google search proxy for d"
},
{
"path": "charts/whoogle/.helmignore",
"chars": 349,
"preview": "# Patterns to ignore when building packages.\n# This supports shell glob matching, relative path matching, and\n# negation"
},
{
"path": "charts/whoogle/Chart.yaml",
"chars": 500,
"preview": "apiVersion: v2\nname: whoogle\ndescription: A self hosted search engine on Kubernetes\ntype: application\nversion: 0.1.0\napp"
},
{
"path": "charts/whoogle/templates/NOTES.txt",
"chars": 1747,
"preview": "1. Get the application URL by running these commands:\n{{- if .Values.ingress.enabled }}\n{{- range $host := .Values.ingre"
},
{
"path": "charts/whoogle/templates/_helpers.tpl",
"chars": 1782,
"preview": "{{/*\nExpand the name of the chart.\n*/}}\n{{- define \"whoogle.name\" -}}\n{{- default .Chart.Name .Values.nameOverride | tru"
},
{
"path": "charts/whoogle/templates/deployment.yaml",
"chars": 2725,
"preview": "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: {{ include \"whoogle.fullname\" . }}\n labels:\n {{- include \"who"
},
{
"path": "charts/whoogle/templates/hpa.yaml",
"chars": 1545,
"preview": "{{- if .Values.autoscaling.enabled }}\n{{- if semverCompare \">=1.23-0\" .Capabilities.KubeVersion.GitVersion -}}\napiVersio"
},
{
"path": "charts/whoogle/templates/ingress.yaml",
"chars": 2079,
"preview": "{{- if .Values.ingress.enabled -}}\n{{- $fullName := include \"whoogle.fullname\" . -}}\n{{- $svcPort := .Values.service.por"
},
{
"path": "charts/whoogle/templates/service.yaml",
"chars": 361,
"preview": "apiVersion: v1\nkind: Service\nmetadata:\n name: {{ include \"whoogle.fullname\" . }}\n labels:\n {{- include \"whoogle.lab"
},
{
"path": "charts/whoogle/templates/serviceaccount.yaml",
"chars": 320,
"preview": "{{- if .Values.serviceAccount.create -}}\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n name: {{ include \"whoogle.servi"
},
{
"path": "charts/whoogle/templates/tests/test-connection.yaml",
"chars": 379,
"preview": "apiVersion: v1\nkind: Pod\nmetadata:\n name: \"{{ include \"whoogle.fullname\" . }}-test-connection\"\n labels:\n {{- includ"
},
{
"path": "charts/whoogle/values.yaml",
"chars": 5377,
"preview": "# Default values for whoogle.\n# This is a YAML-formatted file.\n# Declare variables to be passed into your templates.\n\nna"
},
{
"path": "docker-compose-traefik.yaml",
"chars": 3220,
"preview": "# can't use mem_limit in a 3.x docker-compose file in non swarm mode\n# see https://github.com/docker/compose/issues/4513"
},
{
"path": "docker-compose.yml",
"chars": 1880,
"preview": "# Modern docker-compose format (v2+) does not require version specification\n# Memory limits are supported in Compose v2+"
},
{
"path": "heroku.yml",
"chars": 38,
"preview": "build:\n docker:\n web: Dockerfile\n\n"
},
{
"path": "letsencrypt/acme.json",
"chars": 0,
"preview": ""
},
{
"path": "misc/check_google_user_agents.py",
"chars": 14168,
"preview": "#!/usr/bin/env python3\n\"\"\"\nTest User Agent strings against Google to find which ones return actual search results\ninstea"
},
{
"path": "misc/generate_uas.py",
"chars": 6389,
"preview": "#!/usr/bin/env python3\n\"\"\"\nStandalone Opera User Agent String Generator\n\nThis tool generates Opera-based User Agent stri"
},
{
"path": "misc/heroku-regen.sh",
"chars": 858,
"preview": "#!/bin/bash\n# Assumes this is being executed from a session that has already logged\n# into Heroku with \"heroku login -i\""
},
{
"path": "misc/instances.txt",
"chars": 169,
"preview": "https://search.garudalinux.org\nhttps://search.sethforprivacy.com\nhttps://whoogle.privacydev.net\nhttps://wg.vern.cc\nhttps"
},
{
"path": "misc/replit.py",
"chars": 200,
"preview": "import subprocess\n\n# A plague upon Replit and all who have built it\nreplit_cmd = \"killall -q python3 > /dev/null 2>&1; p"
},
{
"path": "misc/tor/start-tor.sh",
"chars": 748,
"preview": "#!/bin/sh\n\nFF_STRING=\"FascistFirewall 1\"\n\nif [ \"$WHOOGLE_TOR_SERVICE\" == \"0\" ]; then\n echo \"Skipping Tor startup...\"\n"
},
{
"path": "misc/tor/torrc",
"chars": 373,
"preview": "DataDirectory /var/lib/tor\nControlPort 9051\nCookieAuthentication 1\nDataDirectoryGroupReadable 1\nCookieAuthFileGroupReada"
},
{
"path": "misc/update-translations.py",
"chars": 2090,
"preview": "import json\nimport pathlib\nimport httpx\n\nlingva = 'https://lingva.ml/api/v1/en'\n\n\ndef format_lang(lang: str) -> str:\n "
},
{
"path": "pyproject.toml",
"chars": 309,
"preview": "[build-system]\nrequires = [\"setuptools\", \"wheel\"]\nbuild-backend = \"setuptools.build_meta\"\n\n[tool.ruff]\nline-length = 100"
},
{
"path": "requirements.txt",
"chars": 581,
"preview": "attrs==25.3.0\nbeautifulsoup4==4.13.5\nbrotli==1.2.0\ncertifi==2025.8.3\ncffi==2.0.0\nclick==8.3.0\ncryptography==46.0.1\ncssut"
},
{
"path": "run",
"chars": 982,
"preview": "#!/bin/sh\n# Usage:\n# ./run # Runs the full web app\n# ./run test # Runs the testing suite\n\nset -e\n\nSCRIPT_DIR=\"$(CDPATH= "
},
{
"path": "setup.cfg",
"chars": 992,
"preview": "[metadata]\nname = whoogle-search\nversion = attr: app.version.__version__\nurl = https://github.com/benbusby/whoogle-searc"
},
{
"path": "test/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "test/conftest.py",
"chars": 2076,
"preview": "from app import app\nfrom app.request import Request\nfrom app.utils.session import generate_key\nfrom test.mock_google imp"
},
{
"path": "test/mock_google.py",
"chars": 4433,
"preview": "from urllib.parse import parse_qs, unquote, quote\n\nfrom app.models.config import Config\n\nDEFAULT_RESULTS = [\n ('Examp"
},
{
"path": "test/test_alts.py",
"chars": 3738,
"preview": "import copy\nimport os\n\nfrom bs4 import BeautifulSoup\n\nfrom app import app\nfrom app.filter import Filter\nfrom app.models."
},
{
"path": "test/test_autocomplete.py",
"chars": 498,
"preview": "from app.models.endpoint import Endpoint\n\n\ndef test_autocomplete_get(client):\n rv = client.get(f'/{Endpoint.autocompl"
},
{
"path": "test/test_autocomplete_xml.py",
"chars": 983,
"preview": "from app import app\nfrom app.request import Request\nfrom app.models.config import Config\n\n\nclass FakeHttpClient:\n def"
},
{
"path": "test/test_http_client.py",
"chars": 909,
"preview": "import types\n\nimport httpx\nimport pytest\n\nfrom app.services.http_client import HttpxClient\n\n\ndef test_httpxclient_follow"
},
{
"path": "test/test_json.py",
"chars": 2758,
"preview": "import json\nimport types\n\nimport pytest\n\nfrom app.models.endpoint import Endpoint\nfrom app.utils import search as search"
},
{
"path": "test/test_misc.py",
"chars": 2537,
"preview": "from cryptography.fernet import Fernet\n\nfrom app import app\nfrom app.models.endpoint import Endpoint\nfrom app.utils.sess"
},
{
"path": "test/test_results.py",
"chars": 6307,
"preview": "from bs4 import BeautifulSoup\nfrom app.filter import Filter\nfrom app.models.config import Config\nfrom app.models.endpoin"
},
{
"path": "test/test_routes.py",
"chars": 2899,
"preview": "from app import app\nfrom app.models.endpoint import Endpoint\n\nimport json\n\nfrom test.conftest import demo_config\n\n\ndef t"
},
{
"path": "test/test_routes_json.py",
"chars": 758,
"preview": "import json\n\nimport pytest\n\nfrom app.models.endpoint import Endpoint\nfrom app.utils import search as search_mod\n\n\ndef te"
},
{
"path": "test/test_tor.py",
"chars": 1857,
"preview": "import pytest\n\nfrom app import app\nfrom app.request import Request, TorError\nfrom app.models.config import Config\n\n\nclas"
},
{
"path": "whoogle.template.env",
"chars": 3181,
"preview": "# ----------------------------------\n# Rename to \"whoogle.env\" before use\n# ----------------------------------\n# You can"
}
]
About this extraction
This page contains the full source code of the benbusby/whoogle-search GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 119 files (513.4 KB), approximately 139.6k tokens, and a symbol index with 230 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.