Full Code of MarioVilas/google for AI

master 8d11e1735e6d cached

16 files

31.4 KB

7.9k tokens

6 symbols

1 requests

Download .txt

Repository: MarioVilas/google
Branch: master
Commit: 8d11e1735e6d
Files: 16
Total size: 31.4 KB

Directory structure:
gitextract_96tgavo2/

├── .gitignore
├── .readthedocs.yaml
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs/
│   ├── .gitignore
│   ├── Makefile
│   ├── conf.py
│   ├── index.rst
│   └── make.bat
├── googlesearch/
│   └── __init__.py
├── requirements.txt
├── scripts/
│   └── google
├── setup.cfg
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.py[co]
MANIFEST

# IDE
.idea/

# Packages
*.egg
*.egg-info
dist
build
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg

# Installer logs
pip-log.txt

# Unit test / coverage reports
.coverage
.tox

#Translations
*.mo

#Mr Developer
.mr.developer.cfg


================================================
FILE: .readthedocs.yaml
================================================
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
  os: ubuntu-22.04
  tools:
    python: "3.11"

# Build documentation in the docs/ directory with Sphinx
sphinx:
  configuration: docs/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
  install:
  - requirements: requirements.txt

================================================
FILE: .travis.yml
================================================
language: python

# Supported CPython versions:
# https://en.wikipedia.org/wiki/CPython#Version_history
python:
 - pypy3
 - pypy
 - 2.7
 - 3.6
 - 3.5
 - 3.4

# Use container-based infrastructure
sudo: false

install:
 - pip install pycodestyle pyflakes

script:
 # Static analysis
 - pyflakes .
 - pyflakes ./scripts/google
 - pycodestyle --statistics --count .
 - pycodestyle --statistics --count ./scripts/google

matrix:
  fast_finish: true


================================================
FILE: LICENSE
================================================
BSD 3-Clause License

Copyright (c) 2019,
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


================================================
FILE: MANIFEST.in
================================================
include README.md
include MANIFEST.in
include setup.py
include scripts/google
include requirements.txt
include googlesearch/user_agents.txt.gz


================================================
FILE: README.md
================================================
Google Search
============

Google search from Python.

https://python-googlesearch.readthedocs.io/en/latest/

**Note**: this project is not affiliated with Google in any way.

Usage example
-------------

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from googlesearch import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

Installing
----------

    pip install google


================================================
FILE: docs/.gitignore
================================================
_build/


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = sphinx-build
SPHINXPROJ    = googlesearch
SOURCEDIR     = .
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

================================================
FILE: docs/conf.py
================================================
# -*- coding: utf-8 -*-
#
# googlesearch documentation build configuration file, created by
# sphinx-quickstart on Tue Nov  6 12:25:12 2018.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('..'))


# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.viewcode',
              'sphinx.ext.githubpages']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'

# The master toctree document.
master_doc = 'index'

# General information about the project.
project = u'googlesearch'
copyright = u'2018, Mario Vilas'
author = u'Mario Vilas'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u''
# The full version, including alpha/beta/rc tags.
release = u''

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False


# -- Options for HTML output ----------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Theme options are theme-specific and customize the look and feel of a theme
# further.  For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
    '**': [
        'relations.html',  # needs 'show_related': True theme option to display
        'searchbox.html',
    ]
}


# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = 'googlesearchdoc'


# -- Options for LaTeX output ---------------------------------------------

latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',

    # The font size ('10pt', '11pt' or '12pt').
    #
    # 'pointsize': '10pt',

    # Additional stuff for the LaTeX preamble.
    #
    # 'preamble': '',

    # Latex figure (float) alignment
    #
    # 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
#  author, documentclass [howto, manual, or own class]).
latex_documents = [
    (master_doc, 'googlesearch.tex', u'googlesearch Documentation',
     u'Mario Vilas', 'manual'),
]


# -- Options for manual page output ---------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
    (master_doc, 'googlesearch', u'googlesearch Documentation',
     [author], 1)
]


# -- Options for Texinfo output -------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
#  dir menu entry, description, category)
texinfo_documents = [
    (master_doc, 'googlesearch', u'googlesearch Documentation',
     author, 'googlesearch', 'Python bindings to the Google search engine.',
     'Miscellaneous'),
]


# -- Options for Epub output ----------------------------------------------

# Bibliographic Dublin Core info.
epub_title = project
epub_author = author
epub_publisher = author
epub_copyright = copyright

# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''

# A unique identification for the text.
#
# epub_uid = ''

# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']


================================================
FILE: docs/index.rst
================================================
.. googlesearch documentation master file, created by
   sphinx-quickstart on Tue Nov  6 12:25:12 2018.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to googlesearch's documentation!
========================================

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

Reference
=========

.. automodule:: googlesearch
   :members:



================================================
FILE: docs/make.bat
================================================
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
	set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
set SPHINXPROJ=googlesearch

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
	echo.
	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
	echo.installed, then set the SPHINXBUILD environment variable to point
	echo.to the full path of the 'sphinx-build' executable. Alternatively you
	echo.may add the Sphinx directory to PATH.
	echo.
	echo.If you don't have Sphinx installed, grab it from
	echo.http://sphinx-doc.org/
	exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%

:end
popd


================================================
FILE: googlesearch/__init__.py
================================================
#!/usr/bin/env python

# Copyright (c) 2009-2020, Mario Vilas
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice,this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of the copyright holder nor the names of its
#       contributors may be used to endorse or promote products derived from
#       this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

import os
import random
import sys
import time
import ssl

if sys.version_info[0] > 2:
    from http.cookiejar import LWPCookieJar
    from urllib.request import Request, urlopen
    from urllib.parse import quote_plus, urlparse, parse_qs
else:
    from cookielib import LWPCookieJar
    from urllib import quote_plus
    from urllib2 import Request, urlopen
    from urlparse import urlparse, parse_qs

try:
    from bs4 import BeautifulSoup
    is_bs4 = True
except ImportError:
    from BeautifulSoup import BeautifulSoup
    is_bs4 = False

__all__ = [

    # Main search function.
    'search',

    # Shortcut for "get lucky" search.
    'lucky',

    # Miscellaneous utility functions.
    'get_random_user_agent', 'get_tbs',
]

# Debug flag.
DEBUG = False

# URL templates to make Google searches.
url_home = "https://www.google.%(tld)s/"
url_search = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \
             "q=%(query)s&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&" \
             "cr=%(country)s&filter=0"
url_next_page = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \
                "q=%(query)s&start=%(start)d&tbs=%(tbs)s&safe=%(safe)s&" \
                "cr=%(country)s&filter=0"
url_search_num = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \
                 "q=%(query)s&num=%(num)d&btnG=Google+Search&tbs=%(tbs)s&" \
                 "&safe=%(safe)scr=%(country)s&filter=0"
url_next_page_num = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \
                    "q=%(query)s&num=%(num)d&start=%(start)d&tbs=%(tbs)s&" \
                    "safe=%(safe)s&cr=%(country)s&filter=0"
url_parameters = (
    'hl', 'q', 'num', 'btnG', 'start', 'tbs', 'safe', 'cr', 'filter')

# Cookie jar. Stored at the user's home folder.
# If the cookie jar is inaccessible, the errors are ignored.
home_folder = os.getenv('HOME')
if not home_folder:
    home_folder = os.getenv('USERHOME')
    if not home_folder:
        home_folder = '.'   # Use the current folder on error.
cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
try:
    cookie_jar.load()
except Exception:
    pass

# Default user agent, unless instructed by the user to change it.
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'

# Load the list of valid user agents from the install folder.
# The search order is:
#   * user_agents.txt.gz
#   * user_agents.txt
#   * default user agent
try:
    install_folder = os.path.abspath(os.path.split(__file__)[0])
    try:
        user_agents_file = os.path.join(install_folder, 'user_agents.txt.gz')
        import gzip
        fp = gzip.open(user_agents_file, 'rb')
        try:
            user_agents_list = [_.strip() for _ in fp.readlines()]
        finally:
            fp.close()
            del fp
    except Exception:
        user_agents_file = os.path.join(install_folder, 'user_agents.txt')
        with open(user_agents_file) as fp:
            user_agents_list = [_.strip() for _ in fp.readlines()]
except Exception:
    user_agents_list = [USER_AGENT]


# Get a random user agent.
def get_random_user_agent():
    """
    Get a random user agent string.

    :rtype: str
    :return: Random user agent string.
    """
    return random.choice(user_agents_list)


# Helper function to format the tbs parameter.
def get_tbs(from_date, to_date):
    """
    Helper function to format the tbs parameter.

    :param datetime.date from_date: Python date object.
    :param datetime.date to_date: Python date object.

    :rtype: str
    :return: Dates encoded in tbs format.
    """
    from_date = from_date.strftime('%m/%d/%Y')
    to_date = to_date.strftime('%m/%d/%Y')
    return 'cdr:1,cd_min:%(from_date)s,cd_max:%(to_date)s' % vars()


# Request the given URL and return the response page, using the cookie jar.
# If the cookie jar is inaccessible, the errors are ignored.
def get_page(url, user_agent=None, verify_ssl=True):
    """
    Request the given URL and return the response page, using the cookie jar.

    :param str url: URL to retrieve.
    :param str user_agent: User agent for the HTTP requests.
        Use None for the default.
    :param bool verify_ssl: Verify the SSL certificate to prevent
        traffic interception attacks. Defaults to True.

    :rtype: str
    :return: Web page retrieved for the given URL.

    :raises IOError: An exception is raised on error.
    :raises urllib2.URLError: An exception is raised on error.
    :raises urllib2.HTTPError: An exception is raised on error.
    """
    if user_agent is None:
        user_agent = USER_AGENT
    request = Request(url)
    request.add_header('User-Agent', user_agent)
    cookie_jar.add_cookie_header(request)
    if verify_ssl:
        response = urlopen(request)
    else:
        context = ssl._create_unverified_context()
        response = urlopen(request, context=context)
    cookie_jar.extract_cookies(response, request)
    html = response.read()
    response.close()
    try:
        cookie_jar.save()
    except Exception:
        pass
    if DEBUG:
        print('-' * 79)
        print(html)
        print('-' * 79)
    return html

# Filter links found in the Google result pages HTML code.
# Returns None if the link doesn't yield a valid result.
def filter_result(link, include_google_links=False):
    try:
        # Decode hidden URLs.
        if link.startswith('/url?'):
            o = urlparse(link, 'http')
            link = parse_qs(o.query).get('q')[0]

        o = urlparse(link, 'http')

        # Check if the link is an absolute URL.
        if not o.netloc:
            return None

        # If excluding Google links, return None if 'google' is in the domain.
        if not include_google_links and 'google' in o.netloc:
            return None

        return link

    except Exception:
        pass


# Returns a generator that yields URLs.
def search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0,
           stop=None, pause=2.0, country='', extra_params=None,
           user_agent=None, verify_ssl=True, include_google_links=False):
    """
    Search the given query string using Google.

    :param str query: Query string. Must NOT be url-encoded.
    :param str tld: Top level domain.
    :param str lang: Language.
    :param str tbs: Time limits (i.e "qdr:h" => last hour,
        "qdr:d" => last 24 hours, "qdr:m" => last month).
    :param str safe: Safe search.
    :param int num: Number of results per page.
    :param int start: First result to retrieve.
    :param int stop: Last result to retrieve.
        Use None to keep searching forever.
    :param float pause: Lapse to wait between HTTP requests, measured in seconds.
        A lapse too long will make the search slow, but a lapse too short may
        cause Google to block your IP. Your mileage may vary!
    :param str country: Country or region to focus the search on. Similar to
        changing the TLD, but does not yield exactly the same results.
        Only Google knows why...
    :param dict extra_params: A dictionary of extra HTTP GET
        parameters, which must be URL encoded. For example if you don't want
        Google to filter similar results you can set the extra_params to
        {'filter': '0'} which will append '&filter=0' to every query.
    :param str user_agent: User agent for the HTTP requests.
        Use None for the default.
    :param bool verify_ssl: Verify the SSL certificate to prevent
        traffic interception attacks. Defaults to True.
    :param bool include_google_links: Includes links pointing to a Google domain.
        Defaults to False.

    :rtype: generator of str
    :return: Generator (iterator) that yields found URLs.
        If the stop parameter is None the iterator will loop forever.
    """
    # Set of hashes for the results found.
    # This is used to avoid repeated results.
    hashes = set()

    # Count the number of links yielded.
    count = 0

    # Prepare the search string.
    query = quote_plus(query)

    # If no extra_params is given, create an empty dictionary.
    # We should avoid using an empty dictionary as a default value
    # in a function parameter in Python.
    if not extra_params:
        extra_params = {}

    # Check extra_params for overlapping.
    for builtin_param in url_parameters:
        if builtin_param in extra_params.keys():
            raise ValueError(
                'GET parameter "%s" is overlapping with \
                the built-in GET parameter',
                builtin_param
            )

    # Grab the cookie from the home page.
    get_page(url_home % vars(), user_agent, verify_ssl)

    # Prepare the URL of the first request.
    if start:
        if num == 10:
            url = url_next_page % vars()
        else:
            url = url_next_page_num % vars()
    else:
        if num == 10:
            url = url_search % vars()
        else:
            url = url_search_num % vars()

    # Loop until we reach the maximum result, if any (otherwise, loop forever).
    while not stop or count < stop:

        # Remember last count to detect the end of results.
        last_count = count

        # Append extra GET parameters to the URL.
        # This is done on every iteration because we're
        # rebuilding the entire URL at the end of this loop.
        for k, v in extra_params.items():
            k = quote_plus(k)
            v = quote_plus(v)
            url = url + ('&%s=%s' % (k, v))

        # Sleep between requests.
        # Keeps Google from banning you for making too many requests.
        time.sleep(pause)

        # Request the Google Search results page.
        html = get_page(url, user_agent, verify_ssl)

        # Parse the response and get every anchored URL.
        if is_bs4:
            soup = BeautifulSoup(html, 'html.parser')
        else:
            soup = BeautifulSoup(html)
        try:
            anchors = soup.find(id='search').findAll('a')
            # Sometimes (depending on the User-agent) there is
            # no id "search" in html response...
        except AttributeError:
            # Remove links of the top bar.
            gbar = soup.find(id='gbar')
            if gbar:
                gbar.clear()
            anchors = soup.findAll('a')

        # Process every anchored URL.
        for a in anchors:

            # Get the URL from the anchor tag.
            try:
                link = a['href']
            except KeyError:
                continue

            # Filter invalid links and links pointing to Google itself.
            link = filter_result(link, include_google_links)
            if not link:
                continue

            # Discard repeated results.
            h = hash(link)
            if h in hashes:
                continue
            hashes.add(h)

            # Yield the result.
            yield link

            # Increase the results counter.
            # If we reached the limit, stop.
            count += 1
            if stop and count >= stop:
                return

        # End if there are no more results.
        # XXX TODO review this logic, not sure if this is still true!
        if last_count == count:
            break

        # Prepare the URL for the next request.
        start += num
        if num == 10:
            url = url_next_page % vars()
        else:
            url = url_next_page_num % vars()


# Shortcut to single-item search.
# Evaluates the iterator to return the single URL as a string.
def lucky(*args, **kwargs):
    """
    Shortcut to single-item search.

    Same arguments as the main search function, but the return value changes.

    :rtype: str
    :return: URL found by Google.
    """
    return next(search(*args, **kwargs))


================================================
FILE: requirements.txt
================================================
beautifulsoup4>=4.0


================================================
FILE: scripts/google
================================================
#!/usr/bin/env python

# Copyright (c) 2009-2020, Mario Vilas
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice,this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of the copyright holder nor the names of its
#       contributors may be used to endorse or promote products derived from
#       this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

import sys

from googlesearch import search, get_random_user_agent

# TODO port to argparse
from optparse import OptionParser, IndentedHelpFormatter


class BannerHelpFormatter(IndentedHelpFormatter):

    "Just a small tweak to optparse to be able to print a banner."

    def __init__(self, banner, *argv, **argd):
        self.banner = banner
        IndentedHelpFormatter.__init__(self, *argv, **argd)

    def format_usage(self, usage):
        msg = IndentedHelpFormatter.format_usage(self, usage)
        return '%s\n%s' % (self.banner, msg)


def main():

    # Parse the command line arguments.
    formatter = BannerHelpFormatter(
        "Python script to use the Google search engine\n"
        "By Mario Vilas (mvilas at gmail dot com)\n"
        "https://github.com/MarioVilas/googlesearch\n"
    )
    parser = OptionParser(formatter=formatter)
    parser.set_usage("%prog [options] query")
    parser.add_option(
        '--tld', metavar='TLD', type='string', default='com',
        help="top level domain to use [default: com]")
    parser.add_option(
        '--lang', metavar='LANGUAGE', type='string', default='en',
        help="produce results in the given language [default: en]")
    parser.add_option(
        '--tbs', metavar='TBS', type='string', default='0',
        help="produce results from period [default: 0]")
    parser.add_option(
        '--safe', metavar='SAFE', type='string', default='off',
        help="kids safe search [default: off]")
    parser.add_option(
        '--country', metavar='COUNTRY', type='string', default='',
        help="region to restrict search on [default: not restricted]")
    parser.add_option(
        '--num', metavar='NUMBER', type='int', default=10,
        help="number of results per page [default: 10]")
    parser.add_option(
        '--start', metavar='NUMBER', type='int', default=0,
        help="first result to retrieve [default: 0]")
    parser.add_option(
        '--stop', metavar='NUMBER', type='int', default=0,
        help="last result to retrieve [default: unlimited]")
    parser.add_option(
        '--pause', metavar='SECONDS', type='float', default=2.0,
        help="pause between HTTP requests [default: 2.0]")
    parser.add_option(
        '--rua', action='store_true', default=False,
        help="Randomize the User-Agent [default: no]")
    parser.add_option(
        '--insecure', dest="verify_ssl", action='store_false', default=True,
        help="Randomize the User-Agent [default: no]")
    parser.add_option(
        '--include', dest="include_google_links", action='store_true', default=False,
        help="Include links pointing to Google [default: no]")
    (options, args) = parser.parse_args()
    query = ' '.join(args)
    if not query:
        parser.print_help()
        sys.exit(2)
    params = [
        (k, v) for (k, v) in options.__dict__.items()
        if not k.startswith('_')]
    params = dict(params)

    # Randomize the user agent if requested.
    if 'rua' in params and params.pop('rua'):
        params['user_agent'] = get_random_user_agent()

    # Run the query.
    for url in search(query, **params):
        print(url)
        try:
            sys.stdout.flush()
        except Exception:
            pass


if __name__ == '__main__':
    main()


================================================
FILE: setup.cfg
================================================
[bdist_wheel]
universal = 1


================================================
FILE: setup.py
================================================
#!/usr/bin/env python

# Copyright (c) 2009-2024, Mario Vilas
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#     * Redistributions of source code must retain the above copyright notice,
#       this list of conditions and the following disclaimer.
#     * Redistributions in binary form must reproduce the above copyright
#       notice,this list of conditions and the following disclaimer in the
#       documentation and/or other materials provided with the distribution.
#     * Neither the name of the copyright holder nor the names of its
#       contributors may be used to endorse or promote products derived from
#       this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.

from os import chdir
from os.path import abspath, join, split

# Make sure we are standing in the correct directory.
# Old versions of distutils didn't take care of this.
here = split(abspath(__file__))[0]
chdir(here)

# Package metadata.
metadata = dict(
    name='google',
    provides=['googlesearch'],
    requires=['beautifulsoup4'],
    packages=['googlesearch'],
    scripts=[join('scripts', 'google')],
    package_data={'googlesearch': ['user_agents.txt.gz']},
    include_package_data=True,
    version="3.0.0",
    description="Unofficial Python bindings to the Google search engine. Not affiliated with Google.",
    author="Mario Vilas",
    author_email="mvilas@gmail.com",
    url="https://github.com/MarioVilas/googlesearch",
    classifiers=[
        "Development Status :: 5 - Production/Stable",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: BSD License",
        "Environment :: Console",
        "Programming Language :: Python",
        "Topic :: Software Development :: Libraries :: Python Modules",
     ],
)

# Prefer setuptools over the old distutils.
# If setuptools is available, use install_requires.
try:
    from setuptools import setup
    metadata['install_requires'] = metadata['requires']
except ImportError:
    from distutils.core import setup

# Run the setup script.
setup(**metadata)

Download .txt

gitextract_96tgavo2/

├── .gitignore
├── .readthedocs.yaml
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs/
│   ├── .gitignore
│   ├── Makefile
│   ├── conf.py
│   ├── index.rst
│   └── make.bat
├── googlesearch/
│   └── __init__.py
├── requirements.txt
├── scripts/
│   └── google
├── setup.cfg
└── setup.py

Download .txt

SYMBOL INDEX (6 symbols across 1 files)

FILE: googlesearch/__init__.py
  function get_random_user_agent (line 126) | def get_random_user_agent():
  function get_tbs (line 137) | def get_tbs(from_date, to_date):
  function get_page (line 154) | def get_page(url, user_agent=None, verify_ssl=True):
  function filter_result (line 196) | def filter_result(link, include_google_links=False):
  function search (line 220) | def search(query, tld='com', lang='en', tbs='0', safe='off', num=10, sta...
  function lucky (line 378) | def lucky(*args, **kwargs):

Download .json

Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (34K chars).

[
  {
    "path": ".gitignore",
    "chars": 255,
    "preview": "*.py[co]\nMANIFEST\n\n# IDE\n.idea/\n\n# Packages\n*.egg\n*.egg-info\ndist\nbuild\neggs\nparts\nbin\nvar\nsdist\ndevelop-eggs\n.installed"
  },
  {
    "path": ".readthedocs.yaml",
    "chars": 565,
    "preview": "# .readthedocs.yaml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html f"
  },
  {
    "path": ".travis.yml",
    "chars": 444,
    "preview": "language: python\n\n# Supported CPython versions:\n# https://en.wikipedia.org/wiki/CPython#Version_history\npython:\n - pypy3"
  },
  {
    "path": "LICENSE",
    "chars": 1499,
    "preview": "BSD 3-Clause License\n\nCopyright (c) 2019,\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with "
  },
  {
    "path": "MANIFEST.in",
    "chars": 143,
    "preview": "include README.md\ninclude MANIFEST.in\ninclude setup.py\ninclude scripts/google\ninclude requirements.txt\ninclude googlesea"
  },
  {
    "path": "README.md",
    "chars": 438,
    "preview": "Google Search\n============\n\nGoogle search from Python.\n\nhttps://python-googlesearch.readthedocs.io/en/latest/\n\n**Note**:"
  },
  {
    "path": "docs/.gitignore",
    "chars": 8,
    "preview": "_build/\n"
  },
  {
    "path": "docs/Makefile",
    "chars": 609,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/conf.py",
    "chars": 5689,
    "preview": "# -*- coding: utf-8 -*-\n#\n# googlesearch documentation build configuration file, created by\n# sphinx-quickstart on Tue N"
  },
  {
    "path": "docs/index.rst",
    "chars": 462,
    "preview": ".. googlesearch documentation master file, created by\n   sphinx-quickstart on Tue Nov  6 12:25:12 2018.\n   You can adapt"
  },
  {
    "path": "docs/make.bat",
    "chars": 816,
    "preview": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sp"
  },
  {
    "path": "googlesearch/__init__.py",
    "chars": 13366,
    "preview": "#!/usr/bin/env python\n\n# Copyright (c) 2009-2020, Mario Vilas\n# All rights reserved.\n#\n# Redistribution and use in sourc"
  },
  {
    "path": "requirements.txt",
    "chars": 20,
    "preview": "beautifulsoup4>=4.0\n"
  },
  {
    "path": "scripts/google",
    "chars": 4878,
    "preview": "#!/usr/bin/env python\n\n# Copyright (c) 2009-2020, Mario Vilas\n# All rights reserved.\n#\n# Redistribution and use in sourc"
  },
  {
    "path": "setup.cfg",
    "chars": 28,
    "preview": "[bdist_wheel]\nuniversal = 1\n"
  },
  {
    "path": "setup.py",
    "chars": 2952,
    "preview": "#!/usr/bin/env python\n\n# Copyright (c) 2009-2024, Mario Vilas\n# All rights reserved.\n#\n# Redistribution and use in sourc"
  }
]

About this extraction

This page contains the full source code of the MarioVilas/google GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (31.4 KB), approximately 7.9k tokens, and a symbol index with 6 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo