Repository: MarioVilas/google Branch: master Commit: 8d11e1735e6d Files: 16 Total size: 31.4 KB Directory structure: gitextract_96tgavo2/ ├── .gitignore ├── .readthedocs.yaml ├── .travis.yml ├── LICENSE ├── MANIFEST.in ├── README.md ├── docs/ │ ├── .gitignore │ ├── Makefile │ ├── conf.py │ ├── index.rst │ └── make.bat ├── googlesearch/ │ └── __init__.py ├── requirements.txt ├── scripts/ │ └── google ├── setup.cfg └── setup.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.py[co] MANIFEST # IDE .idea/ # Packages *.egg *.egg-info dist build eggs parts bin var sdist develop-eggs .installed.cfg # Installer logs pip-log.txt # Unit test / coverage reports .coverage .tox #Translations *.mo #Mr Developer .mr.developer.cfg ================================================ FILE: .readthedocs.yaml ================================================ # .readthedocs.yaml # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 # Set the version of Python and other tools you might need build: os: ubuntu-22.04 tools: python: "3.11" # Build documentation in the docs/ directory with Sphinx sphinx: configuration: docs/conf.py # We recommend specifying your dependencies to enable reproducible builds: # https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html python: install: - requirements: requirements.txt ================================================ FILE: .travis.yml ================================================ language: python # Supported CPython versions: # https://en.wikipedia.org/wiki/CPython#Version_history python: - pypy3 - pypy - 2.7 - 3.6 - 3.5 - 3.4 # Use container-based infrastructure sudo: false install: - pip install pycodestyle pyflakes script: # Static analysis - pyflakes . - pyflakes ./scripts/google - pycodestyle --statistics --count . - pycodestyle --statistics --count ./scripts/google matrix: fast_finish: true ================================================ FILE: LICENSE ================================================ BSD 3-Clause License Copyright (c) 2019, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================ FILE: MANIFEST.in ================================================ include README.md include MANIFEST.in include setup.py include scripts/google include requirements.txt include googlesearch/user_agents.txt.gz ================================================ FILE: README.md ================================================ Google Search ============ Google search from Python. https://python-googlesearch.readthedocs.io/en/latest/ **Note**: this project is not affiliated with Google in any way. Usage example ------------- # Get the first 20 hits for: "Breaking Code" WordPress blog from googlesearch import search for url in search('"Breaking Code" WordPress blog', stop=20): print(url) Installing ---------- pip install google ================================================ FILE: docs/.gitignore ================================================ _build/ ================================================ FILE: docs/Makefile ================================================ # Minimal makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build SPHINXPROJ = googlesearch SOURCEDIR = . BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) ================================================ FILE: docs/conf.py ================================================ # -*- coding: utf-8 -*- # # googlesearch documentation build configuration file, created by # sphinx-quickstart on Tue Nov 6 12:25:12 2018. # # This file is execfile()d with the current directory set to its # containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. # import os import sys sys.path.insert(0, os.path.abspath('..')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. # # needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode', 'sphinx.ext.githubpages'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix(es) of source filenames. # You can specify multiple suffix as a list of string: # # source_suffix = ['.rst', '.md'] source_suffix = '.rst' # The master toctree document. master_doc = 'index' # General information about the project. project = u'googlesearch' copyright = u'2018, Mario Vilas' author = u'Mario Vilas' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = u'' # The full version, including alpha/beta/rc tags. release = u'' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. # # This is also used if you do content translation via gettext catalogs. # Usually you set "language" from the command line for these cases. language = None # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This patterns also effect to html_static_path and html_extra_path exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = False # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # html_theme = 'alabaster' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. # # html_theme_options = {} # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # Custom sidebar templates, must be a dictionary that maps document names # to template names. # # This is required for the alabaster theme # refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars html_sidebars = { '**': [ 'relations.html', # needs 'show_related': True theme option to display 'searchbox.html', ] } # -- Options for HTMLHelp output ------------------------------------------ # Output file base name for HTML help builder. htmlhelp_basename = 'googlesearchdoc' # -- Options for LaTeX output --------------------------------------------- latex_elements = { # The paper size ('letterpaper' or 'a4paper'). # # 'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). # # 'pointsize': '10pt', # Additional stuff for the LaTeX preamble. # # 'preamble': '', # Latex figure (float) alignment # # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, # author, documentclass [howto, manual, or own class]). latex_documents = [ (master_doc, 'googlesearch.tex', u'googlesearch Documentation', u'Mario Vilas', 'manual'), ] # -- Options for manual page output --------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ (master_doc, 'googlesearch', u'googlesearch Documentation', [author], 1) ] # -- Options for Texinfo output ------------------------------------------- # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ (master_doc, 'googlesearch', u'googlesearch Documentation', author, 'googlesearch', 'Python bindings to the Google search engine.', 'Miscellaneous'), ] # -- Options for Epub output ---------------------------------------------- # Bibliographic Dublin Core info. epub_title = project epub_author = author epub_publisher = author epub_copyright = copyright # The unique identifier of the text. This can be a ISBN number # or the project homepage. # # epub_identifier = '' # A unique identification for the text. # # epub_uid = '' # A list of files that should not be packed into the epub file. epub_exclude_files = ['search.html'] ================================================ FILE: docs/index.rst ================================================ .. googlesearch documentation master file, created by sphinx-quickstart on Tue Nov 6 12:25:12 2018. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to googlesearch's documentation! ======================================== Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` Reference ========= .. automodule:: googlesearch :members: ================================================ FILE: docs/make.bat ================================================ @ECHO OFF pushd %~dp0 REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set SOURCEDIR=. set BUILDDIR=_build set SPHINXPROJ=googlesearch if "%1" == "" goto help %SPHINXBUILD% >NUL 2>NUL if errorlevel 9009 ( echo. echo.The 'sphinx-build' command was not found. Make sure you have Sphinx echo.installed, then set the SPHINXBUILD environment variable to point echo.to the full path of the 'sphinx-build' executable. Alternatively you echo.may add the Sphinx directory to PATH. echo. echo.If you don't have Sphinx installed, grab it from echo.http://sphinx-doc.org/ exit /b 1 ) %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% goto end :help %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% :end popd ================================================ FILE: googlesearch/__init__.py ================================================ #!/usr/bin/env python # Copyright (c) 2009-2020, Mario Vilas # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice,this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of the copyright holder nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. import os import random import sys import time import ssl if sys.version_info[0] > 2: from http.cookiejar import LWPCookieJar from urllib.request import Request, urlopen from urllib.parse import quote_plus, urlparse, parse_qs else: from cookielib import LWPCookieJar from urllib import quote_plus from urllib2 import Request, urlopen from urlparse import urlparse, parse_qs try: from bs4 import BeautifulSoup is_bs4 = True except ImportError: from BeautifulSoup import BeautifulSoup is_bs4 = False __all__ = [ # Main search function. 'search', # Shortcut for "get lucky" search. 'lucky', # Miscellaneous utility functions. 'get_random_user_agent', 'get_tbs', ] # Debug flag. DEBUG = False # URL templates to make Google searches. url_home = "https://www.google.%(tld)s/" url_search = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \ "q=%(query)s&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&" \ "cr=%(country)s&filter=0" url_next_page = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \ "q=%(query)s&start=%(start)d&tbs=%(tbs)s&safe=%(safe)s&" \ "cr=%(country)s&filter=0" url_search_num = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \ "q=%(query)s&num=%(num)d&btnG=Google+Search&tbs=%(tbs)s&" \ "&safe=%(safe)scr=%(country)s&filter=0" url_next_page_num = "https://www.google.%(tld)s/search?lr=lang_%(lang)s&" \ "q=%(query)s&num=%(num)d&start=%(start)d&tbs=%(tbs)s&" \ "safe=%(safe)s&cr=%(country)s&filter=0" url_parameters = ( 'hl', 'q', 'num', 'btnG', 'start', 'tbs', 'safe', 'cr', 'filter') # Cookie jar. Stored at the user's home folder. # If the cookie jar is inaccessible, the errors are ignored. home_folder = os.getenv('HOME') if not home_folder: home_folder = os.getenv('USERHOME') if not home_folder: home_folder = '.' # Use the current folder on error. cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie')) try: cookie_jar.load() except Exception: pass # Default user agent, unless instructed by the user to change it. USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)' # Load the list of valid user agents from the install folder. # The search order is: # * user_agents.txt.gz # * user_agents.txt # * default user agent try: install_folder = os.path.abspath(os.path.split(__file__)[0]) try: user_agents_file = os.path.join(install_folder, 'user_agents.txt.gz') import gzip fp = gzip.open(user_agents_file, 'rb') try: user_agents_list = [_.strip() for _ in fp.readlines()] finally: fp.close() del fp except Exception: user_agents_file = os.path.join(install_folder, 'user_agents.txt') with open(user_agents_file) as fp: user_agents_list = [_.strip() for _ in fp.readlines()] except Exception: user_agents_list = [USER_AGENT] # Get a random user agent. def get_random_user_agent(): """ Get a random user agent string. :rtype: str :return: Random user agent string. """ return random.choice(user_agents_list) # Helper function to format the tbs parameter. def get_tbs(from_date, to_date): """ Helper function to format the tbs parameter. :param datetime.date from_date: Python date object. :param datetime.date to_date: Python date object. :rtype: str :return: Dates encoded in tbs format. """ from_date = from_date.strftime('%m/%d/%Y') to_date = to_date.strftime('%m/%d/%Y') return 'cdr:1,cd_min:%(from_date)s,cd_max:%(to_date)s' % vars() # Request the given URL and return the response page, using the cookie jar. # If the cookie jar is inaccessible, the errors are ignored. def get_page(url, user_agent=None, verify_ssl=True): """ Request the given URL and return the response page, using the cookie jar. :param str url: URL to retrieve. :param str user_agent: User agent for the HTTP requests. Use None for the default. :param bool verify_ssl: Verify the SSL certificate to prevent traffic interception attacks. Defaults to True. :rtype: str :return: Web page retrieved for the given URL. :raises IOError: An exception is raised on error. :raises urllib2.URLError: An exception is raised on error. :raises urllib2.HTTPError: An exception is raised on error. """ if user_agent is None: user_agent = USER_AGENT request = Request(url) request.add_header('User-Agent', user_agent) cookie_jar.add_cookie_header(request) if verify_ssl: response = urlopen(request) else: context = ssl._create_unverified_context() response = urlopen(request, context=context) cookie_jar.extract_cookies(response, request) html = response.read() response.close() try: cookie_jar.save() except Exception: pass if DEBUG: print('-' * 79) print(html) print('-' * 79) return html # Filter links found in the Google result pages HTML code. # Returns None if the link doesn't yield a valid result. def filter_result(link, include_google_links=False): try: # Decode hidden URLs. if link.startswith('/url?'): o = urlparse(link, 'http') link = parse_qs(o.query).get('q')[0] o = urlparse(link, 'http') # Check if the link is an absolute URL. if not o.netloc: return None # If excluding Google links, return None if 'google' is in the domain. if not include_google_links and 'google' in o.netloc: return None return link except Exception: pass # Returns a generator that yields URLs. def search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, stop=None, pause=2.0, country='', extra_params=None, user_agent=None, verify_ssl=True, include_google_links=False): """ Search the given query string using Google. :param str query: Query string. Must NOT be url-encoded. :param str tld: Top level domain. :param str lang: Language. :param str tbs: Time limits (i.e "qdr:h" => last hour, "qdr:d" => last 24 hours, "qdr:m" => last month). :param str safe: Safe search. :param int num: Number of results per page. :param int start: First result to retrieve. :param int stop: Last result to retrieve. Use None to keep searching forever. :param float pause: Lapse to wait between HTTP requests, measured in seconds. A lapse too long will make the search slow, but a lapse too short may cause Google to block your IP. Your mileage may vary! :param str country: Country or region to focus the search on. Similar to changing the TLD, but does not yield exactly the same results. Only Google knows why... :param dict extra_params: A dictionary of extra HTTP GET parameters, which must be URL encoded. For example if you don't want Google to filter similar results you can set the extra_params to {'filter': '0'} which will append '&filter=0' to every query. :param str user_agent: User agent for the HTTP requests. Use None for the default. :param bool verify_ssl: Verify the SSL certificate to prevent traffic interception attacks. Defaults to True. :param bool include_google_links: Includes links pointing to a Google domain. Defaults to False. :rtype: generator of str :return: Generator (iterator) that yields found URLs. If the stop parameter is None the iterator will loop forever. """ # Set of hashes for the results found. # This is used to avoid repeated results. hashes = set() # Count the number of links yielded. count = 0 # Prepare the search string. query = quote_plus(query) # If no extra_params is given, create an empty dictionary. # We should avoid using an empty dictionary as a default value # in a function parameter in Python. if not extra_params: extra_params = {} # Check extra_params for overlapping. for builtin_param in url_parameters: if builtin_param in extra_params.keys(): raise ValueError( 'GET parameter "%s" is overlapping with \ the built-in GET parameter', builtin_param ) # Grab the cookie from the home page. get_page(url_home % vars(), user_agent, verify_ssl) # Prepare the URL of the first request. if start: if num == 10: url = url_next_page % vars() else: url = url_next_page_num % vars() else: if num == 10: url = url_search % vars() else: url = url_search_num % vars() # Loop until we reach the maximum result, if any (otherwise, loop forever). while not stop or count < stop: # Remember last count to detect the end of results. last_count = count # Append extra GET parameters to the URL. # This is done on every iteration because we're # rebuilding the entire URL at the end of this loop. for k, v in extra_params.items(): k = quote_plus(k) v = quote_plus(v) url = url + ('&%s=%s' % (k, v)) # Sleep between requests. # Keeps Google from banning you for making too many requests. time.sleep(pause) # Request the Google Search results page. html = get_page(url, user_agent, verify_ssl) # Parse the response and get every anchored URL. if is_bs4: soup = BeautifulSoup(html, 'html.parser') else: soup = BeautifulSoup(html) try: anchors = soup.find(id='search').findAll('a') # Sometimes (depending on the User-agent) there is # no id "search" in html response... except AttributeError: # Remove links of the top bar. gbar = soup.find(id='gbar') if gbar: gbar.clear() anchors = soup.findAll('a') # Process every anchored URL. for a in anchors: # Get the URL from the anchor tag. try: link = a['href'] except KeyError: continue # Filter invalid links and links pointing to Google itself. link = filter_result(link, include_google_links) if not link: continue # Discard repeated results. h = hash(link) if h in hashes: continue hashes.add(h) # Yield the result. yield link # Increase the results counter. # If we reached the limit, stop. count += 1 if stop and count >= stop: return # End if there are no more results. # XXX TODO review this logic, not sure if this is still true! if last_count == count: break # Prepare the URL for the next request. start += num if num == 10: url = url_next_page % vars() else: url = url_next_page_num % vars() # Shortcut to single-item search. # Evaluates the iterator to return the single URL as a string. def lucky(*args, **kwargs): """ Shortcut to single-item search. Same arguments as the main search function, but the return value changes. :rtype: str :return: URL found by Google. """ return next(search(*args, **kwargs)) ================================================ FILE: requirements.txt ================================================ beautifulsoup4>=4.0 ================================================ FILE: scripts/google ================================================ #!/usr/bin/env python # Copyright (c) 2009-2020, Mario Vilas # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice,this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of the copyright holder nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. import sys from googlesearch import search, get_random_user_agent # TODO port to argparse from optparse import OptionParser, IndentedHelpFormatter class BannerHelpFormatter(IndentedHelpFormatter): "Just a small tweak to optparse to be able to print a banner." def __init__(self, banner, *argv, **argd): self.banner = banner IndentedHelpFormatter.__init__(self, *argv, **argd) def format_usage(self, usage): msg = IndentedHelpFormatter.format_usage(self, usage) return '%s\n%s' % (self.banner, msg) def main(): # Parse the command line arguments. formatter = BannerHelpFormatter( "Python script to use the Google search engine\n" "By Mario Vilas (mvilas at gmail dot com)\n" "https://github.com/MarioVilas/googlesearch\n" ) parser = OptionParser(formatter=formatter) parser.set_usage("%prog [options] query") parser.add_option( '--tld', metavar='TLD', type='string', default='com', help="top level domain to use [default: com]") parser.add_option( '--lang', metavar='LANGUAGE', type='string', default='en', help="produce results in the given language [default: en]") parser.add_option( '--tbs', metavar='TBS', type='string', default='0', help="produce results from period [default: 0]") parser.add_option( '--safe', metavar='SAFE', type='string', default='off', help="kids safe search [default: off]") parser.add_option( '--country', metavar='COUNTRY', type='string', default='', help="region to restrict search on [default: not restricted]") parser.add_option( '--num', metavar='NUMBER', type='int', default=10, help="number of results per page [default: 10]") parser.add_option( '--start', metavar='NUMBER', type='int', default=0, help="first result to retrieve [default: 0]") parser.add_option( '--stop', metavar='NUMBER', type='int', default=0, help="last result to retrieve [default: unlimited]") parser.add_option( '--pause', metavar='SECONDS', type='float', default=2.0, help="pause between HTTP requests [default: 2.0]") parser.add_option( '--rua', action='store_true', default=False, help="Randomize the User-Agent [default: no]") parser.add_option( '--insecure', dest="verify_ssl", action='store_false', default=True, help="Randomize the User-Agent [default: no]") parser.add_option( '--include', dest="include_google_links", action='store_true', default=False, help="Include links pointing to Google [default: no]") (options, args) = parser.parse_args() query = ' '.join(args) if not query: parser.print_help() sys.exit(2) params = [ (k, v) for (k, v) in options.__dict__.items() if not k.startswith('_')] params = dict(params) # Randomize the user agent if requested. if 'rua' in params and params.pop('rua'): params['user_agent'] = get_random_user_agent() # Run the query. for url in search(query, **params): print(url) try: sys.stdout.flush() except Exception: pass if __name__ == '__main__': main() ================================================ FILE: setup.cfg ================================================ [bdist_wheel] universal = 1 ================================================ FILE: setup.py ================================================ #!/usr/bin/env python # Copyright (c) 2009-2024, Mario Vilas # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # # * Redistributions of source code must retain the above copyright notice, # this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice,this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of the copyright holder nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. from os import chdir from os.path import abspath, join, split # Make sure we are standing in the correct directory. # Old versions of distutils didn't take care of this. here = split(abspath(__file__))[0] chdir(here) # Package metadata. metadata = dict( name='google', provides=['googlesearch'], requires=['beautifulsoup4'], packages=['googlesearch'], scripts=[join('scripts', 'google')], package_data={'googlesearch': ['user_agents.txt.gz']}, include_package_data=True, version="3.0.0", description="Unofficial Python bindings to the Google search engine. Not affiliated with Google.", author="Mario Vilas", author_email="mvilas@gmail.com", url="https://github.com/MarioVilas/googlesearch", classifiers=[ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Environment :: Console", "Programming Language :: Python", "Topic :: Software Development :: Libraries :: Python Modules", ], ) # Prefer setuptools over the old distutils. # If setuptools is available, use install_requires. try: from setuptools import setup metadata['install_requires'] = metadata['requires'] except ImportError: from distutils.core import setup # Run the setup script. setup(**metadata)