[
  {
    "path": ".dockerignore",
    "content": ".git\n.gitignore\nLICENSE\nDockerfile\n"
  },
  {
    "path": ".gitignore",
    "content": ".DS_Store\narchivenow.egg-info/\nbuild/\ndist/\n__pycache__\n"
  },
  {
    "path": "Dockerfile",
    "content": "ARG PYTAG=latest\nFROM python:${PYTAG}\nLABEL maintainer \"Mohamed Aturban <mohsci1@yahoo.com>\"\n\nWORKDIR /app\nCOPY requirements.txt ./\nRUN pip install --no-cache-dir -r requirements.txt\nCOPY . ./\nRUN chmod a+x ./archivenow/archivenow.py\n\nENTRYPOINT [\"./archivenow/archivenow.py\"]\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2017 ODU Web Science / Digital Libraries Research Group\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.rst",
    "content": "Archive Now (archivenow)\n=============================\nA Tool To Push Web Resources Into Web Archives\n----------------------------------------------\n\nArchive Now (**archivenow**) currently is configured to push resources into four public web archives. You can easily add more archives by writing a new archive handler (e.g., myarchive_handler.py) and place it inside the folder \"handlers\". \n\nUpdate January 2021\n~~~~~~~~~\nOriginally, **archivenow** was configured to push to 6 different public web archives. The two removed web archives are `WebCite <https://www.webcitation.org/>`_ and `archive.st <http://archive.st/>`_. WebCite was removed from **archivenow** as they are no longer accepting archiving requests. Archive.st was removed from **archivenow** due to encountering a Captcha when attempting to push to the archive. In addition to removing those 2 archives, the method for pushing to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_ from **archivenow** has been updated. In order to push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_, `Selenium <https://selenium-python.readthedocs.io/>`_ is used.\n\nAs explained below, this library can be used through:\n\n- Command Line Interface (CLI)\n\n- A Web Service\n\n- A Docker Container\n\n- Python\n\n\nInstalling\n----------\nThe latest release of **archivenow** can be installed using pip:\n\n.. code-block:: bash\n\n      $ pip install archivenow\n\nThe latest development version containing changes not yet released can be installed from source:\n\n.. code-block:: bash\n      \n      $ git clone git@github.com:oduwsdl/archivenow.git\n      $ cd archivenow\n      $ pip install -r requirements.txt\n      $ pip install ./\n      \nIn order to push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_, **archivenow** must use `Selenium <https://selenium-python.readthedocs.io/>`_, which has already been added to the requirements.txt. However, Selenium additionally needs a driver to interface with the chosen browser. It is recommended to use Selenium and **archivenow** with `Firefox <https://www.mozilla.org/en-US/firefox/releases/>`_ and Firefox's corresponding `GeckoDriver <https://github.com/mozilla/geckodriver/releases>`_.\n\nYou can download the latest versions of `Firefox <https://www.mozilla.org/en-US/firefox/releases/>`_ and the `GeckoDriver <https://github.com/mozilla/geckodriver/releases>`_ to use with **archivenow**.\n\nAfter installing the driver, you can push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_ from **archivenow**.\n\nCLI USAGE \n---------\nUsage of sub-commands in **archivenow** can be accessed through providing the `-h` or `--help` flag, like any of the below.\n\n.. code-block:: bash\n\n      $ archivenow -h\n      usage: archivenow.py [-h] [--mg] [--cc] [--cc_api_key [CC_API_KEY]]\n                           [--is] [--ia] [--warc [WARC]] [-v] [--all]\n                           [--server] [--host [HOST]] [--agent [AGENT]]\n                           [--port [PORT]]\n                           [URI]\n\n      positional arguments:\n        URI                   URI of a web resource\n\n      optional arguments:\n        -h, --help            show this help message and exit\n        --mg                  Use Megalodon.jp\n        --cc                  Use The Perma.cc Archive\n        --cc_api_key [CC_API_KEY]\n                              An API KEY is required by The Perma.cc Archive\n        --is                  Use The Archive.is\n        --ia                  Use The Internet Archive\n        --warc [WARC]         Generate WARC file\n        -v, --version         Report the version of archivenow\n        --all                 Use all possible archives\n        --server              Run archiveNow as a Web Service\n        --host [HOST]         A server address\n        --agent [AGENT]       Use \"wget\" or \"squidwarc\" for WARC generation\n        --port [PORT]         A port number to run a Web Service\n\nExamples\n--------\n\n\nExample 1\n~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in the Internet Archive:\n\n.. code-block:: bash\n\n      $ archivenow --ia www.foxnews.com\n      https://web.archive.org/web/20170209135625/http://www.foxnews.com\n\nExample 2\n~~~~~~~~~\n\nBy default, the web page (e.g., www.foxnews.com) will be saved in the Internet Archive if no optional arguments are provided:\n\n.. code-block:: bash\n\n      $ archivenow www.foxnews.com\n      https://web.archive.org/web/20170215164835/http://www.foxnews.com\n\nExample 3\n~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in the Internet Archive (archive.org) and Archive.is:\n\n.. code-block:: bash\n      \n      $ archivenow --ia --is www.foxnews.com\n      https://web.archive.org/web/20170209140345/http://www.foxnews.com\n      http://archive.is/fPVyc\n\n\nExample 4\n~~~~~~~~~\n\nTo save the web page (https://nypost.com/) in all configured web archives. In addition to preserving the page in all configured archives, this command will also locally create a WARC file:\n\n.. code-block:: bash\n      \n      $ archivenow --all https://nypost.com/ --cc_api_key $Your-Perma-CC-API-Key\n      http://archive.is/dcnan\n      https://perma.cc/53CC-5ST8\n      https://web.archive.org/web/20181002081445/https://nypost.com/\n      https://megalodon.jp/2018-1002-1714-24/https://nypost.com:443/\n      https_nypost.com__96ec2300.warc\n\nExample 5\n~~~~~~~~~\n\nTo download the web page (https://nypost.com/) and create a WARC file:\n\n.. code-block:: bash\n      \n      $ archivenow --warc=mypage --agent=wget https://nypost.com/\n      mypage.warc\n      \nServer\n------\n\nYou can run **archivenow** as a web service. You can specify the server address and/or the port number (e.g., --host localhost  --port 12345)\n\n.. code-block:: bash\n      \n      $ archivenow --server\n      \n      Running on http://0.0.0.0:12345/ (Press CTRL+C to quit)\n\n\nExample 6\n~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in The Internet Archive through the web service:\n\n.. code-block:: bash\n\n      $ curl -i http://0.0.0.0:12345/ia/www.foxnews.com\n      \n          HTTP/1.0 200 OK\n          Content-Type: application/json\n          Content-Length: 95\n          Server: Werkzeug/0.11.15 Python/2.7.10\n          Date: Tue, 02 Oct 2018 08:20:18 GMT\n\n          {\n            \"results\": [\n              \"https://web.archive.org/web/20181002082007/http://www.foxnews.com\"\n            ]\n          }\n      \nExample 7\n~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in all configured archives though the web service:\n\n.. code-block:: bash\n      \n      $ curl -i http://0.0.0.0:12345/all/www.foxnews.com\n\n          HTTP/1.0 200 OK\n          Content-Type: application/json\n          Content-Length: 385\n          Server: Werkzeug/0.11.15 Python/2.7.10\n          Date: Tue, 02 Oct 2018 08:23:53 GMT\n\n          {\n            \"results\": [\n              \"Error (The Perma.cc Archive): An API Key is required \", \n              \"http://archive.is/ukads\", \n              \"https://web.archive.org/web/20181002082007/http://www.foxnews.com\", \n              \"Error (Megalodon.jp): We can not obtain this page because the time limit has been reached or for technical ... \", \n              \"http://www.webcitation.org/72rbKsX8B\"\n            ]\n          }\n\nExample 8\n~~~~~~~~~\n\nBecause an API Key is required by Perma.cc, the HTTP request should be as follows:\n        \n.. code-block:: bash\n      \n      $ curl -i http://127.0.0.1:12345/all/https://nypost.com/?cc_api_key=$Your-Perma-CC-API-Key\n\nOr use only Perma.cc:\n\n.. code-block:: bash\n\n      $ curl -i http://127.0.0.1:12345/cc/https://nypost.com/?cc_api_key=$Your-Perma-CC-API-Key\n\nRunning as a Docker Container\n-----------------------------\n\n.. code-block:: bash\n\n    $ docker image pull oduwsdl/archivenow\n\nDifferent ways to run archivenow    \n\n.. code-block:: bash\n\n    $ docker container run -it --rm oduwsdl/archivenow -h\n\nAccessible at 127.0.0.1:12345:\n\n.. code-block:: bash\n\n    $ docker container run -p 12345:12345 -it --rm oduwsdl/archivenow --server --host 0.0.0.0\n\nAccessible at 127.0.0.1:22222:\n\n.. code-block:: bash\n\n    $ docker container run -p 22222:11111 -it --rm oduwsdl/archivenow --server --port 11111 --host 0.0.0.0\n\n.. image:: http://www.cs.odu.edu/~maturban/archivenow-6-archives.gif\n   :width: 10pt\n\n\nTo save the web page (http://www.cnn.com) in The Internet Archive\n\n.. code-block:: bash\n\n    $ docker container run -it --rm oduwsdl/archivenow --ia http://www.cnn.com\n    \n\nPython Usage\n------------\n\n.. code-block:: bash\n   \n    >>> from archivenow import archivenow\n\nExample 9\n~~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in all configured archives:\n\n.. code-block:: bash\n\n      >>> archivenow.push(\"www.foxnews.com\",\"all\")\n      ['https://web.archive.org/web/20170209145930/http://www.foxnews.com','http://archive.is/oAjuM','http://www.webcitation.org/6o9LcQoVV','Error (The Perma.cc Archive): An API KEY is required]\n\nExample 10\n~~~~~~~~~~\n\nTo save the web page (www.foxnews.com) in The Perma.cc:\n\n.. code-block:: bash\n\n      >>> archivenow.push(\"www.foxnews.com\",\"cc\",{\"cc_api_key\":\"$YOUR-Perma-cc-API-KEY\"})\n      ['https://perma.cc/8YYC-C7RM']\n      \nExample 11\n~~~~~~~~~~\n\nTo start the server from Python do the following. The server/port number can be passed (e.g., start(port=1111, host='localhost')):\n\n.. code-block:: bash\n\n      >>> archivenow.start()\n      \n          2017-02-09 15:02:37\n          Running on http://127.0.0.1:12345\n          (Press CTRL+C to quit)\n\n\nConfiguring a new archive or removing existing one\n--------------------------------------------------\nAdditional archives may be added by creating a handler file in the \"handlers\" directory.\n\nFor example, if I want to add a new archive named \"My Archive\", I would create a file \"ma_handler.py\" and store it in the folder \"handlers\". The \"ma\" will be the archive identifier, so to push a web page (e.g., www.cnn.com) to this archive through the Python code, I should write:\n\n\n.. code-block:: python\n\n      archivenow.push(\"www.cnn.com\",\"ma\")\n      \n\nIn the file \"ma_handler.py\", the name of the class must be \"MA_handler\". This class must have at least one function called \"push\" which has one argument. See the existing `handler files`_ for examples on how to organized a newly configured archive handler.\n\nRemoving an archive can be done by one of the following options:\n\n- Removing the archive handler file from the folder \"handlers\"\n\n- Renaming the archive handler file to other name that does not end with \"_handler.py\"\n\n- Setting the variable \"enabled\" to \"False\" inside the handler file\n\n\nNotes\n-----\nThe Internet Archive (IA) sets a time gap of at least two minutes between creating different copies of the \"same\" resource. \n\nFor example, if you send a request to IA to capture (www.cnn.com) at 10:00pm, IA will create a new copy (*C*) of this URI. IA will then return *C* for all requests to the archive for this URI received until 10:02pm. Using this same submission procedure for Archive.is requires a time gap of five minutes.  \n\n.. _handler files: https://github.com/oduwsdl/archivenow/tree/master/archivenow/handlers\n\n\nCiting Project\n--------------\n\n.. code-block:: latex\n\n      @INPROCEEDINGS{archivenow-jcdl2018,\n        AUTHOR    = {Mohamed Aturban and\n                     Mat Kelly and\n                     Sawood Alam and\n                     John A. Berlin and\n                     Michael L. Nelson and\n                     Michele C. Weigle},\n        TITLE     = {{ArchiveNow}: Simplified, Extensible, Multi-Archive Preservation},\n        BOOKTITLE = {Proceedings of the 18th {ACM/IEEE-CS} Joint Conference on Digital Libraries},\n        SERIES    = {{JCDL} '18},\n        PAGES     = {321--322},\n        MONTH     = {June},\n        YEAR      = {2018},\n        ADDRESS   = {Fort Worth, Texas, USA},\n        URL       = {https://doi.org/10.1145/3197026.3203880},\n        DOI       = {10.1145/3197026.3203880}\n      }\n"
  },
  {
    "path": "archivenow/__init__.py",
    "content": "__version__ = '2020.7.18.12.19.44'"
  },
  {
    "path": "archivenow/archivenow.py",
    "content": "#!/usr/bin/env python\nimport os\nimport re\nimport sys\nimport uuid\nimport glob\nimport json\nimport importlib\nimport argparse\nimport string\nimport requests\nfrom threading import Thread\nfrom flask import request, Flask, jsonify, render_template\nfrom pathlib import Path\n\n#from __init__ import __version__ as archiveNowVersion\n\narchiveNowVersion = '2020.7.18.12.19.44'\n\n# archive handlers path\nPATH = Path(os.path.dirname(os.path.abspath(__file__)))\nPATH_HANDLER = PATH / 'handlers'\n\n# for the web app\napp = Flask(__name__)\n\n# create handlers for enabled archives\nglobal handlers\nhandlers = {}\n\n# defult value for server/port\nSERVER_IP = '0.0.0.0'\nSERVER_PORT = 12345\n\n\ndef bad_request(error=None):\n    message = {\n        'status': 400,\n        'message': 'Error in processing the request',\n    }\n    resp = jsonify(message)\n    resp.status_code = 400\n    return resp\n\n\n# def getServer_IP_PORT():\n#     u = str(SERVER_IP)\n#     if str(SERVER_PORT) != '80':\n#         u = u + \":\" + str(SERVER_PORT)\n#     if 'http' != u[0:4]:\n#         u = 'http://' + u\n#     return u\n\n\ndef listArchives_server(handlers):\n    uri_args = ''\n    if 'cc' in handlers:\n        if handlers['cc'].enabled and handlers['cc'].api_required:\n            uri_args = '?cc_api_key={Your-Perma.cc-API-Key}'\n    li = {\"archives\": [{  # getServer_IP_PORT() + \n        \"id\": \"all\", \"GET\":'/all/' + '{URI}'+uri_args,\n        \"archive-name\": \"All enabled archives\"}]}\n    for handler in handlers:\n        if handlers[handler].enabled:\n            uri_args2 = ''\n            if handler == 'cc':\n                uri_args2 = uri_args\n            li[\"archives\"].append({ #getServer_IP_PORT() +\n                \"id\": handler, \"archive-name\": handlers[handler].name,\n                \"GET\":  '/' + handler + '/' + '{URI}'+uri_args2})\n    return li\n\n\n@app.route('/', defaults={'path': ''}, methods=['GET'])\n@app.route('/<path:path>', methods=['GET'])\ndef pushit(path):\n    # no path; return a list of avaliable archives\n    if path == '':\n        #resp = jsonify(listArchives_server(handlers))\n        #resp.status_code = 200\n        return render_template('index.html')\n        #return resp\n    # get request with path\n    elif (path == 'api'):\n        resp = jsonify(listArchives_server(handlers))\n        resp.status_code = 200\n        return resp\n    elif (path == \"ajax-loader.gif\"):\n        return render_template('ajax-loader.gif')\n    else:\n        try:\n            # get the args passed to push function like API KEY if provided\n            PUSH_ARGS = {}\n            for k in request.args.keys():\n                PUSH_ARGS[k] = request.args[k]\n\n            s = str(path).split('/', 1)\n            arc_id = s[0]\n            URI = request.url.split('/', 4)[4] # include query params, too\n\n            if 'herokuapp.com' in request.host:\n                PUSH_ARGS['from_heroku'] = True\n\n            # To push into archives\n            resp = {\"results\": push(URI, arc_id, PUSH_ARGS)}\n            if len(resp[\"results\"]) == 0:\n                return bad_request()\n            else:\n                # what to return\n                resp = jsonify(resp)\n                resp.status_code = 200\n\n                return resp\n        except Exception as e:\n            pass\n        return bad_request()\n\nres_uris = {}\n\n\ndef push_proxy(hdlr, URIproxy, p_args_proxy, res_uris_idx, session=requests.Session()):\n    global res_uris\n    try:\n        res = hdlr.push( URIproxy , p_args_proxy, session=session)\n        print ( res )\n        res_uris[res_uris_idx].append(res)\n    except:\n        pass\n\ndef push(URI, arc_id, p_args={}, session=requests.Session()):\n    global handlers\n    global res_uris\n    try:\n        # push to all possible archives\n        res_uris_idx = str(uuid.uuid4())\n        res_uris[res_uris_idx] = []\n        ### if arc_id == 'all':\n            ### for handler in handlers:\n                ### if (handlers[handler].api_required):\n                    # pass args like key API\n                    ### res.append(handlers[handler].push(str(URI), p_args))\n                ### else:\n                    ### res.append(handlers[handler].push(str(URI)))\n        ### else:\n            # push to the chosen archives\n\n        threads = []\n\n        for handler in handlers:\n            if (arc_id == handler) or (arc_id == 'all'):\n            ### if (arc_id == handler): ### and (handlers[handler].api_required):\n                #res.append(handlers[handler].push(str(URI), p_args))\n                #push_proxy( handlers[handler], str(URI), p_args, res_uris_idx)\n                threads.append(\n                    Thread(\n                        target=push_proxy, \n                        args=(handlers[handler], str(URI), p_args, res_uris_idx, ), \n                        kwargs={'session': session}))\n                ### elif (arc_id == handler):\n                    ### res.append(handlers[handler].push(str(URI)))\n\n        for th in threads:\n            th.start()\n        for th in threads:\n            th.join()\n\n        res = res_uris[res_uris_idx]\n        del res_uris[res_uris_idx]\n        return res\n    except:\n        del res_uris[res_uris_idx]\n        pass\n    return [\"bad request\"]\n\n\ndef start(port=SERVER_PORT, host=SERVER_IP):\n    global SERVER_PORT\n    global SERVER_IP\n    SERVER_PORT = port\n    SERVER_IP = host\n    app.run(\n        host=host,\n        port=port,\n        threaded=True,\n        debug=True,\n        use_reloader=False)\n\n\ndef load_handlers():\n    global handlers\n    handlers = {}\n    # add the path of the handlers to the system so they can be imported\n    sys.path.append(str(PATH_HANDLER))\n\n    # create a list of handlers.\n    for file in PATH_HANDLER.glob('*_handler.py'):\n        name = file.stem\n        prefix = name.replace('_handler', '')\n        mod = importlib.import_module(name)\n        mod_class = getattr(mod, prefix.upper() + '_handler')\n        # finally an object is created\n        handlers[prefix] = mod_class()\n    # exclude all disabled archives\n\n    for handler in list(handlers): # handlers.keys():\n        if not handlers[handler].enabled:\n            del handlers[handler]\n\n\ndef args_parser():\n    global SERVER_PORT\n    global SERVER_IP\n    # parsing arguments\n\n    class MyParser(argparse.ArgumentParser):\n\n        def error(self, message):\n            sys.stderr.write('error: %s\\n' % message)\n            self.print_help()\n            sys.exit(2)\n\n        def printm(self):\n            sys.stderr.write('')\n            self.print_help()\n            sys.exit(2)\n\n    parser = MyParser()\n\n    # arc_handler = 0\n    for handler in handlers:\n        # add archives identifiers to the list of options\n        # arc_handler += 1\n        if handler == 'warc':\n            parser.add_argument('--' + handler, nargs='?', \n                            help=handlers[handler].name)\n        else:\n            parser.add_argument('--' + handler, action='store_true', default=False,\n                            help='Use ' + handlers[handler].name)\n        if (handlers[handler].api_required):\n            parser.add_argument(\n                '--' +\n                handler +\n                '_api_key',\n                nargs='?',\n                help='An API KEY is required by ' +\n                handlers[handler].name)\n\n    parser.add_argument(\n        '-v',\n        '--version',\n        help='Report the version of archivenow',\n        action='version',\n        version='ArchiveNow ' +\n        archiveNowVersion)\n\n    if len(handlers) > 0:\n        parser.add_argument('--all', action='store_true', default=False,\n                            help='Use all possible archives ')\n\n        parser.add_argument('--server', action='store_true', default=False,\n                            help='Run archiveNow as a Web Service ')\n\n        parser.add_argument('URI', nargs='?', help='URI of a web resource')\n\n        parser.add_argument('--host', nargs='?', help='A server address')\n\n        if 'warc' in handlers.keys():\n            parser.add_argument('--agent', nargs='?', help='Use \"wget\" or \"squidwarc\" for WARC generation')\n\n        parser.add_argument(\n            '--port',\n            nargs='?',\n            help='A port number to run a Web Service')\n\n        args = parser.parse_args()\n    else:\n        print ('\\n Error: No enabled archive handler found\\n')\n        sys.exit(0)\n\n    arc_opt = 0\n    # start the server\n    if getattr(args, 'server'):\n        if getattr(args, 'port'):\n            SERVER_PORT = int(args.port)\n        if getattr(args, 'host'):\n            SERVER_IP = str(args.host)\n\n        start(port=SERVER_PORT, host=SERVER_IP)\n\n    else:\n        if not getattr(args, 'URI'):\n            print (parser.error('too few arguments'))\n        res = []\n\n        # get the args passed to push function like API KEY if provided\n        PUSH_ARGS = {}\n        for handler in handlers:\n            if (handlers[handler].api_required):\n                if getattr(args, handler + '_api_key'):\n                    PUSH_ARGS[\n                        handler +\n                        '_api_key'] = getattr(\n                        args,\n                        handler +\n                        '_api_key')\n                else:\n                    if getattr(args, handler):\n                        print (\n                            parser.error(\n                                'An API Key is required by ' +\n                                handlers[handler].name))\n            orginal_warc_value = getattr(args, 'warc')\n            if handler == 'warc':\n                PUSH_ARGS['warc'] = getattr(args, 'warc')\n                if PUSH_ARGS['warc'] == None:\n                    valid_chars = \"-_.()/ %s%s\" % (string.ascii_letters, string.digits)\n                    PUSH_ARGS['warc'] = ''.join(c for c in str(args.URI).strip() if c in valid_chars)\n                    PUSH_ARGS['warc'] = PUSH_ARGS['warc'].replace(' ','_').replace('/','_').replace('__','_') # I don't like spaces in filenames.\n                    PUSH_ARGS['warc'] = PUSH_ARGS['warc']+'_'+str(uuid.uuid4())[:8]\n                if PUSH_ARGS['warc'][-1] == '_':\n                    PUSH_ARGS['warc'] = PUSH_ARGS['warc'][:-1]\n                agent = 'wget'\n                tmp_agent = getattr(args, 'agent')\n                if tmp_agent == 'squidwarc':\n                    agent = tmp_agent\n                PUSH_ARGS['agent'] = agent\n\n        # sys.exit(0)\n\n        # push to all possible archives\n        if getattr(args, 'all'):\n            arc_opt = 1\n            res = push(str(args.URI).strip(), 'all', PUSH_ARGS)\n        else:\n            # push to the chosen archives\n            for handler in handlers:\n                if getattr(args, handler):\n                    arc_opt += 1\n                    for i in push(str(args.URI).strip(), handler, PUSH_ARGS):\n                        res.append(i)\n            # push to the defult archive\n            if (len(handlers) > 0) and (arc_opt == 0):\n                # set the default; it ia by default or the first archive in the\n                # list if not found\n                if 'ia' in handlers:\n                    res = push(str(args.URI).strip(), 'ia', PUSH_ARGS)\n                else:\n                    res = push(str(args.URI).strip(),\n                               handlers.keys()[0], PUSH_ARGS)\n                # print (parser.printm())\n            # else:\n        # for rs in res:\n        #     print (rs)\n\nload_handlers()\n\nif __name__ == '__main__':\n    args_parser()\n"
  },
  {
    "path": "archivenow/handlers/cc_handler.py",
    "content": "import requests\nimport json\n\nclass CC_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'The Perma.cc Archive'\n        self.api_required = True\n\n    def push(self, uri_org, p_args=[], session=requests.Session()):\n        msg = ''\n        try:\n\n            APIKEY = p_args['cc_api_key']\n\n            r = session.post('https://api.perma.cc/v1/archives/?api_key='+APIKEY, timeout=120,\n                                                           data=json.dumps({\"url\":uri_org}),\n                                                           headers={'Content-type': 'application/json'},\n                                                           allow_redirects=True)       \n            r.raise_for_status()\n\n            if 'Location' in r.headers:\n                return 'https://perma.cc/'+r.headers['Location'].rsplit('/',1)[1]\n            else:\n                for r2 in r.history:\n                    if 'Location' in r2.headers:\n                        return 'https://perma.cc/'+r2.headers['Location'].rsplit('/',1)[1]\n            entity_json = r.json()\n            if 'guid' in entity_json:\n                return str('https://perma.cc/'+entity_json['guid'])\n            msg = \"Error (\"+self.name+ \"): No HTTP Location header is returned in the response\" \n        except Exception as e:\n            if (msg == '') and ('_api_key' in str(e)):\n                msg = \"Error (\" + self.name+ \"): \" + 'An API Key is required '\n            elif (msg == ''):\n                msg = \"Error (\" + self.name+ \"): \" + str(e)\n            pass;\n        return msg\n"
  },
  {
    "path": "archivenow/handlers/ia_handler.py",
    "content": "import requests\n\nclass IA_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'The Internet Archive'\n        self.api_required = False\n\n    def push(self, uri_org, p_args=[], session=requests.Session()):\n        msg = ''\n        try:\n            uri = 'https://web.archive.org/save/' + uri_org\n            archiveTodayUserAgent = {\n                \"User-Agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36\"\n            }\n            # push into the archive\n            # r = session.get(uri, timeout=120, allow_redirects=True, headers=archiveTodayUserAgent)\n\n            if ('user-agent' in session.headers) and (not session.headers['User-Agent'].lower().startswith('python-requests/')):\n                r = session.get(uri, timeout=120, allow_redirects=True)\n            else:\n                r = session.get(uri, timeout=120, allow_redirects=True, headers=archiveTodayUserAgent)\n\n            r.raise_for_status()\n            # extract the link to the archived copy \n            if (r != None):\n                if \"Location\" in r.headers:\n                    return r.headers[\"Location\"]\n                elif \"Content-Location\" in r.headers:\n                    if (r.headers[\"Content-Location\"]).startswith(\"/web/\"):\n                        return \"https://web.archive.org\"+r.headers[\"Content-Location\"]\n                    else:\n                        try:\n                            uri_from_content = \"https://web.archive.org\" + r.text.split('var redirUrl = \"',1)[1].split('\"',1)[0]\n                        except:\n                            uri_from_content = r.headers[\"Content-Location\"]\n                            #pass;\n                        return uri_from_content\n                else:\n                    for r2 in r.history:\n                        if 'Location' in r2.headers:\n                            return r.url\n                            #return r2.headers['Location']\n                        if 'Content-Location' in r2.headers:\n                            return r.url\n                            #return r2.headers['Content-Location']\n            msg = \"(\"+self.name+ \"): No HTTP Location/Content-Location header is returned in the response\"               \n        except Exception as e:\n            if msg == '':\n                msg = \"Error (\" + self.name+ \"): \" + str(e)\n            pass\n        return msg\n"
  },
  {
    "path": "archivenow/handlers/is_handler.py",
    "content": "import os\nimport requests\nimport sys\nfrom selenium.webdriver.firefox.options import Options\nfrom selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\nfrom selenium.webdriver.common.by import By\nfrom selenium.common.exceptions import TimeoutException\n\n\nclass IS_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'The Archive.is'\n        self.api_required = False \n\n    def push(self, uri_org, p_args=[], session=requests.Session()):\n\n        msg = \"\"\n\n        try:\n\n            options = Options()\n            options.headless = True # Run in background\n            driver = webdriver.Firefox(options = options)\n            driver.get(\"https://archive.is\")\n\n            elem = driver.find_element_by_id(\"url\") # Find the form to place a URL to be archived\n\n            elem.send_keys(uri_org) # Place the URL in the input box\n\n            saveButton = driver.find_element_by_xpath(\"/html/body/center/div/form[1]/div[3]/input\") # Find the submit button\n\n            saveButton.click() # Click the submit button\n\n            # After clicking submit, there may be an additional page that pops up and asks if you are sure you want\n            # to archive that page since it was archived X amount of time ago. We need to wait for that page to \n            # load and click submit again.\n            delay = 30 # seconds\n            try:\n                nextSaveButton = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, \"/html/body/center/div[4]/center/div/div[2]/div/form/div/input\")))\n                nextSaveButton.click()\n\n            except TimeoutException:\n                pass\n\n            # The page takes a while to archive, so keep checking if the loading page is still displayed.\n            loading = True\n            while loading:\n                \n                if not 'wip' in driver.current_url and not 'submit' in driver.current_url:\n                    loading = False\n\n            # After the loading screen is gone and the page is archived, the current URL\n            # will be the URL to the archived page.\n            msg = driver.current_url;\n\n            driver.quit()\n\n        except:\n\n            '''\n            exc_type, exc_obj, exc_tb = sys.exc_info()\n            fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]\n            print((fname, exc_tb.tb_lineno, sys.exc_info() ))\n            '''\n\n            msg = \"Unable to complete request.\"\n\n        return msg\n"
  },
  {
    "path": "archivenow/handlers/mg_handler.py",
    "content": "# encoding: utf-8\nimport os\nimport requests\nimport sys\nfrom selenium.webdriver.firefox.options import Options\nfrom selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\nfrom selenium.webdriver.common.by import By\nfrom selenium.common.exceptions import TimeoutException\n\nclass MG_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'Megalodon.jp'\n        self.api_required = False\n\n    def push(self, uri_org, p_args=[], session=requests.Session()):\n\n        msg = \"\"\n\n        options = Options()\n        options.headless = True # Run in background\n        driver = webdriver.Firefox(options = options)\n        driver.get(\"https://megalodon.jp/?url=\" + uri_org)\n\n        try:\n            addButton = driver.find_element_by_xpath(\"/html/body/div[2]/div[2]/div[8]/form/div[1]/input[2]\")\n\n            addButton.click() # Click the add button\n        except :\n            print(\"Unable to archive this page at this time.\")\n            raise\n\n\n        stillOnPage = True\n        while stillOnPage:\n            try:\n                button = driver.find_element_by_xpath(\"/html/body/div[2]/div[2]/div[1]/div/h3\")\n\n            except:\n                stillOnPage = False\n\n            try:\n                error = driver.find_element_by_xpath(\"/html/body/div[2]/div[2]/div[3]/div/a/h3\")\n                msg = \"We apologize for the inconvenience. Currently, acquisitions that are considered \\\"robots\\\" in the acquisition of certain conditions are prohibited.\"\n                raise\n                sys.exit()\n\n            except:\n                pass\n\n        # The page takes a while to archive, so keep checking if the loading page is still displayed.\n        loading = True\n        while loading:\n            try:\n                loadingPage = driver.find_element_by_xpath(\"/html/body/div[2]/div/div[1]/a/img\")\n                loading = False\n\n            except:\n                loading = True\n\n        # After the loading screen is gone and the page is archived, the current URL\n        # will be the URL to the archived page.\n        if msg == \"\":\n            print(driver.current_url)\n\n        return msg\n        "
  },
  {
    "path": "archivenow/handlers/warc_handler.py",
    "content": "import requests\nimport os.path\nimport distutils.spawn\n\nclass WARC_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'Generate WARC file'\n        self.api_required = False\n\n    def push(self, uri_org, p_args=[], session=requests.Session()):\n        msg = ''\n        if p_args['agent'] == 'squidwarc':\n            # squidwarc\n            #if not distutils.spawn.find_executable(\"squidwarc\"):\n            #    return 'wget is not installed!'\n            os.system('python ~/squidwarc_one_page/generte_warcs.py 9222 \"'+uri_org+'\" '+p_args['warc']+'.warc  &> /dev/null')\n            if os.path.exists(p_args['warc']):\n                return p_args['warc']\n            elif os.path.exists(p_args['warc']+'.warc'):\n                return p_args['warc']+'.warc'\n            else:\n                return 'squidwarc failed to generate the WARC file'\n\n        else:\n            if not distutils.spawn.find_executable(\"wget\"):\n                return 'wget is not installed!'\n            # wget \n            os.system('wget -E -H -k -p -q --delete-after --no-warc-compression --warc-file=\"'+p_args['warc']+'\" \"'+uri_org+'\"')\n            if os.path.exists(p_args['warc']):\n                return p_args['warc']\n            elif os.path.exists(p_args['warc']+'.warc'):\n                return p_args['warc']+'.warc'\n            else:\n                return 'wget failed to generate the WARC file'\n"
  },
  {
    "path": "archivenow/templates/api.txt",
    "content": "<!--   <h4 id=\"archivenow_api\">Archive Now API</h4>\n  <h5 id=\"archivenow_api1\">To push a web page into particular web archive, use the following URL:</h5>\n  <pre>\n    http://{server}:{port}/{archive-id}/{URI}\n  </pre>\n  <h5 id=\"archivenow_api2\">Archive identifier (use \"all\" for all archives):</h5>\n<table style=\"width:30%\">\n  <tr>\n    <th>Archive</th>\n    <th>Identifier</th>\n  </tr>\n  <tr>\n    <td>Internet Archive</td>\n    <td>ia</td>\n  </tr>\n  <tr>\n    <td>Archive.is</td>\n    <td>is</td>\n  </tr>\n  <tr>\n    <td>Perma.cc</td>\n    <td>cc</td>\n  </tr>\n</table> -->\n<!--   <h5 id=\"archivenow_api3\">Example, capture http://www.example.com by Internet Archive: </h5>\n<pre>\ncurl  -i http://127.0.0.1:12345/ia/http://www.example.com\n\nHTTP/1.0 200 OK\nContent-Type: application/json\nContent-Length: 95\nServer: Werkzeug/0.11.15 Python/2.7.10\nDate: Fri, 10 Nov 2017 22:36:26 GMT\n\n{\n  \"results\": [\n    \"https://web.archive.org/web/20171110223626/http://www.example.com\"\n  ]\n}\n</pre>\n  <h5 id=\"archivenow_api4\">Example, capture http://www.example.com by all four archive (An API KEY is required by Perma.cc): </h5>\n<pre>\ncurl -i 127.0.0.1:12345/all/http://www.example.com?cc_api_key=8r820...\n\nHTTP/1.0 200 OK\nContent-Type: application/json\nContent-Length: 207\nServer: Werkzeug/0.11.15 Python/2.7.10\nDate: Fri, 10 Nov 2017 22:42:08 GMT\n\n{\n  \"results\": [\n    \"https://perma.cc/QX65-CFDD\", \n    \"https://web.archive.org/web/20171110223626/http://www.example.com\", \n    \"http://archive.is/ff17A\", \n    \"http://www.webcitation.org/6uschXwlI\"\n  ]\n}\n</pre> -->"
  },
  {
    "path": "archivenow/templates/index.html",
    "content": "<html>\n<head>\n<style>\n\n.reveal-if-active {\n  opacity: 0;\n  max-height: 0;\n  overflow: hidden;\n  font-size: 14px;\n  -webkit-transform: scale(0.8);\n          transform: scale(0.8);\n  -webkit-transition: 0.5s;\n  transition: 0.5s;\n}\n.reveal-if-active label {\n  margin: 0 0 3px 22px;\n  display: block;\n  font-size: smaller;\n}\n.reveal-if-active input[type=text] {\n  width: 300px;\n}\ninput[id=\"choice-archive4\"]:checked ~ .reveal-if-active {\n  opacity: 1;\n  max-height: 120px;\n  padding: 0px 0px;\n  -webkit-transform: scale(1);\n          transform: scale(1);\n  overflow: visible;\n}\n\ntable {\n    margin: 14px auto;\n    opacity: 0;\n}\n\ntable, th, td {\n    border-collapse: collapse;\n}\nth, td {\n    padding: 1px;\n    text-align: left;\n    font-family: \"My Custom Font\", Verdana, Tahoma;\n    font-size: 12px;\n}\n\ntr{\n    border-bottom: 1px solid #ccc;\n    border-top: 1px solid #ccc;\n}\n#title {\n  display: block;\n  text-align: center;\n  padding: 22px 0 0 0\n}\n\n.url{\n  display: block;\n  text-align: center;\n}\n\n#text_url{\n  width:333px;\n  font-size: 12.5px;\n}\n\n#select_label{\n    padding: 0px 270px 0 0;\n    text-align: center;\n    margin-bottom: 8px;\n}\n\n#choices{\n      text-align: center;\n      padding: 0px 20px 0px 0px;\n      margin-left: 133px;\n}\n\n#choices2{\n      text-align: left;\n      display: inline-block;\n\n}\n\n#perma_cc_api{\n    margin: -2px 93px 0px 21px;\n}\n\n#submitdiv{\n    text-align: center;\n    padding: 20px 242px 0 0;\n    margin: 0 0 0 38px;\n}\n\ninput[type=submit] {\n    width: 5em;\n    height: 2em;\n    font-size: 12px;\n    background-color: gainsboro;\n    margin: 0px 0px 0px 13px;\n}\n\n#errors{\n    font-size: smaller;\n    color: brown;\n    padding: 6px 0px 3px 104px;\n}\n\n.img1{\n\n  width: 13px;\n  opacity: 0;\n}\n\n.img2{\n\n  width: 13px;\n  opacity: 0;\n}\n\n.img3{\n\n  width: 13px;\n  opacity: 0;\n}\n\n.img5{\n\n  width: 13px;\n  opacity: 0;\n}\n\n.img6{\n\n  width: 13px;\n  opacity: 0;\n}\n\n.img4{\n\n  width: 13px;\n  opacity: 0;\n}\n\n#apilink{\n  font-size: smaller;\n  padding-top: 39px;\n}\n</style>\n</head>\n<body>\n      <script src=\"https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js\" type=\"text/javascript\"></script>\n      <h3 id=\"title\"> Preserve a web page in web archives </h3>\n      <div class=\"url\">\n        <label for=\"text_url\" id=\"label_url\">URL</label>\n        <input type=\"text\" id=\"text_url\" required>\n      </div>\n  \n  <div>\n    <p id=\"select_label\">Select archives:</p>\n    <div id=\"choices\">\n    <div id=\"choices2\">\n    <input type=\"checkbox\" id=\"choice-archive1\" checked > Internet Archive <img src={{ url_for('static', filename = \"ajax-loader.gif\") }} class=\"img1\" id=\"img1\"> <br>\n    <input type=\"checkbox\" id=\"choice-archive2\" checked > Archive.is       <img src={{ url_for('static', filename = \"ajax-loader.gif\") }} class=\"img2\" id=\"img2\"> <br>\n    <input type=\"checkbox\" id=\"choice-archive6\" checked > Megalodon.jp <img src={{ url_for('static', filename = \"ajax-loader.gif\") }} class=\"img6\" id=\"img6\"> <br>\n    <input type=\"checkbox\" id=\"choice-archive4\" > Perma.cc                 <img src={{ url_for('static', filename = \"ajax-loader.gif\") }} class=\"img4\" id=\"img4\">\n    <div class=\"reveal-if-active\">\n      <label for=\"perma_cc_api\">Permaa.cc requires <a href=\"https://perma.cc/settings/tools\" target=\"_blank\"> an API Key </a></label>\n      <input type=\"text\" id=\"perma_cc_api\">\n    </div>\n    </div>\n    </div>\n  </div>\n  <div id=\"submitdiv\">\n    <input type=\"submit\" value=\"Submit\" onClick=\"push_archive();\">\n    <input type=\"submit\" value=\"Reset\" onClick=\"reset();\">\n    <div id =\"errors\"></div>\n  </div>\n    <table id=\"results\" width=\"600\">\n        <thead>\n        <tr>\n            <th scope=\"col\" width=\"130\">Archive</th>\n            <th scope=\"col\" width=\"450\">Link to the archived page</th>\n        </tr>\n        </thead>\n    </table>\n<div id=\"apilink\"><a href=\"/api\" target=\"_blank\">Archive Now API</a></div>\n\n<script type=\"text/javascript\">\n\n  document.getElementById('perma_cc_api').value = localStorage.getItem(\"permaccapikey\");\n\n  if (localStorage.getItem(\"check_archive_1\") !== null){\n      if (localStorage.getItem(\"check_archive_1\") == 'true'){\n        document.getElementById('choice-archive1').checked = true\n      }else{\n        document.getElementById('choice-archive1').checked = false\n      }\n  }\n  if (localStorage.getItem(\"check_archive_2\") !== null){\n      if (localStorage.getItem(\"check_archive_2\") == 'true'){\n        document.getElementById('choice-archive2').checked = true\n      }else{\n        document.getElementById('choice-archive2').checked = false\n      }\n  }\n  if (localStorage.getItem(\"check_archive_3\") !== null){\n      if (localStorage.getItem(\"check_archive_3\") == 'true'){\n        document.getElementById('choice-archive3').checked = true\n      }else{\n        document.getElementById('choice-archive3').checked = false\n     }\n  }\n  if (localStorage.getItem(\"check_archive_5\") !== null){\n      if (localStorage.getItem(\"check_archive_5\") == 'true'){\n        document.getElementById('choice-archive5').checked = true\n      }else{\n        document.getElementById('choice-archive5').checked = false\n     }\n  }\n  if (localStorage.getItem(\"check_archive_6\") !== null){\n      if (localStorage.getItem(\"check_archive_6\") == 'true'){\n        document.getElementById('choice-archive6').checked = true\n      }else{\n        document.getElementById('choice-archive6').checked = false\n     }\n  }\n  if (localStorage.getItem(\"check_archive_4\") !== null){\n      if (localStorage.getItem(\"check_archive_4\") == 'true'){\n        document.getElementById('choice-archive4').checked = true\n      }else{\n       document.getElementById('choice-archive4').checked = false\n      }\n  }\n\n  function reset() {\n\n      window.location.reload();\n  \n  }\n\n  function push_archive() {\n\n            document.getElementById('errors').innerHTML=\"\";\n            localStorage.setItem(\"check_archive_1\", false);\n            localStorage.setItem(\"check_archive_2\", false);\n            localStorage.setItem(\"check_archive_3\", false);\n            localStorage.setItem(\"check_archive_5\", false);\n            localStorage.setItem(\"check_archive_6\", false);\n            localStorage.setItem(\"check_archive_4\", false);\n\n\n            var arr = []\n\n            var table = document.getElementById('results');\n            for (var r = 1, n = table.rows.length; r < n; r++) {\n                    if(table.rows[r].cells[0].innerHTML.indexOf(\"https://archive.org\") !== -1){\n                      arr.push(\"ia\");\n                    }\n                    if(table.rows[r].cells[0].innerHTML.indexOf(\"https://archive.is\") !== -1){\n                      arr.push(\"is\");\n                    }\n                    if(table.rows[r].cells[0].innerHTML.indexOf(\"https://megalodon.jp\") !== -1){\n                      arr.push(\"mg\");\n                    }\n                    if(table.rows[r].cells[0].innerHTML.indexOf(\"https://www.webcitation.org\") !== -1){\n                      arr.push(\"wc\");\n                    }\n                    if(table.rows[r].cells[0].innerHTML.indexOf(\"https://perma.cc\") !== -1){\n                      arr.push(\"cc\");\n                    }\n            }\n\n            function validateURL(textval) {\n                var urlregex = /^(https?|ftp):\\/\\/(((([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-f]{2})|[!\\$&'\\(\\)\\*\\+,;=]|:)*@)?(((\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5])\\.(\\d|[1-9]\\d|1\\d\\d|2[0-4]\\d|25[0-5]))|((([a-z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-z]|\\d|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.)+(([a-z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(([a-z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])*([a-z]|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])))\\.?)(:\\d*)?)(\\/((([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-f]{2})|[!\\$&'\\(\\)\\*\\+,;=]|:|@)+(\\/(([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-f]{2})|[!\\$&'\\(\\)\\*\\+,;=]|:|@)*)*)?)?(\\?((([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-f]{2})|[!\\$&'\\(\\)\\*\\+,;=]|:|@)|[\\uE000-\\uF8FF]|\\/|\\?)*)?(\\#((([a-z]|\\d|-|\\.|_|~|[\\u00A0-\\uD7FF\\uF900-\\uFDCF\\uFDF0-\\uFFEF])|(%[\\da-f]{2})|[!\\$&'\\(\\)\\*\\+,;=]|:|@)|\\/|\\?)*)?$/i;\n                return urlregex.test(textval);\n            }\n            if (validateURL(document.getElementById('text_url').value) == false){\n              document.getElementById('text_url').focus();\n              document.getElementById('errors').innerHTML=\"*Enter a correct URL*\";\n              return;\n            }\n\n            if (document.getElementById('choice-archive4').checked == true){ // perma.cc\n                if(document.getElementById('perma_cc_api').value.trim() == \"\"){\n                  document.getElementById('perma_cc_api').focus();\n                  document.getElementById('errors').innerHTML=\"*Enter your Perma.cc API Key*\";\n                  return;\n                }\n            }\n\n            var selected_archives = 0;\n\n            if (document.getElementById('choice-archive1').checked == true){\n              \n                  selected_archives = selected_archives + 1;\n\n                  if(arr.indexOf(\"ia\") == -1){\n                      document.getElementById('img1').style.opacity = 1\n                      $.ajax({\n                          type: \"GET\",\n                          url: \"ia/\"+document.getElementById('text_url').value,\n                          success: function(json) {\n                              if (validateURL(json['results'][0]) == true){\n                                  var table=document.getElementById(\"results\");\n                                  var row=table.insertRow(-1);\n                                  var cell1=row.insertCell(0);\n                                  var cell2=row.insertCell(1);\n                                  cell1.innerHTML='<a href=\"https://archive.org\" target=\"_blank\"> Internet Archive </a>'\n                                  cell2.innerHTML='<a href=\"'+json['results'][0]+'\" target=\"_blank\"> '+json['results'][0]+' </a>'\n                                  document.getElementById('results').style.opacity = 1\n                                  document.getElementById('img1').style.opacity = 0\n                              }\n                          },\n                          complete: function(){\n                            document.getElementById('img1').style.opacity = 0\n                          }\n                      });\n                 }\n                 localStorage.setItem(\"check_archive_1\", true);\n            }\n            if (document.getElementById('choice-archive2').checked == true){\n              \n                  selected_archives = selected_archives + 1;\n\n                  if(arr.indexOf(\"is\") == -1){\n                      document.getElementById('img2').style.opacity = 1\n                      $.ajax({\n                          type: \"GET\",\n                          url: \"is/\"+document.getElementById('text_url').value,\n                          success: function(json) {\n                              if (validateURL(json['results'][0]) == true){\n                                  var table=document.getElementById(\"results\");\n                                  var row=table.insertRow(-1);\n                                  var cell1=row.insertCell(0);\n                                  var cell2=row.insertCell(1);\n                                  cell1.innerHTML='<a href=\"https://archive.is\" target=\"_blank\"> Archive.is </a>'\n                                  cell2.innerHTML='<a href=\"'+json['results'][0]+'\" target=\"_blank\"> '+json['results'][0]+' </a>'\n                                  document.getElementById('results').style.opacity = 1\n                                  document.getElementById('img2').style.opacity = 0\n                              }\n                          },\n                          complete: function(){\n                            document.getElementById('img2').style.opacity = 0\n                          }\n                      });\n                  }\n              localStorage.setItem(\"check_archive_2\", true);\n            }\n            if (document.getElementById('choice-archive6').checked == true){\n                  \n                  selected_archives = selected_archives + 1;\n                  \n                  if(arr.indexOf(\"mg\") == -1){\n                      document.getElementById('img6').style.opacity = 1\n                      $.ajax({\n                          type: \"GET\",\n                          url: \"mg/\"+document.getElementById('text_url').value,\n                          success: function(json) {\n                              if (validateURL(json['results'][0]) == true){\n                                  var table=document.getElementById(\"results\");\n                                  var row=table.insertRow(-1);\n                                  var cell1=row.insertCell(0);\n                                  var cell2=row.insertCell(1);\n                                  cell1.innerHTML='<a href=\"https://megalodon.jp\" target=\"_blank\"> Megalodon.jp </a>'\n                                  cell2.innerHTML='<a href=\"'+json['results'][0]+'\" target=\"_blank\"> '+json['results'][0]+' </a>'\n                                  document.getElementById('results').style.opacity = 1\n                                  document.getElementById('img6').style.opacity = 0\n                              }\n                          },\n                          complete: function(){\n                            document.getElementById('img6').style.opacity = 0\n                          }\n                      });\n                  }\n                  localStorage.setItem(\"check_archive_6\", true);\n            }\n            if (document.getElementById('choice-archive4').checked == true){\n\n                  selected_archives = selected_archives + 1;\n\n                  if(arr.indexOf(\"cc\") == -1){\n                      document.getElementById('img4').style.opacity = 1\n                      $.ajax({\n                          type: \"GET\",\n                          url: \"cc/\"+document.getElementById('text_url').value+'?cc_api_key='+document.getElementById('perma_cc_api').value,\n                          success: function(json) {\n                              if (validateURL(json['results'][0]) == true){\n                                  var table=document.getElementById(\"results\");\n                                  var row=table.insertRow(-1);\n                                  var cell1=row.insertCell(0);\n                                  var cell2=row.insertCell(1);\n                                  cell1.innerHTML='<a href=\"https://perma.cc\" target=\"_blank\"> Perma.cc </a>'\n                                  cell2.innerHTML='<a href=\"'+json['results'][0]+'\" target=\"_blank\"> '+json['results'][0]+' </a>'\n                                  document.getElementById('results').style.opacity = 1\n                                  document.getElementById('img4').style.opacity = 0\n                              }\n                          },\n                          complete: function(){\n                            document.getElementById('img4').style.opacity = 0\n                          }\n                      });\n                  }\n                localStorage.setItem(\"permaccapikey\", document.getElementById('perma_cc_api').value);\n                localStorage.setItem(\"check_archive_4\", true);\n            }\n\n            if (selected_archives == 0){\n                document.getElementById('errors').innerHTML=\"*Select at least one archive*\";\n                return;\n            }\n    }\n</script>\n</body>\n</html>\n"
  },
  {
    "path": "requirements.txt",
    "content": "flask\nrequests\npathlib\nselenium"
  },
  {
    "path": "setup.py",
    "content": "#!/usr/bin/env python\n\nfrom setuptools import setup, find_packages\nfrom archivenow import __version__\n\nlong_description = open('README.rst').read()\ndesc = \"\"\"A Python library to push web resources into public web archives\"\"\"\n\n\nsetup(\n    name='archivenow',\n    version=__version__,\n    description=desc,\n    long_description=long_description,\n    author='Mohamed Aturban',\n    author_email='maturban@cs.odu.edu',\n    url='https://github.com/maturban/archivenow',\n    packages=find_packages(),\n    license=\"MIT\",\n    classifiers=[\n        'Development Status :: 5 - Production/Stable',\n        'Programming Language :: Python',\n        'Programming Language :: Python :: 2.7',\n        'Programming Language :: Python :: 3',\n        'Programming Language :: Python :: 3.4',\n        'Programming Language :: Python :: 3.5',\n        'Programming Language :: Python :: 3.6',\n        'License :: OSI Approved :: MIT License'\n    ],\n    install_requires=[\n        'flask',\n        'requests'\n    ],\n    package_data={\n        'archivenow': [\n            'handlers/*.*',\n            'templates/*.*',\n            'static/*.*'\n          ]\n    },\n    entry_points='''\n        [console_scripts]\n        archivenow=archivenow.archivenow:args_parser\n    '''   \n)\n"
  }
]