Full Code of oduwsdl/archivenow for AI

master dbc688f4f238 cached

16 files

52.6 KB

13.7k tokens

23 symbols

1 requests

Download .txt

Repository: oduwsdl/archivenow
Branch: master
Commit: dbc688f4f238
Files: 16
Total size: 52.6 KB

Directory structure:
gitextract_3tv8j6jl/

├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.rst
├── archivenow/
│   ├── __init__.py
│   ├── archivenow.py
│   ├── handlers/
│   │   ├── cc_handler.py
│   │   ├── ia_handler.py
│   │   ├── is_handler.py
│   │   ├── mg_handler.py
│   │   └── warc_handler.py
│   └── templates/
│       ├── api.txt
│       └── index.html
├── requirements.txt
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .dockerignore
================================================
.git
.gitignore
LICENSE
Dockerfile


================================================
FILE: .gitignore
================================================
.DS_Store
archivenow.egg-info/
build/
dist/
__pycache__


================================================
FILE: Dockerfile
================================================
ARG PYTAG=latest
FROM python:${PYTAG}
LABEL maintainer "Mohamed Aturban <mohsci1@yahoo.com>"

WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . ./
RUN chmod a+x ./archivenow/archivenow.py

ENTRYPOINT ["./archivenow/archivenow.py"]


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2017 ODU Web Science / Digital Libraries Research Group

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.rst
================================================
Archive Now (archivenow)
=============================
A Tool To Push Web Resources Into Web Archives
----------------------------------------------

Archive Now (**archivenow**) currently is configured to push resources into four public web archives. You can easily add more archives by writing a new archive handler (e.g., myarchive_handler.py) and place it inside the folder "handlers". 

Update January 2021
~~~~~~~~~
Originally, **archivenow** was configured to push to 6 different public web archives. The two removed web archives are `WebCite <https://www.webcitation.org/>`_ and `archive.st <http://archive.st/>`_. WebCite was removed from **archivenow** as they are no longer accepting archiving requests. Archive.st was removed from **archivenow** due to encountering a Captcha when attempting to push to the archive. In addition to removing those 2 archives, the method for pushing to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_ from **archivenow** has been updated. In order to push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_, `Selenium <https://selenium-python.readthedocs.io/>`_ is used.

As explained below, this library can be used through:

- Command Line Interface (CLI)

- A Web Service

- A Docker Container

- Python


Installing
----------
The latest release of **archivenow** can be installed using pip:

.. code-block:: bash

      $ pip install archivenow

The latest development version containing changes not yet released can be installed from source:

.. code-block:: bash
      
      $ git clone git@github.com:oduwsdl/archivenow.git
      $ cd archivenow
      $ pip install -r requirements.txt
      $ pip install ./
      
In order to push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_, **archivenow** must use `Selenium <https://selenium-python.readthedocs.io/>`_, which has already been added to the requirements.txt. However, Selenium additionally needs a driver to interface with the chosen browser. It is recommended to use Selenium and **archivenow** with `Firefox <https://www.mozilla.org/en-US/firefox/releases/>`_ and Firefox's corresponding `GeckoDriver <https://github.com/mozilla/geckodriver/releases>`_.

You can download the latest versions of `Firefox <https://www.mozilla.org/en-US/firefox/releases/>`_ and the `GeckoDriver <https://github.com/mozilla/geckodriver/releases>`_ to use with **archivenow**.

After installing the driver, you can push to `archive.today <https://archive.vn/>`_ and `megalodon.jp <https://megalodon.jp/>`_ from **archivenow**.

CLI USAGE 
---------
Usage of sub-commands in **archivenow** can be accessed through providing the `-h` or `--help` flag, like any of the below.

.. code-block:: bash

      $ archivenow -h
      usage: archivenow.py [-h] [--mg] [--cc] [--cc_api_key [CC_API_KEY]]
                           [--is] [--ia] [--warc [WARC]] [-v] [--all]
                           [--server] [--host [HOST]] [--agent [AGENT]]
                           [--port [PORT]]
                           [URI]

      positional arguments:
        URI                   URI of a web resource

      optional arguments:
        -h, --help            show this help message and exit
        --mg                  Use Megalodon.jp
        --cc                  Use The Perma.cc Archive
        --cc_api_key [CC_API_KEY]
                              An API KEY is required by The Perma.cc Archive
        --is                  Use The Archive.is
        --ia                  Use The Internet Archive
        --warc [WARC]         Generate WARC file
        -v, --version         Report the version of archivenow
        --all                 Use all possible archives
        --server              Run archiveNow as a Web Service
        --host [HOST]         A server address
        --agent [AGENT]       Use "wget" or "squidwarc" for WARC generation
        --port [PORT]         A port number to run a Web Service

Examples
--------


Example 1
~~~~~~~~~

To save the web page (www.foxnews.com) in the Internet Archive:

.. code-block:: bash

      $ archivenow --ia www.foxnews.com
      https://web.archive.org/web/20170209135625/http://www.foxnews.com

Example 2
~~~~~~~~~

By default, the web page (e.g., www.foxnews.com) will be saved in the Internet Archive if no optional arguments are provided:

.. code-block:: bash

      $ archivenow www.foxnews.com
      https://web.archive.org/web/20170215164835/http://www.foxnews.com

Example 3
~~~~~~~~~

To save the web page (www.foxnews.com) in the Internet Archive (archive.org) and Archive.is:

.. code-block:: bash
      
      $ archivenow --ia --is www.foxnews.com
      https://web.archive.org/web/20170209140345/http://www.foxnews.com
      http://archive.is/fPVyc


Example 4
~~~~~~~~~

To save the web page (https://nypost.com/) in all configured web archives. In addition to preserving the page in all configured archives, this command will also locally create a WARC file:

.. code-block:: bash
      
      $ archivenow --all https://nypost.com/ --cc_api_key $Your-Perma-CC-API-Key
      http://archive.is/dcnan
      https://perma.cc/53CC-5ST8
      https://web.archive.org/web/20181002081445/https://nypost.com/
      https://megalodon.jp/2018-1002-1714-24/https://nypost.com:443/
      https_nypost.com__96ec2300.warc

Example 5
~~~~~~~~~

To download the web page (https://nypost.com/) and create a WARC file:

.. code-block:: bash
      
      $ archivenow --warc=mypage --agent=wget https://nypost.com/
      mypage.warc
      
Server
------

You can run **archivenow** as a web service. You can specify the server address and/or the port number (e.g., --host localhost  --port 12345)

.. code-block:: bash
      
      $ archivenow --server
      
      Running on http://0.0.0.0:12345/ (Press CTRL+C to quit)


Example 6
~~~~~~~~~

To save the web page (www.foxnews.com) in The Internet Archive through the web service:

.. code-block:: bash

      $ curl -i http://0.0.0.0:12345/ia/www.foxnews.com
      
          HTTP/1.0 200 OK
          Content-Type: application/json
          Content-Length: 95
          Server: Werkzeug/0.11.15 Python/2.7.10
          Date: Tue, 02 Oct 2018 08:20:18 GMT

          {
            "results": [
              "https://web.archive.org/web/20181002082007/http://www.foxnews.com"
            ]
          }
      
Example 7
~~~~~~~~~

To save the web page (www.foxnews.com) in all configured archives though the web service:

.. code-block:: bash
      
      $ curl -i http://0.0.0.0:12345/all/www.foxnews.com

          HTTP/1.0 200 OK
          Content-Type: application/json
          Content-Length: 385
          Server: Werkzeug/0.11.15 Python/2.7.10
          Date: Tue, 02 Oct 2018 08:23:53 GMT

          {
            "results": [
              "Error (The Perma.cc Archive): An API Key is required ", 
              "http://archive.is/ukads", 
              "https://web.archive.org/web/20181002082007/http://www.foxnews.com", 
              "Error (Megalodon.jp): We can not obtain this page because the time limit has been reached or for technical ... ", 
              "http://www.webcitation.org/72rbKsX8B"
            ]
          }

Example 8
~~~~~~~~~

Because an API Key is required by Perma.cc, the HTTP request should be as follows:
        
.. code-block:: bash
      
      $ curl -i http://127.0.0.1:12345/all/https://nypost.com/?cc_api_key=$Your-Perma-CC-API-Key

Or use only Perma.cc:

.. code-block:: bash

      $ curl -i http://127.0.0.1:12345/cc/https://nypost.com/?cc_api_key=$Your-Perma-CC-API-Key

Running as a Docker Container
-----------------------------

.. code-block:: bash

    $ docker image pull oduwsdl/archivenow

Different ways to run archivenow    

.. code-block:: bash

    $ docker container run -it --rm oduwsdl/archivenow -h

Accessible at 127.0.0.1:12345:

.. code-block:: bash

    $ docker container run -p 12345:12345 -it --rm oduwsdl/archivenow --server --host 0.0.0.0

Accessible at 127.0.0.1:22222:

.. code-block:: bash

    $ docker container run -p 22222:11111 -it --rm oduwsdl/archivenow --server --port 11111 --host 0.0.0.0

.. image:: http://www.cs.odu.edu/~maturban/archivenow-6-archives.gif
   :width: 10pt


To save the web page (http://www.cnn.com) in The Internet Archive

.. code-block:: bash

    $ docker container run -it --rm oduwsdl/archivenow --ia http://www.cnn.com
    

Python Usage
------------

.. code-block:: bash
   
    >>> from archivenow import archivenow

Example 9
~~~~~~~~~~

To save the web page (www.foxnews.com) in all configured archives:

.. code-block:: bash

      >>> archivenow.push("www.foxnews.com","all")
      ['https://web.archive.org/web/20170209145930/http://www.foxnews.com','http://archive.is/oAjuM','http://www.webcitation.org/6o9LcQoVV','Error (The Perma.cc Archive): An API KEY is required]

Example 10
~~~~~~~~~~

To save the web page (www.foxnews.com) in The Perma.cc:

.. code-block:: bash

      >>> archivenow.push("www.foxnews.com","cc",{"cc_api_key":"$YOUR-Perma-cc-API-KEY"})
      ['https://perma.cc/8YYC-C7RM']
      
Example 11
~~~~~~~~~~

To start the server from Python do the following. The server/port number can be passed (e.g., start(port=1111, host='localhost')):

.. code-block:: bash

      >>> archivenow.start()
      
          2017-02-09 15:02:37
          Running on http://127.0.0.1:12345
          (Press CTRL+C to quit)


Configuring a new archive or removing existing one
--------------------------------------------------
Additional archives may be added by creating a handler file in the "handlers" directory.

For example, if I want to add a new archive named "My Archive", I would create a file "ma_handler.py" and store it in the folder "handlers". The "ma" will be the archive identifier, so to push a web page (e.g., www.cnn.com) to this archive through the Python code, I should write:


.. code-block:: python

      archivenow.push("www.cnn.com","ma")
      

In the file "ma_handler.py", the name of the class must be "MA_handler". This class must have at least one function called "push" which has one argument. See the existing `handler files`_ for examples on how to organized a newly configured archive handler.

Removing an archive can be done by one of the following options:

- Removing the archive handler file from the folder "handlers"

- Renaming the archive handler file to other name that does not end with "_handler.py"

- Setting the variable "enabled" to "False" inside the handler file


Notes
-----
The Internet Archive (IA) sets a time gap of at least two minutes between creating different copies of the "same" resource. 

For example, if you send a request to IA to capture (www.cnn.com) at 10:00pm, IA will create a new copy (*C*) of this URI. IA will then return *C* for all requests to the archive for this URI received until 10:02pm. Using this same submission procedure for Archive.is requires a time gap of five minutes.  

.. _handler files: https://github.com/oduwsdl/archivenow/tree/master/archivenow/handlers


Citing Project
--------------

.. code-block:: latex

      @INPROCEEDINGS{archivenow-jcdl2018,
        AUTHOR    = {Mohamed Aturban and
                     Mat Kelly and
                     Sawood Alam and
                     John A. Berlin and
                     Michael L. Nelson and
                     Michele C. Weigle},
        TITLE     = {{ArchiveNow}: Simplified, Extensible, Multi-Archive Preservation},
        BOOKTITLE = {Proceedings of the 18th {ACM/IEEE-CS} Joint Conference on Digital Libraries},
        SERIES    = {{JCDL} '18},
        PAGES     = {321--322},
        MONTH     = {June},
        YEAR      = {2018},
        ADDRESS   = {Fort Worth, Texas, USA},
        URL       = {https://doi.org/10.1145/3197026.3203880},
        DOI       = {10.1145/3197026.3203880}
      }


================================================
FILE: archivenow/__init__.py
================================================
__version__ = '2020.7.18.12.19.44'

================================================
FILE: archivenow/archivenow.py
================================================
#!/usr/bin/env python
import os
import re
import sys
import uuid
import glob
import json
import importlib
import argparse
import string
import requests
from threading import Thread
from flask import request, Flask, jsonify, render_template
from pathlib import Path

#from __init__ import __version__ as archiveNowVersion

archiveNowVersion = '2020.7.18.12.19.44'

# archive handlers path
PATH = Path(os.path.dirname(os.path.abspath(__file__)))
PATH_HANDLER = PATH / 'handlers'

# for the web app
app = Flask(__name__)

# create handlers for enabled archives
global handlers
handlers = {}

# defult value for server/port
SERVER_IP = '0.0.0.0'
SERVER_PORT = 12345


def bad_request(error=None):
    message = {
        'status': 400,
        'message': 'Error in processing the request',
    }
    resp = jsonify(message)
    resp.status_code = 400
    return resp


# def getServer_IP_PORT():
#     u = str(SERVER_IP)
#     if str(SERVER_PORT) != '80':
#         u = u + ":" + str(SERVER_PORT)
#     if 'http' != u[0:4]:
#         u = 'http://' + u
#     return u


def listArchives_server(handlers):
    uri_args = ''
    if 'cc' in handlers:
        if handlers['cc'].enabled and handlers['cc'].api_required:
            uri_args = '?cc_api_key={Your-Perma.cc-API-Key}'
    li = {"archives": [{  # getServer_IP_PORT() + 
        "id": "all", "GET":'/all/' + '{URI}'+uri_args,
        "archive-name": "All enabled archives"}]}
    for handler in handlers:
        if handlers[handler].enabled:
            uri_args2 = ''
            if handler == 'cc':
                uri_args2 = uri_args
            li["archives"].append({ #getServer_IP_PORT() +
                "id": handler, "archive-name": handlers[handler].name,
                "GET":  '/' + handler + '/' + '{URI}'+uri_args2})
    return li


@app.route('/', defaults={'path': ''}, methods=['GET'])
@app.route('/<path:path>', methods=['GET'])
def pushit(path):
    # no path; return a list of avaliable archives
    if path == '':
        #resp = jsonify(listArchives_server(handlers))
        #resp.status_code = 200
        return render_template('index.html')
        #return resp
    # get request with path
    elif (path == 'api'):
        resp = jsonify(listArchives_server(handlers))
        resp.status_code = 200
        return resp
    elif (path == "ajax-loader.gif"):
        return render_template('ajax-loader.gif')
    else:
        try:
            # get the args passed to push function like API KEY if provided
            PUSH_ARGS = {}
            for k in request.args.keys():
                PUSH_ARGS[k] = request.args[k]

            s = str(path).split('/', 1)
            arc_id = s[0]
            URI = request.url.split('/', 4)[4] # include query params, too

            if 'herokuapp.com' in request.host:
                PUSH_ARGS['from_heroku'] = True

            # To push into archives
            resp = {"results": push(URI, arc_id, PUSH_ARGS)}
            if len(resp["results"]) == 0:
                return bad_request()
            else:
                # what to return
                resp = jsonify(resp)
                resp.status_code = 200

                return resp
        except Exception as e:
            pass
        return bad_request()

res_uris = {}


def push_proxy(hdlr, URIproxy, p_args_proxy, res_uris_idx, session=requests.Session()):
    global res_uris
    try:
        res = hdlr.push( URIproxy , p_args_proxy, session=session)
        print ( res )
        res_uris[res_uris_idx].append(res)
    except:
        pass

def push(URI, arc_id, p_args={}, session=requests.Session()):
    global handlers
    global res_uris
    try:
        # push to all possible archives
        res_uris_idx = str(uuid.uuid4())
        res_uris[res_uris_idx] = []
        ### if arc_id == 'all':
            ### for handler in handlers:
                ### if (handlers[handler].api_required):
                    # pass args like key API
                    ### res.append(handlers[handler].push(str(URI), p_args))
                ### else:
                    ### res.append(handlers[handler].push(str(URI)))
        ### else:
            # push to the chosen archives

        threads = []

        for handler in handlers:
            if (arc_id == handler) or (arc_id == 'all'):
            ### if (arc_id == handler): ### and (handlers[handler].api_required):
                #res.append(handlers[handler].push(str(URI), p_args))
                #push_proxy( handlers[handler], str(URI), p_args, res_uris_idx)
                threads.append(
                    Thread(
                        target=push_proxy, 
                        args=(handlers[handler], str(URI), p_args, res_uris_idx, ), 
                        kwargs={'session': session}))
                ### elif (arc_id == handler):
                    ### res.append(handlers[handler].push(str(URI)))

        for th in threads:
            th.start()
        for th in threads:
            th.join()

        res = res_uris[res_uris_idx]
        del res_uris[res_uris_idx]
        return res
    except:
        del res_uris[res_uris_idx]
        pass
    return ["bad request"]


def start(port=SERVER_PORT, host=SERVER_IP):
    global SERVER_PORT
    global SERVER_IP
    SERVER_PORT = port
    SERVER_IP = host
    app.run(
        host=host,
        port=port,
        threaded=True,
        debug=True,
        use_reloader=False)


def load_handlers():
    global handlers
    handlers = {}
    # add the path of the handlers to the system so they can be imported
    sys.path.append(str(PATH_HANDLER))

    # create a list of handlers.
    for file in PATH_HANDLER.glob('*_handler.py'):
        name = file.stem
        prefix = name.replace('_handler', '')
        mod = importlib.import_module(name)
        mod_class = getattr(mod, prefix.upper() + '_handler')
        # finally an object is created
        handlers[prefix] = mod_class()
    # exclude all disabled archives

    for handler in list(handlers): # handlers.keys():
        if not handlers[handler].enabled:
            del handlers[handler]


def args_parser():
    global SERVER_PORT
    global SERVER_IP
    # parsing arguments

    class MyParser(argparse.ArgumentParser):

        def error(self, message):
            sys.stderr.write('error: %s\n' % message)
            self.print_help()
            sys.exit(2)

        def printm(self):
            sys.stderr.write('')
            self.print_help()
            sys.exit(2)

    parser = MyParser()

    # arc_handler = 0
    for handler in handlers:
        # add archives identifiers to the list of options
        # arc_handler += 1
        if handler == 'warc':
            parser.add_argument('--' + handler, nargs='?', 
                            help=handlers[handler].name)
        else:
            parser.add_argument('--' + handler, action='store_true', default=False,
                            help='Use ' + handlers[handler].name)
        if (handlers[handler].api_required):
            parser.add_argument(
                '--' +
                handler +
                '_api_key',
                nargs='?',
                help='An API KEY is required by ' +
                handlers[handler].name)

    parser.add_argument(
        '-v',
        '--version',
        help='Report the version of archivenow',
        action='version',
        version='ArchiveNow ' +
        archiveNowVersion)

    if len(handlers) > 0:
        parser.add_argument('--all', action='store_true', default=False,
                            help='Use all possible archives ')

        parser.add_argument('--server', action='store_true', default=False,
                            help='Run archiveNow as a Web Service ')

        parser.add_argument('URI', nargs='?', help='URI of a web resource')

        parser.add_argument('--host', nargs='?', help='A server address')

        if 'warc' in handlers.keys():
            parser.add_argument('--agent', nargs='?', help='Use "wget" or "squidwarc" for WARC generation')

        parser.add_argument(
            '--port',
            nargs='?',
            help='A port number to run a Web Service')

        args = parser.parse_args()
    else:
        print ('\n Error: No enabled archive handler found\n')
        sys.exit(0)

    arc_opt = 0
    # start the server
    if getattr(args, 'server'):
        if getattr(args, 'port'):
            SERVER_PORT = int(args.port)
        if getattr(args, 'host'):
            SERVER_IP = str(args.host)

        start(port=SERVER_PORT, host=SERVER_IP)

    else:
        if not getattr(args, 'URI'):
            print (parser.error('too few arguments'))
        res = []

        # get the args passed to push function like API KEY if provided
        PUSH_ARGS = {}
        for handler in handlers:
            if (handlers[handler].api_required):
                if getattr(args, handler + '_api_key'):
                    PUSH_ARGS[
                        handler +
                        '_api_key'] = getattr(
                        args,
                        handler +
                        '_api_key')
                else:
                    if getattr(args, handler):
                        print (
                            parser.error(
                                'An API Key is required by ' +
                                handlers[handler].name))
            orginal_warc_value = getattr(args, 'warc')
            if handler == 'warc':
                PUSH_ARGS['warc'] = getattr(args, 'warc')
                if PUSH_ARGS['warc'] == None:
                    valid_chars = "-_.()/ %s%s" % (string.ascii_letters, string.digits)
                    PUSH_ARGS['warc'] = ''.join(c for c in str(args.URI).strip() if c in valid_chars)
                    PUSH_ARGS['warc'] = PUSH_ARGS['warc'].replace(' ','_').replace('/','_').replace('__','_') # I don't like spaces in filenames.
                    PUSH_ARGS['warc'] = PUSH_ARGS['warc']+'_'+str(uuid.uuid4())[:8]
                if PUSH_ARGS['warc'][-1] == '_':
                    PUSH_ARGS['warc'] = PUSH_ARGS['warc'][:-1]
                agent = 'wget'
                tmp_agent = getattr(args, 'agent')
                if tmp_agent == 'squidwarc':
                    agent = tmp_agent
                PUSH_ARGS['agent'] = agent

        # sys.exit(0)

        # push to all possible archives
        if getattr(args, 'all'):
            arc_opt = 1
            res = push(str(args.URI).strip(), 'all', PUSH_ARGS)
        else:
            # push to the chosen archives
            for handler in handlers:
                if getattr(args, handler):
                    arc_opt += 1
                    for i in push(str(args.URI).strip(), handler, PUSH_ARGS):
                        res.append(i)
            # push to the defult archive
            if (len(handlers) > 0) and (arc_opt == 0):
                # set the default; it ia by default or the first archive in the
                # list if not found
                if 'ia' in handlers:
                    res = push(str(args.URI).strip(), 'ia', PUSH_ARGS)
                else:
                    res = push(str(args.URI).strip(),
                               handlers.keys()[0], PUSH_ARGS)
                # print (parser.printm())
            # else:
        # for rs in res:
        #     print (rs)

load_handlers()

if __name__ == '__main__':
    args_parser()


================================================
FILE: archivenow/handlers/cc_handler.py
================================================
import requests
import json

class CC_handler(object):

    def __init__(self):
        self.enabled = True
        self.name = 'The Perma.cc Archive'
        self.api_required = True

    def push(self, uri_org, p_args=[], session=requests.Session()):
        msg = ''
        try:

            APIKEY = p_args['cc_api_key']

            r = session.post('https://api.perma.cc/v1/archives/?api_key='+APIKEY, timeout=120,
                                                           data=json.dumps({"url":uri_org}),
                                                           headers={'Content-type': 'application/json'},
                                                           allow_redirects=True)       
            r.raise_for_status()

            if 'Location' in r.headers:
                return 'https://perma.cc/'+r.headers['Location'].rsplit('/',1)[1]
            else:
                for r2 in r.history:
                    if 'Location' in r2.headers:
                        return 'https://perma.cc/'+r2.headers['Location'].rsplit('/',1)[1]
            entity_json = r.json()
            if 'guid' in entity_json:
                return str('https://perma.cc/'+entity_json['guid'])
            msg = "Error ("+self.name+ "): No HTTP Location header is returned in the response" 
        except Exception as e:
            if (msg == '') and ('_api_key' in str(e)):
                msg = "Error (" + self.name+ "): " + 'An API Key is required '
            elif (msg == ''):
                msg = "Error (" + self.name+ "): " + str(e)
            pass;
        return msg


================================================
FILE: archivenow/handlers/ia_handler.py
================================================
import requests

class IA_handler(object):

    def __init__(self):
        self.enabled = True
        self.name = 'The Internet Archive'
        self.api_required = False

    def push(self, uri_org, p_args=[], session=requests.Session()):
        msg = ''
        try:
            uri = 'https://web.archive.org/save/' + uri_org
            archiveTodayUserAgent = {
                "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
            }
            # push into the archive
            # r = session.get(uri, timeout=120, allow_redirects=True, headers=archiveTodayUserAgent)

            if ('user-agent' in session.headers) and (not session.headers['User-Agent'].lower().startswith('python-requests/')):
                r = session.get(uri, timeout=120, allow_redirects=True)
            else:
                r = session.get(uri, timeout=120, allow_redirects=True, headers=archiveTodayUserAgent)

            r.raise_for_status()
            # extract the link to the archived copy 
            if (r != None):
                if "Location" in r.headers:
                    return r.headers["Location"]
                elif "Content-Location" in r.headers:
                    if (r.headers["Content-Location"]).startswith("/web/"):
                        return "https://web.archive.org"+r.headers["Content-Location"]
                    else:
                        try:
                            uri_from_content = "https://web.archive.org" + r.text.split('var redirUrl = "',1)[1].split('"',1)[0]
                        except:
                            uri_from_content = r.headers["Content-Location"]
                            #pass;
                        return uri_from_content
                else:
                    for r2 in r.history:
                        if 'Location' in r2.headers:
                            return r.url
                            #return r2.headers['Location']
                        if 'Content-Location' in r2.headers:
                            return r.url
                            #return r2.headers['Content-Location']
            msg = "("+self.name+ "): No HTTP Location/Content-Location header is returned in the response"               
        except Exception as e:
            if msg == '':
                msg = "Error (" + self.name+ "): " + str(e)
            pass
        return msg


================================================
FILE: archivenow/handlers/is_handler.py
================================================
import os
import requests
import sys
from selenium.webdriver.firefox.options import Options
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException


class IS_handler(object):

    def __init__(self):
        self.enabled = True
        self.name = 'The Archive.is'
        self.api_required = False 

    def push(self, uri_org, p_args=[], session=requests.Session()):

        msg = ""

        try:

            options = Options()
            options.headless = True # Run in background
            driver = webdriver.Firefox(options = options)
            driver.get("https://archive.is")

            elem = driver.find_element_by_id("url") # Find the form to place a URL to be archived

            elem.send_keys(uri_org) # Place the URL in the input box

            saveButton = driver.find_element_by_xpath("/html/body/center/div/form[1]/div[3]/input") # Find the submit button

            saveButton.click() # Click the submit button

            # After clicking submit, there may be an additional page that pops up and asks if you are sure you want
            # to archive that page since it was archived X amount of time ago. We need to wait for that page to 
            # load and click submit again.
            delay = 30 # seconds
            try:
                nextSaveButton = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.XPATH, "/html/body/center/div[4]/center/div/div[2]/div/form/div/input")))
                nextSaveButton.click()

            except TimeoutException:
                pass

            # The page takes a while to archive, so keep checking if the loading page is still displayed.
            loading = True
            while loading:
                
                if not 'wip' in driver.current_url and not 'submit' in driver.current_url:
                    loading = False

            # After the loading screen is gone and the page is archived, the current URL
            # will be the URL to the archived page.
            msg = driver.current_url;

            driver.quit()

        except:

            '''
            exc_type, exc_obj, exc_tb = sys.exc_info()
            fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
            print((fname, exc_tb.tb_lineno, sys.exc_info() ))
            '''

            msg = "Unable to complete request."

        return msg


================================================
FILE: archivenow/handlers/mg_handler.py
================================================
# encoding: utf-8
import os
import requests
import sys
from selenium.webdriver.firefox.options import Options
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

class MG_handler(object):

    def __init__(self):
        self.enabled = True
        self.name = 'Megalodon.jp'
        self.api_required = False

    def push(self, uri_org, p_args=[], session=requests.Session()):

        msg = ""

        options = Options()
        options.headless = True # Run in background
        driver = webdriver.Firefox(options = options)
        driver.get("https://megalodon.jp/?url=" + uri_org)

        try:
            addButton = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div[8]/form/div[1]/input[2]")

            addButton.click() # Click the add button
        except :
            print("Unable to archive this page at this time.")
            raise


        stillOnPage = True
        while stillOnPage:
            try:
                button = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div[1]/div/h3")

            except:
                stillOnPage = False

            try:
                error = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div[3]/div/a/h3")
                msg = "We apologize for the inconvenience. Currently, acquisitions that are considered \"robots\" in the acquisition of certain conditions are prohibited."
                raise
                sys.exit()

            except:
                pass

        # The page takes a while to archive, so keep checking if the loading page is still displayed.
        loading = True
        while loading:
            try:
                loadingPage = driver.find_element_by_xpath("/html/body/div[2]/div/div[1]/a/img")
                loading = False

            except:
                loading = True

        # After the loading screen is gone and the page is archived, the current URL
        # will be the URL to the archived page.
        if msg == "":
            print(driver.current_url)

        return msg
        

================================================
FILE: archivenow/handlers/warc_handler.py
================================================
import requests
import os.path
import distutils.spawn

class WARC_handler(object):

    def __init__(self):
        self.enabled = True
        self.name = 'Generate WARC file'
        self.api_required = False

    def push(self, uri_org, p_args=[], session=requests.Session()):
        msg = ''
        if p_args['agent'] == 'squidwarc':
            # squidwarc
            #if not distutils.spawn.find_executable("squidwarc"):
            #    return 'wget is not installed!'
            os.system('python ~/squidwarc_one_page/generte_warcs.py 9222 "'+uri_org+'" '+p_args['warc']+'.warc  &> /dev/null')
            if os.path.exists(p_args['warc']):
                return p_args['warc']
            elif os.path.exists(p_args['warc']+'.warc'):
                return p_args['warc']+'.warc'
            else:
                return 'squidwarc failed to generate the WARC file'

        else:
            if not distutils.spawn.find_executable("wget"):
                return 'wget is not installed!'
            # wget 
            os.system('wget -E -H -k -p -q --delete-after --no-warc-compression --warc-file="'+p_args['warc']+'" "'+uri_org+'"')
            if os.path.exists(p_args['warc']):
                return p_args['warc']
            elif os.path.exists(p_args['warc']+'.warc'):
                return p_args['warc']+'.warc'
            else:
                return 'wget failed to generate the WARC file'


================================================
FILE: archivenow/templates/api.txt
================================================
<!--   <h4 id="archivenow_api">Archive Now API</h4>
  <h5 id="archivenow_api1">To push a web page into particular web archive, use the following URL:</h5>
  <pre>
    http://{server}:{port}/{archive-id}/{URI}
  </pre>
  <h5 id="archivenow_api2">Archive identifier (use "all" for all archives):</h5>
<table style="width:30%">
  <tr>
    <th>Archive</th>
    <th>Identifier</th>
  </tr>
  <tr>
    <td>Internet Archive</td>
    <td>ia</td>
  </tr>
  <tr>
    <td>Archive.is</td>
    <td>is</td>
  </tr>
  <tr>
    <td>Perma.cc</td>
    <td>cc</td>
  </tr>
</table> -->
<!--   <h5 id="archivenow_api3">Example, capture http://www.example.com by Internet Archive: </h5>
<pre>
curl  -i http://127.0.0.1:12345/ia/http://www.example.com

HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 95
Server: Werkzeug/0.11.15 Python/2.7.10
Date: Fri, 10 Nov 2017 22:36:26 GMT

{
  "results": [
    "https://web.archive.org/web/20171110223626/http://www.example.com"
  ]
}
</pre>
  <h5 id="archivenow_api4">Example, capture http://www.example.com by all four archive (An API KEY is required by Perma.cc): </h5>
<pre>
curl -i 127.0.0.1:12345/all/http://www.example.com?cc_api_key=8r820...

HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 207
Server: Werkzeug/0.11.15 Python/2.7.10
Date: Fri, 10 Nov 2017 22:42:08 GMT

{
  "results": [
    "https://perma.cc/QX65-CFDD", 
    "https://web.archive.org/web/20171110223626/http://www.example.com", 
    "http://archive.is/ff17A", 
    "http://www.webcitation.org/6uschXwlI"
  ]
}
</pre> -->

================================================
FILE: archivenow/templates/index.html
================================================
<html>
<head>
<style>

.reveal-if-active {
  opacity: 0;
  max-height: 0;
  overflow: hidden;
  font-size: 14px;
  -webkit-transform: scale(0.8);
          transform: scale(0.8);
  -webkit-transition: 0.5s;
  transition: 0.5s;
}
.reveal-if-active label {
  margin: 0 0 3px 22px;
  display: block;
  font-size: smaller;
}
.reveal-if-active input[type=text] {
  width: 300px;
}
input[id="choice-archive4"]:checked ~ .reveal-if-active {
  opacity: 1;
  max-height: 120px;
  padding: 0px 0px;
  -webkit-transform: scale(1);
          transform: scale(1);
  overflow: visible;
}

table {
    margin: 14px auto;
    opacity: 0;
}

table, th, td {
    border-collapse: collapse;
}
th, td {
    padding: 1px;
    text-align: left;
    font-family: "My Custom Font", Verdana, Tahoma;
    font-size: 12px;
}

tr{
    border-bottom: 1px solid #ccc;
    border-top: 1px solid #ccc;
}
#title {
  display: block;
  text-align: center;
  padding: 22px 0 0 0
}

.url{
  display: block;
  text-align: center;
}

#text_url{
  width:333px;
  font-size: 12.5px;
}

#select_label{
    padding: 0px 270px 0 0;
    text-align: center;
    margin-bottom: 8px;
}

#choices{
      text-align: center;
      padding: 0px 20px 0px 0px;
      margin-left: 133px;
}

#choices2{
      text-align: left;
      display: inline-block;

}

#perma_cc_api{
    margin: -2px 93px 0px 21px;
}

#submitdiv{
    text-align: center;
    padding: 20px 242px 0 0;
    margin: 0 0 0 38px;
}

input[type=submit] {
    width: 5em;
    height: 2em;
    font-size: 12px;
    background-color: gainsboro;
    margin: 0px 0px 0px 13px;
}

#errors{
    font-size: smaller;
    color: brown;
    padding: 6px 0px 3px 104px;
}

.img1{

  width: 13px;
  opacity: 0;
}

.img2{

  width: 13px;
  opacity: 0;
}

.img3{

  width: 13px;
  opacity: 0;
}

.img5{

  width: 13px;
  opacity: 0;
}

.img6{

  width: 13px;
  opacity: 0;
}

.img4{

  width: 13px;
  opacity: 0;
}

#apilink{
  font-size: smaller;
  padding-top: 39px;
}
</style>
</head>
<body>
      <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"></script>
      <h3 id="title"> Preserve a web page in web archives </h3>
      <div class="url">
        <label for="text_url" id="label_url">URL</label>
        <input type="text" id="text_url" required>
      </div>
  
  <div>
    <p id="select_label">Select archives:</p>
    <div id="choices">
    <div id="choices2">
    <input type="checkbox" id="choice-archive1" checked > Internet Archive <img src={{ url_for('static', filename = "ajax-loader.gif") }} class="img1" id="img1"> <br>
    <input type="checkbox" id="choice-archive2" checked > Archive.is       <img src={{ url_for('static', filename = "ajax-loader.gif") }} class="img2" id="img2"> <br>
    <input type="checkbox" id="choice-archive6" checked > Megalodon.jp <img src={{ url_for('static', filename = "ajax-loader.gif") }} class="img6" id="img6"> <br>
    <input type="checkbox" id="choice-archive4" > Perma.cc                 <img src={{ url_for('static', filename = "ajax-loader.gif") }} class="img4" id="img4">
    <div class="reveal-if-active">
      <label for="perma_cc_api">Permaa.cc requires <a href="https://perma.cc/settings/tools" target="_blank"> an API Key </a></label>
      <input type="text" id="perma_cc_api">
    </div>
    </div>
    </div>
  </div>
  <div id="submitdiv">
    <input type="submit" value="Submit" onClick="push_archive();">
    <input type="submit" value="Reset" onClick="reset();">
    <div id ="errors"></div>
  </div>
    <table id="results" width="600">
        <thead>
        <tr>
            <th scope="col" width="130">Archive</th>
            <th scope="col" width="450">Link to the archived page</th>
        </tr>
        </thead>
    </table>
<div id="apilink"><a href="/api" target="_blank">Archive Now API</a></div>

<script type="text/javascript">

  document.getElementById('perma_cc_api').value = localStorage.getItem("permaccapikey");

  if (localStorage.getItem("check_archive_1") !== null){
      if (localStorage.getItem("check_archive_1") == 'true'){
        document.getElementById('choice-archive1').checked = true
      }else{
        document.getElementById('choice-archive1').checked = false
      }
  }
  if (localStorage.getItem("check_archive_2") !== null){
      if (localStorage.getItem("check_archive_2") == 'true'){
        document.getElementById('choice-archive2').checked = true
      }else{
        document.getElementById('choice-archive2').checked = false
      }
  }
  if (localStorage.getItem("check_archive_3") !== null){
      if (localStorage.getItem("check_archive_3") == 'true'){
        document.getElementById('choice-archive3').checked = true
      }else{
        document.getElementById('choice-archive3').checked = false
     }
  }
  if (localStorage.getItem("check_archive_5") !== null){
      if (localStorage.getItem("check_archive_5") == 'true'){
        document.getElementById('choice-archive5').checked = true
      }else{
        document.getElementById('choice-archive5').checked = false
     }
  }
  if (localStorage.getItem("check_archive_6") !== null){
      if (localStorage.getItem("check_archive_6") == 'true'){
        document.getElementById('choice-archive6').checked = true
      }else{
        document.getElementById('choice-archive6').checked = false
     }
  }
  if (localStorage.getItem("check_archive_4") !== null){
      if (localStorage.getItem("check_archive_4") == 'true'){
        document.getElementById('choice-archive4').checked = true
      }else{
       document.getElementById('choice-archive4').checked = false
      }
  }

  function reset() {

      window.location.reload();
  
  }

  function push_archive() {

            document.getElementById('errors').innerHTML="";
            localStorage.setItem("check_archive_1", false);
            localStorage.setItem("check_archive_2", false);
            localStorage.setItem("check_archive_3", false);
            localStorage.setItem("check_archive_5", false);
            localStorage.setItem("check_archive_6", false);
            localStorage.setItem("check_archive_4", false);


            var arr = []

            var table = document.getElementById('results');
            for (var r = 1, n = table.rows.length; r < n; r++) {
                    if(table.rows[r].cells[0].innerHTML.indexOf("https://archive.org") !== -1){
                      arr.push("ia");
                    }
                    if(table.rows[r].cells[0].innerHTML.indexOf("https://archive.is") !== -1){
                      arr.push("is");
                    }
                    if(table.rows[r].cells[0].innerHTML.indexOf("https://megalodon.jp") !== -1){
                      arr.push("mg");
                    }
                    if(table.rows[r].cells[0].innerHTML.indexOf("https://www.webcitation.org") !== -1){
                      arr.push("wc");
                    }
                    if(table.rows[r].cells[0].innerHTML.indexOf("https://perma.cc") !== -1){
                      arr.push("cc");
                    }
            }

            function validateURL(textval) {
                var urlregex = /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$/i;
                return urlregex.test(textval);
            }
            if (validateURL(document.getElementById('text_url').value) == false){
              document.getElementById('text_url').focus();
              document.getElementById('errors').innerHTML="*Enter a correct URL*";
              return;
            }

            if (document.getElementById('choice-archive4').checked == true){ // perma.cc
                if(document.getElementById('perma_cc_api').value.trim() == ""){
                  document.getElementById('perma_cc_api').focus();
                  document.getElementById('errors').innerHTML="*Enter your Perma.cc API Key*";
                  return;
                }
            }

            var selected_archives = 0;

            if (document.getElementById('choice-archive1').checked == true){
              
                  selected_archives = selected_archives + 1;

                  if(arr.indexOf("ia") == -1){
                      document.getElementById('img1').style.opacity = 1
                      $.ajax({
                          type: "GET",
                          url: "ia/"+document.getElementById('text_url').value,
                          success: function(json) {
                              if (validateURL(json['results'][0]) == true){
                                  var table=document.getElementById("results");
                                  var row=table.insertRow(-1);
                                  var cell1=row.insertCell(0);
                                  var cell2=row.insertCell(1);
                                  cell1.innerHTML='<a href="https://archive.org" target="_blank"> Internet Archive </a>'
                                  cell2.innerHTML='<a href="'+json['results'][0]+'" target="_blank"> '+json['results'][0]+' </a>'
                                  document.getElementById('results').style.opacity = 1
                                  document.getElementById('img1').style.opacity = 0
                              }
                          },
                          complete: function(){
                            document.getElementById('img1').style.opacity = 0
                          }
                      });
                 }
                 localStorage.setItem("check_archive_1", true);
            }
            if (document.getElementById('choice-archive2').checked == true){
              
                  selected_archives = selected_archives + 1;

                  if(arr.indexOf("is") == -1){
                      document.getElementById('img2').style.opacity = 1
                      $.ajax({
                          type: "GET",
                          url: "is/"+document.getElementById('text_url').value,
                          success: function(json) {
                              if (validateURL(json['results'][0]) == true){
                                  var table=document.getElementById("results");
                                  var row=table.insertRow(-1);
                                  var cell1=row.insertCell(0);
                                  var cell2=row.insertCell(1);
                                  cell1.innerHTML='<a href="https://archive.is" target="_blank"> Archive.is </a>'
                                  cell2.innerHTML='<a href="'+json['results'][0]+'" target="_blank"> '+json['results'][0]+' </a>'
                                  document.getElementById('results').style.opacity = 1
                                  document.getElementById('img2').style.opacity = 0
                              }
                          },
                          complete: function(){
                            document.getElementById('img2').style.opacity = 0
                          }
                      });
                  }
              localStorage.setItem("check_archive_2", true);
            }
            if (document.getElementById('choice-archive6').checked == true){
                  
                  selected_archives = selected_archives + 1;
                  
                  if(arr.indexOf("mg") == -1){
                      document.getElementById('img6').style.opacity = 1
                      $.ajax({
                          type: "GET",
                          url: "mg/"+document.getElementById('text_url').value,
                          success: function(json) {
                              if (validateURL(json['results'][0]) == true){
                                  var table=document.getElementById("results");
                                  var row=table.insertRow(-1);
                                  var cell1=row.insertCell(0);
                                  var cell2=row.insertCell(1);
                                  cell1.innerHTML='<a href="https://megalodon.jp" target="_blank"> Megalodon.jp </a>'
                                  cell2.innerHTML='<a href="'+json['results'][0]+'" target="_blank"> '+json['results'][0]+' </a>'
                                  document.getElementById('results').style.opacity = 1
                                  document.getElementById('img6').style.opacity = 0
                              }
                          },
                          complete: function(){
                            document.getElementById('img6').style.opacity = 0
                          }
                      });
                  }
                  localStorage.setItem("check_archive_6", true);
            }
            if (document.getElementById('choice-archive4').checked == true){

                  selected_archives = selected_archives + 1;

                  if(arr.indexOf("cc") == -1){
                      document.getElementById('img4').style.opacity = 1
                      $.ajax({
                          type: "GET",
                          url: "cc/"+document.getElementById('text_url').value+'?cc_api_key='+document.getElementById('perma_cc_api').value,
                          success: function(json) {
                              if (validateURL(json['results'][0]) == true){
                                  var table=document.getElementById("results");
                                  var row=table.insertRow(-1);
                                  var cell1=row.insertCell(0);
                                  var cell2=row.insertCell(1);
                                  cell1.innerHTML='<a href="https://perma.cc" target="_blank"> Perma.cc </a>'
                                  cell2.innerHTML='<a href="'+json['results'][0]+'" target="_blank"> '+json['results'][0]+' </a>'
                                  document.getElementById('results').style.opacity = 1
                                  document.getElementById('img4').style.opacity = 0
                              }
                          },
                          complete: function(){
                            document.getElementById('img4').style.opacity = 0
                          }
                      });
                  }
                localStorage.setItem("permaccapikey", document.getElementById('perma_cc_api').value);
                localStorage.setItem("check_archive_4", true);
            }

            if (selected_archives == 0){
                document.getElementById('errors').innerHTML="*Select at least one archive*";
                return;
            }
    }
</script>
</body>
</html>


================================================
FILE: requirements.txt
================================================
flask
requests
pathlib
selenium

================================================
FILE: setup.py
================================================
#!/usr/bin/env python

from setuptools import setup, find_packages
from archivenow import __version__

long_description = open('README.rst').read()
desc = """A Python library to push web resources into public web archives"""


setup(
    name='archivenow',
    version=__version__,
    description=desc,
    long_description=long_description,
    author='Mohamed Aturban',
    author_email='maturban@cs.odu.edu',
    url='https://github.com/maturban/archivenow',
    packages=find_packages(),
    license="MIT",
    classifiers=[
        'Development Status :: 5 - Production/Stable',
        'Programming Language :: Python',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
        'License :: OSI Approved :: MIT License'
    ],
    install_requires=[
        'flask',
        'requests'
    ],
    package_data={
        'archivenow': [
            'handlers/*.*',
            'templates/*.*',
            'static/*.*'
          ]
    },
    entry_points='''
        [console_scripts]
        archivenow=archivenow.archivenow:args_parser
    '''   
)

Download .txt

gitextract_3tv8j6jl/

├── .dockerignore
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.rst
├── archivenow/
│   ├── __init__.py
│   ├── archivenow.py
│   ├── handlers/
│   │   ├── cc_handler.py
│   │   ├── ia_handler.py
│   │   ├── is_handler.py
│   │   ├── mg_handler.py
│   │   └── warc_handler.py
│   └── templates/
│       ├── api.txt
│       └── index.html
├── requirements.txt
└── setup.py

Download .txt

SYMBOL INDEX (23 symbols across 6 files)

FILE: archivenow/archivenow.py
  function bad_request (line 36) | def bad_request(error=None):
  function listArchives_server (line 55) | def listArchives_server(handlers):
  function pushit (line 76) | def pushit(path):
  function push_proxy (line 121) | def push_proxy(hdlr, URIproxy, p_args_proxy, res_uris_idx, session=reque...
  function push (line 130) | def push(URI, arc_id, p_args={}, session=requests.Session()):
  function start (line 176) | def start(port=SERVER_PORT, host=SERVER_IP):
  function load_handlers (line 189) | def load_handlers():
  function args_parser (line 210) | def args_parser():

FILE: archivenow/handlers/cc_handler.py
  class CC_handler (line 4) | class CC_handler(object):
    method __init__ (line 6) | def __init__(self):
    method push (line 11) | def push(self, uri_org, p_args=[], session=requests.Session()):

FILE: archivenow/handlers/ia_handler.py
  class IA_handler (line 3) | class IA_handler(object):
    method __init__ (line 5) | def __init__(self):
    method push (line 10) | def push(self, uri_org, p_args=[], session=requests.Session()):

FILE: archivenow/handlers/is_handler.py
  class IS_handler (line 13) | class IS_handler(object):
    method __init__ (line 15) | def __init__(self):
    method push (line 20) | def push(self, uri_org, p_args=[], session=requests.Session()):

FILE: archivenow/handlers/mg_handler.py
  class MG_handler (line 13) | class MG_handler(object):
    method __init__ (line 15) | def __init__(self):
    method push (line 20) | def push(self, uri_org, p_args=[], session=requests.Session()):

FILE: archivenow/handlers/warc_handler.py
  class WARC_handler (line 5) | class WARC_handler(object):
    method __init__ (line 7) | def __init__(self):
    method push (line 12) | def push(self, uri_org, p_args=[], session=requests.Session()):

Download .json

Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (57K chars).

[
  {
    "path": ".dockerignore",
    "chars": 35,
    "preview": ".git\n.gitignore\nLICENSE\nDockerfile\n"
  },
  {
    "path": ".gitignore",
    "chars": 56,
    "preview": ".DS_Store\narchivenow.egg-info/\nbuild/\ndist/\n__pycache__\n"
  },
  {
    "path": "Dockerfile",
    "chars": 277,
    "preview": "ARG PYTAG=latest\nFROM python:${PYTAG}\nLABEL maintainer \"Mohamed Aturban <mohsci1@yahoo.com>\"\n\nWORKDIR /app\nCOPY requirem"
  },
  {
    "path": "LICENSE",
    "chars": 1107,
    "preview": "MIT License\n\nCopyright (c) 2017 ODU Web Science / Digital Libraries Research Group\n\nPermission is hereby granted, free o"
  },
  {
    "path": "README.rst",
    "chars": 11905,
    "preview": "Archive Now (archivenow)\n=============================\nA Tool To Push Web Resources Into Web Archives\n------------------"
  },
  {
    "path": "archivenow/__init__.py",
    "chars": 34,
    "preview": "__version__ = '2020.7.18.12.19.44'"
  },
  {
    "path": "archivenow/archivenow.py",
    "chars": 11434,
    "preview": "#!/usr/bin/env python\nimport os\nimport re\nimport sys\nimport uuid\nimport glob\nimport json\nimport importlib\nimport argpars"
  },
  {
    "path": "archivenow/handlers/cc_handler.py",
    "chars": 1589,
    "preview": "import requests\nimport json\n\nclass CC_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self"
  },
  {
    "path": "archivenow/handlers/ia_handler.py",
    "chars": 2459,
    "preview": "import requests\n\nclass IA_handler(object):\n\n    def __init__(self):\n        self.enabled = True\n        self.name = 'The"
  },
  {
    "path": "archivenow/handlers/is_handler.py",
    "chars": 2610,
    "preview": "import os\nimport requests\nimport sys\nfrom selenium.webdriver.firefox.options import Options\nfrom selenium import webdriv"
  },
  {
    "path": "archivenow/handlers/mg_handler.py",
    "chars": 2288,
    "preview": "# encoding: utf-8\nimport os\nimport requests\nimport sys\nfrom selenium.webdriver.firefox.options import Options\nfrom selen"
  },
  {
    "path": "archivenow/handlers/warc_handler.py",
    "chars": 1421,
    "preview": "import requests\nimport os.path\nimport distutils.spawn\n\nclass WARC_handler(object):\n\n    def __init__(self):\n        self"
  },
  {
    "path": "archivenow/templates/api.txt",
    "chars": 1544,
    "preview": "<!--   <h4 id=\"archivenow_api\">Archive Now API</h4>\n  <h5 id=\"archivenow_api1\">To push a web page into particular web ar"
  },
  {
    "path": "archivenow/templates/index.html",
    "chars": 15774,
    "preview": "<html>\n<head>\n<style>\n\n.reveal-if-active {\n  opacity: 0;\n  max-height: 0;\n  overflow: hidden;\n  font-size: 14px;\n  -webk"
  },
  {
    "path": "requirements.txt",
    "chars": 31,
    "preview": "flask\nrequests\npathlib\nselenium"
  },
  {
    "path": "setup.py",
    "chars": 1249,
    "preview": "#!/usr/bin/env python\n\nfrom setuptools import setup, find_packages\nfrom archivenow import __version__\n\nlong_description "
  }
]

About this extraction

This page contains the full source code of the oduwsdl/archivenow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (52.6 KB), approximately 13.7k tokens, and a symbol index with 23 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo