Full Code of edsu/etudier for AI

main f059595c1747 cached

6 files

24.0 KB

6.4k tokens

11 symbols

1 requests

Download .txt

Repository: edsu/etudier
Branch: main
Commit: f059595c1747
Files: 6
Total size: 24.0 KB

Directory structure:
gitextract_u49eay6p/

├── .gitignore
├── MANIFEST.in
├── README.md
├── etudier/
│   ├── __init__.py
│   └── network.html
└── pyproject.toml

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
__pycache__
dist
etudier.egg-info/
Pipfile*
build
uv.lock


================================================
FILE: MANIFEST.in
================================================
include etudier/network.html



================================================
FILE: README.md
================================================
![Étudier in Action](figure.gif)

*étudier* is a small Python program that uses [Selenium], [requests-html] and
[networkx] to drive a *non-headless* browser to collect a citation graph around
a particular [Google Scholar] citation or set of search results. The resulting
network is written out as [GEXF] and [GraphML] files as well as an HTML file
that includes a [D3] network visualization (pictured above).

If you are wondering why it uses a non-headless browser it's because Google is
[quite protective] of this data and will routinely ask you to solve a captcha
(identifying street signs, cars, etc in photos) to prove you are not a bot.
*étudier* allows you to complete these captcha tasks when they occur and then it
continues on its way collecting data. You need to have a browser to interact
with in order to do your part.

Install
-------

You'll need to install [ChromeDriver] before doing anything else. If you use
Homebrew on OS X this is as easy as:

    brew cask install chromedriver

Then you'll want to install [Python 3] and:

    pip3 install etudier

Run
---

To use étudier you first need to navigate to a page on Google Scholar that you are
interested in, for example here is the page of citations that reference Sherry
Ortner's [Theory in Anthropology since the Sixties]. Then you start *etudier* up
pointed at that page.

    % etudier 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc='

If you are interested in starting with keyword search results in Google Scholar
you can do that too. For example here is the url for searching for "cscw memory"
if I was interested in papers that talk about the CSCW conference and memory:

    % etudier 'https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=cscw+memory&btnG='

Note: it's important to quote the URL so that the shell doesn't interpret the
ampersands as an attempt to background the process.

### --pages

By default *étudier* will collect the 10 citations on that page and then look at
the top 10 citations that reference each one. So you will end up with no more
than 100 citations being collected (10 on each page * 10 citations).

If you would like to get more than one page of results use the `--pages`. For
example this would result in no more than 400 (20 * 20) results being collected:

    % etudier --pages 2 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc=' 

### --depth

And finally if you would like to look at the citations of the citations you use the
--depth parameter. 

    % etudier --depth 2 'https://scholar.google.com/scholar?start=0&hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc='

This will collect the initial set of 10 citations, the top 10 citations for
each, and then the top 10 citations of each of those, so no more than 1000
citations 1000 citations (10 * 10 * 10). It's no more because there is certain
to be some cross-citation duplication.

### --output

By default `output.gexf`, `output.graphml` and `output.html` files will be
written to the current working directory, but you can change this with the
`--output` option to control the prefix that is used. The output file will
contain rudimentary metadata collected from Google Scholar including:

- *id* - the cluster identifier assigned by Google
- *url* - the url for the publication
- *title* - the title of the publication
- *authors* - a comma separated list of the publication authors
- *year* - the year of publication
- *cited-by* - the number of other publications that cite the publication
- *cited-by-url* - a Google Scholar URL for the list of citing publications
* modularity - the modularity value obtained from community detection

Features of HTML/D3 output
--------------------------

- Node's color shows its citation group
- Node's size shows its times being cited
- Click node to open its source website
- Dragable nodes
- Zoom and pan
- Double-click to center node
- Resizable window
- Text labels
- Hover to highlight 1st-order neighborhood
- Click and press node to fade surroundings

[Theory in Anthropology since the Sixties]: https://scholar.google.com/scholar?hl=en&as_sdt=20000005&sciodt=0,21&cites=17950649785549691519&scipsc=
[Google Scholar]: https://scholar.google.com
[Selenium]: https://docs.seleniumhq.org/
[requests-html]: http://html.python-requests.org/
[quite protective]: https://www.quora.com/Are-there-technological-or-logistical-challenges-that-explain-why-Google-does-not-have-an-official-API-for-Google-Scholar
[GEXF]: https://gephi.org/
[GraphML]: https://networkx.org/documentation/stable/reference/readwrite/graphml.html
[networkx]: https://networkx.github.io/
[D3]: https://d3js.org/
[Python 3]: https://www.python.org/downloads/
[ChromeDriver]: https://sites.google.com/a/chromium.org/chromedriver/


================================================
FILE: etudier/__init__.py
================================================
#!/usr/bin/env python

import re
import sys
import json
import time
import random
import argparse
import networkx
import requests_html

from pathlib import Path
from string import Template
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from urllib.parse import urlparse, parse_qs
from networkx.algorithms.community.modularity_max import greedy_modularity_communities


seen = set()
driver = None

def main():
    global driver

    parser = argparse.ArgumentParser()
    parser.add_argument('url', help="URL for a Google Scholar search to start collecting from")
    parser.add_argument('--depth', type=int, default=1, help="depth of the crawl in terms of levels of citation (defaults to 1)")
    parser.add_argument('--pages', type=int, default=1, help="breadth of the crawl in terms of number of pages of results (defaults to 1)")
    parser.add_argument('--output', type=str, default='output', help="file prefix to use for the output files (defaults to 'output')")
    parser.add_argument('--debug', action="store_true", default=False, help="display diagnostics during the crawl")
    args = parser.parse_args()

    # ready to start up headless browser
    driver = webdriver.Chrome()

    # create our graph that will get populated
    g = networkx.DiGraph()

    # iterate through all the citation links
    for from_pub, to_pub in get_citations(args.url, depth=args.depth, pages=args.pages):
        if args.debug:
            print('from: %s' % json.dumps(from_pub))
        g.add_node(from_pub['id'], label=from_pub['title'], **remove_nones(from_pub))
        if to_pub:
            if args.debug:
                print('to: %s' % json.dumps(to_pub))
            print('%s -> %s' % (from_pub['id'], to_pub['id']))
            g.add_node(to_pub['id'], label=to_pub['title'], **remove_nones(to_pub))
            g.add_edge(from_pub['id'], to_pub['id'])

    # cluster the nodes using neighborhood detection
    write_output(g, args)

    # close the browser
    driver.close()

def to_json(g):
    """
    Source and target of links are index of corresponding nodes.
    """
    j = {"nodes": [], "links": []}
    for node_id, node_attrs in g.nodes(True):
        node_attrs['id'] = node_id
        j["nodes"].append(node_attrs)
    for source, target, attrs in g.edges(data=True):
        index = 0
        for node_id, node_attrs in g.nodes(True):
            if source == node_id:
                source = index
            if target == node_id:
                target = index
            index += 1
        j["links"].append({
            "source": source,
            "target": target
        })

    return j

def cluster_nodes(g):
    """
    Use Clauset-Newman-Moore greedy modularity maximization to cluster nodes.
    """
    undirected_g = networkx.Graph(g)
    for i, comm in enumerate(greedy_modularity_communities(undirected_g)):
        for node in comm:
            g.nodes[node]['modularity'] = i
    return g

def get_cluster_id(url):
    """
    Google assign a cluster identifier to a group of web documents
    that appear to be the same publication in different places on the web.
    How they do this is a bit of a mystery, but this identifier is
    important since it uniquely identifies the publication.
    """
    vals = parse_qs(urlparse(url).query).get('cluster', [])
    if len(vals) == 1:
        return vals[0]
    else:
        vals = parse_qs(urlparse(url).query).get('cites', [])
        if len(vals) == 1:
            return vals[0]
    return None

def get_id(e):
    """
    Determining the publication id is tricky since it involves looking
    in the element for the various places a cluster id can show up.
    If it can't find one it will use the data-cid which should be
    usable since it will be a dead end anyway: Scholar doesn't know of
    anything that cites it.
    """
    for a in e.find('.gs_fl a'):
        if 'Cited by' in a.text:
            return get_cluster_id(a.attrs['href'])
        elif 'versions' in a.text:
            return get_cluster_id(a.attrs['href'])
    return e.attrs.get('data-cid')

def get_citations(url, depth=1, pages=1):
    """
    Given a page of citations it will return bibliographic information
    for the source, target of a citation.
    """
    if url in seen:
        return

    html = get_html(url)
    seen.add(url)

    # get the publication that these citations reference.
    # Note: this can be None when starting with generic search results

    a = html.find('#gs_res_ccl_top a', first=True)
    if a:
        to_pub = {
            'id': get_cluster_id(url),
            'title': a.text,
        }
        # try to get the total results for the item we are searching within
        results = html.find('#gs_ab_md .gs_ab_mdw', first=True)
        if results:
            m = re.search('([0-9,]+) results', results.text)
            if m:
                to_pub['cited_by'] = int(m.group(1).replace(',', ''))
    else:
        to_pub = None

    for e in html.find('#gs_res_ccl_mid .gs_r'):
        from_pub = get_metadata(e, to_pub)
        if from_pub:
            yield from_pub, to_pub
        else:
            continue

        # depth first search if we need to go deeper
        if depth > 0 and from_pub['cited_by_url']:
            yield from get_citations(
                from_pub['cited_by_url'],
                depth=depth-1,
                pages=pages
            )

    # get the next page if that's what they wanted
    if pages > 1:
        for link in html.find('#gs_n a'):
            if link.text == 'Next':
                yield from get_citations(
                    'https://scholar.google.com' + link.attrs['href'],
                    depth=depth,
                    pages=pages-1
                )

def get_metadata(e, to_pub):
    """
    Fetch the citation metadata from a citation element on the page.
    """
    article_id = get_id(e)
    if not article_id:
        return None

    a = e.find('.gs_rt a', first=True)
    if a:
        url = a.attrs['href']
        title = a.text
    else:
        url = None
        title = e.find('.gs_rt .gs_ctu', first=True).text

    authors = source = website = None
    meta = e.find('.gs_a', first=True).text
    meta_parts = [m.strip() for m in re.split(r'\W-\W', meta)]
    if len(meta_parts) == 3:
        authors, source, website = meta_parts
    elif len(meta_parts) == 2:
        authors, source = meta_parts

    if source and ',' in source:
        year = source.split(',')[-1].strip()
    else:
        year = source

    cited_by = cited_by_url = None
    for a in e.find('.gs_fl a'):
        if 'Cited by' in a.text:
            cited_by = a.search('Cited by {:d}')[0]
            cited_by_url = 'https://scholar.google.com' + a.attrs['href']

    return {
        'id': article_id,
        'url': url,
        'title': title,
        'authors': authors,
        'year': year,
        'cited_by': cited_by,
        'cited_by_url': cited_by_url,
    }

def get_html(url):
    """
    get_html uses selenium to drive a browser to fetch a URL, and return a
    requests_html.HTML object for it.
    
    If there is a captcha challenge it will alert the user and wait until 
    it has been completed.
    """
    global driver

    if driver is None:
        raise Exception("driver is not configured!")

    time.sleep(random.randint(1,5))
    driver.get(url)
    while True:
        try:
            driver.find_element(By.CSS_SELECTOR, '#gs_captcha_ccl,#recaptcha')
        except NoSuchElementException:

            try:
                html = driver.find_element(By.CSS_SELECTOR,'#gs_top').get_attribute('innerHTML')
                return requests_html.HTML(html=html)
            except NoSuchElementException:
                print("google has blocked this browser, reopening")
                driver.close()
                driver = webdriver.Chrome()
                return get_html(url)

        print("... it's CAPTCHA time!\a ...")
        time.sleep(5)

def remove_nones(d):
    new_d = {}
    for k, v in d.items():
        if v is not None:
            new_d[k] = v
    return new_d

def write_output(g, args):
    cluster_nodes(g)
    networkx.write_gexf(g, '%s.gexf' % args.output)
    networkx.write_graphml(g, '%s.graphml' % args.output)
    write_html(g, '%s.html' % args.output)

def write_html(g, output):
    graph_json = json.dumps(to_json(g), indent=2)
    html_file = Path(__file__).parent / "network.html"
    opts = ' '.join(sys.argv[1:])
    tmpl = Template(html_file.open().read())
    html = tmpl.substitute({
        "__OPTIONS__": opts,
        "__GRAPH_JSON__": graph_json
    })
    Path(output).open('w').write(html)

if __name__ == "__main__":
    main()


================================================
FILE: etudier/network.html
================================================
<!DOCTYPE html>

<!--

This Google Scholar network visualization was generated with
https://github.com/edsu/etudier using the following command:

% etudier $__OPTIONS__

--> 

<html>
  <head>
    <meta charset="utf-8" />
    <style>
      body {
        overflow: hidden;
        margin: 0;
      }

      text {
        font-family: sans-serif;
        pointer-events: none;
      }
    </style>
  </head>

  <body>
    <script src="https://d3js.org/d3.v3.min.js"></script>
    <script>
      var graph = $__GRAPH_JSON__;
      var w = window.innerWidth;
      var h = window.innerHeight;

      var focusNode = null;
      var highlightNode = null;

      var textCenter = false;
      var outline = false;

      var minScore = Math.min(...graph.nodes.map(n => n.modularity));
      var maxScore = Math.max(...graph.nodes.map(n => n.modularity));

      var color = d3.scale
        .linear()
        .domain([
          minScore,
          (minScore + maxScore) / 4,
          (minScore + maxScore) / 2,
          ((minScore + maxScore) * 3) / 4,
          maxScore,
        ])
        .range(["lime", "yellow", "red", "deepskyblue"]);

      var highlightColor = "blue";
      var highlightTrans = 0.1;

      const citedBy = graph.nodes
        .map(n => n.cited_by)
        .filter(n => n != null)

      const maxCitedBy = Math.max(...citedBy)
      const minCitedBy = Math.min(...citedBy)

      var size = d3.scale
        .pow()
        .exponent(1)
        .domain([minCitedBy, maxCitedBy])
        .range([8, 24]);

      var force = d3.layout
        .force()
        .linkDistance(h / (graph.nodes.length / 10))
        .charge(-300)
        .size([w, h]);

      var defaultNodeColor = "#ccc";
      var defaultLinkColor = "#888";
      var nominalBaseNodeSize = 8;
      var nominalTextSize = 10;
      var maxTextSize = 24;
      var nominalStroke = 1.5;
      var maxStroke = 4.5;
      var maxBaseNodeSize = 36;
      var minZoom = 0.1;
      var maxZoom = 7;
      var svg = d3.select("body").append("svg");
      var zoom = d3.behavior.zoom().scaleExtent([minZoom, maxZoom]);
      var g = svg.append("g");
      svg.style("cursor", "move");

      var linkedByIndex = {};
      graph.links.forEach(function (d) {
        linkedByIndex[d.source + "," + d.target] = true;
      });

      function isConnected(a, b) {
        return (
          linkedByIndex[a.index + "," + b.index] ||
          linkedByIndex[b.index + "," + a.index] ||
          a.index == b.index
        );
      }

      force.size([w, h]);

      force
        .nodes(graph.nodes)
        .links(graph.links)
        .start();

      function getLine(data) {

        const x1 = data.source.x;
        const y1= data.source.y;
        const x2 = data.target.x;
        const y2 = data.target.y;

        const r = size(data.target.cited_by) + 1;

        const m = (y2 - y1) / (x2 - x1);
        const b = y1 - m * x1;

        const c = Math.sqrt(Math.pow((y2 - y1), 2) + Math.pow((x2 - x1), 2))
        const a = y2 - y1
        const cos = a / c

        const a2 = cos * r
        const b2 = Math.sqrt(Math.pow(r, 2) - Math.pow(a2, 2))

        const x = x2 > x1 ? x2 - b2 : x2 + b2;
        const y = y2 - a2;

        const path = 'M ' + data.source.x + ',' + data.source.y + ' L ' + x + ',' + y;
        return path;
      }

      var link = g
        .selectAll(".link")
        .data(graph.links)
        .enter()
        .append("svg:path")
        .attr("d", getLine) 
        .attr("stroke", defaultLinkColor)
        .attr("fill", "red")
        .style("stroke-width", nominalStroke)
        .style("marker-end", "url(#end)")

      var node = g
        .selectAll(".node")
        .data(graph.nodes)
        .enter()
        .append("g")
        .attr("class", "node")
        .call(force.drag);

      var timeout = null;

      node.on("dblclick", function (d) {
        clearTimeout(timeout);

        timeout = setTimeout(function () {
          window.open(d.url, "_blank");
          d3.event.stopPropagation();
        }, 300);
      });

      var tocolor = "fill";
      var towhite = "stroke";
      if (outline) {
        tocolor = "stroke";
        towhite = "fill";
      }

      var circle = node
        .append("path")
        .attr(
          "d",
          d3.svg
            .symbol()
            .size(function (d) {
              return (
                Math.PI * Math.pow(size(d.cited_by) || nominalBaseNodeSize, 2)
              );
            })
            .type(function (d) {
              return d.type;
            })
        )
        .style(tocolor, function (d) {
          if (isNumber(d.modularity) && d.modularity >= 0) return color(d.modularity);
          else return defaultNodeColor;
        })
        .style("stroke-width", nominalStroke)
        .style(towhite, "white");

      svg.append("svg:defs").selectAll("marker")
	  .data(["end"])
	.enter().append("svg:marker")
	  .attr("id", String)
	  .attr("viewBox", "0 -5 10 10")
	  .attr("refX", 10)
	  .attr("refY", 0)
	  .attr("markerWidth", 6)
	  .attr("markerHeight", 6)
	  .attr("orient", "auto")
          .style("fill", defaultLinkColor)
	.append("svg:path")
	  .attr("d", "M 0,-5 L 10,0 L 0,5")
          .style("stroke", defaultLinkColor);

      var text = g
        .selectAll(".text")
        .data(graph.nodes)
        .enter()
        .append("text")
        .attr("dy", ".35em")
        .style("font-size", nominalTextSize + "px");

      node
        .on("mouseover", function (d) {
          setHighlight(d);
        })
        .on("mousedown", function (d) {
          d3.event.stopPropagation();
          focusNode = d;
          setFocus(d);
          if (highlightNode === null) setHighlight(d);
        })
        .on("mouseout", function (d) {
          exitHighlight();
        });

      d3.select(window).on("mouseup", function () {
        if (focusNode !== null) {
          focusNode = null;
          if (highlightTrans < 1) {
            circle.style("opacity", 1);
            text.style("opacity", 1);
            link.style("opacity", 1);
          }
        }

        if (highlightNode === null) exitHighlight();
      });

      function exitHighlight() {
        highlightNode = null;
        if (focusNode === null) {
          svg.style("cursor", "move");
          if (highlightColor != "white") {
            circle.style(towhite, "white");
            text.text('')
            link.style("stroke", function (o) {
              return isNumber(o.score) && o.score >= 0
                ? color(o.score)
                : defaultLinkColor;
            });
          }
        }
      }

      function setFocus(d) {
        if (highlightTrans < 1) {
          circle.style("opacity", function (o) {
            return isConnected(d, o) ? 1 : highlightTrans;
          });

          text.style("opacity", function (o) {
            return isConnected(d, o) ? 1 : highlightTrans;
          });

          link.style("opacity", function (o) {
            return o.source.index == d.index || o.target.index == d.index
              ? 1
              : highlightTrans;
          });
        }
      }

      function setHighlight(d) {
        svg.style("cursor", "pointer");
        if (focusNode !== null) d = focusNode;
        highlightNode = d;

        if (highlightColor != "white") {

          circle.style(towhite, function (o) {
            return isConnected(d, o) ? highlightColor : "white";
          });
          
          text.attr("dx", function (d) {
            return size(d.cited_by)
          });

          text.text(function (o) {
            if (isConnected(d, o)) {
              let title = o.title;
              if (o.year) title = title + " (" + o.year + ")";
              if (o.authors) title = title + " - " + o.authors;
              return title
            } else {
              return ""
            }
          });

        }
      }

      zoom.on("zoom", function () {
        var stroke = nominalStroke;
        if (nominalStroke * zoom.scale() > maxStroke)
          stroke = maxStroke / zoom.scale();
        link.style("stroke-width", stroke);
        circle.style("stroke-width", stroke);

        var baseRadius = nominalBaseNodeSize;
        if (nominalBaseNodeSize * zoom.scale() > maxBaseNodeSize)
          baseRadius = maxBaseNodeSize / zoom.scale();
        circle.attr(
          "d",
          d3.svg
            .symbol()
            .size(function (d) {
              return (
                Math.PI *
                Math.pow(
                  (size(d.cited_by) * baseRadius) / nominalBaseNodeSize ||
                    baseRadius,
                  2
                )
              );
            })
        );

        if (!textCenter)
          text.attr("dx", function (d) {
            return (
              (size(d.cited_by) * baseRadius) / nominalBaseNodeSize ||
              baseRadius
            );
          });

        var textSize = nominalTextSize;
        if (nominalTextSize * zoom.scale() > maxTextSize)
          textSize = maxTextSize / zoom.scale();
        text.style("font-size", textSize + "px");

        g.attr(
          "transform",
          "translate(" + d3.event.translate + ")scale(" + d3.event.scale + ")"
        );
      });

      svg.call(zoom);

      resize();
      d3.select(window).on("resize", resize);

      force.on("tick", function () {
        node.attr("transform", function (d) {
          return "translate(" + d.x + "," + d.y + ")";
        });
        text.attr("transform", function (d) {
          return "translate(" + d.x + "," + d.y + ")";
        });

        link.attr("d", getLine)

        node
          .attr("cx", function (d) {
            return d.x;
          })
          .attr("cy", function (d) {
            return d.y;
          });
      });

      function resize() {
        var width = window.innerWidth,
          height = window.innerHeight;
        svg.attr("width", width).attr("height", height);

        force
          .size([
            force.size()[0] + (width - w) / zoom.scale(),
            force.size()[1] + (height - h) / zoom.scale(),
          ])
          .resume();
        w = width;
        h = height;
      }

      function isNumber(n) {
        return !isNaN(parseFloat(n)) && isFinite(n);
      }

    </script>
  </body>
</html>


================================================
FILE: pyproject.toml
================================================
[project]
name = "etudier"
version = "0.2.1"
description = "Collect a citation graph from Google Scholar"
authors = [{name = "Ed Summers", email = "ehs@pobox.com"}]
readme = "README.md"
requires-python = ">=3.8"
dependencies = [
    "selenium>=4.7",
    "requests>=2.28",
    "requests-html>=0.10",
    "networkx>=2.8",
    "lxml-html-clean>=0.3.1",
]

[project.scripts]
etudier = "etudier:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Download .txt

gitextract_u49eay6p/

├── .gitignore
├── MANIFEST.in
├── README.md
├── etudier/
│   ├── __init__.py
│   └── network.html
└── pyproject.toml

Download .txt

SYMBOL INDEX (11 symbols across 1 files)

FILE: etudier/__init__.py
  function main (line 24) | def main():
  function to_json (line 59) | def to_json(g):
  function cluster_nodes (line 82) | def cluster_nodes(g):
  function get_cluster_id (line 92) | def get_cluster_id(url):
  function get_id (line 108) | def get_id(e):
  function get_citations (line 123) | def get_citations(url, depth=1, pages=1):
  function get_metadata (line 177) | def get_metadata(e, to_pub):
  function get_html (line 222) | def get_html(url):
  function remove_nones (line 254) | def remove_nones(d):
  function write_output (line 261) | def write_output(g, args):
  function write_html (line 267) | def write_html(g, output):

Download .json

Condensed preview — 6 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (26K chars).

[
  {
    "path": ".gitignore",
    "chars": 58,
    "preview": "__pycache__\ndist\netudier.egg-info/\nPipfile*\nbuild\nuv.lock\n"
  },
  {
    "path": "MANIFEST.in",
    "chars": 30,
    "preview": "include etudier/network.html\n\n"
  },
  {
    "path": "README.md",
    "chars": 4870,
    "preview": "![Étudier in Action](figure.gif)\n\n*étudier* is a small Python program that uses [Selenium], [requests-html] and\n[network"
  },
  {
    "path": "etudier/__init__.py",
    "chars": 8778,
    "preview": "#!/usr/bin/env python\n\nimport re\nimport sys\nimport json\nimport time\nimport random\nimport argparse\nimport networkx\nimport"
  },
  {
    "path": "etudier/network.html",
    "chars": 10340,
    "preview": "<!DOCTYPE html>\n\n<!--\n\nThis Google Scholar network visualization was generated with\nhttps://github.com/edsu/etudier usin"
  },
  {
    "path": "pyproject.toml",
    "chars": 471,
    "preview": "[project]\nname = \"etudier\"\nversion = \"0.2.1\"\ndescription = \"Collect a citation graph from Google Scholar\"\nauthors = [{na"
  }
]

About this extraction

This page contains the full source code of the edsu/etudier GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 6 files (24.0 KB), approximately 6.4k tokens, and a symbol index with 11 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo