Documentation

JSON API & AI Agent Integration

GitExtract is a free service at gitextract.com that converts any GitHub repository into AI-ready text. Use it through the web UI or the JSON API below. No API key needed.

JSON API

One endpoint. Send a GitHub repo URL, get back the full source code, file tree, and symbol index as JSON. No authentication required.

GET https://gitextract.com/api/v1/ingest?url=...

Parameters

Name Type Description
url string GitHub URL or shorthand like owner/repo required
branch string Branch name (defaults to repo's default branch)
subpath string Subdirectory to focus on (default: entire repo)
include_patterns string Comma-separated globs to include, e.g. *.py,*.js
exclude_patterns string Comma-separated globs to exclude, e.g. tests/*,docs/*
max_file_size_kb int Max file size in KB (default: 10240, range: 64–20480)
include_json bool Include structured content_json array in response (default: false)

Basic example

GET https://gitextract.com/api/v1/ingest?url=fastapi/fastapi

Example with optional parameters

GET https://gitextract.com/api/v1/ingest?url=fastapi/fastapi&include_patterns=*.py&subpath=fastapi

Example response

{
  "full_name": "fastapi/fastapi",
  "branch": "master",
  "commit": "a1b2c3d4e5f6...",
  "subpath": "/",
  "file_count": 15,
  "total_size": 245760,
  "token_count": 52000,
  "symbol_count": 128,
  "cached": false,
  "content_json": null,
  "summary": "Repository: fastapi/fastapi\nBranch: master\nCommit: a1b2c3d\nFiles analysed: 15\nEstimated tokens: 52,000",
  "tree": "fastapi/\n  __init__.py\n  applications.py\n  routing.py\n  ...",
  "content": "================================================\nFile: fastapi/__init__.py\n================================================\nfrom .applications import FastAPI\n...",
  "symbol_index": "SYMBOL INDEX (128 symbols across 15 files)\n\nFILE: fastapi/applications.py\n  class FastAPI (line 30) | class FastAPI(Starlette):\n    method setup (line 85) | def setup(self) -> None:\n    method add_api_route (line 120) | def add_api_route(...):\n  ..."
}

Example with structured JSON

GET https://gitextract.com/api/v1/ingest?url=fastapi/fastapi&include_json=true

When include_json=true, the response includes a content_json array with each file as a structured object:

{
  "content_json": [
    {
      "path": "fastapi/__init__.py",
      "content": "from .applications import FastAPI\n..."
    },
    {
      "path": "fastapi/applications.py",
      "content": "from starlette.applications import Starlette\n..."
    }
  ],
  ...
}

Tip: Results are cached by repo + commit hash. Repeated requests for the same repo return instantly with "cached": true.

Tip: Use include_json=true to get a structured content_json array where each file is a separate {path, content} object — useful for programmatic file access without parsing the text dump.

AI Agent Skill

AI agents like OpenClaw, Claude, Cursor, and others can call the GitExtract API to instantly understand any GitHub codebase. Copy the skill definition below to give your agent access to GitExtract.

What agents can do

Understand a codebase

Extract a repo, review the symbol index for structure, then read specific files from the content.

Compare repositories

Extract two repos and compare their architecture and symbol indexes side by side.

Focus on specific code

Use include/exclude patterns and subpath to extract only the files that matter.

SKILL.md

For AI agents like OpenClaw that support skills, create a SKILL.md file with the content below.

---
name: gitextract
description: Extract full source code, file tree, and symbol index from any
  GitHub repository. Free, no API key needed.
---

# GitExtract

Convert any GitHub repo into AI-ready text at https://gitextract.com

## API

GET https://gitextract.com/api/v1/ingest?url={owner/repo}

Optional params: branch, subpath, include_patterns, exclude_patterns, max_file_size_kb, include_json

### Basic example

GET https://gitextract.com/api/v1/ingest?url=fastapi/fastapi

### Example with optional parameters

GET https://gitextract.com/api/v1/ingest?url=fastapi/fastapi&include_patterns=*.py&subpath=fastapi

### Example response

```json
{
  "full_name": "fastapi/fastapi",
  "branch": "master",
  "commit": "a1b2c3d...",
  "file_count": 15,
  "total_size": 245760,
  "token_count": 52000,
  "symbol_count": 128,
  "cached": false,
  "content_json": null,
  "summary": "Repository: fastapi/fastapi\nBranch: master\n...",
  "tree": "fastapi/\n  __init__.py\n  applications.py\n  ...",
  "content": "================================================\nFile: fastapi/__init__.py\n================================================\n...",
  "symbol_index": "SYMBOL INDEX (128 symbols across 15 files)\n\nFILE: fastapi/applications.py\n  class FastAPI (line 30)\n    method setup (line 85)\n  ..."
}
```

Set include_json=true to get content_json as an array of {path, content} objects
for structured file access.

## Symbol Index

The symbol_index field maps functions, classes, methods, constants, and types
via tree-sitter AST parsing. 15 languages: Python, JS, TS, TSX, Go, Java,
Rust, C, C++, C#, Ruby, PHP, Dart, Elixir, SQL.

Symbol Index

Every extraction includes a symbol index — a structural map of the codebase built using tree-sitter AST parsing. It's returned as a separate field alongside the file content, so consumers can choose whether to use it.

Extracted symbol types

functions classes methods constants types

Supported languages (15)

Python, JavaScript, TypeScript, TSX, Go, Java, Rust, C, C++, C#, Ruby, PHP, Dart, Elixir, SQL

Example output

SYMBOL INDEX (42 symbols across 8 files)

FILE: src/main.py
  class UserService (line 15) | class UserService:
    method login (line 23) | def login(self, username, password):
    method logout (line 45) | def logout(self):
  function main (line 60) | def main():

FILE: src/utils.py
  function parse_config (line 1) | def parse_config(path: str) -> dict:
  constant MAX_RETRIES (line 30) | MAX_RETRIES = 3

Where it's available

  • Web UI — dedicated "Symbol Index" tab on the result page
  • JSON APIsymbol_index and symbol_count fields in the response
  • Web UI (Structured JSON tab) — dedicated tab on the result page with copy and download buttons
  • JSON API — add include_json=true to get a content_json array with each file as a separate {path, content} object

Built by Nikandr Surkov

Copied to clipboard!