Full Code of coderamp-labs/gitingest for AI

main 4e259a02fe72 cached
110 files
391.9 KB
98.9k tokens
238 symbols
1 requests
Download .txt
Showing preview only (420K chars total). Download the full file or copy to clipboard to get everything.
Repository: coderamp-labs/gitingest
Branch: main
Commit: 4e259a02fe72
Files: 110
Total size: 391.9 KB

Directory structure:
gitextract_380b_654/

├── .docker/
│   └── minio/
│       └── setup.sh
├── .dockerignore
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   └── workflows/
│       ├── ci.yml
│       ├── codeql.yml
│       ├── dependency-review.yml
│       ├── deploy-pr.yml
│       ├── docker-build.ecr.yml
│       ├── docker-build.ghcr.yml
│       ├── pr-title-check.yml
│       ├── publish_to_pypi.yml
│       ├── rebase-needed.yml
│       ├── release-please.yml
│       ├── scorecard.yml
│       └── stale.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .release-please-manifest.json
├── .vscode/
│   └── launch.json
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── SECURITY.md
├── compose.yml
├── eslint.config.cjs
├── pyproject.toml
├── release-please-config.json
├── renovate.json
├── requirements-dev.txt
├── requirements.txt
├── src/
│   ├── gitingest/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── clone.py
│   │   ├── config.py
│   │   ├── entrypoint.py
│   │   ├── ingestion.py
│   │   ├── output_formatter.py
│   │   ├── query_parser.py
│   │   ├── schemas/
│   │   │   ├── __init__.py
│   │   │   ├── cloning.py
│   │   │   ├── filesystem.py
│   │   │   └── ingestion.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── auth.py
│   │       ├── compat_func.py
│   │       ├── compat_typing.py
│   │       ├── exceptions.py
│   │       ├── file_utils.py
│   │       ├── git_utils.py
│   │       ├── ignore_patterns.py
│   │       ├── ingestion_utils.py
│   │       ├── logging_config.py
│   │       ├── notebook.py
│   │       ├── os_utils.py
│   │       ├── pattern_utils.py
│   │       ├── query_parser_utils.py
│   │       └── timeout_wrapper.py
│   ├── server/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── form_types.py
│   │   ├── main.py
│   │   ├── metrics_server.py
│   │   ├── models.py
│   │   ├── query_processor.py
│   │   ├── routers/
│   │   │   ├── __init__.py
│   │   │   ├── dynamic.py
│   │   │   ├── index.py
│   │   │   └── ingest.py
│   │   ├── routers_utils.py
│   │   ├── s3_utils.py
│   │   ├── server_config.py
│   │   ├── server_utils.py
│   │   └── templates/
│   │       ├── base.jinja
│   │       ├── components/
│   │       │   ├── _macros.jinja
│   │       │   ├── footer.jinja
│   │       │   ├── git_form.jinja
│   │       │   ├── navbar.jinja
│   │       │   ├── result.jinja
│   │       │   └── tailwind_components.html
│   │       ├── git.jinja
│   │       ├── index.jinja
│   │       └── swagger_ui.jinja
│   └── static/
│       ├── js/
│       │   ├── git.js
│       │   ├── git_form.js
│       │   ├── index.js
│       │   ├── navbar.js
│       │   ├── posthog.js
│       │   └── utils.js
│       ├── llms.txt
│       └── robots.txt
└── tests/
    ├── .pylintrc
    ├── __init__.py
    ├── conftest.py
    ├── query_parser/
    │   ├── __init__.py
    │   ├── test_git_host_agnostic.py
    │   └── test_query_parser.py
    ├── server/
    │   ├── __init__.py
    │   └── test_flow_integration.py
    ├── test_cli.py
    ├── test_clone.py
    ├── test_git_utils.py
    ├── test_gitignore_feature.py
    ├── test_ingestion.py
    ├── test_notebook_utils.py
    ├── test_pattern_utils.py
    └── test_summary.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .docker/minio/setup.sh
================================================
#!/bin/sh

# Simple script to set up MinIO bucket and user
# Based on example from MinIO issues

# Format bucket name to ensure compatibility
BUCKET_NAME=$(echo "${S3_BUCKET_NAME}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')

# Configure MinIO client
mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}

# Remove bucket if it exists (for clean setup)
mc rm -r --force myminio/${BUCKET_NAME} || true

# Create bucket
mc mb myminio/${BUCKET_NAME}

# Set bucket policy to allow downloads
mc anonymous set download myminio/${BUCKET_NAME}

# Create user with access and secret keys
mc admin user add myminio ${S3_ACCESS_KEY} ${S3_SECRET_KEY} || echo "User already exists"

# Create policy for the bucket
echo '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["s3:*"],"Resource":["arn:aws:s3:::'${BUCKET_NAME}'/*","arn:aws:s3:::'${BUCKET_NAME}'"]}]}' > /tmp/policy.json

# Apply policy
mc admin policy create myminio gitingest-policy /tmp/policy.json || echo "Policy already exists"
mc admin policy attach myminio gitingest-policy --user ${S3_ACCESS_KEY}

echo "MinIO setup completed successfully"
echo "Bucket: ${BUCKET_NAME}"
echo "Access via console: http://localhost:9001"


================================================
FILE: .dockerignore
================================================
# -------------------------------------------------
# Base: reuse patterns from .gitignore
# -------------------------------------------------

# Operating-system
.DS_Store
Thumbs.db

# Editor / IDE settings
.vscode/
!.vscode/launch.json
.idea/
*.swp

# Python virtual-envs & tooling
.venv*/
.python-version
__pycache__/
*.egg-info/
*.egg
.ruff_cache/

# Test artifacts & coverage
.pytest_cache/
.coverage
coverage.xml
htmlcov/

# Build, distribution & docs
build/
dist/
*.wheel

# Logs & runtime output
*.log
logs/
*.tmp
tmp/

# Project-specific files
history.txt
digest.txt


# -------------------------------------------------
# Extra for Docker
# -------------------------------------------------

# Git history
.git/
.gitignore

# Tests
tests/

# Docs
docs/
*.md
LICENSE

# Local overrides & secrets
.env

# Docker files
.dockerignore
Dockerfile*

# -------------------------------------------------
# Files required during build
# -------------------------------------------------
!pyproject.toml
!src/


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.yml
================================================
name: Bug report 🐞
description: Report a bug or internal server error when using Gitingest
title: "(bug): "
labels: ["bug"]
body:
  - type: markdown
    attributes:
      value: |
        Thanks for taking the time to report a bug! :lady_beetle:

        Please fill out the following details to help us reproduce and fix the issue. :point_down:

  - type: dropdown
    id: interface
    attributes:
      label: Which interface did you use?
      default: 0
      options:
        - "Select one..."
        - Web UI
        - CLI
        - PyPI package
    validations:
      required: true

  - type: input
    id: repo_url
    attributes:
      label: Repository URL (if public)
      placeholder: e.g., https://github.com/<username>/<repo>/commit_branch_or_tag/blob_or_tree/subdir

  - type: dropdown
    id: git_host
    attributes:
      label: Git host
      description: The Git host of the repository.
      default: 0
      options:
        - "Select one..."
        - GitHub (github.com)
        - GitLab (gitlab.com)
        - Bitbucket (bitbucket.org)
        - Gitea (gitea.com)
        - Codeberg (codeberg.org)
        - Gist (gist.github.com)
        - Kaggle (kaggle.com)
        - GitHub Enterprise (github.company.com)
        - Other (specify below)
    validations:
      required: true

  - type: input
    id: git_host_other
    attributes:
      label: Other Git host
      placeholder: If you selected "Other", please specify the Git host here.

  - type: dropdown
    id: repo_visibility
    attributes:
      label: Repository visibility
      default: 0
      options:
        - "Select one..."
        - public
        - private
    validations:
      required: true

  - type: dropdown
    id: revision
    attributes:
      label: Commit, branch, or tag
      default: 0
      options:
        - "Select one..."
        - default branch
        - commit
        - branch
        - tag
    validations:
      required: true

  - type: dropdown
    id: ingest_scope
    attributes:
      label: Did you ingest the full repository or a subdirectory?
      default: 0
      options:
        - "Select one..."
        - full repository
        - subdirectory
    validations:
      required: true

  - type: dropdown
    id: os
    attributes:
      label: Operating system
      default: 0
      options:
        - "Select one..."
        - Not relevant (Web UI)
        - macOS
        - Windows
        - Linux
    validations:
      required: true

  - type: dropdown
    id: browser
    attributes:
      label: Browser (Web UI only)
      default: 0
      options:
        - "Select one..."
        - Not relevant (CLI / PyPI)
        - Chrome
        - Firefox
        - Safari
        - Edge
        - Other (specify below)
    validations:
      required: true

  - type: input
    id: browser_other
    attributes:
      label: Other browser
      placeholder: If you selected "Other", please specify the browser here.

  - type: input
    id: gitingest_version
    attributes:
      label: Gitingest version
      placeholder: e.g., v0.1.5
      description: Not required if you used the Web UI.

  - type: input
    id: python_version
    attributes:
      label: Python version
      placeholder: e.g., 3.11.5
      description: Not required if you used the Web UI.

  - type: textarea
    id: bug_description
    attributes:
      label: Bug description
      placeholder: Describe the bug here.
      description: A detailed but concise description of the bug.
    validations:
      required: true


  - type: textarea
    id: steps_to_reproduce
    attributes:
      label: Steps to reproduce
      placeholder: Include the exact commands or actions that led to the error.
      description: Include the exact commands or actions that led to the error *(if relevant)*.
      render: shell

  - type: textarea
    id: expected_behavior
    attributes:
      label: Expected behavior
      placeholder: Describe what you expected to happen.
      description: Describe what you expected to happen *(if relevant)*.

  - type: textarea
    id: actual_behavior
    attributes:
      label: Actual behavior
      description: Paste the full error message or stack trace here.

  - type: textarea
    id: additional_context
    attributes:
      label: Additional context, logs, or screenshots
      placeholder: Add any other context, links, or screenshots about the issue here.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.yml
================================================
name: Feature request 💡
description: Suggest a new feature or improvement for Gitingest
title: "(feat): "
labels: ["enhancement"]
body:
  - type: markdown
    attributes:
      value: |
        Thanks for taking the time to help us improve **Gitingest**! :sparkles:

        Please fill in the sections below to describe your idea. The more detail you provide, the easier it is for us to evaluate and plan the work. :point_down:

  - type: input
    id: summary
    attributes:
      label: Feature summary
      placeholder: One-sentence description of the feature.
    validations:
      required: true

  - type: textarea
    id: problem
    attributes:
      label: Problem / motivation
      description: What problem does this feature solve? How does it affect your workflow?
      placeholder: Why is this feature important? Describe the pain point or limitation you're facing.
    validations:
      required: true

  - type: textarea
    id: proposal
    attributes:
      label: Proposed solution
      placeholder: Describe what you would like to see happen.
      description: Outline the feature as you imagine it. *(optional)*


  - type: textarea
    id: alternatives
    attributes:
      label: Alternatives considered
      placeholder: List other approaches you've considered or work-arounds you use today.
      description: Feel free to mention why those alternatives don't fully solve the problem.

  - type: dropdown
    id: interface
    attributes:
      label: Which interface would this affect?
      default: 0
      options:
        - "Select one..."
        - Web UI
        - CLI
        - PyPI package
        - CLI + PyPI package
        - All
    validations:
      required: true

  - type: dropdown
    id: priority
    attributes:
      label: How important is this to you?
      default: 0
      options:
        - "Select one..."
        - Nice to have
        - Important
        - Critical
    validations:
      required: true

  - type: dropdown
    id: willingness
    attributes:
      label: Would you like to work on this feature yourself?
      default: 0
      options:
        - "Select one..."
        - Yes, I'd like to implement it
        - Maybe, if I get some guidance
        - No, just requesting (absolutely fine!)
    validations:
      required: true

  - type: dropdown
    id: support_needed
    attributes:
      label: Would you need support from the maintainers (if you're implementing it yourself)?
      default: 0
      options:
        - "Select one..."
        - No, I can handle it solo
        - Yes, I'd need some guidance
        - Not sure yet
        - This is just a suggestion, I'm not planning to implement it myself (absolutely fine!)

  - type: textarea
    id: additional_context
    attributes:
      label: Additional context, screenshots, or examples
      placeholder: Add links, sketches, or any other context that would help us understand and implement the feature.


================================================
FILE: .github/workflows/ci.yml
================================================
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

permissions:
  contents: read

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, macos-latest, windows-latest]
        python-version: ["3.8", "3.13"]
        include:
          - os: ubuntu-latest
            python-version: "3.13"
            coverage: true

    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

      - name: Set up Python
        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install ".[dev,server]"

      - name: Cache pytest results
        uses: actions/cache@v4
        with:
          path: .pytest_cache
          key: ${{ runner.os }}-pytest-${{ matrix.python-version }}-${{ hashFiles('**/pytest.ini') }}
          restore-keys: |
            ${{ runner.os }}-pytest-${{ matrix.python-version }}-

      - name: Run tests
        if: ${{ matrix.coverage != true }}
        run: pytest

      - name: Run tests
        if: ${{ matrix.coverage == true }}
        run: pytest



      - name: Run pre-commit hooks
        uses: pre-commit/action@2c7b3805fd2a0fd8c1884dcaebf91fc102a13ecd # v3.0.1
        if: ${{ matrix.python-version == '3.13' && matrix.os == 'ubuntu-latest' }}


================================================
FILE: .github/workflows/codeql.yml
================================================
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
  push:
    branches: ["main"]
  pull_request:
    # The branches below must be a subset of the branches above
    branches: ["main"]
  schedule:
    - cron: "0 0 * * 1"

permissions:
  contents: read

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-latest
    permissions:
      actions: read
      contents: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: ["javascript", "python"]
        # CodeQL supports [ $supported-codeql-languages ]
        # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - name: Checkout repository
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

      # Initializes the CodeQL tools for scanning.
      - name: Initialize CodeQL
        uses: github/codeql-action/init@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
        with:
          languages: ${{ matrix.language }}
          # If you wish to specify custom queries, you can do so here or in a config file.
          # By default, queries listed here will override any specified in a config file.
          # Prefix the list here with "+" to use these queries and those in the config file.

      # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
      # If this step fails, then you should remove it and run the build manually (see below)
      - name: Autobuild
        uses: github/codeql-action/autobuild@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9

      # ℹ️ Command-line programs to run using the OS shell.
      # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

      #   If the Autobuild fails above, remove it and uncomment the following three lines.
      #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

      # - run: |
      #   echo "Run, Build Application using script"
      #   ./location_of_script_within_repo/buildscript.sh

      - name: Perform CodeQL Analysis
        uses: github/codeql-action/analyze@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
        with:
          category: "/language:${{matrix.language}}"


================================================
FILE: .github/workflows/dependency-review.yml
================================================
# Dependency Review Action
#
# This Action will scan dependency manifest files that change as part of a Pull Request,
# surfacing known-vulnerable versions of the packages declared or updated in the PR.
# Once installed, if the workflow run is marked as required,
# PRs introducing known-vulnerable packages will be blocked from merging.
#
# Source repository: https://github.com/actions/dependency-review-action
name: 'Dependency Review'
on: [pull_request]

permissions:
  contents: read

jobs:
  dependency-review:
    runs-on: ubuntu-latest
    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - name: 'Checkout Repository'
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
      - name: 'Dependency Review'
        uses: actions/dependency-review-action@da24556b548a50705dd671f47852072ea4c105d9 # v4.7.1


================================================
FILE: .github/workflows/deploy-pr.yml
================================================
name: Manage PR Temp Envs
'on':
  pull_request:
    types:
      - labeled
      - unlabeled
      - closed

permissions:
  contents: read
  pull-requests: write

env:
  APP_NAME: gitingest
  FLUX_OWNER: '${{ github.repository_owner }}'
  FLUX_REPO: '${{ secrets.CR_FLUX_REPO }}'

jobs:
  deploy-pr-env:
    if: >-
      ${{ github.event.action == 'labeled' && github.event.label.name ==
      'deploy-pr-temp-env' }}
    runs-on: ubuntu-latest
    steps:
      - name: Create GitHub App token
        uses: actions/create-github-app-token@v2
        id: app-token
        with:
          app-id: '${{ secrets.CR_APP_CI_APP_ID }}'
          private-key: '${{ secrets.CR_APP_CI_PRIVATE_KEY }}'
          owner: '${{ env.FLUX_OWNER }}'
          repositories: '${{ env.FLUX_REPO }}'

      - name: Checkout Flux repo
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          repository: '${{ env.FLUX_OWNER }}/${{ env.FLUX_REPO }}'
          token: '${{ steps.app-token.outputs.token }}'
          path: flux-repo
          persist-credentials: false

      - name: Export PR ID
        shell: bash
        run: 'echo "PR_ID=${{ github.event.pull_request.number }}" >> $GITHUB_ENV'

      - name: Ensure template exists
        shell: bash
        run: >
          T="flux-repo/pr-template/${APP_NAME}"

          [[ -d "$T" ]] || { echo "Missing $T"; exit 1; }

          [[ $(find "$T" -type f | wc -l) -gt 0 ]] || { echo "No files in $T";
          exit 1; }

      - name: Render & copy template
        shell: bash
        run: |
          SRC="flux-repo/pr-template/${APP_NAME}"
          DST="flux-repo/deployments/prs-${APP_NAME}/${PR_ID}"
          mkdir -p "$DST"
          cp -r "$SRC/." "$DST/"
          find "$DST" -type f -print0 \
            | xargs -0 -n1 sed -i "s|@PR-ID@|${PR_ID}|g"

      - name: Sanity‑check rendered output
        shell: bash
        run: >
          E=$(find "flux-repo/pr-template/${APP_NAME}" -type f | wc -l)

          G=$(find "flux-repo/deployments/prs-${APP_NAME}/${PR_ID}" -type f | wc
          -l)

          (( G == E )) || { echo "Expected $E files, got $G"; exit 1; }

      - name: Commit & push creation
        shell: bash
        run: >
          cd flux-repo

          git config user.name  "${{ steps.app-token.outputs.app-slug }}[bot]"

          git config user.email "${{ steps.app-token.outputs.app-slug
          }}[bot]@users.noreply.github.com"

          git add .

          git commit -m "chore(prs-${APP_NAME}): create temp env for PR #${{
          env.PR_ID }} [skip ci]" || echo "Nothing to commit"

          git remote set-url origin \
            https://x-access-token:${{ steps.app-token.outputs.token }}@github.com/${{ env.FLUX_OWNER }}/${{ env.FLUX_REPO }}.git
          git push origin HEAD:main

      - name: Comment preview URL on PR
        uses: thollander/actions-comment-pull-request@v3
        with:
          github-token: '${{ secrets.GITHUB_TOKEN }}'
          pr-number: '${{ github.event.pull_request.number }}'
          comment-tag: 'pr-preview'
          create-if-not-exists: 'true'
          message: |
            🌐 [Preview environment](https://pr-${{ env.PR_ID }}.${{ env.APP_NAME }}.coderamp.dev/) for PR #${{ env.PR_ID }}

            📊 [Log viewer](https://app.datadoghq.eu/logs?query=kube_namespace%3Aprs-gitingest%20version%3Apr-${{ env.PR_ID }})

  remove-pr-env:
    if: >-
      (github.event.action == 'unlabeled' && github.event.label.name ==
      'deploy-pr-temp-env') || (github.event.action == 'closed')
    runs-on: ubuntu-latest
    steps:
      - name: Create GitHub App token
        uses: actions/create-github-app-token@v2
        id: app-token
        with:
          app-id: '${{ secrets.CR_APP_CI_APP_ID }}'
          private-key: '${{ secrets.CR_APP_CI_PRIVATE_KEY }}'
          owner: '${{ env.FLUX_OWNER }}'
          repositories: '${{ env.FLUX_REPO }}'

      - name: Checkout Flux repo
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          repository: '${{ env.FLUX_OWNER }}/${{ env.FLUX_REPO }}'
          token: '${{ steps.app-token.outputs.token }}'
          path: flux-repo
          persist-credentials: false

      - name: Export PR ID
        shell: bash
        run: 'echo "PR_ID=${{ github.event.pull_request.number }}" >> $GITHUB_ENV'

      - name: Remove deployed directory
        shell: bash
        run: |
          DST="flux-repo/deployments/prs-${APP_NAME}/${PR_ID}"
          if [[ -d "$DST" ]]; then
            rm -rf "$DST"
            echo "✅ Deleted $DST"
          else
            echo "⏭️ Nothing to delete at $DST"
          fi

      - name: Commit & push deletion
        shell: bash
        run: >
          cd flux-repo

          git config user.name  "${{ steps.app-token.outputs.app-slug }}[bot]"

          git config user.email "${{ steps.app-token.outputs.app-slug
          }}[bot]@users.noreply.github.com"

          git add -A

          git commit -m "chore(prs-${APP_NAME}): remove temp env for PR #${{
          env.PR_ID }} [skip ci]" || echo "Nothing to commit"

          git remote set-url origin \
            https://x-access-token:${{ steps.app-token.outputs.token }}@github.com/${{ env.FLUX_OWNER }}/${{ env.FLUX_REPO }}.git
          git push origin HEAD:main

      - name: Comment preview URL on PR
        uses: thollander/actions-comment-pull-request@v3
        with:
          github-token: '${{ secrets.GITHUB_TOKEN }}'
          pr-number: '${{ github.event.pull_request.number }}'
          comment-tag: 'pr-preview'
          create-if-not-exists: 'true'
          message: |
            ⚙️ Preview environment was undeployed.


================================================
FILE: .github/workflows/docker-build.ecr.yml
================================================
name: Build & Push Container

on:
  push:
    branches:
      - 'main'
    tags:
      - '*'
  merge_group:
  pull_request:
    types: [labeled, synchronize, reopened, ready_for_review, opened]

env:
  PUSH_FROM_PR: >-
    ${{ github.event_name == 'pull_request' &&
       (
         contains(github.event.pull_request.labels.*.name, 'push-container') ||
         contains(github.event.pull_request.labels.*.name, 'deploy-pr-temp-env')
       )
    }}

jobs:
  terraform:
    name: "ECR"
    runs-on: ubuntu-latest
    if: github.repository == 'coderamp-labs/gitingest'

    permissions:
      id-token: write
      contents: read
      pull-requests: write

    steps:
      - name: Checkout
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

      - name: configure aws credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.CODERAMP_AWS_ECR_REGISTRY_PUSH_ROLE_ARN }}
          role-session-name: GitHub_to_AWS_via_FederatedOIDC
          aws-region: eu-west-1

      - name: Set current timestamp
        id: vars
        run: |
          echo "timestamp=$(date +%s)" >> $GITHUB_OUTPUT
          echo "sha_short=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
          echo "sha_full=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT

      - name: Determine version and deployment context
        id: version
        run: |
          REPO_URL="https://github.com/${{ github.repository }}"

          if [[ "${{ github.ref_type }}" == "tag" ]]; then
            # Tag deployment - display version, link to release
            echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
            echo "app_version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/releases/tag/${{ github.ref_name }}" >> $GITHUB_OUTPUT
          elif [[ "${{ github.event_name }}" == "pull_request" ]]; then
            # PR deployment - display pr-XXX, link to PR commit
            PR_NUMBER="${{ github.event.pull_request.number }}"
            COMMIT_HASH="${{ steps.vars.outputs.sha_full }}"
            echo "version=${PR_NUMBER}/merge-${COMMIT_HASH}" >> $GITHUB_OUTPUT
            echo "app_version=pr-${PR_NUMBER}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/pull/${PR_NUMBER}/commits/${COMMIT_HASH}" >> $GITHUB_OUTPUT
          else
            # Branch deployment - display branch name, link to commit
            BRANCH_NAME="${{ github.ref_name }}"
            COMMIT_HASH="${{ steps.vars.outputs.sha_full }}"
            echo "app_version=${BRANCH_NAME}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/commit/${COMMIT_HASH}" >> $GITHUB_OUTPUT
          fi

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Docker Meta
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: |
            ${{ secrets.ECR_REGISTRY_URL }}
          flavor: |
            latest=false
          tags: |
            type=ref,event=branch,branch=main,suffix=-${{ steps.vars.outputs.sha_short }}-${{ steps.vars.outputs.timestamp }}
            type=ref,event=pr,suffix=-${{ steps.vars.outputs.sha_short }}-${{ steps.vars.outputs.timestamp }}
            type=pep440,pattern={{raw}}

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64, linux/arm64
          push: ${{ github.event_name != 'pull_request' || env.PUSH_FROM_PR == 'true' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          build-args: |
            APP_REPOSITORY=https://github.com/${{ github.repository }}
            APP_VERSION=${{ steps.version.outputs.app_version }}
            APP_VERSION_URL=${{ steps.version.outputs.app_version_url }}
          cache-from: type=gha
          cache-to: type=gha,mode=max


================================================
FILE: .github/workflows/docker-build.ghcr.yml
================================================
name: Build & Push Container

on:
  push:
    branches:
      - 'main'
    tags:
      - '*'
  merge_group:
  pull_request:
    types: [labeled, synchronize, reopened, ready_for_review, opened]

concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  PUSH_FROM_PR: >-
    ${{ github.event_name == 'pull_request' &&
       (
         contains(github.event.pull_request.labels.*.name, 'push-container') ||
         contains(github.event.pull_request.labels.*.name, 'deploy-pr-temp-env')
       )
    }}

permissions:
  contents: read

jobs:
  docker-build:
    name: "GHCR"
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      attestations: write
      id-token: write
    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

      - name: Set current timestamp
        id: vars
        run: |
          echo "timestamp=$(date +%s)" >> $GITHUB_OUTPUT
          echo "sha_short=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
          echo "sha_full=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT

      - name: Determine version and deployment context
        id: version
        run: |
          REPO_URL="https://github.com/${{ github.repository }}"

          if [[ "${{ github.ref_type }}" == "tag" ]]; then
            # Tag deployment - display version, link to release
            echo "version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
            echo "app_version=${{ github.ref_name }}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/releases/tag/${{ github.ref_name }}" >> $GITHUB_OUTPUT
          elif [[ "${{ github.event_name }}" == "pull_request" ]]; then
            # PR deployment - display pr-XXX, link to PR commit
            PR_NUMBER="${{ github.event.pull_request.number }}"
            COMMIT_HASH="${{ steps.vars.outputs.sha_full }}"
            echo "version=${PR_NUMBER}/merge-${COMMIT_HASH}" >> $GITHUB_OUTPUT
            echo "app_version=pr-${PR_NUMBER}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/pull/${PR_NUMBER}/commits/${COMMIT_HASH}" >> $GITHUB_OUTPUT
          else
            # Branch deployment - display branch name, link to commit
            BRANCH_NAME="${{ github.ref_name }}"
            COMMIT_HASH="${{ steps.vars.outputs.sha_full }}"
            echo "app_version=${BRANCH_NAME}" >> $GITHUB_OUTPUT
            echo "app_version_url=${REPO_URL}/commit/${COMMIT_HASH}" >> $GITHUB_OUTPUT
          fi

      - name: Log in to the Container registry
        uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Docker Meta
        id: meta
        uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f # v5.8.0
        with:
          images: |
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          flavor: |
            latest=false
          tags: |
            type=ref,event=branch,branch=main
            type=ref,event=branch,branch=main,suffix=-${{ steps.vars.outputs.sha_short }}-${{ steps.vars.outputs.timestamp }}
            type=pep440,pattern={{raw}}
            type=ref,event=pr,suffix=-${{ steps.vars.outputs.sha_short }}-${{ steps.vars.outputs.timestamp }}

      - name: Set up QEMU
        uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1

      - name: Build and push
        uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
        id: push
        with:
          context: .
          platforms: linux/amd64, linux/arm64
          push: ${{ github.event_name != 'pull_request' || env.PUSH_FROM_PR == 'true' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          build-args: |
            APP_REPOSITORY=https://github.com/${{ github.repository }}
            APP_VERSION=${{ steps.version.outputs.app_version }}
            APP_VERSION_URL=${{ steps.version.outputs.app_version_url }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

      - name: Generate artifact attestation
        if: github.event_name != 'pull_request' || env.PUSH_FROM_PR == 'true'
        uses: actions/attest-build-provenance@e8998f949152b193b063cb0ec769d69d929409be # v2.4.0
        with:
          subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME}}
          subject-digest: ${{ steps.push.outputs.digest }}
          push-to-registry: true


================================================
FILE: .github/workflows/pr-title-check.yml
================================================
name: PR Conventional Commit Validation

on:
  pull_request:
    types: [opened, synchronize, reopened, edited]

jobs:
  validate-pr-title:
    runs-on: ubuntu-latest
    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - name: PR Conventional Commit Validation
        uses:  ytanikin/pr-conventional-commits@b72758283dcbee706975950e96bc4bf323a8d8c0 # 1.4.2
        with:
          task_types: '["feat","fix","docs","test","ci","refactor","perf","chore","revert"]'
          add_label: 'false'


================================================
FILE: .github/workflows/publish_to_pypi.yml
================================================
name: Publish to PyPI

on:
  release:
    types: [created] # Run when you click "Publish release"
  workflow_dispatch: # ... or run it manually from the Actions tab

permissions:
  contents: read

jobs:
  release-build:
    runs-on: ubuntu-latest

    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

      - name: Set up Python 3.13
        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
        with:
          python-version: "3.13"
          cache: pip
          cache-dependency-path: pyproject.toml

      - name: Build package
        run: |
          python -m pip install --upgrade pip
          python -m pip install build twine
          python -m build
          twine check dist/*
      - name: Upload dist artefact
        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
        with:
          name: dist
          path: dist/

# Publish to PyPI (only if "dist/" succeeded)
  pypi-publish:
    needs: release-build
    runs-on: ubuntu-latest
    environment: pypi

    permissions:
      id-token: write # OIDC token for trusted publishing

    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - uses: actions/download-artifact@634f93cb2916e3fdff6788551b99b062d0335ce0 # v5.0.0
        with:
          name: dist
          path: dist/

      - uses: pypa/gh-action-pypi-publish@76f52bc884231f62b9a034ebfe128415bbaabdfc # release/v1
        with:
          verbose: true


================================================
FILE: .github/workflows/rebase-needed.yml
================================================
name: PR Needs Rebase

on:
  workflow_dispatch: {}
  schedule:
    - cron: '0 * * * *'

permissions:
  pull-requests: write

jobs:
  label-rebase-needed:
    runs-on: ubuntu-latest
    if: github.repository == 'coderamp-labs/gitingest'

    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}
      cancel-in-progress: true

    steps:
      - name: Check for merge conflicts
        uses: eps1lon/actions-label-merge-conflict@v3
        with:
          dirtyLabel: 'rebase needed :construction:'
          repoToken: '${{ secrets.GITHUB_TOKEN }}'
          commentOnClean: This pull request has resolved merge conflicts and is ready for review.
          commentOnDirty: This pull request has merge conflicts that must be resolved before it can be merged.
          retryMax: 30
          continueOnMissingPermissions: false


================================================
FILE: .github/workflows/release-please.yml
================================================
name: release-please
on:
  push:
    branches:
      - main

permissions:
  contents: write
  pull-requests: write

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0

      - name: Create GitHub App token
        uses: actions/create-github-app-token@v2
        id: app-token
        with:
          app-id: '${{ secrets.CR_APP_CI_APP_ID }}'
          private-key: '${{ secrets.CR_APP_CI_PRIVATE_KEY }}'
          owner: '${{ env.FLUX_OWNER }}'
          repositories: '${{ env.FLUX_REPO }}'

      - name: Release Please
        uses: googleapis/release-please-action@v4
        with:
          token: '${{ steps.app-token.outputs.token }}'


================================================
FILE: .github/workflows/scorecard.yml
================================================
name: OSSF Scorecard
on:
  branch_protection_rule:
  schedule:
    - cron: '33 11 * * 2'  # Every Tuesday at 11:33 AM UTC
  push:
    branches: [ main ]

permissions: read-all

concurrency: # avoid overlapping runs
  group: scorecard-${{ github.ref }}
  cancel-in-progress: true

jobs:
  analysis:
    name: Scorecard analysis
    runs-on: ubuntu-latest
    permissions:
      security-events: write # upload SARIF to code-scanning
      id-token: write # publish results for the badge

    steps:
      - name: Harden the runner (Audit all outbound calls)
        uses: step-security/harden-runner@ec9f2d5744a09debf3a187a3f4f675c53b671911 # v2.13.0
        with:
          egress-policy: audit

      - name: Checkout
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          persist-credentials: false

      - name: Run Scorecard
        uses: ossf/scorecard-action@f35c64557cf912815708bb1126d9948f3e459487
        with:
          results_file: results.sarif
          results_format: sarif
          publish_results: true  # enables the public badge

      - name: Upload to code-scanning
        uses: github/codeql-action/upload-sarif@df559355d593797519d70b90fc8edd5db049e7a2 # v3.29.9
        with:
          sarif_file: results.sarif


================================================
FILE: .github/workflows/stale.yml
================================================
name: "Close stale issues and PRs"

on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch: {}

permissions:
  issues: write
  pull-requests: write

jobs:
  stale:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/stale@v9
        with:
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          days-before-stale: 45
          days-before-close: 10
          stale-issue-label: stale
          stale-pr-label: stale
          stale-issue-message: |
            Hi there! We haven’t seen activity here for 45 days, so I’m marking this issue as stale.
            If you’d like to keep it open, please leave a comment within 10 days. Thanks!
          stale-pr-message: |
            Hi there! We haven’t seen activity on this pull request for 45 days, so I’m marking it as stale.
            If you’d like to keep it open, please leave a comment within 10 days. Thanks!
          close-issue-message: |
            Hi there! We haven’t heard anything for 10 days, so I’m closing this issue. Feel free to reopen if you’d like to continue the discussion. Thanks!
          close-pr-message: |
            Hi there! We haven’t heard anything for 10 days, so I’m closing this pull request. Feel free to reopen if you’d like to continue working on it. Thanks!


================================================
FILE: .gitignore
================================================
# Operating-system
.DS_Store
Thumbs.db

# Editor / IDE settings
.vscode/
!.vscode/launch.json
.idea/
*.swp

# Python virtual-envs & tooling
.venv*/
venv/
.python-version
__pycache__/
*.egg-info/
*.egg
.ruff_cache/

# Test artifacts & coverage
.pytest_cache/
.coverage
coverage.xml
htmlcov/

# Build, distribution & docs
build/
dist/
*.wheel



# Logs & runtime output
*.log
logs/
*.tmp
tmp/

# Project-specific files
history.txt
digest.txt

# Environment variables
.env


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v5.0.0
    hooks:
      - id: check-added-large-files
        description: 'Prevent large files from being committed.'
        args: ['--maxkb=10000']

      - id: check-case-conflict
        description: 'Check for files that would conflict in case-insensitive filesystems.'

      - id: fix-byte-order-marker
        description: 'Remove utf-8 byte order marker.'

      - id: mixed-line-ending
        description: 'Replace mixed line ending.'

      - id: destroyed-symlinks
        description: 'Detect symlinks which are changed to regular files with a content of a path which that symlink was pointing to.'

      - id: check-ast
        description: 'Check for parseable syntax.'

      - id: end-of-file-fixer
        description: 'Ensure that a file is either empty, or ends with one newline.'

      - id: trailing-whitespace
        description: 'Trim trailing whitespace.'
        exclude: CHANGELOG.md

      - id: check-docstring-first
        description: 'Check a common error of defining a docstring after code.'

      - id: requirements-txt-fixer
        description: 'Sort entries in requirements.txt.'

  - repo: https://github.com/MarcoGorelli/absolufy-imports
    rev: v0.3.1
    hooks:
      - id: absolufy-imports
        description: 'Automatically convert relative imports to absolute. (Use `args: [--never]` to revert.)'

  - repo: https://github.com/asottile/pyupgrade
    rev: v3.20.0
    hooks:
      - id: pyupgrade
        description: 'Automatically upgrade syntax for newer versions.'
        args: [--py3-plus, --py36-plus]

  - repo: https://github.com/pre-commit/pygrep-hooks
    rev: v1.10.0
    hooks:
      - id: python-check-blanket-noqa
        description: 'Enforce that `# noqa` annotations always occur with specific codes.'

      - id: python-check-blanket-type-ignore
        description: 'Enforce that `# type: ignore` annotations always occur with specific codes.'

      - id: python-use-type-annotations
        description: 'Enforce that python3.6+ type annotations are used instead of type comments.'

  - repo: https://github.com/PyCQA/isort
    rev: 6.0.1
    hooks:
      - id: isort
        description: 'Sort imports alphabetically, and automatically separated into sections and by type.'

  - repo: https://github.com/pre-commit/mirrors-eslint
    rev: v9.30.1
    hooks:
      - id: eslint
        description: 'Lint javascript files.'
        files: \.js$
        args: [--max-warnings=0, --fix]
        additional_dependencies:
          [
            'eslint@9.30.1',
            '@eslint/js@9.30.1',
            'eslint-plugin-import@2.32.0',
            'globals@16.3.0',
          ]

  - repo: https://github.com/djlint/djLint
    rev: v1.36.4
    hooks:
      - id: djlint-reformat-jinja

  - repo: https://github.com/igorshubovych/markdownlint-cli
    rev: v0.45.0
    hooks:
      - id: markdownlint
        description: 'Lint markdown files.'
        args: ['--disable=line-length', '--ignore=CHANGELOG.md']

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.12.2
    hooks:
      - id: ruff-check
      - id: ruff-format

  - repo: https://github.com/jsh9/pydoclint
    rev: 0.6.7
    hooks:
      - id: pydoclint
        name: pydoclint for source
        args: [--style=numpy]
        files: ^src/

  - repo: https://github.com/pycqa/pylint
    rev: v3.3.7
    hooks:
      - id: pylint
        name: pylint for source
        files: ^src/
        additional_dependencies:
          [
            boto3>=1.28.0,
            click>=8.0.0,
            'fastapi[standard]>=0.109.1',
            gitpython>=3.1.0,
            httpx,
            loguru>=0.7.0,
            pathspec>=0.12.1,
            prometheus-client,
            pydantic,
            pytest-asyncio,
            pytest-mock,
            python-dotenv,
            'sentry-sdk[fastapi]',
            slowapi,
            starlette>=0.40.0,
            strenum; python_version < '3.11',
            tiktoken>=0.7.0,
            typing_extensions>= 4.0.0; python_version < '3.10',
            uvicorn>=0.11.7,
          ]

      - id: pylint
        name: pylint for tests
        files: ^tests/
        args:
          - --rcfile=tests/.pylintrc
        additional_dependencies:
          [
            boto3>=1.28.0,
            click>=8.0.0,
            'fastapi[standard]>=0.109.1',
            gitpython>=3.1.0,
            httpx,
            loguru>=0.7.0,
            pathspec>=0.12.1,
            prometheus-client,
            pydantic,
            pytest-asyncio,
            pytest-mock,
            python-dotenv,
            'sentry-sdk[fastapi]',
            slowapi,
            starlette>=0.40.0,
            strenum; python_version < '3.11',
            tiktoken>=0.7.0,
            typing_extensions>= 4.0.0; python_version < '3.10',
            uvicorn>=0.11.7,
          ]

  - repo: meta
    hooks:
      - id: check-hooks-apply
      - id: check-useless-excludes
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.16.3
    hooks:
      - id: gitleaks


================================================
FILE: .release-please-manifest.json
================================================
{".":"0.3.1"}


================================================
FILE: .vscode/launch.json
================================================
{
    "configurations": [
        {
            "name": "Python Debugger: Module",
            "type": "debugpy",
            "request": "launch",
            "module": "server",
            "args": [],
            "cwd": "${workspaceFolder}/src"
        }
    ]
}


================================================
FILE: CHANGELOG.md
================================================
# Changelog

## [0.3.1](https://github.com/coderamp-labs/gitingest/compare/v0.3.0...v0.3.1) (2025-07-31)


### Bug Fixes

* make cache aware of subpaths ([#481](https://github.com/coderamp-labs/gitingest/issues/481)) ([8b59bef](https://github.com/coderamp-labs/gitingest/commit/8b59bef541f858ef44eba8fce6ace77df9dea01c))

## [0.3.0](https://github.com/coderamp-labs/gitingest/compare/v0.2.1...v0.3.0) (2025-07-30)


### Features

* **logging:** implement loguru ([#473](https://github.com/coderamp-labs/gitingest/issues/473)) ([d061b48](https://github.com/coderamp-labs/gitingest/commit/d061b4877a253ba3f0480d329f025427c7f70177))
* serve cached digest if available ([#462](https://github.com/coderamp-labs/gitingest/issues/462)) ([efe5a26](https://github.com/coderamp-labs/gitingest/commit/efe5a2686142b5ee4984061ebcec23c3bf3495d5))


### Bug Fixes

* handle network errors gracefully in token count estimation ([#437](https://github.com/coderamp-labs/gitingest/issues/437)) ([5fbb445](https://github.com/coderamp-labs/gitingest/commit/5fbb445cd8725e56972f43ec8b5e12cb299e9e83))
* improved server side cleanup after ingest ([#477](https://github.com/coderamp-labs/gitingest/issues/477)) ([2df0eb4](https://github.com/coderamp-labs/gitingest/commit/2df0eb43989731ae40a9dd82d310ff76a794a46d))


### Documentation

* **contributing:** update PR title guidelines to enforce convention ([#476](https://github.com/coderamp-labs/gitingest/issues/476)) ([d1f8a80](https://github.com/coderamp-labs/gitingest/commit/d1f8a80826ca38ec105a1878742fe351d4939d6e))

## [0.2.1](https://github.com/coderamp-labs/gitingest/compare/v0.2.0...v0.2.1) (2025-07-27)


### Bug Fixes

* remove logarithm conversion from the backend and correctly process max file size in kb ([#464](https://github.com/coderamp-labs/gitingest/issues/464)) ([932bfef](https://github.com/coderamp-labs/gitingest/commit/932bfef85db66704985c83f3f7c427756bd14023))

## [0.2.0](https://github.com/coderamp-labs/gitingest/compare/v0.1.5...v0.2.0) (2025-07-26)

### Features

* `include_submodules` option ([#313](https://github.com/coderamp-labs/gitingest/issues/313)) ([38c2317](https://github.com/coderamp-labs/gitingest/commit/38c23171a14556a2cdd05c0af8219f4dc789defd))
* add Tailwind CSS pipeline, tag-aware cloning & overhaul CI/CD ([#352](https://github.com/coderamp-labs/gitingest/issues/352)) ([b683e59](https://github.com/coderamp-labs/gitingest/commit/b683e59b5b1a31d27cc5c6ce8fb62da9b660613b))
* add Tailwind CSS pipeline, tag-aware cloning & overhaul CI/CD ([#352](https://github.com/coderamp-labs/gitingest/issues/352)) ([016817d](https://github.com/coderamp-labs/gitingest/commit/016817d5590c1412498b7532f6e854d20239c6be))
* **ci:** build Docker Image on PRs ([#382](https://github.com/coderamp-labs/gitingest/issues/382)) ([bc8cdb4](https://github.com/coderamp-labs/gitingest/commit/bc8cdb459482948c27e780b733ac7216d822529a))
* implement prometheus exporter ([#406](https://github.com/coderamp-labs/gitingest/issues/406)) ([1016f6e](https://github.com/coderamp-labs/gitingest/commit/1016f6ecb3b1b066d541d1eba1ddffec49b15f16))
* implement S3 integration for storing and retrieving digest files ([#427](https://github.com/coderamp-labs/gitingest/issues/427)) ([414e851](https://github.com/coderamp-labs/gitingest/commit/414e85189fb9055491530ba8c0665c798474451e))
* integrate Sentry for error tracking and performance monitoring ([#408](https://github.com/coderamp-labs/gitingest/issues/408)) ([590e55a](https://github.com/coderamp-labs/gitingest/commit/590e55a4d28a4f5c0beafbd12c525828fa79e221))
* Refactor backend to a rest api ([#346](https://github.com/coderamp-labs/gitingest/issues/346)) ([2b1f228](https://github.com/coderamp-labs/gitingest/commit/2b1f228ae1f6d1f7ee471794d258b13fcac25a96))
* **ui:** add inline PAT info tooltip inside token field ([#348](https://github.com/coderamp-labs/gitingest/issues/348)) ([2592303](https://github.com/coderamp-labs/gitingest/commit/25923037ea6cd2f8ef33a6cf1f0406c2b4f0c9b6))


### Bug Fixes

* enable metrics if env var is defined instead of being "True" ([#407](https://github.com/coderamp-labs/gitingest/issues/407)) ([fa2e192](https://github.com/coderamp-labs/gitingest/commit/fa2e192c05864c8db90bda877e9efb9b03caf098))
* fix docker container not launching ([#449](https://github.com/coderamp-labs/gitingest/issues/449)) ([998cea1](https://github.com/coderamp-labs/gitingest/commit/998cea15b4f79c5d6f840b5d3d916f83c8be3a07))
* frontend directory tree ([#363](https://github.com/coderamp-labs/gitingest/issues/363)) ([0fcf8a9](https://github.com/coderamp-labs/gitingest/commit/0fcf8a956f7ec8403a025177f998f92ddee96de0))
* gitignore and gitingestignore files are now correctly processed … ([#416](https://github.com/coderamp-labs/gitingest/issues/416)) ([74e503f](https://github.com/coderamp-labs/gitingest/commit/74e503fa1140feb74aa5350a32f0025c43097da1))
* Potential fix for code scanning alert no. 75: Uncontrolled data used in path expression ([#421](https://github.com/coderamp-labs/gitingest/issues/421)) ([9ceaf6c](https://github.com/coderamp-labs/gitingest/commit/9ceaf6cbbb0cdefbc79f78c5285406b9188b2d3d))
* reset pattern form when switching between include/exclude patterns ([#417](https://github.com/coderamp-labs/gitingest/issues/417)) ([7085e13](https://github.com/coderamp-labs/gitingest/commit/7085e138a74099b1df189b3bf9b8a333c8769380))
* temp files cleanup after ingest([#309](https://github.com/coderamp-labs/gitingest/issues/309)) ([e669e44](https://github.com/coderamp-labs/gitingest/commit/e669e444fa1e6130f3f22952dd81f0ca3fe08fa5))
* **ui:** update layout in PAT section to avoid overlaps & overflows ([#331](https://github.com/coderamp-labs/gitingest/issues/331)) ([b39ef54](https://github.com/coderamp-labs/gitingest/commit/b39ef5416c1f8a7993a8249161d2a898b7387595))
* **windows:** warn if Git long path support is disabled, do not fail ([b8e375f](https://github.com/coderamp-labs/gitingest/commit/b8e375f71cae7d980cf431396c4414a6dbd0588c))


### Documentation

* add GitHub Issue Form for bug reports ([#403](https://github.com/coderamp-labs/gitingest/issues/403)) ([4546449](https://github.com/coderamp-labs/gitingest/commit/4546449bbc1e4a7ad0950c4b831b8855a98628fd))
* add GitHub Issue Form for feature requests ([#404](https://github.com/coderamp-labs/gitingest/issues/404)) ([9b1fc58](https://github.com/coderamp-labs/gitingest/commit/9b1fc58900ae18a3416fe3cf9b5e301a65a8e9fd))
* Fix CLI help text accuracy ([#332](https://github.com/coderamp-labs/gitingest/issues/332)) ([fdcbc53](https://github.com/coderamp-labs/gitingest/commit/fdcbc53cadde6a5dc3c3626120df1935b63693b2))


### Code Refactoring

* centralize PAT validation, streamline repo checks & misc cleanup ([#349](https://github.com/coderamp-labs/gitingest/issues/349)) ([cea0edd](https://github.com/coderamp-labs/gitingest/commit/cea0eddce8c6846bc6271cb3a8d15320e103214c))
* centralize PAT validation, streamline repo checks & misc cleanup ([#349](https://github.com/coderamp-labs/gitingest/issues/349)) ([f8d397e](https://github.com/coderamp-labs/gitingest/commit/f8d397e66e3382d12f8a0ed05d291a39db830bda))


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
  and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
  overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
  advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
  address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
<romain@coderamp.io>.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior,  harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org),
version 2.0, available at
<https://www.contributor-covenant.org/version/2/0/code_of_conduct.html>.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

For answers to common questions about this code of conduct, see the FAQ at
<https://www.contributor-covenant.org/faq>. Translations are available at
<https://www.contributor-covenant.org/translations>.


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to Gitingest

Thanks for your interest in contributing to **Gitingest** 🚀 Our goal is to keep the codebase friendly to first-time
contributors.
If you ever get stuck, reach out on [Discord](https://discord.com/invite/zerRaGK9EC).

---

## How to Contribute (non-technical)

- **Create an Issue** – found a bug or have a feature idea?
  [Open an issue](https://github.com/coderamp-labs/gitingest/issues/new).
- **Spread the Word** – tweet, blog, or tell a friend.
- **Use Gitingest** – real-world usage gives the best feedback. File issues or ping us
  on [Discord](https://discord.com/invite/zerRaGK9EC) with anything you notice.

---

## How to submit a Pull Request

> **Prerequisites**: The project uses **Python 3.9+** and `pre-commit` for development.

1. **Fork** the repository.

2. **Clone** your fork:

   ```bash
   git clone https://github.com/coderamp-labs/gitingest.git
   cd gitingest
   ```

3. **Set up the dev environment**:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   pip install -e ".[dev,server]"
   pre-commit install
   ```

4. **Create a branch** for your changes:

   ```bash
   git checkout -b your-branch
   ```

5. **Make your changes** (and add tests when relevant).

6. **Stage** the changes:

   ```bash
   git add .
   ```

7. **Run the backend test suite**:

   ```bash
   pytest
   ```

8. *(Optional)* **Run `pre-commit` on all files** to check hooks without committing:

   ```bash
   pre-commit run --all-files
   ```

9. **Run the local server** to sanity-check:

    ```bash
    python -m server
    ```

   Open [http://localhost:8000](http://localhost:8000) to confirm everything works.

10. **Commit** (signed):

    ```bash
    git commit -S -m "Your commit message"
    ```

    If *pre-commit* complains, fix the problems and repeat **5 – 9**.

11. **Push** your branch:

    ```bash
    git push origin your-branch
    ```

12. **Open a pull request** on GitHub with a clear description.

    > **Important:** Pull request titles **must follow
    the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) specification**. This helps with
    changelogs and automated releases.

13. **Iterate** on any review feedback—update your branch and repeat **6 – 11** as needed.

*(Optional) Invite a maintainer to your branch for easier collaboration.*


================================================
FILE: Dockerfile
================================================
# Stage 1: Install Python dependencies
FROM python:3.13.5-slim@sha256:4c2cf9917bd1cbacc5e9b07320025bdb7cdf2df7b0ceaccb55e9dd7e30987419 AS python-builder

WORKDIR /build

RUN set -eux; \
    apt-get update; \
    apt-get install -y --no-install-recommends gcc python3-dev; \
    rm -rf /var/lib/apt/lists/*

COPY pyproject.toml .
COPY src/ ./src/

RUN set -eux; \
    pip install --no-cache-dir --upgrade pip; \
    pip install --no-cache-dir --timeout 1000 .[server,mcp]

# Stage 2: Runtime image
FROM python:3.13.5-slim@sha256:4c2cf9917bd1cbacc5e9b07320025bdb7cdf2df7b0ceaccb55e9dd7e30987419

ARG UID=1000
ARG GID=1000
ARG APP_REPOSITORY=https://github.com/coderamp-labs/gitingest
ARG APP_VERSION=unknown
ARG APP_VERSION_URL=https://github.com/coderamp-labs/gitingest

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    APP_REPOSITORY=${APP_REPOSITORY} \
    APP_VERSION=${APP_VERSION} \
    APP_VERSION_URL=${APP_VERSION_URL}

RUN set -eux; \
    apt-get update; \
    apt-get install -y --no-install-recommends git curl; \
    apt-get clean; \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
RUN set -eux; \
    groupadd -g "$GID" appuser; \
    useradd -m -u "$UID" -g "$GID" appuser

COPY --from=python-builder --chown=$UID:$GID /usr/local/lib/python3.13/site-packages/ /usr/local/lib/python3.13/site-packages/
COPY --chown=$UID:$GID src/ ./

RUN set -eux; \
    chown -R appuser:appuser /app
USER appuser

EXPOSE 8000
EXPOSE 9090
CMD ["python", "-m", "server"]


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2024 Romain Courtois

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# Gitingest

[![Screenshot of Gitingest front page](https://raw.githubusercontent.com/coderamp-labs/gitingest/refs/heads/main/docs/frontpage.png)](https://gitingest.com)

<!-- Badges -->
<!-- markdownlint-disable MD033 -->
<p align="center">
  <!-- row 1 — install & compat -->
  <a href="https://pypi.org/project/gitingest"><img src="https://img.shields.io/pypi/v/gitingest.svg" alt="PyPI"></a>
  <a href="https://pypi.org/project/gitingest"><img src="https://img.shields.io/pypi/pyversions/gitingest.svg" alt="Python Versions"></a>
  <br>
  <!-- row 2 — quality & community -->
  <a href="https://github.com/coderamp-labs/gitingest/actions/workflows/ci.yml?query=branch%3Amain"><img src="https://github.com/coderamp-labs/gitingest/actions/workflows/ci.yml/badge.svg?branch=main" alt="CI"></a>

  <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json" alt="Ruff"></a>
  <a href="https://scorecard.dev/viewer/?uri=github.com/coderamp-labs/gitingest"><img src="https://api.scorecard.dev/projects/github.com/coderamp-labs/gitingest/badge" alt="OpenSSF Scorecard"></a>
  <br>
  <a href="https://github.com/coderamp-labs/gitingest/blob/main/LICENSE"><img src="https://img.shields.io/github/license/coderamp-labs/gitingest.svg" alt="License"></a>
  <a href="https://pepy.tech/project/gitingest"><img src="https://pepy.tech/badge/gitingest" alt="Downloads"></a>
  <a href="https://github.com/coderamp-labs/gitingest"><img src="https://img.shields.io/github/stars/coderamp-labs/gitingest" alt="GitHub Stars"></a>
  <a href="https://discord.com/invite/zerRaGK9EC"><img src="https://img.shields.io/badge/Discord-Join_chat-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
  <br>
  <a href="https://trendshift.io/repositories/13519"><img src="https://trendshift.io/api/badge/repositories/13519" alt="Trendshift" height="50"></a>
</p>
<!-- markdownlint-enable MD033 -->

Turn any Git repository into a prompt-friendly text ingest for LLMs.

You can also replace `hub` with `ingest` in any GitHub URL to access the corresponding digest.

<!-- Extensions -->
[gitingest.com](https://gitingest.com) · [Chrome Extension](https://chromewebstore.google.com/detail/adfjahbijlkjfoicpjkhjicpjpjfaood) · [Firefox Add-on](https://addons.mozilla.org/firefox/addon/gitingest)

<!-- Languages -->
[Deutsch](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=de) |
[Español](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=es) |
[Français](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=fr) |
[日本語](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=ja) |
[한국어](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=ko) |
[Português](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=pt) |
[Русский](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=ru) |
[中文](https://www.readme-i18n.com/coderamp-labs/gitingest?lang=zh)

## 🚀 Features

- **Easy code context**: Get a text digest from a Git repository URL or a directory
- **Smart Formatting**: Optimized output format for LLM prompts
- **Statistics about**:
  - File and directory structure
  - Size of the extract
  - Token count
- **CLI tool**: Run it as a shell command
- **Python package**: Import it in your code

## 📚 Requirements

- Python 3.8+
- For private repositories: A GitHub Personal Access Token (PAT). [Generate your token **here**!](https://github.com/settings/tokens/new?description=gitingest&scopes=repo)

### 📦 Installation

Gitingest is available on [PyPI](https://pypi.org/project/gitingest/).
You can install it using `pip`:

```bash
pip install gitingest
```

or

```bash
pip install gitingest[server]
```

to include server dependencies for self-hosting.

However, it might be a good idea to use `pipx` to install it.
You can install `pipx` using your preferred package manager.

```bash
brew install pipx
apt install pipx
scoop install pipx
...
```

If you are using pipx for the first time, run:

```bash
pipx ensurepath
```

```bash
# install gitingest
pipx install gitingest
```

## 🧩 Browser Extension Usage

<!-- markdownlint-disable MD033 -->
<a href="https://chromewebstore.google.com/detail/adfjahbijlkjfoicpjkhjicpjpjfaood" target="_blank" title="Get Gitingest Extension from Chrome Web Store"><img height="48" src="https://github.com/user-attachments/assets/20a6e44b-fd46-4e6c-8ea6-aad436035753" alt="Available in the Chrome Web Store" /></a>
<a href="https://addons.mozilla.org/firefox/addon/gitingest" target="_blank" title="Get Gitingest Extension from Firefox Add-ons"><img height="48" src="https://github.com/user-attachments/assets/c0e99e6b-97cf-4af2-9737-099db7d3538b" alt="Get The Add-on for Firefox" /></a>
<a href="https://microsoftedge.microsoft.com/addons/detail/nfobhllgcekbmpifkjlopfdfdmljmipf" target="_blank" title="Get Gitingest Extension from Microsoft Edge Add-ons"><img height="48" src="https://github.com/user-attachments/assets/204157eb-4cae-4c0e-b2cb-db514419fd9e" alt="Get from the Edge Add-ons" /></a>
<!-- markdownlint-enable MD033 -->

The extension is open source at [lcandy2/gitingest-extension](https://github.com/lcandy2/gitingest-extension).

Issues and feature requests are welcome to the repo.

## 💡 Command line usage

The `gitingest` command line tool allows you to analyze codebases and create a text dump of their contents.

```bash
# Basic usage (writes to digest.txt by default)
gitingest /path/to/directory

# From URL
gitingest https://github.com/coderamp-labs/gitingest

# or from specific subdirectory
gitingest https://github.com/coderamp-labs/gitingest/tree/main/src/gitingest/utils
```

For private repositories, use the `--token/-t` option.

```bash
# Get your token from https://github.com/settings/personal-access-tokens
gitingest https://github.com/username/private-repo --token github_pat_...

# Or set it as an environment variable
export GITHUB_TOKEN=github_pat_...
gitingest https://github.com/username/private-repo

# Include repository submodules
gitingest https://github.com/username/repo-with-submodules --include-submodules
```

By default, files listed in `.gitignore` are skipped. Use `--include-gitignored` if you
need those files in the digest.

By default, the digest is written to a text file (`digest.txt`) in your current working directory. You can customize the output in two ways:

- Use `--output/-o <filename>` to write to a specific file.
- Use `--output/-o -` to output directly to `STDOUT` (useful for piping to other tools).

See more options and usage details with:

```bash
gitingest --help
```

## 🐍 Python package usage

```python
# Synchronous usage
from gitingest import ingest

summary, tree, content = ingest("path/to/directory")

# or from URL
summary, tree, content = ingest("https://github.com/coderamp-labs/gitingest")

# or from a specific subdirectory
summary, tree, content = ingest("https://github.com/coderamp-labs/gitingest/tree/main/src/gitingest/utils")
```

For private repositories, you can pass a token:

```python
# Using token parameter
summary, tree, content = ingest("https://github.com/username/private-repo", token="github_pat_...")

# Or set it as an environment variable
import os
os.environ["GITHUB_TOKEN"] = "github_pat_..."
summary, tree, content = ingest("https://github.com/username/private-repo")

# Include repository submodules
summary, tree, content = ingest("https://github.com/username/repo-with-submodules", include_submodules=True)
```

By default, this won't write a file but can be enabled with the `output` argument.

```python
# Asynchronous usage
from gitingest import ingest_async
import asyncio

result = asyncio.run(ingest_async("path/to/directory"))
```

### Jupyter notebook usage

```python
from gitingest import ingest_async

# Use await directly in Jupyter
summary, tree, content = await ingest_async("path/to/directory")

```

This is because Jupyter notebooks are asynchronous by default.

## 🐳 Self-host

### Using Docker

1. Build the image:

   ``` bash
   docker build -t gitingest .
   ```

2. Run the container:

   ``` bash
   docker run -d --name gitingest -p 8000:8000 gitingest
   ```

The application will be available at `http://localhost:8000`.

If you are hosting it on a domain, you can specify the allowed hostnames via env variable `ALLOWED_HOSTS`.

   ```bash
   # Default: "gitingest.com, *.gitingest.com, localhost, 127.0.0.1".
   ALLOWED_HOSTS="example.com, localhost, 127.0.0.1"
   ```

### Environment Variables

The application can be configured using the following environment variables:

- **ALLOWED_HOSTS**: Comma-separated list of allowed hostnames (default: "gitingest.com, *.gitingest.com, localhost, 127.0.0.1")
- **GITINGEST_METRICS_ENABLED**: Enable Prometheus metrics server (set to any value to enable)
- **GITINGEST_METRICS_HOST**: Host for the metrics server (default: "127.0.0.1")
- **GITINGEST_METRICS_PORT**: Port for the metrics server (default: "9090")
- **GITINGEST_SENTRY_ENABLED**: Enable Sentry error tracking (set to any value to enable)
- **GITINGEST_SENTRY_DSN**: Sentry DSN (required if Sentry is enabled)
- **GITINGEST_SENTRY_TRACES_SAMPLE_RATE**: Sampling rate for performance data (default: "1.0", range: 0.0-1.0)
- **GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE**: Sampling rate for profile sessions (default: "1.0", range: 0.0-1.0)
- **GITINGEST_SENTRY_PROFILE_LIFECYCLE**: Profile lifecycle mode (default: "trace")
- **GITINGEST_SENTRY_SEND_DEFAULT_PII**: Send default personally identifiable information (default: "true")
- **S3_ALIAS_HOST**: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")
- **S3_DIRECTORY_PREFIX**: Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)

### Using Docker Compose

The project includes a `compose.yml` file that allows you to easily run the application in both development and production environments.

#### Compose File Structure

The `compose.yml` file uses YAML anchoring with `&app-base` and `<<: *app-base` to define common configuration that is shared between services:

```yaml
# Common base configuration for all services
x-app-base: &app-base
  build:
    context: .
    dockerfile: Dockerfile
  ports:
    - "${APP_WEB_BIND:-8000}:8000"  # Main application port
    - "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090"  # Metrics port
  # ... other common configurations
```

#### Services

The file defines three services:

1. **app**: Production service configuration
   - Uses the `prod` profile
   - Sets the Sentry environment to "production"
   - Configured for stable operation with `restart: unless-stopped`

2. **app-dev**: Development service configuration
   - Uses the `dev` profile
   - Enables debug mode
   - Mounts the source code for live development
   - Uses hot reloading for faster development

3. **minio**: S3-compatible object storage for development
   - Uses the `dev` profile (only available in development mode)
   - Provides S3-compatible storage for local development
   - Accessible via:
     - API: Port 9000 ([localhost:9000](http://localhost:9000))
     - Web Console: Port 9001 ([localhost:9001](http://localhost:9001))
   - Default admin credentials:
     - Username: `minioadmin`
     - Password: `minioadmin`
   - Configurable via environment variables:
     - `MINIO_ROOT_USER`: Custom admin username (default: minioadmin)
     - `MINIO_ROOT_PASSWORD`: Custom admin password (default: minioadmin)
   - Includes persistent storage via Docker volume
   - Auto-creates a bucket and application-specific credentials:
     - Bucket name: `gitingest-bucket` (configurable via `S3_BUCKET_NAME`)
     - Access key: `gitingest` (configurable via `S3_ACCESS_KEY`)
     - Secret key: `gitingest123` (configurable via `S3_SECRET_KEY`)
   - These credentials are automatically passed to the app-dev service via environment variables:
     - `S3_ENDPOINT`: URL of the MinIO server
     - `S3_ACCESS_KEY`: Access key for the S3 bucket
     - `S3_SECRET_KEY`: Secret key for the S3 bucket
     - `S3_BUCKET_NAME`: Name of the S3 bucket
     - `S3_REGION`: Region for the S3 bucket (default: us-east-1)
     - `S3_ALIAS_HOST`: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")

#### Usage Examples

To run the application in development mode:

```bash
docker compose --profile dev up
```

To run the application in production mode:

```bash
docker compose --profile prod up -d
```

To build and run the application:

```bash
docker compose --profile prod build
docker compose --profile prod up -d
```

## 🤝 Contributing

### Non-technical ways to contribute

- **Create an Issue**: If you find a bug or have an idea for a new feature, please [create an issue](https://github.com/coderamp-labs/gitingest/issues/new) on GitHub. This will help us track and prioritize your request.
- **Spread the Word**: If you like Gitingest, please share it with your friends, colleagues, and on social media. This will help us grow the community and make Gitingest even better.
- **Use Gitingest**: The best feedback comes from real-world usage! If you encounter any issues or have ideas for improvement, please let us know by [creating an issue](https://github.com/coderamp-labs/gitingest/issues/new) on GitHub or by reaching out to us on [Discord](https://discord.com/invite/zerRaGK9EC).

### Technical ways to contribute

Gitingest aims to be friendly for first time contributors, with a simple Python and HTML codebase. If you need any help while working with the code, reach out to us on [Discord](https://discord.com/invite/zerRaGK9EC). For detailed instructions on how to make a pull request, see [CONTRIBUTING.md](./CONTRIBUTING.md).

## 🛠️ Stack

- [Tailwind CSS](https://tailwindcss.com) - Frontend
- [FastAPI](https://github.com/fastapi/fastapi) - Backend framework
- [Jinja2](https://jinja.palletsprojects.com) - HTML templating
- [tiktoken](https://github.com/openai/tiktoken) - Token estimation
- [posthog](https://github.com/PostHog/posthog) - Amazing analytics
- [Sentry](https://sentry.io) - Error tracking and performance monitoring

### Looking for a JavaScript/FileSystemNode package?

Check out the NPM alternative 📦 Repomix: <https://github.com/yamadashy/repomix>

## 🚀 Project Growth

[![Star History Chart](https://api.star-history.com/svg?repos=coderamp-labs/gitingest&type=Date)](https://star-history.com/#coderamp-labs/gitingest&Date)


================================================
FILE: SECURITY.md
================================================
# Security Policy

## Reporting a Vulnerability

If you have discovered a vulnerability inside the project, report it privately at <romain@coderamp.io>. This way the maintainer can work on a proper fix without disclosing the problem to the public before it has been solved.


================================================
FILE: compose.yml
================================================
x-base-environment: &base-environment
  # Python Configuration
  PYTHONUNBUFFERED: "1"
  PYTHONDONTWRITEBYTECODE: "1"
  # Host Configuration
  ALLOWED_HOSTS: ${ALLOWED_HOSTS:-gitingest.com,*.gitingest.com,localhost,127.0.0.1}
  # Metrics Configuration
  GITINGEST_METRICS_ENABLED: ${GITINGEST_METRICS_ENABLED:-true}
  GITINGEST_METRICS_HOST: ${GITINGEST_METRICS_HOST:-0.0.0.0}
  GITINGEST_METRICS_PORT: ${GITINGEST_METRICS_PORT:-9090}
  # Sentry Configuration
  GITINGEST_SENTRY_ENABLED: ${GITINGEST_SENTRY_ENABLED:-false}
  GITINGEST_SENTRY_DSN: ${GITINGEST_SENTRY_DSN:-}
  GITINGEST_SENTRY_TRACES_SAMPLE_RATE: ${GITINGEST_SENTRY_TRACES_SAMPLE_RATE:-1.0}
  GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE: ${GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE:-1.0}
  GITINGEST_SENTRY_PROFILE_LIFECYCLE: ${GITINGEST_SENTRY_PROFILE_LIFECYCLE:-trace}
  GITINGEST_SENTRY_SEND_DEFAULT_PII: ${GITINGEST_SENTRY_SEND_DEFAULT_PII:-true}

x-prod-environment: &prod-environment
  GITINGEST_SENTRY_ENVIRONMENT: ${GITINGEST_SENTRY_ENVIRONMENT:-production}

x-dev-environment: &dev-environment
  DEBUG: "true"
  LOG_LEVEL: "DEBUG"
  RELOAD: "true"
  GITINGEST_SENTRY_ENVIRONMENT: ${GITINGEST_SENTRY_ENVIRONMENT:-development}
  # S3 Configuration for development
  S3_ENABLED: "true"
  S3_ENDPOINT: http://minio:9000
  S3_ACCESS_KEY: ${S3_ACCESS_KEY:-gitingest}
  S3_SECRET_KEY: ${S3_SECRET_KEY:-gitingest123}
  S3_BUCKET_NAME: ${S3_BUCKET_NAME:-gitingest-bucket}
  S3_REGION: ${S3_REGION:-us-east-1}
  S3_DIRECTORY_PREFIX: ${S3_DIRECTORY_PREFIX:-dev}
  S3_ALIAS_HOST: ${S3_ALIAS_HOST:-http://127.0.0.1:9000/${S3_BUCKET_NAME:-gitingest-bucket}}

x-app-base: &app-base
  ports:
    - "${APP_WEB_BIND:-8000}:8000"  # Main application port
    - "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090"  # Metrics port
  user: "1000:1000"
  command: ["python", "-m", "server"]

services:
  # Production service configuration
  app:
    <<: *app-base
    image: ghcr.io/coderamp-labs/gitingest:latest
    profiles:
      - prod
    environment:
      <<: [*base-environment, *prod-environment]
    restart: unless-stopped

  # Development service configuration
  app-dev:
    <<: *app-base
    build:
      context: .
      dockerfile: Dockerfile
    profiles:
      - dev
    environment:
      <<: [*base-environment, *dev-environment]
    volumes:
      # Mount source code for live development
      - ./src:/app:ro
    # Use --reload flag for hot reloading during development
    command: ["python", "-m", "server"]
    depends_on:
      minio-setup:
        condition: service_completed_successfully

  # MinIO S3-compatible object storage for development
  minio:
    image: minio/minio:latest
    profiles:
      - dev
    ports:
      - "9000:9000"  # API port
      - "9001:9001"  # Console port
    environment: &minio-environment
      MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minioadmin}
      MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minioadmin}
    volumes:
      - minio-data:/data
    command: server /data --console-address ":9001"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 30s
      start_period: 30s
      start_interval: 1s

  # MinIO setup service to create bucket and user
  minio-setup:
    image: minio/mc
    profiles:
      - dev
    depends_on:
      minio:
        condition: service_healthy
    environment:
      <<: *minio-environment
      S3_ACCESS_KEY: ${S3_ACCESS_KEY:-gitingest}
      S3_SECRET_KEY: ${S3_SECRET_KEY:-gitingest123}
      S3_BUCKET_NAME: ${S3_BUCKET_NAME:-gitingest-bucket}
    volumes:
      - ./.docker/minio/setup.sh:/setup.sh:ro
    entrypoint: sh
    command: -c /setup.sh

volumes:
  minio-data:
    driver: local


================================================
FILE: eslint.config.cjs
================================================
const js = require('@eslint/js');
const globals = require('globals');
const importPlugin = require('eslint-plugin-import');

module.exports = [
  js.configs.recommended,

  {
    files: ['src/static/js/**/*.js'],

    languageOptions: {
      parserOptions: { ecmaVersion: 2021, sourceType: 'module' },
      globals: {
        ...globals.browser,
        changePattern: 'readonly',
        copyFullDigest: 'readonly',
        copyText: 'readonly',
        downloadFullDigest: 'readonly',
        handleSubmit: 'readonly',
        posthog: 'readonly',
        submitExample: 'readonly',
        toggleAccessSettings: 'readonly',
        toggleFile: 'readonly',
      },
    },

    plugins: { import: importPlugin },

    rules: {
      // Import hygiene (eslint-plugin-import)
      'import/no-extraneous-dependencies': 'error',
      'import/no-unresolved': 'error',
      'import/order': ['warn', { alphabetize: { order: 'asc' } }],

      // Safety & bug-catchers
      'consistent-return': 'error',
      'default-case': 'error',
      'no-implicit-globals': 'error',
      'no-shadow': 'error',

      // Maintainability / complexity
      complexity: ['warn', 10],
      'max-depth': ['warn', 4],
      'max-lines': ['warn', 500],
      'max-params': ['warn', 5],

      // Stylistic consistency (auto-fixable)
      'arrow-parens': ['error', 'always'],
      curly: ['error', 'all'],
      indent: ['error', 4, { SwitchCase: 2 }],
      'newline-per-chained-call': ['warn', { ignoreChainWithDepth: 2 }],
      'no-multi-spaces': 'error',
      'object-shorthand': ['error', 'always'],
      'padding-line-between-statements': [
        'warn',
        { blankLine: 'always', prev: '*', next: 'return' },
        { blankLine: 'always', prev: ['const', 'let', 'var'], next: '*' },
        { blankLine: 'any', prev: ['const', 'let', 'var'], next: ['const', 'let', 'var'] },
      ],
      'quote-props': ['error', 'consistent-as-needed'],
      quotes: ['error', 'single', { avoidEscape: true }],
      semi: 'error',

      // Modern / performance tips
      'arrow-body-style': ['warn', 'as-needed'],
      'prefer-arrow-callback': 'error',
      'prefer-exponentiation-operator': 'error',
      'prefer-numeric-literals': 'error',
      'prefer-object-has-own': 'warn',
      'prefer-object-spread': 'error',
      'prefer-template': 'error',
    },
  },
];


================================================
FILE: pyproject.toml
================================================
[project]
name = "gitingest"
version = "0.3.1"
description="CLI tool to analyze and create text dumps of codebases for LLMs"
readme = {file = "README.md", content-type = "text/markdown" }
requires-python = ">= 3.8"
dependencies = [
    "click>=8.0.0",
    "gitpython>=3.1.0",
    "httpx",
    "loguru>=0.7.0",
    "pathspec>=0.12.1",
    "pydantic",
    "python-dotenv",
    "starlette>=0.40.0",  # Minimum safe release (https://osv.dev/vulnerability/GHSA-f96h-pmfr-66vw)
    "strenum; python_version < '3.11'",
    "tiktoken>=0.7.0",  # Support for o200k_base encoding
    "typing_extensions>= 4.0.0; python_version < '3.10'",
]

license = {file = "LICENSE"}
authors = [
    { name = "Romain Courtois", email = "romain@coderamp.io" },
    { name = "Filip Christiansen"},
]
classifiers=[
    "Development Status :: 3 - Alpha",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Programming Language :: Python :: 3.13",
]

[project.optional-dependencies]
dev = [
    "eval-type-backport",
    "pre-commit",
    "pytest",
    "pytest-asyncio",
    "pytest-mock",
]

server = [
    "boto3>=1.28.0",  # AWS SDK for S3 support
    "fastapi[standard]>=0.109.1",  # Minimum safe release (https://osv.dev/vulnerability/PYSEC-2024-38)
    "prometheus-client",
    "sentry-sdk[fastapi]",
    "slowapi",
    "uvicorn>=0.11.7",  # Minimum safe release (https://osv.dev/vulnerability/PYSEC-2020-150)
]

[project.scripts]
gitingest = "gitingest.__main__:main"

[project.urls]
homepage = "https://gitingest.com"
github = "https://github.com/coderamp-labs/gitingest"

[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = {find = {where = ["src"]}}
include-package-data = true

# Linting configuration
[tool.pylint.format]
max-line-length = 119

[tool.pylint.'MESSAGES CONTROL']
disable = [
    "too-many-arguments",
    "too-many-positional-arguments",
    "too-many-locals",
    "too-few-public-methods",
    "broad-exception-caught",
    "duplicate-code",
    "fixme",
]

[tool.ruff]
line-length = 119
fix = true

[tool.ruff.lint]
select = ["ALL"]
ignore = [  # https://docs.astral.sh/ruff/rules/...
    "D107", # undocumented-public-init
    "FIX002", # line-contains-todo
    "TD002", # missing-todo-author
    "PLR0913", # too-many-arguments,

    # TODO: fix the following issues:
    "TD003", # missing-todo-link, TODO: add issue links
    "S108", # hardcoded-temp-file, TODO: replace with tempfile
    "BLE001", # blind-except, TODO: replace with specific exceptions
    "FAST003", # fast-api-unused-path-parameter, TODO: fix
]
per-file-ignores = { "tests/**/*.py" = ["S101"] } # Skip the "assert used" warning

[tool.ruff.lint.pylint]
max-returns = 10

[tool.ruff.lint.isort]
order-by-type = true
case-sensitive = true

[tool.pycln]
all = true

# TODO: Remove this once we figure out how to use ruff-isort
[tool.isort]
profile = "black"
line_length = 119
remove_redundant_aliases = true
float_to_top = true  # https://github.com/astral-sh/ruff/issues/6514
order_by_type = true
filter_files = true

# Test configuration
[tool.pytest.ini_options]
pythonpath = ["src"]
testpaths = ["tests/"]
python_files = "test_*.py"
asyncio_mode = "auto"
asyncio_default_fixture_loop_scope = "function"
python_classes = "Test*"
python_functions = "test_*"


================================================
FILE: release-please-config.json
================================================
{
  "$schema": "https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json",
  "packages": {
    ".": {
      "release-type": "python",
      "bump-minor-pre-major": true
    }
  }
}


================================================
FILE: renovate.json
================================================
{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": [
    "config:recommended"
  ]
}


================================================
FILE: requirements-dev.txt
================================================
-r requirements.txt
eval-type-backport
pre-commit
pytest
pytest-asyncio
pytest-cov
pytest-mock


================================================
FILE: requirements.txt
================================================
boto3>=1.28.0  # AWS SDK for S3 support
click>=8.0.0
fastapi[standard]>=0.109.1  # Vulnerable to https://osv.dev/vulnerability/PYSEC-2024-38
httpx
loguru>=0.7.0
pathspec>=0.12.1
prometheus-client
pydantic
python-dotenv
sentry-sdk[fastapi]
slowapi
starlette>=0.40.0  # Vulnerable to https://osv.dev/vulnerability/GHSA-f96h-pmfr-66vw
tiktoken>=0.7.0  # Support for o200k_base encoding
uvicorn>=0.11.7  # Vulnerable to https://osv.dev/vulnerability/PYSEC-2020-150


================================================
FILE: src/gitingest/__init__.py
================================================
"""Gitingest: A package for ingesting data from Git repositories."""

from gitingest.entrypoint import ingest, ingest_async

__all__ = ["ingest", "ingest_async"]


================================================
FILE: src/gitingest/__main__.py
================================================
"""Command-line interface (CLI) for Gitingest."""

# pylint: disable=no-value-for-parameter
from __future__ import annotations

import asyncio
from typing import TypedDict

import click
from typing_extensions import Unpack

from gitingest.config import MAX_FILE_SIZE, OUTPUT_FILE_NAME
from gitingest.entrypoint import ingest_async

# Import logging configuration first to intercept all logging
from gitingest.utils.logging_config import get_logger

# Initialize logger for this module
logger = get_logger(__name__)


class _CLIArgs(TypedDict):
    source: str
    max_size: int
    exclude_pattern: tuple[str, ...]
    include_pattern: tuple[str, ...]
    branch: str | None
    include_gitignored: bool
    include_submodules: bool
    token: str | None
    output: str | None


@click.command()
@click.argument("source", type=str, default=".")
@click.option(
    "--max-size",
    "-s",
    default=MAX_FILE_SIZE,
    show_default=True,
    help="Maximum file size to process in bytes",
)
@click.option("--exclude-pattern", "-e", multiple=True, help="Shell-style patterns to exclude.")
@click.option(
    "--include-pattern",
    "-i",
    multiple=True,
    help="Shell-style patterns to include.",
)
@click.option("--branch", "-b", default=None, help="Branch to clone and ingest")
@click.option(
    "--include-gitignored",
    is_flag=True,
    default=False,
    help="Include files matched by .gitignore and .gitingestignore",
)
@click.option(
    "--include-submodules",
    is_flag=True,
    help="Include repository's submodules in the analysis",
    default=False,
)
@click.option(
    "--token",
    "-t",
    envvar="GITHUB_TOKEN",
    default=None,
    help=(
        "GitHub personal access token (PAT) for accessing private repositories. "
        "If omitted, the CLI will look for the GITHUB_TOKEN environment variable."
    ),
)
@click.option(
    "--output",
    "-o",
    default=None,
    help="Output file path (default: digest.txt in current directory). Use '-' for stdout.",
)
def main(**cli_kwargs: Unpack[_CLIArgs]) -> None:
    """Run the CLI entry point to analyze a repo / directory and dump its contents.

    Parameters
    ----------
    **cli_kwargs : Unpack[_CLIArgs]
        A dictionary of keyword arguments forwarded to ``ingest_async``.

    Notes
    -----
    See ``ingest_async`` for a detailed description of each argument.

    Examples
    --------
    Basic usage:
        $ gitingest
        $ gitingest /path/to/repo
        $ gitingest https://github.com/user/repo

    Output to stdout:
        $ gitingest -o -
        $ gitingest https://github.com/user/repo --output -

    With filtering:
        $ gitingest -i "*.py" -e "*.log"
        $ gitingest --include-pattern "*.js" --exclude-pattern "node_modules/*"

    Private repositories:
        $ gitingest https://github.com/user/private-repo -t ghp_token
        $ GITHUB_TOKEN=ghp_token gitingest https://github.com/user/private-repo

    Include submodules:
        $ gitingest https://github.com/user/repo --include-submodules

    """
    asyncio.run(_async_main(**cli_kwargs))


async def _async_main(
    source: str,
    *,
    max_size: int = MAX_FILE_SIZE,
    exclude_pattern: tuple[str, ...] | None = None,
    include_pattern: tuple[str, ...] | None = None,
    branch: str | None = None,
    include_gitignored: bool = False,
    include_submodules: bool = False,
    token: str | None = None,
    output: str | None = None,
) -> None:
    """Analyze a directory or repository and create a text dump of its contents.

    This command scans the specified ``source`` (a local directory or Git repo),
    applies custom include and exclude patterns, and generates a text summary of
    the analysis.  The summary is written to an output file or printed to ``stdout``.

    Parameters
    ----------
    source : str
        A directory path or a Git repository URL.
    max_size : int
        Maximum file size in bytes to ingest (default: 10 MB).
    exclude_pattern : tuple[str, ...] | None
        Glob patterns for pruning the file set.
    include_pattern : tuple[str, ...] | None
        Glob patterns for including files in the output.
    branch : str | None
        Git branch to ingest. If ``None``, the repository's default branch is used.
    include_gitignored : bool
        If ``True``, also ingest files matched by ``.gitignore`` or ``.gitingestignore`` (default: ``False``).
    include_submodules : bool
        If ``True``, recursively include all Git submodules within the repository (default: ``False``).
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.
        Can also be set via the ``GITHUB_TOKEN`` environment variable.
    output : str | None
        The path where the output file will be written (default: ``digest.txt`` in current directory).
        Use ``"-"`` to write to ``stdout``.

    Raises
    ------
    click.Abort
        Raised if an error occurs during execution and the command must be aborted.

    """
    try:
        # Normalise pattern containers (the ingest layer expects sets)
        exclude_patterns = set(exclude_pattern) if exclude_pattern else set()
        include_patterns = set(include_pattern) if include_pattern else set()

        output_target = output if output is not None else OUTPUT_FILE_NAME

        if output_target == "-":
            click.echo("Analyzing source, preparing output for stdout...", err=True)
        else:
            click.echo(f"Analyzing source, output will be written to '{output_target}'...", err=True)

        summary, _, _ = await ingest_async(
            source,
            max_file_size=max_size,
            include_patterns=include_patterns,
            exclude_patterns=exclude_patterns,
            branch=branch,
            include_gitignored=include_gitignored,
            include_submodules=include_submodules,
            token=token,
            output=output_target,
        )
    except Exception as exc:
        # Convert any exception into Click.Abort so that exit status is non-zero
        click.echo(f"Error: {exc}", err=True)
        raise click.Abort from exc

    if output_target == "-":  # stdout
        click.echo("\n--- Summary ---", err=True)
        click.echo(summary, err=True)
        click.echo("--- End Summary ---", err=True)
        click.echo("Analysis complete! Output sent to stdout.", err=True)
    else:  # file
        click.echo(f"Analysis complete! Output written to: {output_target}")
        click.echo("\nSummary:")
        click.echo(summary)


if __name__ == "__main__":
    main()


================================================
FILE: src/gitingest/clone.py
================================================
"""Module containing functions for cloning a Git repository to a local path."""

from __future__ import annotations

from pathlib import Path
from typing import TYPE_CHECKING

import git

from gitingest.config import DEFAULT_TIMEOUT
from gitingest.utils.git_utils import (
    check_repo_exists,
    checkout_partial_clone,
    create_git_repo,
    ensure_git_installed,
    git_auth_context,
    is_github_host,
    resolve_commit,
)
from gitingest.utils.logging_config import get_logger
from gitingest.utils.os_utils import ensure_directory_exists_or_create
from gitingest.utils.timeout_wrapper import async_timeout

if TYPE_CHECKING:
    from gitingest.schemas import CloneConfig

# Initialize logger for this module
logger = get_logger(__name__)


@async_timeout(DEFAULT_TIMEOUT)
async def clone_repo(config: CloneConfig, *, token: str | None = None) -> None:
    """Clone a repository to a local path based on the provided configuration.

    This function handles the process of cloning a Git repository to the local file system.
    It can clone a specific branch, tag, or commit if provided, and it raises exceptions if
    any errors occur during the cloning process.

    Parameters
    ----------
    config : CloneConfig
        The configuration for cloning the repository.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Raises
    ------
    ValueError
        If the repository is not found, if the provided URL is invalid, or if the token format is invalid.
    RuntimeError
        If Git operations fail during the cloning process.

    """
    # Extract and validate query parameters
    url: str = config.url
    local_path: str = config.local_path
    partial_clone: bool = config.subpath != "/"

    logger.info(
        "Starting git clone operation",
        extra={
            "url": url,
            "local_path": local_path,
            "partial_clone": partial_clone,
            "subpath": config.subpath,
            "branch": config.branch,
            "tag": config.tag,
            "commit": config.commit,
            "include_submodules": config.include_submodules,
        },
    )

    logger.debug("Ensuring git is installed")
    await ensure_git_installed()

    logger.debug("Creating local directory", extra={"parent_path": str(Path(local_path).parent)})
    await ensure_directory_exists_or_create(Path(local_path).parent)

    logger.debug("Checking if repository exists", extra={"url": url})
    if not await check_repo_exists(url, token=token):
        logger.error("Repository not found", extra={"url": url})
        msg = "Repository not found. Make sure it is public or that you have provided a valid token."
        raise ValueError(msg)

    logger.debug("Resolving commit reference")
    commit = await resolve_commit(config, token=token)
    logger.debug("Resolved commit", extra={"commit": commit})

    # Clone the repository using GitPython with proper authentication
    logger.info("Executing git clone operation", extra={"url": "<redacted>", "local_path": local_path})
    try:
        clone_kwargs = {
            "single_branch": True,
            "no_checkout": True,
            "depth": 1,
        }

        with git_auth_context(url, token) as (git_cmd, auth_url):
            if partial_clone:
                # For partial clones, use git.Git() with filter and sparse options
                cmd_args = ["--single-branch", "--no-checkout", "--depth=1"]
                cmd_args.extend(["--filter=blob:none", "--sparse"])
                cmd_args.extend([auth_url, local_path])
                git_cmd.clone(*cmd_args)
            elif token and is_github_host(url):
                # For authenticated GitHub repos, use git_cmd with auth URL
                cmd_args = ["--single-branch", "--no-checkout", "--depth=1", auth_url, local_path]
                git_cmd.clone(*cmd_args)
            else:
                # For non-authenticated repos, use the standard GitPython method
                git.Repo.clone_from(url, local_path, **clone_kwargs)

        logger.info("Git clone completed successfully")
    except git.GitCommandError as exc:
        msg = f"Git clone failed: {exc}"
        raise RuntimeError(msg) from exc

    # Checkout the subpath if it is a partial clone
    if partial_clone:
        logger.info("Setting up partial clone for subpath", extra={"subpath": config.subpath})
        await checkout_partial_clone(config, token=token)
        logger.debug("Partial clone setup completed")

    # Perform post-clone operations
    await _perform_post_clone_operations(config, local_path, url, token, commit)

    logger.info("Git clone operation completed successfully", extra={"local_path": local_path})


async def _perform_post_clone_operations(
    config: CloneConfig,
    local_path: str,
    url: str,
    token: str | None,
    commit: str,
) -> None:
    """Perform post-clone operations like fetching, checkout, and submodule updates.

    Parameters
    ----------
    config : CloneConfig
        The configuration for cloning the repository.
    local_path : str
        The local path where the repository was cloned.
    url : str
        The repository URL.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.
    commit : str
        The commit SHA to checkout.

    Raises
    ------
    RuntimeError
        If any Git operation fails.

    """
    try:
        repo = create_git_repo(local_path, url, token)

        # Ensure the commit is locally available
        logger.debug("Fetching specific commit", extra={"commit": commit})
        repo.git.fetch("--depth=1", "origin", commit)

        # Write the work-tree at that commit
        logger.info("Checking out commit", extra={"commit": commit})
        repo.git.checkout(commit)

        # Update submodules
        if config.include_submodules:
            logger.info("Updating submodules")
            repo.git.submodule("update", "--init", "--recursive", "--depth=1")
            logger.debug("Submodules updated successfully")
    except git.GitCommandError as exc:
        msg = f"Git operation failed: {exc}"
        raise RuntimeError(msg) from exc


================================================
FILE: src/gitingest/config.py
================================================
"""Configuration file for the project."""

import tempfile
from pathlib import Path

MAX_FILE_SIZE = 10 * 1024 * 1024  # Maximum size of a single file to process (10 MB)
MAX_DIRECTORY_DEPTH = 20  # Maximum depth of directory traversal
MAX_FILES = 10_000  # Maximum number of files to process
MAX_TOTAL_SIZE_BYTES = 500 * 1024 * 1024  # Maximum size of output file (500 MB)
DEFAULT_TIMEOUT = 60  # seconds

OUTPUT_FILE_NAME = "digest.txt"

TMP_BASE_PATH = Path(tempfile.gettempdir()) / "gitingest"


================================================
FILE: src/gitingest/entrypoint.py
================================================
"""Main entry point for ingesting a source and processing its contents."""

from __future__ import annotations

import asyncio
import errno
import shutil
import stat
import sys
from contextlib import asynccontextmanager
from pathlib import Path
from typing import TYPE_CHECKING, AsyncGenerator, Callable
from urllib.parse import urlparse

from gitingest.clone import clone_repo
from gitingest.config import MAX_FILE_SIZE
from gitingest.ingestion import ingest_query
from gitingest.query_parser import parse_local_dir_path, parse_remote_repo
from gitingest.utils.auth import resolve_token
from gitingest.utils.compat_func import removesuffix
from gitingest.utils.ignore_patterns import load_ignore_patterns
from gitingest.utils.logging_config import get_logger
from gitingest.utils.pattern_utils import process_patterns
from gitingest.utils.query_parser_utils import KNOWN_GIT_HOSTS

if TYPE_CHECKING:
    from types import TracebackType

    from gitingest.schemas import IngestionQuery

# Initialize logger for this module
logger = get_logger(__name__)


async def ingest_async(
    source: str,
    *,
    max_file_size: int = MAX_FILE_SIZE,
    include_patterns: str | set[str] | None = None,
    exclude_patterns: str | set[str] | None = None,
    branch: str | None = None,
    tag: str | None = None,
    include_gitignored: bool = False,
    include_submodules: bool = False,
    token: str | None = None,
    output: str | None = None,
) -> tuple[str, str, str]:
    """Ingest a source and process its contents.

    This function analyzes a source (URL or local path), clones the corresponding repository (if applicable),
    and processes its files according to the specified query parameters. It returns a summary, a tree-like
    structure of the files, and the content of the files. The results can optionally be written to an output file.

    Parameters
    ----------
    source : str
        The source to analyze, which can be a URL (for a Git repository) or a local directory path.
    max_file_size : int
        Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
    include_patterns : str | set[str] | None
        Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
    exclude_patterns : str | set[str] | None
        Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
    branch : str | None
        The branch to clone and ingest (default: the default branch).
    tag : str | None
        The tag to clone and ingest. If ``None``, no tag is used.
    include_gitignored : bool
        If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
    include_submodules : bool
        If ``True``, recursively include all Git submodules within the repository (default: ``False``).
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.
        Can also be set via the ``GITHUB_TOKEN`` environment variable.
    output : str | None
        File path where the summary and content should be written.
        If ``"-"`` (dash), the results are written to ``stdout``.
        If ``None``, the results are not written to a file.

    Returns
    -------
    tuple[str, str, str]
        A tuple containing:
        - A summary string of the analyzed repository or directory.
        - A tree-like string representation of the file structure.
        - The content of the files in the repository or directory.

    """
    logger.info("Starting ingestion process", extra={"source": source})

    token = resolve_token(token)

    source = removesuffix(source.strip(), ".git")

    # Determine the parsing method based on the source type
    if urlparse(source).scheme in ("https", "http") or any(h in source for h in KNOWN_GIT_HOSTS):
        # We either have a full URL or a domain-less slug
        logger.info("Parsing remote repository", extra={"source": source})
        query = await parse_remote_repo(source, token=token)
        query.include_submodules = include_submodules
        _override_branch_and_tag(query, branch=branch, tag=tag)

    else:
        # Local path scenario
        logger.info("Processing local directory", extra={"source": source})
        query = parse_local_dir_path(source)

    query.max_file_size = max_file_size
    query.ignore_patterns, query.include_patterns = process_patterns(
        exclude_patterns=exclude_patterns,
        include_patterns=include_patterns,
    )

    if query.url:
        _override_branch_and_tag(query, branch=branch, tag=tag)

    query.include_submodules = include_submodules

    logger.debug(
        "Configuration completed",
        extra={
            "max_file_size": query.max_file_size,
            "include_submodules": query.include_submodules,
            "include_gitignored": include_gitignored,
            "has_include_patterns": bool(query.include_patterns),
            "has_exclude_patterns": bool(query.ignore_patterns),
        },
    )

    async with _clone_repo_if_remote(query, token=token):
        if query.url:
            logger.info("Repository cloned, starting file processing")
        else:
            logger.info("Starting local directory processing")

        if not include_gitignored:
            logger.debug("Applying gitignore patterns")
            _apply_gitignores(query)

        logger.info("Processing files and generating output")
        summary, tree, content = ingest_query(query)

        if output:
            logger.debug("Writing output to file", extra={"output_path": output})
        await _write_output(tree, content=content, target=output)

        logger.info("Ingestion completed successfully")
        return summary, tree, content


def ingest(
    source: str,
    *,
    max_file_size: int = MAX_FILE_SIZE,
    include_patterns: str | set[str] | None = None,
    exclude_patterns: str | set[str] | None = None,
    branch: str | None = None,
    tag: str | None = None,
    include_gitignored: bool = False,
    include_submodules: bool = False,
    token: str | None = None,
    output: str | None = None,
) -> tuple[str, str, str]:
    """Provide a synchronous wrapper around ``ingest_async``.

    This function analyzes a source (URL or local path), clones the corresponding repository (if applicable),
    and processes its files according to the specified query parameters. It returns a summary, a tree-like
    structure of the files, and the content of the files. The results can optionally be written to an output file.

    Parameters
    ----------
    source : str
        The source to analyze, which can be a URL (for a Git repository) or a local directory path.
    max_file_size : int
        Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
    include_patterns : str | set[str] | None
        Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
    exclude_patterns : str | set[str] | None
        Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
    branch : str | None
        The branch to clone and ingest (default: the default branch).
    tag : str | None
        The tag to clone and ingest. If ``None``, no tag is used.
    include_gitignored : bool
        If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
    include_submodules : bool
        If ``True``, recursively include all Git submodules within the repository (default: ``False``).
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.
        Can also be set via the ``GITHUB_TOKEN`` environment variable.
    output : str | None
        File path where the summary and content should be written.
        If ``"-"`` (dash), the results are written to ``stdout``.
        If ``None``, the results are not written to a file.

    Returns
    -------
    tuple[str, str, str]
        A tuple containing:
        - A summary string of the analyzed repository or directory.
        - A tree-like string representation of the file structure.
        - The content of the files in the repository or directory.

    See Also
    --------
    ``ingest_async`` : The asynchronous version of this function.

    """
    return asyncio.run(
        ingest_async(
            source=source,
            max_file_size=max_file_size,
            include_patterns=include_patterns,
            exclude_patterns=exclude_patterns,
            branch=branch,
            tag=tag,
            include_gitignored=include_gitignored,
            include_submodules=include_submodules,
            token=token,
            output=output,
        ),
    )


def _override_branch_and_tag(query: IngestionQuery, branch: str | None, tag: str | None) -> None:
    """Compare the caller-supplied ``branch`` and ``tag`` with the ones already in ``query``.

    If they differ, update ``query`` to the chosen values and issue a warning.
    If both are specified, the tag wins over the branch.

    Parameters
    ----------
    query : IngestionQuery
        The query to update.
    branch : str | None
        The branch to use.
    tag : str | None
        The tag to use.

    """
    if tag and query.tag and tag != query.tag:
        msg = f"Warning: The specified tag '{tag}' overrides the tag found in the URL '{query.tag}'."
        logger.warning(msg)

    query.tag = tag or query.tag

    if branch and query.branch and branch != query.branch:
        msg = f"Warning: The specified branch '{branch}' overrides the branch found in the URL '{query.branch}'."
        logger.warning(msg)

    query.branch = branch or query.branch

    if tag and branch:
        msg = "Warning: Both tag and branch are specified. The tag will be used."
        logger.warning(msg)

    # Tag wins over branch if both supplied
    if query.tag:
        query.branch = None


def _apply_gitignores(query: IngestionQuery) -> None:
    """Update ``query.ignore_patterns`` in-place.

    Parameters
    ----------
    query : IngestionQuery
        The query to update.

    """
    for fname in (".gitignore", ".gitingestignore"):
        query.ignore_patterns.update(load_ignore_patterns(query.local_path, filename=fname))


@asynccontextmanager
async def _clone_repo_if_remote(query: IngestionQuery, *, token: str | None) -> AsyncGenerator[None]:
    """Async context-manager that clones ``query.url`` if present.

    If ``query.url`` is set, the repo is cloned, control is yielded, and the temp directory is removed on exit.
    If no URL is given, the function simply yields immediately.

    Parameters
    ----------
    query : IngestionQuery
        Parsed query describing the source to ingest.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    """
    kwargs = {}
    if sys.version_info >= (3, 12):
        kwargs["onexc"] = _handle_remove_readonly
    else:
        kwargs["onerror"] = _handle_remove_readonly

    if query.url:
        clone_config = query.extract_clone_config()
        await clone_repo(clone_config, token=token)
        try:
            yield
        finally:
            shutil.rmtree(query.local_path.parent, **kwargs)
    else:
        yield


def _handle_remove_readonly(
    func: Callable,
    path: str,
    exc_info: BaseException | tuple[type[BaseException], BaseException, TracebackType],
) -> None:
    """Handle permission errors raised by ``shutil.rmtree()``.

    * Makes the target writable (removes the read-only attribute).
    * Retries the original operation (``func``) once.

    """
    # 'onerror' passes a (type, value, tb) tuple; 'onexc' passes the exception
    if isinstance(exc_info, tuple):  # 'onerror' (Python <3.12)
        exc: BaseException = exc_info[1]
    else:  # 'onexc' (Python 3.12+)
        exc = exc_info

    # Handle only'Permission denied' and 'Operation not permitted'
    if not isinstance(exc, OSError) or exc.errno not in {errno.EACCES, errno.EPERM}:
        raise exc

    # Make the target writable
    Path(path).chmod(stat.S_IWRITE)
    func(path)


async def _write_output(tree: str, content: str, target: str | None) -> None:
    """Write combined output to ``target`` (``"-"`` ⇒ stdout).

    Parameters
    ----------
    tree : str
        The tree-like string representation of the file structure.
    content : str
        The content of the files in the repository or directory.
    target : str | None
        The path to the output file. If ``None``, the results are not written to a file.

    """
    data = f"{tree}\n{content}"
    loop = asyncio.get_running_loop()
    if target == "-":
        await loop.run_in_executor(None, sys.stdout.write, data)
        await loop.run_in_executor(None, sys.stdout.flush)
    elif target is not None:
        await loop.run_in_executor(None, Path(target).write_text, data, "utf-8")


================================================
FILE: src/gitingest/ingestion.py
================================================
"""Functions to ingest and analyze a codebase directory or single file."""

from __future__ import annotations

from pathlib import Path
from typing import TYPE_CHECKING

from gitingest.config import MAX_DIRECTORY_DEPTH, MAX_FILES, MAX_TOTAL_SIZE_BYTES
from gitingest.output_formatter import format_node
from gitingest.schemas import FileSystemNode, FileSystemNodeType, FileSystemStats
from gitingest.utils.ingestion_utils import _should_exclude, _should_include
from gitingest.utils.logging_config import get_logger

if TYPE_CHECKING:
    from gitingest.schemas import IngestionQuery

# Initialize logger for this module
logger = get_logger(__name__)


def ingest_query(query: IngestionQuery) -> tuple[str, str, str]:
    """Run the ingestion process for a parsed query.

    This is the main entry point for analyzing a codebase directory or single file. It processes the query
    parameters, reads the file or directory content, and generates a summary, directory structure, and file content,
    along with token estimations.

    Parameters
    ----------
    query : IngestionQuery
        The parsed query object containing information about the repository and query parameters.

    Returns
    -------
    tuple[str, str, str]
        A tuple containing the summary, directory structure, and file contents.

    Raises
    ------
    ValueError
        If the path cannot be found, is not a file, or the file has no content.

    """
    logger.info(
        "Starting file ingestion",
        extra={
            "slug": query.slug,
            "subpath": query.subpath,
            "local_path": str(query.local_path),
            "max_file_size": query.max_file_size,
        },
    )

    subpath = Path(query.subpath.strip("/")).as_posix()
    path = query.local_path / subpath

    if not path.exists():
        logger.error("Path not found", extra={"path": str(path), "slug": query.slug})
        msg = f"{query.slug} cannot be found"
        raise ValueError(msg)

    if (query.type and query.type == "blob") or query.local_path.is_file():
        # TODO: We do this wrong! We should still check the branch and commit!
        logger.info("Processing single file", extra={"file_path": str(path)})

        if not path.is_file():
            logger.error("Expected file but found non-file", extra={"path": str(path)})
            msg = f"Path {path} is not a file"
            raise ValueError(msg)

        relative_path = path.relative_to(query.local_path)

        file_node = FileSystemNode(
            name=path.name,
            type=FileSystemNodeType.FILE,
            size=path.stat().st_size,
            file_count=1,
            path_str=str(relative_path),
            path=path,
        )

        if not file_node.content:
            logger.error("File has no content", extra={"file_name": file_node.name})
            msg = f"File {file_node.name} has no content"
            raise ValueError(msg)

        logger.info(
            "Single file processing completed",
            extra={
                "file_name": file_node.name,
                "file_size": file_node.size,
            },
        )
        return format_node(file_node, query=query)

    logger.info("Processing directory", extra={"directory_path": str(path)})

    root_node = FileSystemNode(
        name=path.name,
        type=FileSystemNodeType.DIRECTORY,
        path_str=str(path.relative_to(query.local_path)),
        path=path,
    )

    stats = FileSystemStats()

    _process_node(node=root_node, query=query, stats=stats)

    logger.info(
        "Directory processing completed",
        extra={
            "total_files": root_node.file_count,
            "total_directories": root_node.dir_count,
            "total_size_bytes": root_node.size,
            "stats_total_files": stats.total_files,
            "stats_total_size": stats.total_size,
        },
    )

    return format_node(root_node, query=query)


def _process_node(node: FileSystemNode, query: IngestionQuery, stats: FileSystemStats) -> None:
    """Process a file or directory item within a directory.

    This function handles each file or directory item, checking if it should be included or excluded based on the
    provided patterns. It handles symlinks, directories, and files accordingly.

    Parameters
    ----------
    node : FileSystemNode
        The current directory or file node being processed.
    query : IngestionQuery
        The parsed query object containing information about the repository and query parameters.
    stats : FileSystemStats
        Statistics tracking object for the total file count and size.

    """
    if limit_exceeded(stats, depth=node.depth):
        return

    for sub_path in node.path.iterdir():
        if query.ignore_patterns and _should_exclude(sub_path, query.local_path, query.ignore_patterns):
            continue

        if query.include_patterns and not _should_include(sub_path, query.local_path, query.include_patterns):
            continue

        if sub_path.is_symlink():
            _process_symlink(path=sub_path, parent_node=node, stats=stats, local_path=query.local_path)
        elif sub_path.is_file():
            if sub_path.stat().st_size > query.max_file_size:
                logger.debug(
                    "Skipping file: would exceed max file size limit",
                    extra={
                        "file_path": str(sub_path),
                        "file_size": sub_path.stat().st_size,
                        "max_file_size": query.max_file_size,
                    },
                )
                continue
            _process_file(path=sub_path, parent_node=node, stats=stats, local_path=query.local_path)
        elif sub_path.is_dir():
            child_directory_node = FileSystemNode(
                name=sub_path.name,
                type=FileSystemNodeType.DIRECTORY,
                path_str=str(sub_path.relative_to(query.local_path)),
                path=sub_path,
                depth=node.depth + 1,
            )

            _process_node(node=child_directory_node, query=query, stats=stats)

            if not child_directory_node.children:
                continue

            node.children.append(child_directory_node)
            node.size += child_directory_node.size
            node.file_count += child_directory_node.file_count
            node.dir_count += 1 + child_directory_node.dir_count
        else:
            logger.warning("Unknown file type, skipping", extra={"file_path": str(sub_path)})

    node.sort_children()


def _process_symlink(path: Path, parent_node: FileSystemNode, stats: FileSystemStats, local_path: Path) -> None:
    """Process a symlink in the file system.

    This function checks the symlink's target.

    Parameters
    ----------
    path : Path
        The full path of the symlink.
    parent_node : FileSystemNode
        The parent directory node.
    stats : FileSystemStats
        Statistics tracking object for the total file count and size.
    local_path : Path
        The base path of the repository or directory being processed.

    """
    child = FileSystemNode(
        name=path.name,
        type=FileSystemNodeType.SYMLINK,
        path_str=str(path.relative_to(local_path)),
        path=path,
        depth=parent_node.depth + 1,
    )
    stats.total_files += 1
    parent_node.children.append(child)
    parent_node.file_count += 1


def _process_file(path: Path, parent_node: FileSystemNode, stats: FileSystemStats, local_path: Path) -> None:
    """Process a file in the file system.

    This function checks the file's size, increments the statistics, and reads its content.
    If the file size exceeds the maximum allowed, it raises an error.

    Parameters
    ----------
    path : Path
        The full path of the file.
    parent_node : FileSystemNode
        The dictionary to accumulate the results.
    stats : FileSystemStats
        Statistics tracking object for the total file count and size.
    local_path : Path
        The base path of the repository or directory being processed.

    """
    if stats.total_files + 1 > MAX_FILES:
        logger.warning(
            "Maximum file limit reached",
            extra={
                "current_files": stats.total_files,
                "max_files": MAX_FILES,
                "file_path": str(path),
            },
        )
        return

    file_size = path.stat().st_size
    if stats.total_size + file_size > MAX_TOTAL_SIZE_BYTES:
        logger.warning(
            "Skipping file: would exceed total size limit",
            extra={
                "file_path": str(path),
                "file_size": file_size,
                "current_total_size": stats.total_size,
                "max_total_size": MAX_TOTAL_SIZE_BYTES,
            },
        )
        return

    stats.total_files += 1
    stats.total_size += file_size

    child = FileSystemNode(
        name=path.name,
        type=FileSystemNodeType.FILE,
        size=file_size,
        file_count=1,
        path_str=str(path.relative_to(local_path)),
        path=path,
        depth=parent_node.depth + 1,
    )

    parent_node.children.append(child)
    parent_node.size += file_size
    parent_node.file_count += 1


def limit_exceeded(stats: FileSystemStats, depth: int) -> bool:
    """Check if any of the traversal limits have been exceeded.

    This function checks if the current traversal has exceeded any of the configured limits:
    maximum directory depth, maximum number of files, or maximum total size in bytes.

    Parameters
    ----------
    stats : FileSystemStats
        Statistics tracking object for the total file count and size.
    depth : int
        The current depth of directory traversal.

    Returns
    -------
    bool
        ``True`` if any limit has been exceeded, ``False`` otherwise.

    """
    if depth > MAX_DIRECTORY_DEPTH:
        logger.warning(
            "Maximum directory depth limit reached",
            extra={
                "current_depth": depth,
                "max_depth": MAX_DIRECTORY_DEPTH,
            },
        )
        return True

    if stats.total_files >= MAX_FILES:
        logger.warning(
            "Maximum file limit reached",
            extra={
                "current_files": stats.total_files,
                "max_files": MAX_FILES,
            },
        )
        return True  # TODO: end recursion

    if stats.total_size >= MAX_TOTAL_SIZE_BYTES:
        logger.warning(
            "Maximum total size limit reached",
            extra={
                "current_size_mb": stats.total_size / 1024 / 1024,
                "max_size_mb": MAX_TOTAL_SIZE_BYTES / 1024 / 1024,
            },
        )
        return True  # TODO: end recursion

    return False


================================================
FILE: src/gitingest/output_formatter.py
================================================
"""Functions to ingest and analyze a codebase directory or single file."""

from __future__ import annotations

import ssl
from typing import TYPE_CHECKING

import requests.exceptions
import tiktoken

from gitingest.schemas import FileSystemNode, FileSystemNodeType
from gitingest.utils.compat_func import readlink
from gitingest.utils.logging_config import get_logger

if TYPE_CHECKING:
    from gitingest.schemas import IngestionQuery

# Initialize logger for this module
logger = get_logger(__name__)

_TOKEN_THRESHOLDS: list[tuple[int, str]] = [
    (1_000_000, "M"),
    (1_000, "k"),
]


def format_node(node: FileSystemNode, query: IngestionQuery) -> tuple[str, str, str]:
    """Generate a summary, directory structure, and file contents for a given file system node.

    If the node represents a directory, the function will recursively process its contents.

    Parameters
    ----------
    node : FileSystemNode
        The file system node to be summarized.
    query : IngestionQuery
        The parsed query object containing information about the repository and query parameters.

    Returns
    -------
    tuple[str, str, str]
        A tuple containing the summary, directory structure, and file contents.

    """
    is_single_file = node.type == FileSystemNodeType.FILE
    summary = _create_summary_prefix(query, single_file=is_single_file)

    if node.type == FileSystemNodeType.DIRECTORY:
        summary += f"Files analyzed: {node.file_count}\n"
    elif node.type == FileSystemNodeType.FILE:
        summary += f"File: {node.name}\n"
        summary += f"Lines: {len(node.content.splitlines()):,}\n"

    tree = "Directory structure:\n" + _create_tree_structure(query, node=node)

    content = _gather_file_contents(node)

    token_estimate = _format_token_count(tree + content)
    if token_estimate:
        summary += f"\nEstimated tokens: {token_estimate}"

    return summary, tree, content


def _create_summary_prefix(query: IngestionQuery, *, single_file: bool = False) -> str:
    """Create a prefix string for summarizing a repository or local directory.

    Includes repository name (if provided), commit/branch details, and subpath if relevant.

    Parameters
    ----------
    query : IngestionQuery
        The parsed query object containing information about the repository and query parameters.
    single_file : bool
        A flag indicating whether the summary is for a single file (default: ``False``).

    Returns
    -------
    str
        A summary prefix string containing repository, commit, branch, and subpath details.

    """
    parts = []

    if query.user_name:
        parts.append(f"Repository: {query.user_name}/{query.repo_name}")
    else:
        # Local scenario
        parts.append(f"Directory: {query.slug}")

    if query.tag:
        parts.append(f"Tag: {query.tag}")
    elif query.branch and query.branch not in ("main", "master"):
        parts.append(f"Branch: {query.branch}")

    if query.commit:
        parts.append(f"Commit: {query.commit}")

    if query.subpath != "/" and not single_file:
        parts.append(f"Subpath: {query.subpath}")

    return "\n".join(parts) + "\n"


def _gather_file_contents(node: FileSystemNode) -> str:
    """Recursively gather contents of all files under the given node.

    This function recursively processes a directory node and gathers the contents of all files
    under that node. It returns the concatenated content of all files as a single string.

    Parameters
    ----------
    node : FileSystemNode
        The current directory or file node being processed.

    Returns
    -------
    str
        The concatenated content of all files under the given node.

    """
    if node.type != FileSystemNodeType.DIRECTORY:
        return node.content_string

    # Recursively gather contents of all files under the current directory
    return "\n".join(_gather_file_contents(child) for child in node.children)


def _create_tree_structure(
    query: IngestionQuery,
    *,
    node: FileSystemNode,
    prefix: str = "",
    is_last: bool = True,
) -> str:
    """Generate a tree-like string representation of the file structure.

    This function generates a string representation of the directory structure, formatted
    as a tree with appropriate indentation for nested directories and files.

    Parameters
    ----------
    query : IngestionQuery
        The parsed query object containing information about the repository and query parameters.
    node : FileSystemNode
        The current directory or file node being processed.
    prefix : str
        A string used for indentation and formatting of the tree structure (default: ``""``).
    is_last : bool
        A flag indicating whether the current node is the last in its directory (default: ``True``).

    Returns
    -------
    str
        A string representing the directory structure formatted as a tree.

    """
    if not node.name:
        # If no name is present, use the slug as the top-level directory name
        node.name = query.slug

    tree_str = ""
    current_prefix = "└── " if is_last else "├── "

    # Indicate directories with a trailing slash
    display_name = node.name
    if node.type == FileSystemNodeType.DIRECTORY:
        display_name += "/"
    elif node.type == FileSystemNodeType.SYMLINK:
        display_name += " -> " + readlink(node.path).name

    tree_str += f"{prefix}{current_prefix}{display_name}\n"

    if node.type == FileSystemNodeType.DIRECTORY and node.children:
        prefix += "    " if is_last else "│   "
        for i, child in enumerate(node.children):
            tree_str += _create_tree_structure(query, node=child, prefix=prefix, is_last=i == len(node.children) - 1)
    return tree_str


def _format_token_count(text: str) -> str | None:
    """Return a human-readable token-count string (e.g. 1.2k, 1.2 M).

    Parameters
    ----------
    text : str
        The text string for which the token count is to be estimated.

    Returns
    -------
    str | None
        The formatted number of tokens as a string (e.g., ``"1.2k"``, ``"1.2M"``), or ``None`` if an error occurs.

    """
    try:
        encoding = tiktoken.get_encoding("o200k_base")  # gpt-4o, gpt-4o-mini
        total_tokens = len(encoding.encode(text, disallowed_special=()))
    except (ValueError, UnicodeEncodeError) as exc:
        logger.warning("Failed to estimate token size", extra={"error": str(exc)})
        return None
    except (requests.exceptions.RequestException, ssl.SSLError) as exc:
        # If network errors, skip token count estimation instead of erroring out
        logger.warning("Failed to download tiktoken model", extra={"error": str(exc)})
        return None

    for threshold, suffix in _TOKEN_THRESHOLDS:
        if total_tokens >= threshold:
            return f"{total_tokens / threshold:.1f}{suffix}"

    return str(total_tokens)


================================================
FILE: src/gitingest/query_parser.py
================================================
"""Module containing functions to parse and validate input sources and patterns."""

from __future__ import annotations

import uuid
from pathlib import Path
from typing import Literal

from gitingest.config import TMP_BASE_PATH
from gitingest.schemas import IngestionQuery
from gitingest.utils.git_utils import fetch_remote_branches_or_tags, resolve_commit
from gitingest.utils.logging_config import get_logger
from gitingest.utils.query_parser_utils import (
    PathKind,
    _fallback_to_root,
    _get_user_and_repo_from_path,
    _is_valid_git_commit_hash,
    _normalise_source,
)

# Initialize logger for this module
logger = get_logger(__name__)


async def parse_remote_repo(source: str, token: str | None = None) -> IngestionQuery:
    """Parse a repository URL and return an ``IngestionQuery`` object.

    If source is:
      - A fully qualified URL ('https://gitlab.com/...'), parse & verify that domain
      - A URL missing 'https://' ('gitlab.com/...'), add 'https://' and parse
      - A *slug* ('pandas-dev/pandas'), attempt known domains until we find one that exists.

    Parameters
    ----------
    source : str
        The URL or domain-less slug to parse.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    IngestionQuery
        A dictionary containing the parsed details of the repository.

    """
    parsed_url = await _normalise_source(source, token=token)
    host = parsed_url.netloc
    user, repo = _get_user_and_repo_from_path(parsed_url.path)

    _id = uuid.uuid4()
    slug = f"{user}-{repo}"
    local_path = TMP_BASE_PATH / str(_id) / slug
    url = f"https://{host}/{user}/{repo}"

    query = IngestionQuery(
        host=host,
        user_name=user,
        repo_name=repo,
        url=url,
        local_path=local_path,
        slug=slug,
        id=_id,
    )

    path_parts = parsed_url.path.strip("/").split("/")[2:]

    # main branch
    if not path_parts:
        return await _fallback_to_root(query, token=token)

    kind = PathKind(path_parts.pop(0))  # may raise ValueError
    query.type = kind

    # TODO: Handle issues and pull requests
    if query.type in {PathKind.ISSUES, PathKind.PULL}:
        msg = f"Warning: Issues and pull requests are not yet supported: {url}. Returning repository root."
        return await _fallback_to_root(query, token=token, warn_msg=msg)

    # If no extra path parts, just return
    if not path_parts:
        msg = f"Warning: No extra path parts: {url}. Returning repository root."
        return await _fallback_to_root(query, token=token, warn_msg=msg)

    if query.type not in {PathKind.TREE, PathKind.BLOB}:
        # TODO: Handle other types
        msg = f"Warning: Type '{query.type}' is not yet supported: {url}. Returning repository root."
        return await _fallback_to_root(query, token=token, warn_msg=msg)

    # Commit, branch, or tag
    ref = path_parts[0]

    if _is_valid_git_commit_hash(ref):  # Commit
        query.commit = ref
        path_parts.pop(0)  # Consume the commit hash
    else:  # Branch or tag
        # Try to resolve a tag
        query.tag = await _configure_branch_or_tag(
            path_parts,
            url=url,
            ref_type="tags",
            token=token,
        )

        # If no tag found, try to resolve a branch
        if not query.tag:
            query.branch = await _configure_branch_or_tag(
                path_parts,
                url=url,
                ref_type="branches",
                token=token,
            )

    # Only configure subpath if we have identified a commit, branch, or tag.
    if path_parts and (query.commit or query.branch or query.tag):
        query.subpath += "/".join(path_parts)

    query.commit = await resolve_commit(query.extract_clone_config(), token=token)

    return query


def parse_local_dir_path(path_str: str) -> IngestionQuery:
    """Parse the given file path into a structured query dictionary.

    Parameters
    ----------
    path_str : str
        The file path to parse.

    Returns
    -------
    IngestionQuery
        A dictionary containing the parsed details of the file path.

    """
    path_obj = Path(path_str).resolve()
    slug = path_obj.name if path_str == "." else path_str.strip("/")
    return IngestionQuery(local_path=path_obj, slug=slug, id=uuid.uuid4())


async def _configure_branch_or_tag(
    path_parts: list[str],
    *,
    url: str,
    ref_type: Literal["branches", "tags"],
    token: str | None = None,
) -> str | None:
    """Configure the branch or tag based on the remaining parts of the URL.

    Parameters
    ----------
    path_parts : list[str]
        The path parts of the URL.
    url : str
        The URL of the repository.
    ref_type : Literal["branches", "tags"]
        The type of reference to configure. Can be "branches" or "tags".
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str | None
        The branch or tag name if found, otherwise ``None``.

    """
    _ref_type = "tags" if ref_type == "tags" else "branches"

    try:
        # Fetch the list of branches or tags from the remote repository
        branches_or_tags: list[str] = await fetch_remote_branches_or_tags(url, ref_type=_ref_type, token=token)
    except RuntimeError as exc:
        # If remote discovery fails, we optimistically treat the first path segment as the branch/tag.
        msg = f"Warning: Failed to fetch {_ref_type}: {exc}"
        logger.warning(msg)
        return path_parts.pop(0) if path_parts else None

    # Iterate over the path components and try to find a matching branch/tag
    candidate_parts: list[str] = []

    for part in path_parts:
        candidate_parts.append(part)
        candidate_name = "/".join(candidate_parts)
        if candidate_name in branches_or_tags:
            # We found a match — now consume exactly the parts that form the branch/tag
            del path_parts[: len(candidate_parts)]
            return candidate_name

    # No match found; leave path_parts intact
    return None


================================================
FILE: src/gitingest/schemas/__init__.py
================================================
"""Module containing the schemas for the Gitingest package."""

from gitingest.schemas.cloning import CloneConfig
from gitingest.schemas.filesystem import FileSystemNode, FileSystemNodeType, FileSystemStats
from gitingest.schemas.ingestion import IngestionQuery

__all__ = ["CloneConfig", "FileSystemNode", "FileSystemNodeType", "FileSystemStats", "IngestionQuery"]


================================================
FILE: src/gitingest/schemas/cloning.py
================================================
"""Schema for the cloning process."""

from __future__ import annotations

from pydantic import BaseModel, Field


class CloneConfig(BaseModel):  # pylint: disable=too-many-instance-attributes
    """Configuration for cloning a Git repository.

    This model holds the necessary parameters for cloning a repository to a local path, including
    the repository's URL, the target local path, and optional parameters for a specific commit, branch, or tag.

    Attributes
    ----------
    url : str
        The URL of the Git repository to clone.
    local_path : str
        The local directory where the repository will be cloned.
    commit : str | None
        The specific commit hash to check out after cloning.
    branch : str | None
        The branch to clone.
    tag : str | None
        The tag to clone.
    subpath : str
        The subpath to clone from the repository (default: ``"/"``).
    blob : bool
        Whether the repository is a blob (default: ``False``).
    include_submodules : bool
        Whether to clone submodules (default: ``False``).

    """

    url: str
    local_path: str
    commit: str | None = None
    branch: str | None = None
    tag: str | None = None
    subpath: str = Field(default="/")
    blob: bool = Field(default=False)
    include_submodules: bool = Field(default=False)


================================================
FILE: src/gitingest/schemas/filesystem.py
================================================
"""Schema for the filesystem representation."""

from __future__ import annotations

import os
from dataclasses import dataclass, field
from enum import Enum, auto
from typing import TYPE_CHECKING

from gitingest.utils.compat_func import readlink
from gitingest.utils.file_utils import _decodes, _get_preferred_encodings, _read_chunk
from gitingest.utils.notebook import process_notebook

if TYPE_CHECKING:
    from pathlib import Path

SEPARATOR = "=" * 48  # Tiktoken, the tokenizer openai uses, counts 2 tokens if we have more than 48


class FileSystemNodeType(Enum):
    """Enum representing the type of a file system node (directory or file)."""

    DIRECTORY = auto()
    FILE = auto()
    SYMLINK = auto()


@dataclass
class FileSystemStats:
    """Class for tracking statistics during file system traversal."""

    total_files: int = 0
    total_size: int = 0


@dataclass
class FileSystemNode:  # pylint: disable=too-many-instance-attributes
    """Class representing a node in the file system (either a file or directory).

    Tracks properties of files/directories for comprehensive analysis.
    """

    name: str
    type: FileSystemNodeType
    path_str: str
    path: Path
    size: int = 0
    file_count: int = 0
    dir_count: int = 0
    depth: int = 0
    children: list[FileSystemNode] = field(default_factory=list)

    def sort_children(self) -> None:
        """Sort the children nodes of a directory according to a specific order.

        Order of sorting:
          2. Regular files (not starting with dot)
          3. Hidden files (starting with dot)
          4. Regular directories (not starting with dot)
          5. Hidden directories (starting with dot)

        All groups are sorted alphanumerically within themselves.

        Raises
        ------
        ValueError
            If the node is not a directory.

        """
        if self.type != FileSystemNodeType.DIRECTORY:
            msg = "Cannot sort children of a non-directory node"
            raise ValueError(msg)

        def _sort_key(child: FileSystemNode) -> tuple[int, str]:
            # returns the priority order for the sort function, 0 is first
            # Groups: 0=README, 1=regular file, 2=hidden file, 3=regular dir, 4=hidden dir
            name = child.name.lower()
            if child.type == FileSystemNodeType.FILE:
                if name == "readme" or name.startswith("readme."):
                    return (0, name)
                return (1 if not name.startswith(".") else 2, name)
            return (3 if not name.startswith(".") else 4, name)

        self.children.sort(key=_sort_key)

    @property
    def content_string(self) -> str:
        """Return the content of the node as a string, including path and content.

        Returns
        -------
        str
            A string representation of the node's content.

        """
        parts = [
            SEPARATOR,
            f"{self.type.name}: {str(self.path_str).replace(os.sep, '/')}"
            + (f" -> {readlink(self.path).name}" if self.type == FileSystemNodeType.SYMLINK else ""),
            SEPARATOR,
            f"{self.content}",
        ]

        return "\n".join(parts) + "\n\n"

    @property
    def content(self) -> str:  # pylint: disable=too-many-return-statements
        """Return file content (if text / notebook) or an explanatory placeholder.

        Heuristically decides whether the file is text or binary by decoding a small chunk of the file
        with multiple encodings and checking for common binary markers.

        Returns
        -------
        str
            The content of the file, or an error message if the file could not be read.

        Raises
        ------
        ValueError
            If the node is a directory.

        """
        if self.type == FileSystemNodeType.DIRECTORY:
            msg = "Cannot read content of a directory node"
            raise ValueError(msg)

        if self.type == FileSystemNodeType.SYMLINK:
            return ""  # TODO: are we including the empty content of symlinks?

        if self.path.suffix == ".ipynb":  # Notebook
            try:
                return process_notebook(self.path)
            except Exception as exc:
                return f"Error processing notebook: {exc}"

        chunk = _read_chunk(self.path)

        if chunk is None:
            return "Error reading file"

        if chunk == b"":
            return "[Empty file]"

        if not _decodes(chunk, "utf-8"):
            return "[Binary file]"

        # Find the first encoding that decodes the sample
        good_enc: str | None = next(
            (enc for enc in _get_preferred_encodings() if _decodes(chunk, encoding=enc)),
            None,
        )

        if good_enc is None:
            return "Error: Unable to decode file with available encodings"

        try:
            with self.path.open(encoding=good_enc) as fp:
                return fp.read()
        except (OSError, UnicodeDecodeError) as exc:
            return f"Error reading file with {good_enc!r}: {exc}"


================================================
FILE: src/gitingest/schemas/ingestion.py
================================================
"""Module containing the dataclasses for the ingestion process."""

from __future__ import annotations

from pathlib import Path  # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)
from uuid import UUID  # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)

from pydantic import BaseModel, Field

from gitingest.config import MAX_FILE_SIZE
from gitingest.schemas.cloning import CloneConfig


class IngestionQuery(BaseModel):  # pylint: disable=too-many-instance-attributes
    """Pydantic model to store the parsed details of the repository or file path.

    Attributes
    ----------
    host : str | None
        The host of the repository.
    user_name : str | None
        The username or owner of the repository.
    repo_name : str | None
        The name of the repository.
    local_path : Path
        The local path to the repository or file.
    url : str | None
        The URL of the repository.
    slug : str
        The slug of the repository.
    id : UUID
        The ID of the repository.
    subpath : str
        The subpath to the repository or file (default: ``"/"``).
    type : str | None
        The type of the repository or file.
    branch : str | None
        The branch of the repository.
    commit : str | None
        The commit of the repository.
    tag : str | None
        The tag of the repository.
    max_file_size : int
        The maximum file size to ingest in bytes (default: 10 MB).
    ignore_patterns : set[str]
        The patterns to ignore (default: ``set()``).
    include_patterns : set[str] | None
        The patterns to include.
    include_submodules : bool
        Whether to include all Git submodules within the repository. (default: ``False``)
    s3_url : str | None
        The S3 URL where the digest is stored if S3 is enabled.

    """

    host: str | None = None
    user_name: str | None = None
    repo_name: str | None = None
    local_path: Path
    url: str | None = None
    slug: str
    id: UUID
    subpath: str = Field(default="/")
    type: str | None = None
    branch: str | None = None
    commit: str | None = None
    tag: str | None = None
    max_file_size: int = Field(default=MAX_FILE_SIZE)
    ignore_patterns: set[str] = Field(default_factory=set)  # TODO: ssame type for ignore_* and include_* patterns
    include_patterns: set[str] | None = None
    include_submodules: bool = Field(default=False)
    s3_url: str | None = None

    def extract_clone_config(self) -> CloneConfig:
        """Extract the relevant fields for the CloneConfig object.

        Returns
        -------
        CloneConfig
            A CloneConfig object containing the relevant fields.

        Raises
        ------
        ValueError
            If the ``url`` parameter is not provided.

        """
        if not self.url:
            msg = "The 'url' parameter is required."
            raise ValueError(msg)

        return CloneConfig(
            url=self.url,
            local_path=str(self.local_path),
            commit=self.commit,
            branch=self.branch,
            tag=self.tag,
            subpath=self.subpath,
            blob=self.type == "blob",
            include_submodules=self.include_submodules,
        )


================================================
FILE: src/gitingest/utils/__init__.py
================================================
"""Utility functions for the gitingest package."""


================================================
FILE: src/gitingest/utils/auth.py
================================================
"""Utilities for handling authentication."""

from __future__ import annotations

import os

from gitingest.utils.git_utils import validate_github_token


def resolve_token(token: str | None) -> str | None:
    """Resolve the token to use for the query.

    Parameters
    ----------
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str | None
        The resolved token.

    """
    token = token or os.getenv("GITHUB_TOKEN")
    if token:
        validate_github_token(token)
    return token


================================================
FILE: src/gitingest/utils/compat_func.py
================================================
"""Compatibility functions for Python 3.8."""

import os
from pathlib import Path


def readlink(path: Path) -> Path:
    """Read the target of a symlink.

    Compatible with Python 3.8.

    Parameters
    ----------
    path : Path
        Path to the symlink.

    Returns
    -------
    Path
        The target of the symlink.

    """
    return Path(os.readlink(path))


def removesuffix(s: str, suffix: str) -> str:
    """Remove a suffix from a string.

    Compatible with Python 3.8.

    Parameters
    ----------
    s : str
        String to remove suffix from.
    suffix : str
        Suffix to remove.

    Returns
    -------
    str
        String with suffix removed.

    """
    return s[: -len(suffix)] if s.endswith(suffix) else s


================================================
FILE: src/gitingest/utils/compat_typing.py
================================================
"""Compatibility layer for typing."""

try:
    from enum import StrEnum  # type: ignore[attr-defined]  # Py ≥ 3.11
except ImportError:
    from strenum import StrEnum  # type: ignore[import-untyped] # Py ≤ 3.10

try:
    from typing import ParamSpec, TypeAlias  # type: ignore[attr-defined]  # Py ≥ 3.10
except ImportError:
    from typing_extensions import ParamSpec, TypeAlias  # type: ignore[attr-defined]  # Py ≤ 3.9

try:
    from typing import Annotated  # type: ignore[attr-defined]  # Py ≥ 3.9
except ImportError:
    from typing_extensions import Annotated  # type: ignore[attr-defined]  # Py ≤ 3.8

__all__ = ["Annotated", "ParamSpec", "StrEnum", "TypeAlias"]


================================================
FILE: src/gitingest/utils/exceptions.py
================================================
"""Custom exceptions for the Gitingest package."""


class AsyncTimeoutError(Exception):
    """Exception raised when an async operation exceeds its timeout limit.

    This exception is used by the ``async_timeout`` decorator to signal that the wrapped
    asynchronous function has exceeded the specified time limit for execution.
    """


class InvalidNotebookError(Exception):
    """Exception raised when a Jupyter notebook is invalid or cannot be processed."""

    def __init__(self, message: str) -> None:
        super().__init__(message)


class InvalidGitHubTokenError(ValueError):
    """Exception raised when a GitHub Personal Access Token is malformed."""

    def __init__(self) -> None:
        msg = (
            "Invalid GitHub token format. To generate a token, go to "
            "https://github.com/settings/tokens/new?description=gitingest&scopes=repo."
        )
        super().__init__(msg)


================================================
FILE: src/gitingest/utils/file_utils.py
================================================
"""Utility functions for working with files and directories."""

from __future__ import annotations

import locale
import platform
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from pathlib import Path

try:
    locale.setlocale(locale.LC_ALL, "")
except locale.Error:
    locale.setlocale(locale.LC_ALL, "C")

_CHUNK_SIZE = 1024  # bytes


def _get_preferred_encodings() -> list[str]:
    """Get list of encodings to try, prioritized for the current platform.

    Returns
    -------
    list[str]
        List of encoding names to try in priority order, starting with the
        platform's default encoding followed by common fallback encodings.

    """
    encodings = [locale.getpreferredencoding(), "utf-8", "utf-16", "utf-16le", "utf-8-sig", "latin"]
    if platform.system() == "Windows":
        encodings += ["cp1252", "iso-8859-1"]
    return list(dict.fromkeys(encodings))


def _read_chunk(path: Path) -> bytes | None:
    """Attempt to read the first *size* bytes of *path* in binary mode.

    Parameters
    ----------
    path : Path
        The path to the file to read.

    Returns
    -------
    bytes | None
        The first ``_CHUNK_SIZE`` bytes of ``path``, or ``None`` on any ``OSError``.

    """
    try:
        with path.open("rb") as fp:
            return fp.read(_CHUNK_SIZE)
    except OSError:
        return None


def _decodes(chunk: bytes, encoding: str) -> bool:
    """Return ``True`` if ``chunk`` decodes cleanly with ``encoding``.

    Parameters
    ----------
    chunk : bytes
        The chunk of bytes to decode.
    encoding : str
        The encoding to use to decode the chunk.

    Returns
    -------
    bool
        ``True`` if the chunk decodes cleanly with the encoding, ``False`` otherwise.

    """
    try:
        chunk.decode(encoding)
    except UnicodeDecodeError:
        return False
    return True


================================================
FILE: src/gitingest/utils/git_utils.py
================================================
"""Utility functions for interacting with Git repositories."""

from __future__ import annotations

import asyncio
import base64
import re
import sys
from contextlib import contextmanager
from pathlib import Path
from typing import TYPE_CHECKING, Final, Generator, Iterable
from urllib.parse import urlparse, urlunparse

import git

from gitingest.utils.compat_func import removesuffix
from gitingest.utils.exceptions import InvalidGitHubTokenError
from gitingest.utils.logging_config import get_logger

if TYPE_CHECKING:
    from gitingest.schemas import CloneConfig

# Initialize logger for this module
logger = get_logger(__name__)

# GitHub Personal-Access tokens (classic + fine-grained).
#   - ghp_ / gho_ / ghu_ / ghs_ / ghr_  → 36 alphanumerics
#   - github_pat_                       → 22 alphanumerics + "_" + 59 alphanumerics
_GITHUB_PAT_PATTERN: Final[str] = r"^(?:gh[pousr]_[A-Za-z0-9]{36}|github_pat_[A-Za-z0-9]{22}_[A-Za-z0-9]{59})$"


def is_github_host(url: str) -> bool:
    """Check if a URL is from a GitHub host (github.com or GitHub Enterprise).

    Parameters
    ----------
    url : str
        The URL to check

    Returns
    -------
    bool
        True if the URL is from a GitHub host, False otherwise

    """
    hostname = urlparse(url).hostname or ""
    return hostname.startswith("github.")


async def run_command(*args: str) -> tuple[bytes, bytes]:
    """Execute a shell command asynchronously and return (stdout, stderr) bytes.

    This function is kept for backward compatibility with non-git commands.
    Git operations should use GitPython directly.

    Parameters
    ----------
    *args : str
        The command and its arguments to execute.

    Returns
    -------
    tuple[bytes, bytes]
        A tuple containing the stdout and stderr of the command.

    Raises
    ------
    RuntimeError
        If command exits with a non-zero status.

    """
    # Execute the requested command
    proc = await asyncio.create_subprocess_exec(
        *args,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    stdout, stderr = await proc.communicate()
    if proc.returncode != 0:
        msg = f"Command failed: {' '.join(args)}\nError: {stderr.decode().strip()}"
        raise RuntimeError(msg)

    return stdout, stderr


async def ensure_git_installed() -> None:
    """Ensure Git is installed and accessible on the system.

    On Windows, this also checks whether Git is configured to support long file paths.

    Raises
    ------
    RuntimeError
        If Git is not installed or not accessible.

    """
    try:
        # Use GitPython to check git availability
        git_cmd = git.Git()
        git_cmd.version()
    except git.GitCommandError as exc:
        msg = "Git is not installed or not accessible. Please install Git first."
        raise RuntimeError(msg) from exc
    except Exception as exc:
        msg = "Git is not installed or not accessible. Please install Git first."
        raise RuntimeError(msg) from exc

    if sys.platform == "win32":
        try:
            longpaths_value = git_cmd.config("core.longpaths")
            if longpaths_value.lower() != "true":
                logger.warning(
                    "Git clone may fail on Windows due to long file paths. "
                    "Consider enabling long path support with: 'git config --global core.longpaths true'. "
                    "Note: This command may require administrator privileges.",
                    extra={"platform": "windows", "longpaths_enabled": False},
                )
        except git.GitCommandError:
            # Ignore if checking 'core.longpaths' fails.
            pass


async def check_repo_exists(url: str, token: str | None = None) -> bool:
    """Check whether a remote Git repository is reachable.

    Parameters
    ----------
    url : str
        URL of the Git repository to check.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    bool
        ``True`` if the repository exists, ``False`` otherwise.

    """
    try:
        # Try to resolve HEAD - if repo exists, this will work
        await _resolve_ref_to_sha(url, "HEAD", token=token)
    except (ValueError, Exception):
        # Repository doesn't exist, is private without proper auth, or other error
        return False

    return True


def _parse_github_url(url: str) -> tuple[str, str, str]:
    """Parse a GitHub URL and return (hostname, owner, repo).

    Parameters
    ----------
    url : str
        The URL of the GitHub repository to parse.

    Returns
    -------
    tuple[str, str, str]
        A tuple containing the hostname, owner, and repository name.

    Raises
    ------
    ValueError
        If the URL is not a valid GitHub repository URL.

    """
    parsed = urlparse(url)
    if parsed.scheme not in {"http", "https"}:
        msg = f"URL must start with http:// or https://: {url!r}"
        raise ValueError(msg)

    if not parsed.hostname or not parsed.hostname.startswith("github."):
        msg = f"Un-recognised GitHub hostname: {parsed.hostname!r}"
        raise ValueError(msg)

    parts = removesuffix(parsed.path, ".git").strip("/").split("/")
    expected_path_length = 2
    if len(parts) != expected_path_length:
        msg = f"Path must look like /<owner>/<repo>: {parsed.path!r}"
        raise ValueError(msg)

    owner, repo = parts
    return parsed.hostname, owner, repo


async def fetch_remote_branches_or_tags(url: str, *, ref_type: str, token: str | None = None) -> list[str]:
    """Fetch the list of branches or tags from a remote Git repository.

    Parameters
    ----------
    url : str
        The URL of the Git repository to fetch branches or tags from.
    ref_type: str
        The type of reference to fetch. Can be "branches" or "tags".
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    list[str]
        A list of branch names available in the remote repository.

    Raises
    ------
    ValueError
        If the ``ref_type`` parameter is not "branches" or "tags".
    RuntimeError
        If fetching branches or tags from the remote repository fails.

    """
    if ref_type not in ("branches", "tags"):
        msg = f"Invalid fetch type: {ref_type}"
        raise ValueError(msg)

    await ensure_git_installed()

    # Use GitPython to get remote references
    try:
        fetch_tags = ref_type == "tags"
        to_fetch = "tags" if fetch_tags else "heads"

        # Build ls-remote command
        cmd_args = [f"--{to_fetch}"]
        if fetch_tags:
            cmd_args.append("--refs")  # Filter out peeled tag objects
        cmd_args.append(url)

        # Run the command with proper authentication
        with git_auth_context(url, token) as (git_cmd, auth_url):
            # Replace the URL in cmd_args with the authenticated URL
            cmd_args[-1] = auth_url  # URL is the last argument
            output = git_cmd.ls_remote(*cmd_args)

        # Parse output
        return [
            line.split(f"refs/{to_fetch}/", 1)[1]
            for line in output.splitlines()
            if line.strip() and f"refs/{to_fetch}/" in line
        ]
    except git.GitCommandError as exc:
        msg = f"Failed to fetch {ref_type} from {url}: {exc}"
        raise RuntimeError(msg) from exc


def create_git_repo(local_path: str, url: str, token: str | None = None) -> git.Repo:
    """Create a GitPython Repo object with authentication if needed.

    Parameters
    ----------
    local_path : str
        The local path where the git repository is located.
    url : str
        The repository URL to check if it's a GitHub repository.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    git.Repo
        A GitPython Repo object configured with authentication.

    Raises
    ------
    ValueError
        If the local path is not a valid git repository.

    """
    try:
        repo = git.Repo(local_path)

        # Configure authentication if needed
        if token and is_github_host(url):
            auth_header = create_git_auth_header(token, url=url)
            # Set the auth header in git config for this repo
            key, value = auth_header.split("=", 1)
            repo.git.config(key, value)

    except git.InvalidGitRepositoryError as exc:
        msg = f"Invalid git repository at {local_path}"
        raise ValueError(msg) from exc

    return repo


def create_git_auth_header(token: str, url: str = "https://github.com") -> str:
    """Create a Basic authentication header for GitHub git operations.

    Parameters
    ----------
    token : str
        GitHub personal access token (PAT) for accessing private repositories.
    url : str
        The GitHub URL to create the authentication header for.
        Defaults to "https://github.com" if not provided.

    Returns
    -------
    str
        The git config command for setting the authentication header.

    Raises
    ------
    ValueError
        If the URL is not a valid GitHub repository URL.

    """
    hostname = urlparse(url).hostname
    if not hostname:
        msg = f"Invalid GitHub URL: {url!r}"
        raise ValueError(msg)

    basic = base64.b64encode(f"x-oauth-basic:{token}".encode()).decode()
    return f"http.https://{hostname}/.extraheader=Authorization: Basic {basic}"


def create_authenticated_url(url: str, token: str | None = None) -> str:
    """Create an authenticated URL for Git operations.

    This is the safest approach for multi-user environments - no global state.

    Parameters
    ----------
    url : str
        The repository URL.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str
        The URL with authentication embedded (for GitHub) or original URL.

    """
    if not (token and is_github_host(url)):
        return url

    parsed = urlparse(url)
    # Add token as username in URL (GitHub supports this)
    netloc = f"x-oauth-basic:{token}@{parsed.hostname}"
    if parsed.port:
        netloc += f":{parsed.port}"

    return urlunparse(
        (
            parsed.scheme,
            netloc,
            parsed.path,
            parsed.params,
            parsed.query,
            parsed.fragment,
        ),
    )


@contextmanager
def git_auth_context(url: str, token: str | None = None) -> Generator[tuple[git.Git, str]]:
    """Context manager that provides Git command and authenticated URL.

    Returns both a Git command object and the authenticated URL to use.
    This avoids any global state contamination between users.

    Parameters
    ----------
    url : str
        The repository URL to check if authentication is needed.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Yields
    ------
    Generator[tuple[git.Git, str]]
        Tuple of (Git command object, authenticated URL to use).

    """
    git_cmd = git.Git()
    auth_url = create_authenticated_url(url, token)
    yield git_cmd, auth_url


def validate_github_token(token: str) -> None:
    """Validate the format of a GitHub Personal Access Token.

    Parameters
    ----------
    token : str
        GitHub personal access token (PAT) for accessing private repositories.

    Raises
    ------
    InvalidGitHubTokenError
        If the token format is invalid.

    """
    if not re.fullmatch(_GITHUB_PAT_PATTERN, token):
        raise InvalidGitHubTokenError


async def checkout_partial_clone(config: CloneConfig, token: str | None) -> None:
    """Configure sparse-checkout for a partially cloned repository.

    Parameters
    ----------
    config : CloneConfig
        The configuration for cloning the repository, including subpath and blob flag.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Raises
    ------
    RuntimeError
        If the sparse-checkout configuration fails.

    """
    subpath = config.subpath.lstrip("/")
    if config.blob:
        # Remove the file name from the subpath when ingesting from a file url (e.g. blob/branch/path/file.txt)
        subpath = str(Path(subpath).parent.as_posix())

    try:
        repo = create_git_repo(config.local_path, config.url, token)
        repo.git.sparse_checkout("set", subpath)
    except git.GitCommandError as exc:
        msg = f"Failed to configure sparse-checkout: {exc}"
        raise RuntimeError(msg) from exc


async def resolve_commit(config: CloneConfig, token: str | None) -> str:
    """Resolve the commit to use for the clone.

    Parameters
    ----------
    config : CloneConfig
        The configuration for cloning the repository.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str
        The commit SHA.

    """
    if config.commit:
        commit = config.commit
    elif config.tag:
        commit = await _resolve_ref_to_sha(config.url, pattern=f"refs/tags/{config.tag}*", token=token)
    elif config.branch:
        commit = await _resolve_ref_to_sha(config.url, pattern=f"refs/heads/{config.branch}", token=token)
    else:
        commit = await _resolve_ref_to_sha(config.url, pattern="HEAD", token=token)
    return commit


async def _resolve_ref_to_sha(url: str, pattern: str, token: str | None = None) -> str:
    """Return the commit SHA that <kind>/<ref> points to in <url>.

    * Branch → first line from ``git ls-remote``.
    * Tag    → if annotated, prefer the peeled ``^{}`` line (commit).

    Parameters
    ----------
    url : str
        The URL of the remote repository.
    pattern : str
        The pattern to use to resolve the commit SHA.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str
        The commit SHA.

    Raises
    ------
    ValueError
        If the ref does not exist in the remote repository.

    """
    try:
        # Execute ls-remote command with proper authentication
        with git_auth_context(url, token) as (git_cmd, auth_url):
            output = git_cmd.ls_remote(auth_url, pattern)
        lines = output.splitlines()

        sha = _pick_commit_sha(lines)
        if not sha:
            msg = f"{pattern!r} not found in {url}"
            raise ValueError(msg)

    except git.GitCommandError as exc:
        msg = f"Failed to resolve {pattern} in {url}:\n{exc}"
        raise ValueError(msg) from exc

    return sha


def _pick_commit_sha(lines: Iterable[str]) -> str | None:
    """Return a commit SHA from ``git ls-remote`` output.

    • Annotated tag            →  prefer the peeled line (<sha> refs/tags/x^{})
    • Branch / lightweight tag →  first non-peeled line


    Parameters
    ----------
    lines : Iterable[str]
        The lines of a ``git ls-remote`` output.

    Returns
    -------
    str | None
        The commit SHA, or ``None`` if no commit SHA is found.

    """
    first_non_peeled: str | None = None

    for ln in lines:
        if not ln.strip():
            continue

        sha, ref = ln.split(maxsplit=1)

        if ref.endswith("^{}"):  # peeled commit of annotated tag
            return sha  # ← best match, done

        if first_non_peeled is None:  # remember the first ordinary line
            first_non_peeled = sha

    return first_non_peeled  # branch or lightweight tag (or None)


================================================
FILE: src/gitingest/utils/ignore_patterns.py
================================================
"""Default ignore patterns for Gitingest."""

from __future__ import annotations

from pathlib import Path

DEFAULT_IGNORE_PATTERNS: set[str] = {
    # Python
    "*.pyc",
    "*.pyo",
    "*.pyd",
    "__pycache__",
    ".pytest_cache",
    ".coverage",
    ".tox",
    ".nox",
    ".mypy_cache",
    ".ruff_cache",
    ".hypothesis",
    "poetry.lock",
    "Pipfile.lock",
    # JavaScript/FileSystemNode
    "node_modules",
    "bower_components",
    "package-lock.json",
    "yarn.lock",
    ".npm",
    ".yarn",
    ".pnpm-store",
    "bun.lock",
    "bun.lockb",
    # Java
    "*.class",
    "*.jar",
    "*.war",
    "*.ear",
    "*.nar",
    ".gradle/",
    "build/",
    ".settings/",
    ".classpath",
    "gradle-app.setting",
    "*.gradle",
    # IDEs and editors / Java
    ".project",
    # C/C++
    "*.o",
    "*.obj",
    "*.dll",
    "*.dylib",
    "*.exe",
    "*.lib",
    "*.out",
    "*.a",
    "*.pdb",
    # Binary
    "*.bin",
    # Swift/Xcode
    ".build/",
    "*.xcodeproj/",
    "*.xcworkspace/",
    "*.pbxuser",
    "*.mode1v3",
    "*.mode2v3",
    "*.perspectivev3",
    "*.xcuserstate",
    "xcuserdata/",
    ".swiftpm/",
    # Ruby
    "*.gem",
    ".bundle/",
    "vendor/bundle",
    "Gemfile.lock",
    ".ruby-version",
    ".ruby-gemset",
    ".rvmrc",
    # Rust
    "Cargo.lock",
    "**/*.rs.bk",
    # Java / Rust
    "target/",
    # Go
    "pkg/",
    # .NET/C#
    "obj/",
    "*.suo",
    "*.user",
    "*.userosscache",
    "*.sln.docstates",
    "*.nupkg",
    # Go / .NET / C#
    "bin/",
    # Version control
    ".git",
    ".svn",
    ".hg",
    ".gitignore",
    ".gitattributes",
    ".gitmodules",
    # Images and media
    "*.svg",
    "*.png",
    "*.jpg",
    "*.jpeg",
    "*.gif",
    "*.ico",
    "*.pdf",
    "*.mov",
    "*.mp4",
    "*.mp3",
    "*.wav",
    # Virtual environments
    "venv",
    ".venv",
    "env",
    ".env",
    "virtualenv",
    # IDEs and editors
    ".idea",
    ".vscode",
    ".vs",
    "*.swo",
    "*.swn",
    ".settings",
    "*.sublime-*",
    # Temporary and cache files
    "*.log",
    "*.bak",
    "*.swp",
    "*.tmp",
    "*.temp",
    ".cache",
    ".sass-cache",
    ".eslintcache",
    ".DS_Store",
    "Thumbs.db",
    "desktop.ini",
    # Build directories and artifacts
    "build",
    "dist",
    "target",
    "out",
    "*.egg-info",
    "*.egg",
    "*.whl",
    "*.so",
    # Documentation
    "site-packages",
    ".docusaurus",
    ".next",
    ".nuxt",
    # Database
    "*.db",
    "*.sqlite",
    "*.sqlite3",
    # Other common patterns
    ## Minified files
    "*.min.js",
    "*.min.css",
    ## Source maps
    "*.map",
    ## Terraform
    "*.tfstate*",
    ## Dependencies in various languages
    "vendor/",
    # Gitingest
    "digest.txt",
}


def load_ignore_patterns(root: Path, filename: str) -> set[str]:
    """Load ignore patterns from ``filename`` found under ``root``.

    The loader walks the directory tree, looks for the supplied ``filename``,
    and returns a unified set of patterns. It implements the same parsing rules
    we use for ``.gitignore`` and ``.gitingestignore`` (git-wildmatch syntax with
    support for negation and root-relative paths).

    Parameters
    ----------
    root : Path
        Directory to walk.
    filename : str
        The filename to look for in each directory.

    Returns
    -------
    set[str]
        A set of ignore patterns extracted from the ``filename`` file found under the ``root`` directory.

    """
    patterns: set[str] = set()

    for ignore_file in root.rglob(filename):
        if ignore_file.is_file():
            patterns.update(_parse_ignore_file(ignore_file, root))
    return patterns


def _parse_ignore_file(ignore_file: Path, root: Path) -> set[str]:
    """Parse an ignore file and return a set of ignore patterns.

    Parameters
    ----------
    ignore_file : Path
        The path to the ignore file.
    root : Path
        The root directory of the repository.

    Returns
    -------
    set[str]
        A set of ignore patterns.

    """
    patterns: set[str] = set()

    # Path of the ignore file relative to the repository root
    rel_dir = ignore_file.parent.relative_to(root)
    base_dir = Path() if rel_dir == Path() else rel_dir

    with ignore_file.open(encoding="utf-8") as fh:
        for raw in fh:
            line = raw.strip()
            if not line or line.startswith("#"):  # comments / blank lines
                continue

            # Handle negation ("!foobar")
            negated = line.startswith("!")
            if negated:
                line = line[1:]

            # Handle leading slash ("/foobar")
            if line.startswith("/"):
                line = line.lstrip("/")

            pattern_body = (base_dir / line).as_posix()
            patterns.add(f"!{pattern_body}" if negated else pattern_body)

    return patterns


================================================
FILE: src/gitingest/utils/ingestion_utils.py
================================================
"""Utility functions for the ingestion process."""

from __future__ import annotations

from typing import TYPE_CHECKING

from pathspec import PathSpec

if TYPE_CHECKING:
    from pathlib import Path


def _should_include(path: Path, base_path: Path, include_patterns: set[str]) -> bool:
    """Return ``True`` if ``path`` matches any of ``include_patterns``.

    Parameters
    ----------
    path : Path
        The absolute path of the file or directory to check.

    base_path : Path
        The base directory from which the relative path is calculated.

    include_patterns : set[str]
        A set of patterns to check against the relative path.

    Returns
    -------
    bool
        ``True`` if the path matches any of the include patterns, ``False`` otherwise.

    """
    rel_path = _relative_or_none(path, base_path)
    if rel_path is None:  # outside repo → do *not* include
        return False
    if path.is_dir():  # keep directories so children are visited
        return True

    spec = PathSpec.from_lines("gitwildmatch", include_patterns)
    return spec.match_file(str(rel_path))


def _should_exclude(path: Path, base_path: Path, ignore_patterns: set[str]) -> bool:
    """Return ``True`` if ``path`` matches any of ``ignore_patterns``.

    Parameters
    ----------
    path : Path
        The absolute path of the file or directory to check.
    base_path : Path
        The base directory from which the relative path is calculated.
    ignore_patterns : set[str]
        A set of patterns to check against the relative path.

    Returns
    -------
    bool
        ``True`` if the path matches any of the ignore patterns, ``False`` otherwise.

    """
    rel_path = _relative_or_none(path, base_path)
    if rel_path is None:  # outside repo → already "excluded"
        return True

    spec = PathSpec.from_lines("gitwildmatch", ignore_patterns)
    return spec.match_file(str(rel_path))


def _relative_or_none(path: Path, base: Path) -> Path | None:
    """Return *path* relative to *base* or ``None`` if *path* is outside *base*.

    Parameters
    ----------
    path : Path
        The absolute path of the file or directory to check.
    base : Path
        The base directory from which the relative path is calculated.

    Returns
    -------
    Path | None
        The relative path of ``path`` to ``base``, or ``None`` if ``path`` is outside ``base``.

    """
    try:
        return path.relative_to(base)
    except ValueError:  # path is not a sub-path of base
        return None


================================================
FILE: src/gitingest/utils/logging_config.py
================================================
"""Logging configuration for gitingest using loguru.

This module provides structured JSON logging suitable for Kubernetes deployments
while also supporting human-readable logging for development.
"""

from __future__ import annotations

import json
import logging
import os
import sys
from typing import Any

from loguru import logger


def json_sink(message: Any) -> None:  # noqa: ANN401
    """Create JSON formatted log output.

    Parameters
    ----------
    message : Any
        The loguru message record

    """
    record = message.record

    log_entry = {
        "timestamp": record["time"].isoformat(),
        "level": record["level"].name.upper(),
        "logger": record["name"],
        "module": record["module"],
        "function": record["function"],
        "line": record["line"],
        "message": record["message"],
    }

    # Add exception info if present
    if record["exception"]:
        log_entry["exception"] = {
            "type": record["exception"].type.__name__,
            "value": str(record["exception"].value),
            "traceback": record["exception"].traceback,
        }

    # Add extra fields if present
    if record["extra"]:
        log_entry.update(record["extra"])

    sys.stdout.write(json.dumps(log_entry, ensure_ascii=False, separators=(",", ":")) + "\n")


def format_extra_fields(record: dict) -> str:
    """Format extra fields as JSON string.

    Parameters
    ----------
    record : dict
        The loguru record dictionary

    Returns
    -------
    str
        JSON formatted extra fields or empty string

    """
    if not record.get("extra"):
        return ""

    # Filter out loguru's internal extra fields
    filtered_extra = {k: v for k, v in record["extra"].items() if not k.startswith("_") and k not in ["name"]}

    # Handle nested extra structure - if there's an 'extra' key, use its contents
    if "extra" in filtered_extra and isinstance(filtered_extra["extra"], dict):
        filtered_extra = filtered_extra["extra"]

    if filtered_extra:
        extra_json = json.dumps(filtered_extra, ensure_ascii=False, separators=(",", ":"))
        return f" | {extra_json}"

    return ""


def extra_filter(record: dict) -> dict:
    """Filter function to add extra fields to the message.

    Parameters
    ----------
    record : dict
        The loguru record dictionary

    Returns
    -------
    dict
        Modified record with extra fields appended to message

    """
    extra_str = format_extra_fields(record)
    if extra_str:
        record["message"] = record["message"] + extra_str
    return record


class InterceptHandler(logging.Handler):
    """Intercept standard library logging and redirect to loguru."""

    def emit(self, record: logging.LogRecord) -> None:
        """Emit a record to loguru."""
        # Get corresponding loguru level
        try:
            level = logger.level(record.levelname).name
        except ValueError:
            level = record.levelno

        # Find caller from where originated the logged message
        frame, depth = logging.currentframe(), 2
        while frame.f_code.co_filename == logging.__file__:
            frame = frame.f_back
            depth += 1

        logger.opt(depth=depth, exception=record.exc_info).log(
            level,
            record.getMessage(),
        )


def configure_logging() -> None:
    """Configure loguru for the application.

    Sets up JSON logging for production/Kubernetes environments
    or human-readable logging for development.
    Intercepts all standard library logging including uvicorn.
    """
    # Remove default handler
    logger.remove()

    # Check if we're in Kubernetes or production environment
    is_k8s = os.getenv("KUBERNETES_SERVICE_HOST") is not None
    log_format = os.getenv("LOG_FORMAT", "json" if is_k8s else "human")
    log_level = os.getenv("LOG_LEVEL", "INFO")

    if log_format.lower() == "json":
        # JSON format for structured logging (Kubernetes/production)
        logger.add(
            json_sink,
            level=log_level,
            enqueue=True,  # Async logging for better performance
            diagnose=False,  # Don't include variable values in exceptions (security)
            backtrace=True,  # Include full traceback
            serialize=True,  # Ensure proper serialization
        )
    else:
        # Human-readable format for development
        logger_format = (
            "<green>{time:YYYY-MM-DD HH:mm:ss.SSS}</green> | "
            "<level>{level: <8}</level> | "
            "<cyan>{name}</cyan>:<cyan>{function}</cyan>:<cyan>{line}</cyan> | "
            "{message}"
        )
        logger.add(
            sys.stderr,
            format=logger_format,
            filter=extra_filter,
            level=log_level,
            enqueue=True,
            diagnose=True,  # Include variable values in development
            backtrace=True,
        )

    # Intercept all standard library logging
    logging.basicConfig(handlers=[InterceptHandler()], level=0, force=True)

    # Intercept specific loggers that might bypass basicConfig
    for name in logging.root.manager.loggerDict:  # pylint: disable=no-member
        logging.getLogger(name).handlers = []
        logging.getLogger(name).propagate = True


def get_logger(name: str | None = None) -> logger.__class__:
    """Get a configured logger instance.

    Parameters
    ----------
    name : str | None, optional
        Logger name, defaults to the calling module name

    Returns
    -------
    logger.__class__
        Configured logger instance

    """
    if name:
        return logger.bind(name=name)
    return logger


# Initialize logging when module is imported
configure_logging()


================================================
FILE: src/gitingest/utils/notebook.py
================================================
"""Utilities for processing Jupyter notebooks."""

from __future__ import annotations

import json
from itertools import chain
from typing import TYPE_CHECKING, Any

from gitingest.utils.exceptions import InvalidNotebookError
from gitingest.utils.logging_config import get_logger

if TYPE_CHECKING:
    from pathlib import Path

# Initialize logger for this module
logger = get_logger(__name__)


def process_notebook(file: Path, *, include_output: bool = True) -> str:
    """Process a Jupyter notebook file and return an executable Python script as a string.

    Parameters
    ----------
    file : Path
        The path to the Jupyter notebook file.
    include_output : bool
        Whether to include cell outputs in the generated script (default: ``True``).

    Returns
    -------
    str
        The executable Python script as a string.

    Raises
    ------
    InvalidNotebookError
        If the notebook file is invalid or cannot be processed.

    """
    try:
        with file.open(encoding="utf-8") as f:
            notebook: dict[str, Any] = json.load(f)
    except json.JSONDecodeError as exc:
        msg = f"Invalid JSON in notebook: {file}"
        raise InvalidNotebookError(msg) from exc

    # Check if the notebook contains worksheets
    worksheets = notebook.get("worksheets")
    if worksheets:
        logger.warning(
            "Worksheets are deprecated as of IPEP-17. Consider updating the notebook. "
            "(See: https://github.com/jupyter/nbformat and "
            "https://github.com/ipython/ipython/wiki/IPEP-17:-Notebook-Format-4#remove-multiple-worksheets "
            "for more information.)",
        )

        if len(worksheets) > 1:
            logger.warning(
                "Multiple worksheets detected. Combining all worksheets into a single script.",
            )

        cells = list(chain.from_iterable(ws["cells"] for ws in worksheets))

    else:
        cells = notebook["cells"]

    result = ["# Jupyter notebook converted to Python script."]

    for cell in cells:
        cell_str = _process_cell(cell, include_output=include_output)
        if cell_str:
            result.append(cell_str)

    return "\n\n".join(result) + "\n"


def _process_cell(cell: dict[str, Any], *, include_output: bool) -> str | None:
    """Process a Jupyter notebook cell and return the cell content as a string.

    Parameters
    ----------
    cell : dict[str, Any]
        The cell dictionary from a Jupyter notebook.
    include_output : bool
        Whether to include cell outputs in the generated script.

    Returns
    -------
    str | None
        The cell content as a string, or ``None`` if the cell is empty.

    Raises
    ------
    ValueError
        If an unexpected cell type is encountered.

    """
    cell_type = cell["cell_type"]

    # Validate cell type and handle unexpected types
    if cell_type not in ("markdown", "code", "raw"):
        msg = f"Unknown cell type: {cell_type}"
        raise ValueError(msg)

    cell_str = "".join(cell["source"])

    # Skip empty cells
    if not cell_str:
        return None

    # Convert Markdown and raw cells to multi-line comments
    if cell_type in ("markdown", "raw"):
        return f'"""\n{cell_str}\n"""'

    # Add cell output as comments
    outputs = cell.get("outputs")
    if include_output and outputs:
        # Include cell outputs as comments
        raw_lines: list[str] = []
        for output in outputs:
            raw_lines += _extract_output(output)

        cell_str += "\n# Output:\n#   " + "\n#   ".join(raw_lines)

    return cell_str


def _extract_output(output: dict[str, Any]) -> list[str]:
    """Extract the output from a Jupyter notebook cell.

    Parameters
    ----------
    output : dict[str, Any]
        The output dictionary from a Jupyter notebook cell.

    Returns
    -------
    list[str]
        The output as a list of strings.

    Raises
    ------
    ValueError
        If an unknown output type is encountered.

    """
    output_type = output["output_type"]

    if output_type == "stream":
        return output["text"]

    if output_type in ("execute_result", "display_data"):
        return output["data"]["text/plain"]

    if output_type == "error":
        return [f"Error: {output['ename']}: {output['evalue']}"]

    msg = f"Unknown output type: {output_type}"
    raise ValueError(msg)


================================================
FILE: src/gitingest/utils/os_utils.py
================================================
"""Utility functions for working with the operating system."""

from pathlib import Path


async def ensure_directory_exists_or_create(path: Path) -> None:
    """Ensure the directory exists, creating it if necessary.

    Parameters
    ----------
    path : Path
        The path to ensure exists.

    Raises
    ------
    OSError
        If the directory cannot be created.

    """
    try:
        path.mkdir(parents=True, exist_ok=True)
    except OSError as exc:
        msg = f"Failed to create directory {path}: {exc}"
        raise OSError(msg) from exc


================================================
FILE: src/gitingest/utils/pattern_utils.py
================================================
"""Pattern utilities for the Gitingest package."""

from __future__ import annotations

import re
from typing import Iterable

from gitingest.utils.ignore_patterns import DEFAULT_IGNORE_PATTERNS

_PATTERN_SPLIT_RE = re.compile(r"[,\s]+")


def process_patterns(
    exclude_patterns: str | set[str] | None = None,
    include_patterns: str | set[str] | None = None,
) -> tuple[set[str], set[str] | None]:
    """Process include and exclude patterns.

    Parameters
    ----------
    exclude_patterns : str | set[str] | None
        Exclude patterns to process.
    include_patterns : str | set[str] | None
        Include patterns to process.

    Returns
    -------
    tuple[set[str], set[str] | None]
        A tuple containing the processed ignore patterns and include patterns.

    """
    # Combine default ignore patterns + custom patterns
    ignore_patterns_set = DEFAULT_IGNORE_PATTERNS.copy()
    if exclude_patterns:
        ignore_patterns_set.update(_parse_patterns(exclude_patterns))

    # Process include patterns and override ignore patterns accordingly
    if include_patterns:
        parsed_include = _parse_patterns(include_patterns)
        # Override ignore patterns with include patterns
        ignore_patterns_set = set(ignore_patterns_set) - set(parsed_include)
    else:
        parsed_include = None

    return ignore_patterns_set, parsed_include


def _parse_patterns(patterns: str | Iterable[str]) -> set[str]:
    """Normalize a collection of file or directory patterns.

    Parameters
    ----------
    patterns : str | Iterable[str]
        One pattern string or an iterable of pattern strings. Each pattern may contain multiple comma- or
        whitespace-separated sub-patterns, e.g. "src/*, tests *.md".

    Returns
    -------
    set[str]
        Normalized patterns with Windows back-slashes converted to forward-slashes and duplicates removed.

    """
    # Treat a lone string as the iterable [string]
    if isinstance(patterns, str):
        patterns = [patterns]

    # Flatten, split on commas/whitespace, strip empties, normalise slashes
    return {
        part.replace("\\", "/")
        for pat in patterns
        for part in _PATTERN_SPLIT_RE.split(pat.strip())
        if part  # discard empty tokens
    }


================================================
FILE: src/gitingest/utils/query_parser_utils.py
================================================
"""Utility functions for parsing and validating query parameters."""

from __future__ import annotations

import string
from typing import TYPE_CHECKING, cast
from urllib.parse import ParseResult, unquote, urlparse

from gitingest.utils.compat_typing import StrEnum
from gitingest.utils.git_utils import _resolve_ref_to_sha, check_repo_exists
from gitingest.utils.logging_config import get_logger

if TYPE_CHECKING:
    from gitingest.schemas import IngestionQuery

# Initialize logger for this module
logger = get_logger(__name__)

HEX_DIGITS: set[str] = set(string.hexdigits)

KNOWN_GIT_HOSTS: list[str] = [
    "github.com",
    "gitlab.com",
    "bitbucket.org",
    "gitea.com",
    "codeberg.org",
    "gist.github.com",
]


class PathKind(StrEnum):
    """Path kind enum."""

    TREE = "tree"
    BLOB = "blob"
    ISSUES = "issues"
    PULL = "pull"


async def _fallback_to_root(query: IngestionQuery, token: str | None, warn_msg: str | None = None) -> IngestionQuery:
    """Fallback to the root of the repository if no extra path parts are provided.

    Parameters
    ----------
    query : IngestionQuery
        The query to fallback to the root of the repository.
    token : str | None
        The token to use to access the repository.
    warn_msg : str | None
        The message to warn.

    Returns
    -------
    IngestionQuery
        The query with the fallback to the root of the repository.

    """
    url = cast("str", query.url)
    query.commit = await _resolve_ref_to_sha(url, pattern="HEAD", token=token)
    if warn_msg:
        logger.warning(warn_msg)
    return query


async def _normalise_source(raw: str, token: str | None) -> ParseResult:
    """Return a fully-qualified ParseResult or raise.

    Parameters
    ----------
    raw : str
        The raw URL to parse.
    token : str | None
        The token to use to access the repository.

    Returns
    -------
    ParseResult
        The parsed URL.

    """
    raw = unquote(raw)
    parsed = urlparse(raw)

    if parsed.scheme:
        _validate_url_scheme(parsed.scheme)
        _validate_host(parsed.netloc)
        return parsed

    # no scheme ('host/user/repo' or 'user/repo')
    host = raw.split("/", 1)[0].lower()
    if "." in host:
        _validate_host(host)
        return urlparse(f"https://{raw}")

    # "user/repo" slug
    host = await _try_domains_for_user_and_repo(*_get_user_and_repo_from_path(raw), token=token)

    return urlparse(f"https://{host}/{raw}")


async def _try_domains_for_user_and_repo(user_name: str, repo_name: str, token: str | None = None) -> str:
    """Attempt to find a valid repository host for the given ``user_name`` and ``repo_name``.

    Parameters
    ----------
    user_name : str
        The username or owner of the repository.
    repo_name : str
        The name of the repository.
    token : str | None
        GitHub personal access token (PAT) for accessing private repositories.

    Returns
    -------
    str
        The domain of the valid repository host.

    Raises
    ------
    ValueError
        If no valid repository host is found for the given ``user_name`` and ``repo_name``.

    """
    for domain in KNOWN_GIT_HOSTS:
        candidate = f"https://{domain}/{user_name}/{repo_name}"
        if await check_repo_exists(candidate, token=token if domain.startswith("github.") else None):
            return domain

    msg = f"Could not find a valid repository host for '{user_name}/{repo_name}'."
    raise ValueError(msg)


def _is_valid_git_commit_hash(commit: str) -> bool:
    """Validate if the provided string is a valid Git commit hash.

    This function checks if the commit hash is a 40-character string consisting only
    of hexadecimal digits, which is the standard format for Git commit hashes.

    Parameters
    ----------
    commit : str
        The string to validate as a Git commit hash.

    Returns
    -------
    bool
        ``True`` if the string is a valid 40-character Git commit hash, otherwise ``False``.

    """
    sha_hex_length = 40
    return len(commit) == sha_hex_length and all(c in HEX_DIGITS for c in commit)


def _validate_host(host: str) -> None:
    """Validate a hostname.

    The host is accepted if it is either present in the hard-coded ``KNOWN_GIT_HOSTS`` list or if it satisfies the
    simple heuristics in ``_looks_like_git_host``, which try to recognise common self-hosted Git services (e.g. GitLab
    instances on sub-domains such as 'gitlab.example.com' or 'git.example.com').

    Parameters
    ----------
    host : str
        Hostname (case-insensitive).

    Raises
    ------
    ValueError
        If the host cannot be recognised as a probable Git hosting domain.

    """
    host = host.lower()
    if host not in KNOWN_GIT_HOSTS and not _looks_like_git_host(host):
        msg = f"Unknown domain '{host}' in URL"
        raise ValueError(msg)


def _looks_like_git_host(host: str) -> bool:
    """Check if the given host looks like a Git host.

    The current heuristic returns ``True`` when the host starts with ``git.`` (e.g. 'git.example.com'), starts with
   
Download .txt
gitextract_380b_654/

├── .docker/
│   └── minio/
│       └── setup.sh
├── .dockerignore
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   └── workflows/
│       ├── ci.yml
│       ├── codeql.yml
│       ├── dependency-review.yml
│       ├── deploy-pr.yml
│       ├── docker-build.ecr.yml
│       ├── docker-build.ghcr.yml
│       ├── pr-title-check.yml
│       ├── publish_to_pypi.yml
│       ├── rebase-needed.yml
│       ├── release-please.yml
│       ├── scorecard.yml
│       └── stale.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .release-please-manifest.json
├── .vscode/
│   └── launch.json
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── SECURITY.md
├── compose.yml
├── eslint.config.cjs
├── pyproject.toml
├── release-please-config.json
├── renovate.json
├── requirements-dev.txt
├── requirements.txt
├── src/
│   ├── gitingest/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── clone.py
│   │   ├── config.py
│   │   ├── entrypoint.py
│   │   ├── ingestion.py
│   │   ├── output_formatter.py
│   │   ├── query_parser.py
│   │   ├── schemas/
│   │   │   ├── __init__.py
│   │   │   ├── cloning.py
│   │   │   ├── filesystem.py
│   │   │   └── ingestion.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── auth.py
│   │       ├── compat_func.py
│   │       ├── compat_typing.py
│   │       ├── exceptions.py
│   │       ├── file_utils.py
│   │       ├── git_utils.py
│   │       ├── ignore_patterns.py
│   │       ├── ingestion_utils.py
│   │       ├── logging_config.py
│   │       ├── notebook.py
│   │       ├── os_utils.py
│   │       ├── pattern_utils.py
│   │       ├── query_parser_utils.py
│   │       └── timeout_wrapper.py
│   ├── server/
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── form_types.py
│   │   ├── main.py
│   │   ├── metrics_server.py
│   │   ├── models.py
│   │   ├── query_processor.py
│   │   ├── routers/
│   │   │   ├── __init__.py
│   │   │   ├── dynamic.py
│   │   │   ├── index.py
│   │   │   └── ingest.py
│   │   ├── routers_utils.py
│   │   ├── s3_utils.py
│   │   ├── server_config.py
│   │   ├── server_utils.py
│   │   └── templates/
│   │       ├── base.jinja
│   │       ├── components/
│   │       │   ├── _macros.jinja
│   │       │   ├── footer.jinja
│   │       │   ├── git_form.jinja
│   │       │   ├── navbar.jinja
│   │       │   ├── result.jinja
│   │       │   └── tailwind_components.html
│   │       ├── git.jinja
│   │       ├── index.jinja
│   │       └── swagger_ui.jinja
│   └── static/
│       ├── js/
│       │   ├── git.js
│       │   ├── git_form.js
│       │   ├── index.js
│       │   ├── navbar.js
│       │   ├── posthog.js
│       │   └── utils.js
│       ├── llms.txt
│       └── robots.txt
└── tests/
    ├── .pylintrc
    ├── __init__.py
    ├── conftest.py
    ├── query_parser/
    │   ├── __init__.py
    │   ├── test_git_host_agnostic.py
    │   └── test_query_parser.py
    ├── server/
    │   ├── __init__.py
    │   └── test_flow_integration.py
    ├── test_cli.py
    ├── test_clone.py
    ├── test_git_utils.py
    ├── test_gitignore_feature.py
    ├── test_ingestion.py
    ├── test_notebook_utils.py
    ├── test_pattern_utils.py
    └── test_summary.py
Download .txt
SYMBOL INDEX (238 symbols across 51 files)

FILE: src/gitingest/__main__.py
  class _CLIArgs (line 22) | class _CLIArgs(TypedDict):
  function main (line 79) | def main(**cli_kwargs: Unpack[_CLIArgs]) -> None:
  function _async_main (line 117) | async def _async_main(

FILE: src/gitingest/clone.py
  function clone_repo (line 32) | async def clone_repo(config: CloneConfig, *, token: str | None = None) -...
  function _perform_post_clone_operations (line 130) | async def _perform_post_clone_operations(

FILE: src/gitingest/entrypoint.py
  function ingest_async (line 35) | async def ingest_async(
  function ingest (line 151) | def ingest(
  function _override_branch_and_tag (line 225) | def _override_branch_and_tag(query: IngestionQuery, branch: str | None, ...
  function _apply_gitignores (line 262) | def _apply_gitignores(query: IngestionQuery) -> None:
  function _clone_repo_if_remote (line 276) | async def _clone_repo_if_remote(query: IngestionQuery, *, token: str | N...
  function _handle_remove_readonly (line 307) | def _handle_remove_readonly(
  function _write_output (line 333) | async def _write_output(tree: str, content: str, target: str | None) -> ...

FILE: src/gitingest/ingestion.py
  function ingest_query (line 21) | def ingest_query(query: IngestionQuery) -> tuple[str, str, str]:
  function _process_node (line 123) | def _process_node(node: FileSystemNode, query: IngestionQuery, stats: Fi...
  function _process_symlink (line 187) | def _process_symlink(path: Path, parent_node: FileSystemNode, stats: Fil...
  function _process_file (line 216) | def _process_file(path: Path, parent_node: FileSystemNode, stats: FileSy...
  function limit_exceeded (line 276) | def limit_exceeded(stats: FileSystemStats, depth: int) -> bool:

FILE: src/gitingest/output_formatter.py
  function format_node (line 27) | def format_node(node: FileSystemNode, query: IngestionQuery) -> tuple[st...
  function _create_summary_prefix (line 65) | def _create_summary_prefix(query: IngestionQuery, *, single_file: bool =...
  function _gather_file_contents (line 105) | def _gather_file_contents(node: FileSystemNode) -> str:
  function _create_tree_structure (line 129) | def _create_tree_structure(
  function _format_token_count (line 181) | def _format_token_count(text: str) -> str | None:

FILE: src/gitingest/query_parser.py
  function parse_remote_repo (line 25) | async def parse_remote_repo(source: str, token: str | None = None) -> In...
  function parse_local_dir_path (line 122) | def parse_local_dir_path(path_str: str) -> IngestionQuery:
  function _configure_branch_or_tag (line 141) | async def _configure_branch_or_tag(

FILE: src/gitingest/schemas/cloning.py
  class CloneConfig (line 8) | class CloneConfig(BaseModel):  # pylint: disable=too-many-instance-attri...

FILE: src/gitingest/schemas/filesystem.py
  class FileSystemNodeType (line 20) | class FileSystemNodeType(Enum):
  class FileSystemStats (line 29) | class FileSystemStats:
  class FileSystemNode (line 37) | class FileSystemNode:  # pylint: disable=too-many-instance-attributes
    method sort_children (line 53) | def sort_children(self) -> None:
    method content_string (line 87) | def content_string(self) -> str:
    method content (line 107) | def content(self) -> str:  # pylint: disable=too-many-return-statements

FILE: src/gitingest/schemas/ingestion.py
  class IngestionQuery (line 14) | class IngestionQuery(BaseModel):  # pylint: disable=too-many-instance-at...
    method extract_clone_config (line 74) | def extract_clone_config(self) -> CloneConfig:

FILE: src/gitingest/utils/auth.py
  function resolve_token (line 10) | def resolve_token(token: str | None) -> str | None:

FILE: src/gitingest/utils/compat_func.py
  function readlink (line 7) | def readlink(path: Path) -> Path:
  function removesuffix (line 26) | def removesuffix(s: str, suffix: str) -> str:

FILE: src/gitingest/utils/exceptions.py
  class AsyncTimeoutError (line 4) | class AsyncTimeoutError(Exception):
  class InvalidNotebookError (line 12) | class InvalidNotebookError(Exception):
    method __init__ (line 15) | def __init__(self, message: str) -> None:
  class InvalidGitHubTokenError (line 19) | class InvalidGitHubTokenError(ValueError):
    method __init__ (line 22) | def __init__(self) -> None:

FILE: src/gitingest/utils/file_utils.py
  function _get_preferred_encodings (line 20) | def _get_preferred_encodings() -> list[str]:
  function _read_chunk (line 36) | def _read_chunk(path: Path) -> bytes | None:
  function _decodes (line 57) | def _decodes(chunk: bytes, encoding: str) -> bool:

FILE: src/gitingest/utils/git_utils.py
  function is_github_host (line 32) | def is_github_host(url: str) -> bool:
  function run_command (line 50) | async def run_command(*args: str) -> tuple[bytes, bytes]:
  function ensure_git_installed (line 86) | async def ensure_git_installed() -> None:
  function check_repo_exists (line 123) | async def check_repo_exists(url: str, token: str | None = None) -> bool:
  function _parse_github_url (line 149) | def _parse_github_url(url: str) -> tuple[str, str, str]:
  function fetch_remote_branches_or_tags (line 187) | async def fetch_remote_branches_or_tags(url: str, *, ref_type: str, toke...
  function create_git_repo (line 246) | def create_git_repo(local_path: str, url: str, token: str | None = None)...
  function create_git_auth_header (line 286) | def create_git_auth_header(token: str, url: str = "https://github.com") ...
  function create_authenticated_url (line 317) | def create_authenticated_url(url: str, token: str | None = None) -> str:
  function git_auth_context (line 357) | def git_auth_context(url: str, token: str | None = None) -> Generator[tu...
  function validate_github_token (line 381) | def validate_github_token(token: str) -> None:
  function checkout_partial_clone (line 399) | async def checkout_partial_clone(config: CloneConfig, token: str | None)...
  function resolve_commit (line 428) | async def resolve_commit(config: CloneConfig, token: str | None) -> str:
  function _resolve_ref_to_sha (line 455) | async def _resolve_ref_to_sha(url: str, pattern: str, token: str | None ...
  function _pick_commit_sha (line 499) | def _pick_commit_sha(lines: Iterable[str]) -> str | None:

FILE: src/gitingest/utils/ignore_patterns.py
  function load_ignore_patterns (line 171) | def load_ignore_patterns(root: Path, filename: str) -> set[str]:
  function _parse_ignore_file (line 200) | def _parse_ignore_file(ignore_file: Path, root: Path) -> set[str]:

FILE: src/gitingest/utils/ingestion_utils.py
  function _should_include (line 13) | def _should_include(path: Path, base_path: Path, include_patterns: set[s...
  function _should_exclude (line 43) | def _should_exclude(path: Path, base_path: Path, ignore_patterns: set[st...
  function _relative_or_none (line 69) | def _relative_or_none(path: Path, base: Path) -> Path | None:

FILE: src/gitingest/utils/logging_config.py
  function json_sink (line 18) | def json_sink(message: Any) -> None:  # noqa: ANN401
  function format_extra_fields (line 54) | def format_extra_fields(record: dict) -> str:
  function extra_filter (line 85) | def extra_filter(record: dict) -> dict:
  class InterceptHandler (line 105) | class InterceptHandler(logging.Handler):
    method emit (line 108) | def emit(self, record: logging.LogRecord) -> None:
  function configure_logging (line 128) | def configure_logging() -> None:
  function get_logger (line 180) | def get_logger(name: str | None = None) -> logger.__class__:

FILE: src/gitingest/utils/notebook.py
  function process_notebook (line 19) | def process_notebook(file: Path, *, include_output: bool = True) -> str:
  function _process_cell (line 77) | def _process_cell(cell: dict[str, Any], *, include_output: bool) -> str ...
  function _extract_output (line 128) | def _extract_output(output: dict[str, Any]) -> list[str]:

FILE: src/gitingest/utils/os_utils.py
  function ensure_directory_exists_or_create (line 6) | async def ensure_directory_exists_or_create(path: Path) -> None:

FILE: src/gitingest/utils/pattern_utils.py
  function process_patterns (line 13) | def process_patterns(
  function _parse_patterns (line 48) | def _parse_patterns(patterns: str | Iterable[str]) -> set[str]:

FILE: src/gitingest/utils/query_parser_utils.py
  class PathKind (line 31) | class PathKind(StrEnum):
  function _fallback_to_root (line 40) | async def _fallback_to_root(query: IngestionQuery, token: str | None, wa...
  function _normalise_source (line 65) | async def _normalise_source(raw: str, token: str | None) -> ParseResult:
  function _try_domains_for_user_and_repo (line 101) | async def _try_domains_for_user_and_repo(user_name: str, repo_name: str,...
  function _is_valid_git_commit_hash (line 133) | def _is_valid_git_commit_hash(commit: str) -> bool:
  function _validate_host (line 154) | def _validate_host(host: str) -> None:
  function _looks_like_git_host (line 178) | def _looks_like_git_host(host: str) -> bool:
  function _validate_url_scheme (line 199) | def _validate_url_scheme(scheme: str) -> None:
  function _get_user_and_repo_from_path (line 219) | def _get_user_and_repo_from_path(path: str) -> tuple[str, str]:

FILE: src/gitingest/utils/timeout_wrapper.py
  function async_timeout (line 14) | def async_timeout(seconds: int) -> Callable[[Callable[P, Awaitable[T]]],...

FILE: src/server/main.py
  function health_check (line 96) | async def health_check() -> dict[str, str]:
  function head_root (line 108) | async def head_root() -> HTMLResponse:
  function robots (line 124) | async def robots() -> FileResponse:
  function llm_txt (line 140) | async def llm_txt() -> FileResponse:
  function custom_swagger_ui (line 156) | async def custom_swagger_ui(request: Request) -> HTMLResponse:
  function openapi_json_get (line 178) | def openapi_json_get() -> JSONResponse:
  function openapi_json (line 195) | def openapi_json() -> JSONResponse:

FILE: src/server/metrics_server.py
  function metrics (line 23) | async def metrics() -> HTMLResponse:
  function start_metrics_server (line 41) | def start_metrics_server(host: str = "127.0.0.1", port: int = 9090) -> N...

FILE: src/server/models.py
  class PatternType (line 18) | class PatternType(str, Enum):
  class IngestRequest (line 25) | class IngestRequest(BaseModel):
    method validate_input_text (line 51) | def validate_input_text(cls, v: str) -> str:
    method validate_pattern (line 60) | def validate_pattern(cls, v: str) -> str:
  class IngestSuccessResponse (line 65) | class IngestSuccessResponse(BaseModel):
  class IngestErrorResponse (line 102) | class IngestErrorResponse(BaseModel):
  class S3Metadata (line 119) | class S3Metadata(BaseModel):
  class QueryForm (line 138) | class QueryForm(BaseModel):
    method as_form (line 163) | def as_form(

FILE: src/server/query_processor.py
  function _cleanup_repository (line 35) | def _cleanup_repository(clone_config: CloneConfig) -> None:
  function _check_s3_cache (line 46) | async def _check_s3_cache(
  function _store_digest_content (line 139) | def _store_digest_content(
  function _generate_digest_url (line 200) | def _generate_digest_url(query: IngestionQuery) -> str:
  function process_query (line 229) | async def process_query(
  function _print_query (line 345) | def _print_query(url: str, max_file_size: int, pattern_type: str, patter...
  function _print_error (line 373) | def _print_error(url: str, exc: Exception, max_file_size: int, pattern_t...
  function _print_success (line 402) | def _print_success(url: str, max_file_size: int, pattern_type: str, patt...

FILE: src/server/routers/dynamic.py
  function catch_all (line 12) | async def catch_all(request: Request, full_path: str) -> HTMLResponse:

FILE: src/server/routers/index.py
  function home (line 12) | async def home(request: Request) -> HTMLResponse:

FILE: src/server/routers/ingest.py
  function api_ingest (line 24) | async def api_ingest(
  function api_ingest_get (line 57) | async def api_ingest_get(
  function download_ingest (line 98) | async def download_ingest(

FILE: src/server/routers_utils.py
  function _perform_ingestion (line 20) | async def _perform_ingestion(

FILE: src/server/s3_utils.py
  class S3UploadError (line 30) | class S3UploadError(Exception):
  function is_s3_enabled (line 34) | def is_s3_enabled() -> bool:
  function get_s3_config (line 39) | def get_s3_config() -> dict[str, str | None]:
  function get_s3_bucket_name (line 50) | def get_s3_bucket_name() -> str:
  function get_s3_alias_host (line 55) | def get_s3_alias_host() -> str | None:
  function generate_s3_file_path (line 60) | def generate_s3_file_path(
  function create_s3_client (line 133) | def create_s3_client() -> BaseClient:
  function upload_to_s3 (line 149) | def upload_to_s3(content: str, s3_file_path: str, ingest_id: UUID) -> str:
  function upload_metadata_to_s3 (line 246) | def upload_metadata_to_s3(metadata: S3Metadata, s3_file_path: str, inges...
  function get_metadata_from_s3 (line 345) | def get_metadata_from_s3(s3_file_path: str) -> S3Metadata | None:
  function _build_s3_url (line 389) | def _build_s3_url(key: str) -> str:
  function _check_object_tags (line 405) | def _check_object_tags(s3_client: BaseClient, bucket_name: str, key: str...
  function check_s3_object_exists (line 415) | def check_s3_object_exists(s3_file_path: str) -> bool:
  function get_s3_url_for_ingest_id (line 486) | def get_s3_url_for_ingest_id(ingest_id: UUID) -> str | None:

FILE: src/server/server_config.py
  function get_version_info (line 31) | def get_version_info() -> dict[str, str]:

FILE: src/server/server_utils.py
  function rate_limit_exception_handler (line 18) | async def rate_limit_exception_handler(request: Request, exc: Exception)...
  class Colors (line 47) | class Colors:

FILE: src/static/js/git.js
  function waitForStars (line 1) | function waitForStars() {

FILE: src/static/js/git_form.js
  function changePattern (line 2) | function changePattern() {
  function toggleAccessSettings (line 24) | function toggleAccessSettings() {

FILE: src/static/js/index.js
  function submitExample (line 1) | function submitExample(repoName) {

FILE: src/static/js/navbar.js
  function formatStarCount (line 2) | function formatStarCount(count) {
  function fetchGitHubStars (line 8) | async function fetchGitHubStars() {

FILE: src/static/js/posthog.js
  function g (line 9) | function g(t, e) {

FILE: src/static/js/utils.js
  function getFileName (line 1) | function getFileName(element) {
  function toggleFile (line 30) | function toggleFile(element) {
  function copyText (line 58) | function copyText(className) {
  function showLoading (line 105) | function showLoading() {
  function showResults (line 110) | function showResults() {
  function showError (line 115) | function showError(msg) {
  function collectFormData (line 125) | function collectFormData(form) {
  function setButtonLoadingState (line 143) | function setButtonLoadingState(submitButton, isLoading) {
  function handleSuccessfulResponse (line 171) | function handleSuccessfulResponse(data) {
  function handleSubmit (line 203) | function handleSubmit(event, showLoadingSpinner = false) {
  function copyFullDigest (line 277) | function copyFullDigest() {
  function downloadFullDigest (line 301) | function downloadFullDigest() {
  function logSliderToSize (line 345) | function logSliderToSize(position) {
  function initializeSlider (line 355) | function initializeSlider() {
  function formatSize (line 378) | function formatSize(sizeInKB) {
  function setupGlobalEnterHandler (line 387) | function setupGlobalEnterHandler() {

FILE: tests/conftest.py
  function get_ensure_git_installed_call_count (line 30) | def get_ensure_git_installed_call_count() -> int:
  function sample_query (line 50) | def sample_query() -> IngestionQuery:
  function temp_directory (line 74) | def temp_directory(tmp_path: Path) -> Path:
  function write_notebook (line 136) | def write_notebook(tmp_path: Path) -> WriteNotebookFunc:
  function stub_resolve_sha (line 162) | def stub_resolve_sha(mocker: MockerFixture) -> dict[str, AsyncMock]:
  function stub_branches (line 182) | def stub_branches(mocker: MockerFixture) -> Callable[[list[str]], None]:
  function repo_exists_true (line 205) | def repo_exists_true(mocker: MockerFixture) -> AsyncMock:
  function run_command_mock (line 211) | def run_command_mock(mocker: MockerFixture) -> AsyncMock:
  function gitpython_mocks (line 227) | def gitpython_mocks(mocker: MockerFixture) -> dict[str, MagicMock]:
  function _setup_gitpython_mocks (line 232) | def _setup_gitpython_mocks(mocker: MockerFixture) -> dict[str, MagicMock]:
  function _fake_run_command (line 275) | async def _fake_run_command(*args: str) -> tuple[bytes, bytes]:

FILE: tests/query_parser/test_git_host_agnostic.py
  function test_parse_query_without_host (line 31) | async def test_parse_query_without_host(

FILE: tests/query_parser/test_query_parser.py
  function test_parse_url_valid_https (line 42) | async def test_parse_url_valid_https(url: str, stub_resolve_sha: dict[st...
  function test_parse_url_valid_http (line 51) | async def test_parse_url_valid_http(url: str, stub_resolve_sha: dict[str...
  function test_parse_url_invalid (line 57) | async def test_parse_url_invalid(stub_resolve_sha: dict[str, AsyncMock])...
  function test_parse_query_basic (line 74) | async def test_parse_query_basic(url: str, stub_resolve_sha: dict[str, A...
  function test_parse_query_mixed_case (line 90) | async def test_parse_query_mixed_case(stub_resolve_sha: dict[str, AsyncM...
  function test_parse_url_with_subpaths (line 106) | async def test_parse_url_with_subpaths(
  function test_parse_url_invalid_repo_structure (line 129) | async def test_parse_url_invalid_repo_structure(stub_resolve_sha: dict[s...
  function test_parse_local_dir_path_local_path (line 144) | async def test_parse_local_dir_path_local_path() -> None:
  function test_parse_local_dir_path_relative_path (line 160) | async def test_parse_local_dir_path_relative_path() -> None:
  function test_parse_remote_repo_empty_source (line 176) | async def test_parse_remote_repo_empty_source(stub_resolve_sha: dict[str...
  function test_parse_url_branch_and_commit_distinction (line 199) | async def test_parse_url_branch_and_commit_distinction(
  function test_parse_local_dir_path_uuid_uniqueness (line 222) | async def test_parse_local_dir_path_uuid_uniqueness() -> None:
  function test_parse_url_with_query_and_fragment (line 237) | async def test_parse_url_with_query_and_fragment(stub_resolve_sha: dict[...
  function test_parse_url_unsupported_host (line 254) | async def test_parse_url_unsupported_host(stub_resolve_sha: dict[str, As...
  function test_parse_query_with_branch (line 270) | async def test_parse_query_with_branch() -> None:
  function test_parse_repo_source_with_various_url_patterns (line 304) | async def test_parse_repo_source_with_various_url_patterns(
  function _assert_basic_repo_fields (line 330) | async def _assert_basic_repo_fields(url: str, sha_mock: AsyncMock) -> In...

FILE: tests/server/test_flow_integration.py
  function test_client (line 21) | def test_client() -> Generator[TestClient, None, None]:
  function mock_static_files (line 29) | def mock_static_files(mocker: MockerFixture) -> None:
  function cleanup_tmp_dir (line 37) | def cleanup_tmp_dir() -> Generator[None, None, None]:
  function test_remote_repository_analysis (line 49) | async def test_remote_repository_analysis(request: pytest.FixtureRequest...
  function test_invalid_repository_url (line 74) | async def test_invalid_repository_url(request: pytest.FixtureRequest) ->...
  function test_large_repository (line 95) | async def test_large_repository(request: pytest.FixtureRequest) -> None:
  function test_concurrent_requests (line 119) | async def test_concurrent_requests(request: pytest.FixtureRequest) -> None:
  function test_large_file_handling (line 148) | async def test_large_file_handling(request: pytest.FixtureRequest) -> None:
  function test_repository_with_patterns (line 171) | async def test_repository_with_patterns(request: pytest.FixtureRequest) ...

FILE: tests/test_cli.py
  function test_cli_writes_file (line 37) | def test_cli_writes_file(
  function test_cli_with_stdout_output (line 62) | def test_cli_with_stdout_output() -> None:
  function _invoke_isolated_cli_runner (line 93) | def _invoke_isolated_cli_runner(args: list[str]) -> Result:

FILE: tests/test_clone.py
  function test_clone_with_commit (line 34) | async def test_clone_with_commit(repo_exists_true: AsyncMock, gitpython_...
  function test_clone_nonexistent_repository (line 70) | async def test_clone_nonexistent_repository(repo_exists_true: AsyncMock)...
  function test_check_repo_exists (line 100) | async def test_check_repo_exists(
  function test_clone_without_commit (line 121) | async def test_clone_without_commit(repo_exists_true: AsyncMock, gitpyth...
  function test_clone_creates_parent_directory (line 149) | async def test_clone_creates_parent_directory(tmp_path: Path, gitpython_...
  function test_clone_with_specific_subpath (line 170) | async def test_clone_with_specific_subpath(gitpython_mocks: dict) -> None:
  function test_clone_with_include_submodules (line 192) | async def test_clone_with_include_submodules(gitpython_mocks: dict) -> N...
  function test_check_repo_exists_with_auth_token (line 209) | async def test_check_repo_exists_with_auth_token(mocker: MockerFixture) ...

FILE: tests/test_git_utils.py
  function test_validate_github_token_valid (line 35) | def test_validate_github_token_valid(token: str) -> None:
  function test_validate_github_token_invalid (line 52) | def test_validate_github_token_invalid(token: str) -> None:
  function test_create_git_repo (line 81) | def test_create_git_repo(
  function test_create_git_auth_header (line 113) | def test_create_git_auth_header(token: str) -> None:
  function test_create_git_repo_helper_calls (line 129) | def test_create_git_repo_helper_calls(
  function test_is_github_host (line 178) | def test_is_github_host(url: str, *, expected: bool) -> None:
  function test_create_git_auth_header_with_ghe_url (line 195) | def test_create_git_auth_header_with_ghe_url(token: str, url: str, expec...
  function test_create_git_repo_with_ghe_urls (line 234) | def test_create_git_repo_with_ghe_urls(
  function test_create_git_repo_ignores_non_github_urls (line 264) | def test_create_git_repo_ignores_non_github_urls(

FILE: tests/test_gitignore_feature.py
  function repo_fixture (line 12) | def repo_fixture(tmp_path: Path) -> Path:
  function test_load_gitignore_patterns (line 35) | def test_load_gitignore_patterns(tmp_path: Path) -> None:
  function test_ingest_with_gitignore (line 52) | async def test_ingest_with_gitignore(repo_path: Path) -> None:

FILE: tests/test_ingestion.py
  function test_run_ingest_query (line 22) | def test_run_ingest_query(temp_directory: Path, sample_query: IngestionQ...
  class PatternScenario (line 55) | class PatternScenario(TypedDict):
  function test_include_ignore_patterns (line 201) | def test_include_ignore_patterns(

FILE: tests/test_notebook_utils.py
  function test_process_notebook_all_cells (line 14) | def test_process_notebook_all_cells(write_notebook: WriteNotebookFunc) -...
  function test_process_notebook_with_worksheets (line 48) | def test_process_notebook_with_worksheets(write_notebook: WriteNotebookF...
  function test_process_notebook_multiple_worksheets (line 80) | def test_process_notebook_multiple_worksheets(write_notebook: WriteNoteb...
  function test_process_notebook_code_only (line 118) | def test_process_notebook_code_only(write_notebook: WriteNotebookFunc) -...
  function test_process_notebook_markdown_only (line 139) | def test_process_notebook_markdown_only(write_notebook: WriteNotebookFun...
  function test_process_notebook_raw_only (line 161) | def test_process_notebook_raw_only(write_notebook: WriteNotebookFunc) ->...
  function test_process_notebook_empty_cells (line 183) | def test_process_notebook_empty_cells(write_notebook: WriteNotebookFunc)...
  function test_process_notebook_invalid_cell_type (line 206) | def test_process_notebook_invalid_cell_type(write_notebook: WriteNoteboo...
  function test_process_notebook_with_output (line 225) | def test_process_notebook_with_output(write_notebook: WriteNotebookFunc)...

FILE: tests/test_pattern_utils.py
  function test_process_patterns_empty_patterns (line 7) | def test_process_patterns_empty_patterns() -> None:
  function test_parse_patterns_valid (line 20) | def test_parse_patterns_valid() -> None:
  function test_process_patterns_include_and_ignore_overlap (line 33) | def test_process_patterns_include_and_ignore_overlap() -> None:

FILE: tests/test_summary.py
  function test_ingest_summary (line 28) | def test_ingest_summary(path_type: str, path: str, ref_type: str, ref: s...
  function _calculate_expected_lines (line 86) | def _calculate_expected_lines(ref_type: str, *, is_main_branch: bool) ->...
Condensed preview — 110 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (427K chars).
[
  {
    "path": ".docker/minio/setup.sh",
    "chars": 1215,
    "preview": "#!/bin/sh\n\n# Simple script to set up MinIO bucket and user\n# Based on example from MinIO issues\n\n# Format bucket name to"
  },
  {
    "path": ".dockerignore",
    "chars": 1009,
    "preview": "# -------------------------------------------------\n# Base: reuse patterns from .gitignore\n# ---------------------------"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.yml",
    "chars": 4417,
    "preview": "name: Bug report 🐞\ndescription: Report a bug or internal server error when using Gitingest\ntitle: \"(bug): \"\nlabels: [\"bu"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.yml",
    "chars": 2955,
    "preview": "name: Feature request 💡\ndescription: Suggest a new feature or improvement for Gitingest\ntitle: \"(feat): \"\nlabels: [\"enha"
  },
  {
    "path": ".github/workflows/ci.yml",
    "chars": 1831,
    "preview": "name: CI\n\non:\n  push:\n    branches: [main]\n  pull_request:\n    branches: [main]\n\nconcurrency:\n  group: ${{ github.workfl"
  },
  {
    "path": ".github/workflows/codeql.yml",
    "chars": 2925,
    "preview": "# For most projects, this workflow file will not need changing; you simply need\n# to commit it to your repository.\n#\n# Y"
  },
  {
    "path": ".github/workflows/dependency-review.yml",
    "chars": 1001,
    "preview": "# Dependency Review Action\n#\n# This Action will scan dependency manifest files that change as part of a Pull Request,\n# "
  },
  {
    "path": ".github/workflows/deploy-pr.yml",
    "chars": 5722,
    "preview": "name: Manage PR Temp Envs\n'on':\n  pull_request:\n    types:\n      - labeled\n      - unlabeled\n      - closed\n\npermissions"
  },
  {
    "path": ".github/workflows/docker-build.ecr.yml",
    "chars": 4235,
    "preview": "name: Build & Push Container\n\non:\n  push:\n    branches:\n      - 'main'\n    tags:\n      - '*'\n  merge_group:\n  pull_reque"
  },
  {
    "path": ".github/workflows/docker-build.ghcr.yml",
    "chars": 5118,
    "preview": "name: Build & Push Container\n\non:\n  push:\n    branches:\n      - 'main'\n    tags:\n      - '*'\n  merge_group:\n  pull_reque"
  },
  {
    "path": ".github/workflows/pr-title-check.yml",
    "chars": 657,
    "preview": "name: PR Conventional Commit Validation\n\non:\n  pull_request:\n    types: [opened, synchronize, reopened, edited]\n\njobs:\n "
  },
  {
    "path": ".github/workflows/publish_to_pypi.yml",
    "chars": 1847,
    "preview": "name: Publish to PyPI\n\non:\n  release:\n    types: [created] # Run when you click \"Publish release\"\n  workflow_dispatch: #"
  },
  {
    "path": ".github/workflows/rebase-needed.yml",
    "chars": 841,
    "preview": "name: PR Needs Rebase\n\non:\n  workflow_dispatch: {}\n  schedule:\n    - cron: '0 * * * *'\n\npermissions:\n  pull-requests: wr"
  },
  {
    "path": ".github/workflows/release-please.yml",
    "chars": 728,
    "preview": "name: release-please\non:\n  push:\n    branches:\n      - main\n\npermissions:\n  contents: write\n  pull-requests: write\n\njobs"
  },
  {
    "path": ".github/workflows/scorecard.yml",
    "chars": 1289,
    "preview": "name: OSSF Scorecard\non:\n  branch_protection_rule:\n  schedule:\n    - cron: '33 11 * * 2'  # Every Tuesday at 11:33 AM UT"
  },
  {
    "path": ".github/workflows/stale.yml",
    "chars": 1271,
    "preview": "name: \"Close stale issues and PRs\"\n\non:\n  schedule:\n    - cron: \"0 6 * * *\"\n  workflow_dispatch: {}\n\npermissions:\n  issu"
  },
  {
    "path": ".gitignore",
    "chars": 470,
    "preview": "# Operating-system\n.DS_Store\nThumbs.db\n\n# Editor / IDE settings\n.vscode/\n!.vscode/launch.json\n.idea/\n*.swp\n\n# Python vir"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 5111,
    "preview": "repos:\n  - repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: v5.0.0\n    hooks:\n      - id: check-added-large"
  },
  {
    "path": ".release-please-manifest.json",
    "chars": 14,
    "preview": "{\".\":\"0.3.1\"}\n"
  },
  {
    "path": ".vscode/launch.json",
    "chars": 265,
    "preview": "{\n    \"configurations\": [\n        {\n            \"name\": \"Python Debugger: Module\",\n            \"type\": \"debugpy\",\n      "
  },
  {
    "path": "CHANGELOG.md",
    "chars": 7107,
    "preview": "# Changelog\n\n## [0.3.1](https://github.com/coderamp-labs/gitingest/compare/v0.3.0...v0.3.1) (2025-07-31)\n\n\n### Bug Fixes"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 5206,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participa"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 2346,
    "preview": "# Contributing to Gitingest\n\nThanks for your interest in contributing to **Gitingest** 🚀 Our goal is to keep the codebas"
  },
  {
    "path": "Dockerfile",
    "chars": 1475,
    "preview": "# Stage 1: Install Python dependencies\nFROM python:3.13.5-slim@sha256:4c2cf9917bd1cbacc5e9b07320025bdb7cdf2df7b0ceaccb55"
  },
  {
    "path": "LICENSE",
    "chars": 1072,
    "preview": "MIT License\n\nCopyright (c) 2024 Romain Courtois\n\nPermission is hereby granted, free of charge, to any person obtaining a"
  },
  {
    "path": "README.md",
    "chars": 14460,
    "preview": "# Gitingest\n\n[![Screenshot of Gitingest front page](https://raw.githubusercontent.com/coderamp-labs/gitingest/refs/heads"
  },
  {
    "path": "SECURITY.md",
    "chars": 274,
    "preview": "# Security Policy\n\n## Reporting a Vulnerability\n\nIf you have discovered a vulnerability inside the project, report it pr"
  },
  {
    "path": "compose.yml",
    "chars": 3777,
    "preview": "x-base-environment: &base-environment\n  # Python Configuration\n  PYTHONUNBUFFERED: \"1\"\n  PYTHONDONTWRITEBYTECODE: \"1\"\n  "
  },
  {
    "path": "eslint.config.cjs",
    "chars": 2366,
    "preview": "const js = require('@eslint/js');\nconst globals = require('globals');\nconst importPlugin = require('eslint-plugin-import"
  },
  {
    "path": "pyproject.toml",
    "chars": 3572,
    "preview": "[project]\nname = \"gitingest\"\nversion = \"0.3.1\"\ndescription=\"CLI tool to analyze and create text dumps of codebases for L"
  },
  {
    "path": "release-please-config.json",
    "chars": 209,
    "preview": "{\n  \"$schema\": \"https://raw.githubusercontent.com/googleapis/release-please/main/schemas/config.json\",\n  \"packages\": {\n "
  },
  {
    "path": "renovate.json",
    "chars": 114,
    "preview": "{\n  \"$schema\": \"https://docs.renovatebot.com/renovate-schema.json\",\n  \"extends\": [\n    \"config:recommended\"\n  ]\n}\n"
  },
  {
    "path": "requirements-dev.txt",
    "chars": 95,
    "preview": "-r requirements.txt\neval-type-backport\npre-commit\npytest\npytest-asyncio\npytest-cov\npytest-mock\n"
  },
  {
    "path": "requirements.txt",
    "chars": 461,
    "preview": "boto3>=1.28.0  # AWS SDK for S3 support\nclick>=8.0.0\nfastapi[standard]>=0.109.1  # Vulnerable to https://osv.dev/vulnera"
  },
  {
    "path": "src/gitingest/__init__.py",
    "chars": 162,
    "preview": "\"\"\"Gitingest: A package for ingesting data from Git repositories.\"\"\"\n\nfrom gitingest.entrypoint import ingest, ingest_as"
  },
  {
    "path": "src/gitingest/__main__.py",
    "chars": 6600,
    "preview": "\"\"\"Command-line interface (CLI) for Gitingest.\"\"\"\n\n# pylint: disable=no-value-for-parameter\nfrom __future__ import annot"
  },
  {
    "path": "src/gitingest/clone.py",
    "chars": 6233,
    "preview": "\"\"\"Module containing functions for cloning a Git repository to a local path.\"\"\"\n\nfrom __future__ import annotations\n\nfro"
  },
  {
    "path": "src/gitingest/config.py",
    "chars": 497,
    "preview": "\"\"\"Configuration file for the project.\"\"\"\n\nimport tempfile\nfrom pathlib import Path\n\nMAX_FILE_SIZE = 10 * 1024 * 1024  #"
  },
  {
    "path": "src/gitingest/entrypoint.py",
    "chars": 13064,
    "preview": "\"\"\"Main entry point for ingesting a source and processing its contents.\"\"\"\n\nfrom __future__ import annotations\n\nimport a"
  },
  {
    "path": "src/gitingest/ingestion.py",
    "chars": 10789,
    "preview": "\"\"\"Functions to ingest and analyze a codebase directory or single file.\"\"\"\n\nfrom __future__ import annotations\n\nfrom pat"
  },
  {
    "path": "src/gitingest/output_formatter.py",
    "chars": 6917,
    "preview": "\"\"\"Functions to ingest and analyze a codebase directory or single file.\"\"\"\n\nfrom __future__ import annotations\n\nimport s"
  },
  {
    "path": "src/gitingest/query_parser.py",
    "chars": 6162,
    "preview": "\"\"\"Module containing functions to parse and validate input sources and patterns.\"\"\"\n\nfrom __future__ import annotations\n"
  },
  {
    "path": "src/gitingest/schemas/__init__.py",
    "chars": 366,
    "preview": "\"\"\"Module containing the schemas for the Gitingest package.\"\"\"\n\nfrom gitingest.schemas.cloning import CloneConfig\nfrom g"
  },
  {
    "path": "src/gitingest/schemas/cloning.py",
    "chars": 1331,
    "preview": "\"\"\"Schema for the cloning process.\"\"\"\n\nfrom __future__ import annotations\n\nfrom pydantic import BaseModel, Field\n\n\nclass"
  },
  {
    "path": "src/gitingest/schemas/filesystem.py",
    "chars": 5071,
    "preview": "\"\"\"Schema for the filesystem representation.\"\"\"\n\nfrom __future__ import annotations\n\nimport os\nfrom dataclasses import d"
  },
  {
    "path": "src/gitingest/schemas/ingestion.py",
    "chars": 3289,
    "preview": "\"\"\"Module containing the dataclasses for the ingestion process.\"\"\"\n\nfrom __future__ import annotations\n\nfrom pathlib imp"
  },
  {
    "path": "src/gitingest/utils/__init__.py",
    "chars": 51,
    "preview": "\"\"\"Utility functions for the gitingest package.\"\"\"\n"
  },
  {
    "path": "src/gitingest/utils/auth.py",
    "chars": 579,
    "preview": "\"\"\"Utilities for handling authentication.\"\"\"\n\nfrom __future__ import annotations\n\nimport os\n\nfrom gitingest.utils.git_ut"
  },
  {
    "path": "src/gitingest/utils/compat_func.py",
    "chars": 756,
    "preview": "\"\"\"Compatibility functions for Python 3.8.\"\"\"\n\nimport os\nfrom pathlib import Path\n\n\ndef readlink(path: Path) -> Path:\n  "
  },
  {
    "path": "src/gitingest/utils/compat_typing.py",
    "chars": 671,
    "preview": "\"\"\"Compatibility layer for typing.\"\"\"\n\ntry:\n    from enum import StrEnum  # type: ignore[attr-defined]  # Py ≥ 3.11\nexce"
  },
  {
    "path": "src/gitingest/utils/exceptions.py",
    "chars": 919,
    "preview": "\"\"\"Custom exceptions for the Gitingest package.\"\"\"\n\n\nclass AsyncTimeoutError(Exception):\n    \"\"\"Exception raised when an"
  },
  {
    "path": "src/gitingest/utils/file_utils.py",
    "chars": 1878,
    "preview": "\"\"\"Utility functions for working with files and directories.\"\"\"\n\nfrom __future__ import annotations\n\nimport locale\nimpor"
  },
  {
    "path": "src/gitingest/utils/git_utils.py",
    "chars": 15654,
    "preview": "\"\"\"Utility functions for interacting with Git repositories.\"\"\"\n\nfrom __future__ import annotations\n\nimport asyncio\nimpor"
  },
  {
    "path": "src/gitingest/utils/ignore_patterns.py",
    "chars": 4897,
    "preview": "\"\"\"Default ignore patterns for Gitingest.\"\"\"\n\nfrom __future__ import annotations\n\nfrom pathlib import Path\n\nDEFAULT_IGNO"
  },
  {
    "path": "src/gitingest/utils/ingestion_utils.py",
    "chars": 2540,
    "preview": "\"\"\"Utility functions for the ingestion process.\"\"\"\n\nfrom __future__ import annotations\n\nfrom typing import TYPE_CHECKING"
  },
  {
    "path": "src/gitingest/utils/logging_config.py",
    "chars": 5742,
    "preview": "\"\"\"Logging configuration for gitingest using loguru.\n\nThis module provides structured JSON logging suitable for Kubernet"
  },
  {
    "path": "src/gitingest/utils/notebook.py",
    "chars": 4385,
    "preview": "\"\"\"Utilities for processing Jupyter notebooks.\"\"\"\n\nfrom __future__ import annotations\n\nimport json\nfrom itertools import"
  },
  {
    "path": "src/gitingest/utils/os_utils.py",
    "chars": 566,
    "preview": "\"\"\"Utility functions for working with the operating system.\"\"\"\n\nfrom pathlib import Path\n\n\nasync def ensure_directory_ex"
  },
  {
    "path": "src/gitingest/utils/pattern_utils.py",
    "chars": 2272,
    "preview": "\"\"\"Pattern utilities for the Gitingest package.\"\"\"\n\nfrom __future__ import annotations\n\nimport re\nfrom typing import Ite"
  },
  {
    "path": "src/gitingest/utils/query_parser_utils.py",
    "chars": 6635,
    "preview": "\"\"\"Utility functions for parsing and validating query parameters.\"\"\"\n\nfrom __future__ import annotations\n\nimport string\n"
  },
  {
    "path": "src/gitingest/utils/timeout_wrapper.py",
    "chars": 1582,
    "preview": "\"\"\"Utility functions for the Gitingest package.\"\"\"\n\nimport asyncio\nimport functools\nfrom typing import Awaitable, Callab"
  },
  {
    "path": "src/server/__init__.py",
    "chars": 21,
    "preview": "\"\"\"Server module.\"\"\"\n"
  },
  {
    "path": "src/server/__main__.py",
    "chars": 798,
    "preview": "\"\"\"Server module entry point for running with python -m server.\"\"\"\n\nimport os\n\nimport uvicorn\n\n# Import logging configur"
  },
  {
    "path": "src/server/form_types.py",
    "chars": 448,
    "preview": "\"\"\"Reusable form type aliases for FastAPI form parameters.\"\"\"\n\nfrom __future__ import annotations\n\nfrom typing import TY"
  },
  {
    "path": "src/server/main.py",
    "chars": 7413,
    "preview": "\"\"\"Main module for the FastAPI application.\"\"\"\n\nfrom __future__ import annotations\n\nimport os\nimport threading\nfrom path"
  },
  {
    "path": "src/server/metrics_server.py",
    "chars": 1814,
    "preview": "\"\"\"Prometheus metrics server running on a separate port.\"\"\"\n\nimport uvicorn\nfrom fastapi import FastAPI\nfrom fastapi.res"
  },
  {
    "path": "src/server/models.py",
    "chars": 6193,
    "preview": "\"\"\"Pydantic models for the query form.\"\"\"\n\nfrom __future__ import annotations\n\nfrom enum import Enum\nfrom typing import "
  },
  {
    "path": "src/server/query_processor.py",
    "chars": 14392,
    "preview": "\"\"\"Process a query by parsing input, cloning a repository, and generating a summary.\"\"\"\n\nfrom __future__ import annotati"
  },
  {
    "path": "src/server/routers/__init__.py",
    "chars": 261,
    "preview": "\"\"\"Module containing the routers for the FastAPI application.\"\"\"\n\nfrom server.routers.dynamic import router as dynamic\nf"
  },
  {
    "path": "src/server/routers/dynamic.py",
    "chars": 1262,
    "preview": "\"\"\"The dynamic router module defines handlers for dynamic path requests.\"\"\"\n\nfrom fastapi import APIRouter, Request\nfrom"
  },
  {
    "path": "src/server/routers/index.py",
    "chars": 1216,
    "preview": "\"\"\"Module defining the FastAPI router for the home page of the application.\"\"\"\n\nfrom fastapi import APIRouter, Request\nf"
  },
  {
    "path": "src/server/routers/ingest.py",
    "chars": 6211,
    "preview": "\"\"\"Ingest endpoint for the API.\"\"\"\n\nfrom typing import Union\nfrom uuid import UUID\n\nfrom fastapi import APIRouter, HTTPE"
  },
  {
    "path": "src/server/routers_utils.py",
    "chars": 2257,
    "preview": "\"\"\"Utility functions for the ingest endpoints.\"\"\"\n\nfrom __future__ import annotations\n\nfrom typing import Any\n\nfrom fast"
  },
  {
    "path": "src/server/s3_utils.py",
    "chars": 17775,
    "preview": "\"\"\"S3 utility functions for uploading and managing digest files.\"\"\"\n\nfrom __future__ import annotations\n\nimport hashlib\n"
  },
  {
    "path": "src/server/server_config.py",
    "chars": 1849,
    "preview": "\"\"\"Configuration for the server.\"\"\"\n\nfrom __future__ import annotations\n\nimport os\nfrom pathlib import Path\n\nfrom fastap"
  },
  {
    "path": "src/server/server_utils.py",
    "chars": 1892,
    "preview": "\"\"\"Utility functions for the server.\"\"\"\n\nfrom fastapi import Request\nfrom fastapi.responses import Response\nfrom slowapi"
  },
  {
    "path": "src/server/templates/base.jinja",
    "chars": 3391,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n    <head>\n        <meta charset=\"UTF-8\">\n        <meta name=\"viewport\" content=\"width="
  },
  {
    "path": "src/server/templates/components/_macros.jinja",
    "chars": 324,
    "preview": "{# Icon link #}\n{% macro footer_icon_link(href, icon, label) -%}\n    <a href=\"{{ href }}\"\n       target=\"_blank\"\n       "
  },
  {
    "path": "src/server/templates/components/footer.jinja",
    "chars": 1611,
    "preview": "{% from 'components/_macros.jinja' import footer_icon_link %}\n<footer class=\"w-full border-t-[3px] border-gray-900 mt-au"
  },
  {
    "path": "src/server/templates/components/git_form.jinja",
    "chars": 12294,
    "preview": "<div class=\"relative\">\n    <div class=\"w-full h-full absolute inset-0 bg-gray-900 rounded-xl translate-y-2 translate-x-2"
  },
  {
    "path": "src/server/templates/components/navbar.jinja",
    "chars": 1879,
    "preview": "<header class=\"sticky top-0 bg-[#FFFDF8] border-b-[3px] border-gray-900 z-50\">\n    <div class=\"max-w-4xl mx-auto px-4\">\n"
  },
  {
    "path": "src/server/templates/components/result.jinja",
    "chars": 8134,
    "preview": "<div class=\"mt-10\">\n    <!-- Error Message (hidden by default) -->\n    <div id=\"results-error\" style=\"display:none\"></di"
  },
  {
    "path": "src/server/templates/components/tailwind_components.html",
    "chars": 1327,
    "preview": "<style type=\"text/tailwindcss\">\n  @layer components {\n    .badge-new {\n      @apply inline-block -rotate-6 -translate-y-"
  },
  {
    "path": "src/server/templates/git.jinja",
    "chars": 452,
    "preview": "{% extends \"base.jinja\" %}\n{% block content %}\n    {% if error_message %}\n        <div class=\"mb-6 p-4 bg-red-50 border "
  },
  {
    "path": "src/server/templates/index.jinja",
    "chars": 1279,
    "preview": "{% extends \"base.jinja\" %}\n{% block content %}\n    <div class=\"mb-8\">\n        <div class=\"relative w-full flex sm:flex-r"
  },
  {
    "path": "src/server/templates/swagger_ui.jinja",
    "chars": 1422,
    "preview": "{% extends \"base.jinja\" %}\n{% block title %}GitIngest API{% endblock %}\n{% block content %}\n    <div class=\"mb-8\">\n     "
  },
  {
    "path": "src/static/js/git.js",
    "chars": 967,
    "preview": "function waitForStars() {\n    return new Promise((resolve) => {\n        const check = () => {\n            const stars = "
  },
  {
    "path": "src/static/js/git_form.js",
    "chars": 1367,
    "preview": "// Strike-through / un-strike file lines when the pattern-type menu flips.\nfunction changePattern() {\n    const dirPre ="
  },
  {
    "path": "src/static/js/index.js",
    "chars": 258,
    "preview": "function submitExample(repoName) {\n    const input = document.getElementById('input_text');\n\n    if (input) {\n        in"
  },
  {
    "path": "src/static/js/navbar.js",
    "chars": 776,
    "preview": "// Fetch GitHub stars\nfunction formatStarCount(count) {\n    if (count >= 1000) {return `${ (count / 1000).toFixed(1) }k`"
  },
  {
    "path": "src/static/js/posthog.js",
    "chars": 2798,
    "preview": "/* eslint-disable */\n!function (t, e) {\n    let o, n, p, r;\n    if (e.__SV) {return;}                 // already loaded\n"
  },
  {
    "path": "src/static/js/utils.js",
    "chars": 13898,
    "preview": "function getFileName(element) {\n    const indentSize = 4;\n    let path = '';\n    let prevIndentLevel = null;\n\n    while "
  },
  {
    "path": "src/static/llms.txt",
    "chars": 12047,
    "preview": "# GitIngest – **AI Agent Integration Guide**\n\nTurn any Git repository into a prompt-ready text digest. GitIngest fetches"
  },
  {
    "path": "src/static/robots.txt",
    "chars": 69,
    "preview": "User-agent: *\nAllow: /\nAllow: /api/\nAllow: /coderamp-labs/gitingest/\n"
  },
  {
    "path": "tests/.pylintrc",
    "chars": 196,
    "preview": "[MASTER]\ninit-hook=\n    import sys\n    sys.path.append('./src')\n\n[MESSAGES CONTROL]\ndisable=missing-class-docstring,miss"
  },
  {
    "path": "tests/__init__.py",
    "chars": 39,
    "preview": "\"\"\"Tests for the gitingest package.\"\"\"\n"
  },
  {
    "path": "tests/conftest.py",
    "chars": 8720,
    "preview": "\"\"\"Fixtures for tests.\n\nThis file provides shared fixtures for creating sample queries, a temporary directory structure,"
  },
  {
    "path": "tests/query_parser/__init__.py",
    "chars": 34,
    "preview": "\"\"\"Tests for the query parser.\"\"\"\n"
  },
  {
    "path": "tests/query_parser/test_git_host_agnostic.py",
    "chars": 2734,
    "preview": "\"\"\"Tests to verify that the query parser is Git host agnostic.\n\nThese tests confirm that ``parse_query`` correctly ident"
  },
  {
    "path": "tests/query_parser/test_query_parser.py",
    "chars": 11907,
    "preview": "\"\"\"Tests for the ``query_parser`` module.\n\nThese tests cover URL parsing, pattern parsing, and handling of branches/subp"
  },
  {
    "path": "tests/server/__init__.py",
    "chars": 28,
    "preview": "\"\"\"Tests for the server.\"\"\"\n"
  },
  {
    "path": "tests/server/test_flow_integration.py",
    "chars": 6683,
    "preview": "\"\"\"Integration tests covering core functionalities, edge cases, and concurrency handling.\"\"\"\n\nimport shutil\nimport sys\nf"
  },
  {
    "path": "tests/test_cli.py",
    "chars": 3548,
    "preview": "\"\"\"Tests for the Gitingest CLI.\"\"\"\n\nfrom __future__ import annotations\n\nfrom inspect import signature\nfrom pathlib impor"
  },
  {
    "path": "tests/test_clone.py",
    "chars": 7746,
    "preview": "\"\"\"Tests for the ``clone`` module.\n\nThese tests cover various scenarios for cloning repositories, verifying that the app"
  },
  {
    "path": "tests/test_git_utils.py",
    "chars": 9459,
    "preview": "\"\"\"Tests for the ``git_utils`` module.\n\nThese tests validate the ``validate_github_token`` function, which ensures that\n"
  },
  {
    "path": "tests/test_gitignore_feature.py",
    "chars": 2714,
    "preview": "\"\"\"Tests for the gitignore functionality in Gitingest.\"\"\"\n\nfrom pathlib import Path\n\nimport pytest\n\nfrom gitingest.entry"
  },
  {
    "path": "tests/test_ingestion.py",
    "chars": 8861,
    "preview": "\"\"\"Tests for the ``query_ingestion`` module.\n\nThese tests validate directory scanning, file content extraction, notebook"
  },
  {
    "path": "tests/test_notebook_utils.py",
    "chars": 10375,
    "preview": "\"\"\"Tests for the ``notebook`` utils module.\n\nThese tests validate how notebooks are processed into Python-like output, e"
  },
  {
    "path": "tests/test_pattern_utils.py",
    "chars": 1642,
    "preview": "\"\"\"Test pattern utilities.\"\"\"\n\nfrom gitingest.utils.ignore_patterns import DEFAULT_IGNORE_PATTERNS\nfrom gitingest.utils."
  },
  {
    "path": "tests/test_summary.py",
    "chars": 3487,
    "preview": "\"\"\"Test that ``gitingest.ingest()`` emits a concise, 5-or-6-line summary.\"\"\"\n\nimport re\nfrom pathlib import Path\n\nimport"
  }
]

About this extraction

This page contains the full source code of the coderamp-labs/gitingest GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 110 files (391.9 KB), approximately 98.9k tokens, and a symbol index with 238 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!