Full Code of huggingface/ml-intern for AI

main b292d83aa78e cached

110 files

948.1 KB

226.4k tokens

671 symbols

1 requests

Download .txt

Showing preview only (1,001K chars total). Download the full file or copy to clipboard to get everything.

Repository: huggingface/ml-intern
Branch: main
Commit: b292d83aa78e
Files: 110
Total size: 948.1 KB

Directory structure:
gitextract_xwbm9csz/

├── .gitattributes
├── .github/
│   └── workflows/
│       ├── claude-review.yml
│       └── claude.yml
├── .gitignore
├── .python-version
├── Dockerfile
├── README.md
├── REVIEW.md
├── agent/
│   ├── README.md
│   ├── __init__.py
│   ├── config.py
│   ├── context_manager/
│   │   ├── __init__.py
│   │   └── manager.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── agent_loop.py
│   │   ├── doom_loop.py
│   │   ├── effort_probe.py
│   │   ├── hf_router_catalog.py
│   │   ├── llm_params.py
│   │   ├── model_switcher.py
│   │   ├── prompt_caching.py
│   │   ├── session.py
│   │   ├── session_uploader.py
│   │   └── tools.py
│   ├── main.py
│   ├── prompts/
│   │   ├── system_prompt.yaml
│   │   ├── system_prompt_v2.yaml
│   │   └── system_prompt_v3.yaml
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── dataset_tools.py
│   │   ├── docs_tools.py
│   │   ├── edit_utils.py
│   │   ├── github_find_examples.py
│   │   ├── github_list_repos.py
│   │   ├── github_read_file.py
│   │   ├── hf_repo_files_tool.py
│   │   ├── hf_repo_git_tool.py
│   │   ├── jobs_tool.py
│   │   ├── local_tools.py
│   │   ├── papers_tool.py
│   │   ├── plan_tool.py
│   │   ├── private_hf_repo_tools.py
│   │   ├── research_tool.py
│   │   ├── sandbox_client.py
│   │   ├── sandbox_tool.py
│   │   ├── types.py
│   │   └── utilities.py
│   └── utils/
│       ├── __init__.py
│       ├── boot_timing.py
│       ├── braille.py
│       ├── crt_boot.py
│       ├── particle_logo.py
│       ├── reliability_checks.py
│       └── terminal_display.py
├── backend/
│   ├── __init__.py
│   ├── dependencies.py
│   ├── main.py
│   ├── models.py
│   ├── routes/
│   │   ├── __init__.py
│   │   ├── agent.py
│   │   └── auth.py
│   ├── session_manager.py
│   ├── start.sh
│   └── user_quotas.py
├── configs/
│   └── main_agent_config.json
├── frontend/
│   ├── eslint.config.js
│   ├── index.html
│   ├── package.json
│   ├── src/
│   │   ├── App.tsx
│   │   ├── components/
│   │   │   ├── Chat/
│   │   │   │   ├── ActivityStatusBar.tsx
│   │   │   │   ├── AssistantMessage.tsx
│   │   │   │   ├── ChatInput.tsx
│   │   │   │   ├── ExpiredBanner.tsx
│   │   │   │   ├── MarkdownContent.tsx
│   │   │   │   ├── MessageBubble.tsx
│   │   │   │   ├── MessageList.tsx
│   │   │   │   ├── ThinkingIndicator.tsx
│   │   │   │   ├── ToolCallGroup.tsx
│   │   │   │   └── UserMessage.tsx
│   │   │   ├── ClaudeCapDialog.tsx
│   │   │   ├── CodePanel/
│   │   │   │   └── CodePanel.tsx
│   │   │   ├── Layout/
│   │   │   │   └── AppLayout.tsx
│   │   │   ├── SessionChat.tsx
│   │   │   ├── SessionSidebar/
│   │   │   │   └── SessionSidebar.tsx
│   │   │   └── WelcomeScreen/
│   │   │       └── WelcomeScreen.tsx
│   │   ├── hooks/
│   │   │   ├── useAgentChat.ts
│   │   │   ├── useAuth.ts
│   │   │   ├── useOrgMembership.ts
│   │   │   └── useUserQuota.ts
│   │   ├── lib/
│   │   │   ├── backend-message-store.ts
│   │   │   ├── chat-message-store.ts
│   │   │   ├── convert-llm-messages.ts
│   │   │   ├── research-store.ts
│   │   │   └── sse-chat-transport.ts
│   │   ├── main.tsx
│   │   ├── store/
│   │   │   ├── agentStore.ts
│   │   │   ├── layoutStore.ts
│   │   │   └── sessionStore.ts
│   │   ├── theme.ts
│   │   ├── types/
│   │   │   ├── agent.ts
│   │   │   └── events.ts
│   │   ├── utils/
│   │   │   ├── api.ts
│   │   │   ├── logProcessor.ts
│   │   │   ├── logger.ts
│   │   │   └── model.ts
│   │   └── vite-env.d.ts
│   ├── tsconfig.json
│   └── vite.config.ts
├── pyproject.toml
└── tests/
    └── unit/
        └── test_user_quotas.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitattributes
================================================
*.png filter=lfs diff=lfs merge=lfs -text


================================================
FILE: .github/workflows/claude-review.yml
================================================
name: Claude PR Review

on:
  pull_request:
    types: [opened, synchronize, ready_for_review]

permissions:
  contents: read
  pull-requests: write
  issues: read
  id-token: write

concurrency:
  group: claude-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  review:
    if: github.event.pull_request.draft == false
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Compose review prompt
        id: compose
        run: |
          {
            printf 'prompt<<PROMPT_EOF\n'
            if [ -f REVIEW.md ]; then
              echo '# Highest-priority review instructions (from REVIEW.md at the repo root)'
              echo 'Follow these rules as the authoritative guide for this review. If anything'
              echo 'below contradicts a more generic review habit, follow these.'
              echo
              cat REVIEW.md
              echo
              echo '---'
              echo
            fi
            cat <<'BASE'
          Review this pull request against the main branch.

          Tag every finding with a priority label: P0 (blocks merge), P1 (worth
          fixing, not blocking), or P2 (informational / pre-existing). Open the
          review body with a one-line tally ("2 P0, 3 P1", or
          "No blocking issues — 3 P1", or "LGTM" if nothing). Cite file:line for
          every behavior claim. Prefer inline comments over long summaries.

          Fallback focus if REVIEW.md is missing: correctness, security (auth,
          injection, SSRF), LiteLLM/Bedrock routing breakage, agent loop / streaming
          regressions, test coverage for new behavior. Skip anything ruff already
          catches.
          BASE
            printf 'PROMPT_EOF\n'
          } >> "$GITHUB_OUTPUT"

      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          track_progress: true
          prompt: ${{ steps.compose.outputs.prompt }}


================================================
FILE: .github/workflows/claude.yml
================================================
name: Claude on Mention

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  pull_request_review:
    types: [submitted]
  issues:
    types: [opened, assigned]

permissions:
  contents: write
  pull-requests: write
  issues: write
  id-token: write

jobs:
  claude:
    if: |
      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
      (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: anthropics/claude-code-action@v1
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          track_progress: true


================================================
FILE: .gitignore
================================================
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info
.pytest_cache/
.mypy_cache/
.tox/
.coverage
htmlcov/
.ipynb_checkpoints/

# Virtual environments
.venv/
venv/
ENV/
env/

# Environment and Secrets
.env
.env.local
.env.*
!.env.example
*.local
credentials*.json

# OS-specific
.DS_Store
Thumbs.db
*.swp

# IDE-specific
.vscode/
.idea/
.cursor/
.history/
*.sublime-project
*.sublime-workspace

# Frontend (Node.js)
frontend/node_modules/
frontend/dist/
frontend/.cache/
frontend/*.local
frontend/.eslintcache
frontend/npm-debug.log*
frontend/yarn-debug.log*
frontend/yarn-error.log*

# Docker
.docker/

# Eval (stale)
eval/

# Project-specific
session_logs/
/logs
hf-agent-leaderboard/
skills/
.claude/
*.jsonl
*.csv

# ML / Data
data/
datasets/
models/
checkpoint-*/
runs/
wandb/
frontend/tsconfig.tsbuildinfo


================================================
FILE: .python-version
================================================
3.12


================================================
FILE: Dockerfile
================================================
# Stage 1: Build frontend
FROM node:20-alpine AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package.json frontend/package-lock.json ./
RUN npm install
COPY frontend/ ./
RUN npm run build

# Stage 2: Production
FROM python:3.12-slim

# Install uv directly from official image
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Create user with UID 1000 (required for HF Spaces)
RUN useradd -m -u 1000 user

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy dependency files
COPY pyproject.toml uv.lock ./

# Install dependencies into /app/.venv
# Use --frozen to ensure exact versions from uv.lock
RUN uv sync --no-dev --frozen

# Copy application code
COPY agent/ ./agent/
COPY backend/ ./backend/
COPY configs/ ./configs/

# Copy built frontend
COPY --from=frontend-builder /app/frontend/dist ./static/

# Create directories and set ownership
RUN mkdir -p /app/session_logs && \
    chown -R user:user /app

# Switch to non-root user
USER user

# Set environment
ENV HOME=/home/user \
    PYTHONUNBUFFERED=1 \
    PYTHONPATH=/app \
    PATH="/app/.venv/bin:$PATH"

# Expose port
EXPOSE 7860

# Run the application from backend directory
WORKDIR /app/backend
CMD ["bash", "start.sh"]


================================================
FILE: README.md
================================================
<p align="center">
  <img src="frontend/public/smolagents.webp" alt="smolagents logo" width="160" />
</p>

# ML Intern

An ML intern that autonomously researches, writes, and ships good quality ML releated code using the Hugging Face ecosystem — with deep access to docs, papers, datasets, and cloud compute.

## Quick Start

### Installation

```bash
git clone git@github.com:huggingface/ml-intern.git
cd ml-intern
uv sync
uv tool install -e .
```

#### That's it. Now `ml-intern` works from any directory:

```bash
ml-intern
```

Create a `.env` file in the project root (or export these in your shell):

```bash
ANTHROPIC_API_KEY=<your-anthropic-api-key> # if using anthropic models
HF_TOKEN=<your-hugging-face-token>
GITHUB_TOKEN=<github-personal-access-token> 
```
If no `HF_TOKEN` is set, the CLI will prompt you to paste one on first launch. To get a GITHUB_TOKEN follow the tutorial [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-fine-grained-personal-access-token).

### Usage

**Interactive mode** (start a chat session):

```bash
ml-intern
```

**Headless mode** (single prompt, auto-approve):

```bash
ml-intern "fine-tune llama on my dataset"
```

**Options:**

```bash
ml-intern --model anthropic/claude-opus-4-6 "your prompt"
ml-intern --max-iterations 100 "your prompt"
ml-intern --no-stream "your prompt"
```

## Architecture

### Component Overview

```
┌─────────────────────────────────────────────────────────────┐
│                         User/CLI                            │
└────────────┬─────────────────────────────────────┬──────────┘
             │ Operations                          │ Events
             ↓ (user_input, exec_approval,         ↑
      submission_queue  interrupt, compact, ...)  event_queue
             │                                          │
             ↓                                          │
┌────────────────────────────────────────────────────┐  │
│            submission_loop (agent_loop.py)         │  │
│  ┌──────────────────────────────────────────────┐  │  │
│  │  1. Receive Operation from queue             │  │  │
│  │  2. Route to handler (run_agent/compact/...) │  │  │
│  └──────────────────────────────────────────────┘  │  │
│                      ↓                             │  │
│  ┌──────────────────────────────────────────────┐  │  │
│  │         Handlers.run_agent()                 │  ├──┤
│  │                                              │  │  │
│  │  ┌────────────────────────────────────────┐  │  │  │
│  │  │  Agentic Loop (max 300 iterations)     │  │  │  │
│  │  │                                        │  │  │  │
│  │  │  ┌──────────────────────────────────┐  │  │  │  │
│  │  │  │ Session                          │  │  │  │  │
│  │  │  │  ┌────────────────────────────┐  │  │  │  │  │
│  │  │  │  │ ContextManager             │  │  │  │  │  │
│  │  │  │  │ • Message history          │  │  │  │  │  │
│  │  │  │  │   (litellm.Message[])      │  │  │  │  │  │
│  │  │  │  │ • Auto-compaction (170k)   │  │  │  │  │  │
│  │  │  │  │ • Session upload to HF     │  │  │  │  │  │
│  │  │  │  └────────────────────────────┘  │  │  │  │  │
│  │  │  │                                  │  │  │  │  │
│  │  │  │  ┌────────────────────────────┐  │  │  │  │  │
│  │  │  │  │ ToolRouter                 │  │  │  │  │  │
│  │  │  │  │  ├─ HF docs & research     │  │  │  │  │  │
│  │  │  │  │  ├─ HF repos, datasets,    │  │  │  │  │  │
│  │  │  │  │  │  jobs, papers           │  │  │  │  │  │
│  │  │  │  │  ├─ GitHub code search     │  │  │  │  │  │
│  │  │  │  │  ├─ Sandbox & local tools  │  │  │  │  │  │
│  │  │  │  │  ├─ Planning               │  │  │  │  │  │
│  │  │  │  │  └─ MCP server tools       │  │  │  │  │  │
│  │  │  │  └────────────────────────────┘  │  │  │  │  │
│  │  │  └──────────────────────────────────┘  │  │  │  │
│  │  │                                        │  │  │  │
│  │  │  ┌──────────────────────────────────┐  │  │  │  │
│  │  │  │ Doom Loop Detector               │  │  │  │  │
│  │  │  │ • Detects repeated tool patterns │  │  │  │  │
│  │  │  │ • Injects corrective prompts     │  │  │  │  │
│  │  │  └──────────────────────────────────┘  │  │  │  │
│  │  │                                        │  │  │  │
│  │  │  Loop:                                 │  │  │  │
│  │  │    1. LLM call (litellm.acompletion)   │  │  │  │
│  │  │       ↓                                │  │  │  │
│  │  │    2. Parse tool_calls[]               │  │  │  │
│  │  │       ↓                                │  │  │  │
│  │  │    3. Approval check                   │  │  │  │
│  │  │       (jobs, sandbox, destructive ops) │  │  │  │
│  │  │       ↓                                │  │  │  │
│  │  │    4. Execute via ToolRouter           │  │  │  │
│  │  │       ↓                                │  │  │  │
│  │  │    5. Add results to ContextManager    │  │  │  │
│  │  │       ↓                                │  │  │  │
│  │  │    6. Repeat if tool_calls exist       │  │  │  │
│  │  └────────────────────────────────────────┘  │  │  │
│  └──────────────────────────────────────────────┘  │  │
└────────────────────────────────────────────────────┴──┘
```

### Agentic Loop Flow

```
User Message
     ↓
[Add to ContextManager]
     ↓
     ╔═══════════════════════════════════════════╗
     ║      Iteration Loop (max 300)             ║
     ║                                           ║
     ║  Get messages + tool specs                ║
     ║         ↓                                 ║
     ║  litellm.acompletion()                    ║
     ║         ↓                                 ║
     ║  Has tool_calls? ──No──> Done             ║
     ║         │                                 ║
     ║        Yes                                ║
     ║         ↓                                 ║
     ║  Add assistant msg (with tool_calls)      ║
     ║         ↓                                 ║
     ║  Doom loop check                          ║
     ║         ↓                                 ║
     ║  For each tool_call:                      ║
     ║    • Needs approval? ──Yes──> Wait for    ║
     ║    │                         user confirm ║
     ║    No                                     ║
     ║    ↓                                      ║
     ║    • ToolRouter.execute_tool()            ║
     ║    • Add result to ContextManager         ║
     ║         ↓                                 ║
     ║  Continue loop ─────────────────┐         ║
     ║         ↑                       │         ║
     ║         └───────────────────────┘         ║
     ╚═══════════════════════════════════════════╝
```

## Events

The agent emits the following events via `event_queue`:

- `processing` - Starting to process user input
- `ready` - Agent is ready for input
- `assistant_chunk` - Streaming token chunk
- `assistant_message` - Complete LLM response text
- `assistant_stream_end` - Token stream finished
- `tool_call` - Tool being called with arguments
- `tool_output` - Tool execution result
- `tool_log` - Informational tool log message
- `tool_state_change` - Tool execution state transition
- `approval_required` - Requesting user approval for sensitive operations
- `turn_complete` - Agent finished processing
- `error` - Error occurred during processing
- `interrupted` - Agent was interrupted
- `compacted` - Context was compacted
- `undo_complete` - Undo operation completed
- `shutdown` - Agent shutting down

## Development

### Adding Built-in Tools

Edit `agent/core/tools.py`:

```python
def create_builtin_tools() -> list[ToolSpec]:
    return [
        ToolSpec(
            name="your_tool",
            description="What your tool does",
            parameters={
                "type": "object",
                "properties": {
                    "param": {"type": "string", "description": "Parameter description"}
                },
                "required": ["param"]
            },
            handler=your_async_handler
        ),
        # ... existing tools
    ]
```

### Adding MCP Servers

Edit `configs/main_agent_config.json`:

```json
{
  "model_name": "anthropic/claude-sonnet-4-5-20250929",
  "mcpServers": {
    "your-server-name": {
      "transport": "http",
      "url": "https://example.com/mcp",
      "headers": {
        "Authorization": "Bearer ${YOUR_TOKEN}"
      }
    }
  }
}
```

Note: Environment variables like `${YOUR_TOKEN}` are auto-substituted from `.env`.


================================================
FILE: REVIEW.md
================================================
# Review instructions

These rules override the default review guidance. Treat them as the highest-priority
instruction block for any review of this repo. If something here contradicts a more
generic review habit, follow these.

## Severity levels

Every finding carries one of three priority labels:

- **P0** — blocks merge.
- **P1** — worth fixing, not blocking.
- **P2** — informational.

Write labels as plain text (`P0`, `P1`, `P2`) in finding headers. Do not use
emoji or colored markers. Use judgment on what belongs at which level — this
repo does not enumerate P0 cases; read the code and decide.

## Default bias: rigor

Reviews gate merges. This is an open-source repo that takes PRs from anyone; the
maintainer team is small and relies on the review to catch what they don't have
time to verify themselves. **Default bias is rigor, not speed.** When in doubt
on a P0-class concern, investigate further before deciding whether to flag — a
false negative ships a bug to production, a false positive costs the contributor
one round trip.

Rigor is not nitpicking. The P1 cap, "do not report" skip list, and verification
bar all still apply. Rigor means going deep on a small number of real concerns,
not surfacing a large number of shallow ones. Prefer one well-investigated P0
over three speculative P1s.

**Hold the line on P0.** If the author pushes back on a P0 finding without a fix
that actually addresses the root cause, re-state the concern with added
citations. Only accept the pushback if the author points to code or behavior you
missed. Do not soften a P0 because the contributor is polite or new to the repo.

For P1 and P2: if the author defers or pushes back without fixing, accept it
silently — do not re-flag on subsequent commits. P1/P2 are informational; the
author may defer to a follow-up issue at their discretion.

If Claude and the author repeatedly disagree on the same class of finding, the
signal is that REVIEW.md is missing a rule; note it once in the PR summary as
`suggest-rule: <short description>` and stop.

## Investigate before posting

The depth of your analysis determines the strength of your finding. For any
P0-class concern, before writing it up:

- Read the relevant callers and callees, not just the diff. Use Read and Grep
  to open files the diff doesn't touch but the changed code interacts with.
- Trace the full chain end-to-end for routing, auth, and agent-loop findings.
  Cite each hop by `file:line`, not just the suspicious line.
- Check whether the codebase already has an established pattern for this kind
  of change (`grep` for similar call sites, similar tool definitions, similar
  route guards). If the PR introduces a new approach where an established
  pattern exists, flag that — divergence from the existing pattern is usually a
  regression vector even when the new code "works."
- Confirm the specific behavior you're claiming. "This breaks X" must be
  grounded in either the code handling X or a test exercising X, not in
  inference from naming or structure.

A finding you "spotted" by scanning the diff is more likely to be a false
positive than a finding you verified by reading the code around it.

## P1 cap

Report at most **3** P1 findings per review. If you found more, say "plus N
similar items" in the summary. If everything you found is P1 or below, open the
summary with "No blocking issues."

## Re-review convergence

If this PR has already received a Claude review (there is a prior review comment
by the `claude` bot), suppress new P1 findings and post only P0 ones. Do not
re-post P1s that were already flagged on earlier commits. If the author pushed a
fix for a previously flagged issue, acknowledge it in one line rather than
re-flagging.

## Do not report

Anything in these paths — skip entirely:

- `frontend/node_modules/**`, `**/*.lock`, `uv.lock`, `package-lock.json`
- `hf_agent.egg-info/**`, `.ruff_cache/**`, `.pytest_cache/**`, `.venv/**`
- `session_logs/**`, `reports/**`
- Anything under a `gen/` or `generated/` path

Anything speculative — do not post:

- "This might be slow" without a concrete complexity claim tied to a specific
  input size
- Hypothetical race conditions without a concrete interleaving

## Dependency PRs

For PRs whose diff is only a lockfile bump, a `pyproject.toml` change, or a
new dependency, the code rules above don't apply — risks shift to provenance
and framing. Every claim in the title or body (CVE IDs, version numbers,
behavior fixes) must match what the diff actually does, and any new
transitive dep needs justification. A PR that lies in its framing is P0
regardless of whether the code change is safe in isolation.

## Verification bar

Every behavior claim in a finding must cite `file:line`. "This breaks X" is not
actionable without a line reference. If you cannot cite a line, do not post
the finding.

## Summary shape

Open the review body with a single-line tally and an explicit merge verdict, on
two lines:

```
2 P0, 3 P1
Verdict: changes requested
```

Valid verdicts:

- **Verdict: ready to merge** — no P0 findings, contributor can merge as-is
  once any CI passes
- **Verdict: changes requested** — at least one P0 that must be addressed
  before merging
- **Verdict: needs discussion** — a design-level concern the maintainer should
  weigh in on before the contributor iterates (use sparingly)

If it's a clean review, write `LGTM` followed by `Verdict: ready to merge`.

Then a **What I checked** bullet list — one line per major area you examined,
regardless of whether you found anything. This gives the maintainer visible
coverage at a glance and lets them decide whether to spot-check areas you
didn't touch.


================================================
FILE: agent/README.md
================================================
# Agent

Async agent loop with LiteLLM.

## Architecture

**Queue-based async system:**
- Submissions in (user input) → Agent Loop → Events output for possible UI updates
- Session maintains state (context + tools) for possible future Context Engineering
- Handlers operations like (USER_INPUT, INTERRUPT, COMPACT, UNDO, SHUTDOWN) for possible UI control

## Components

| Component | Purpose | Long Term Goal |
|-----------|---------|----------------|
| **`agent_loop.py`** | Core agentic loop: processes user input, calls LLM via LiteLLM, executes tool calls iteratively until completion, emits events | Support parallel tool execution, streaming responses, and advanced reasoning patterns |
| **`session.py`** | Maintains session state and interaction with potential UI (context, config, event queue), handles interrupts, assigns unique session IDs for tracing | Enable plugging in different UIs (CLI, web, API, programmatic etc.) |
| **`tools.py`** | `ToolRouter` manages potential built-in tools (e.g. bash, read_file, write_file which are dummy implementations rn) + MCP tools, converts specs to OpenAI format | Be the place for tools that can be used by the agent. All crazy tool design happens here. |
| **`context_manager/`** | Manages conversation history, very rudimentary context engineering support | Implement intelligent context engineering to keep the agent on track |
| **`config.py`** | Loads JSON config for the agent | Support different configs etc. |
| **`main.py`** | Interactive CLI with async queue architecture (submission→agent, agent→events) (simple way to interact with the agent now)| Serve as reference implementation for other UIs (web, API, programmatic) |


================================================
FILE: agent/__init__.py
================================================
"""
HF Agent - Main agent module
"""

import litellm

# Global LiteLLM behavior — set once at package import so both CLI and
# backend entries share the same config.
#   drop_params: quietly drop unsupported params rather than raising
#   suppress_debug_info: hide the noisy "Give Feedback" banner on errors
#   modify_params: let LiteLLM patch Anthropic's tool-call requirements
#     (synthesize a dummy tool spec when we call completion on a history
#     that contains tool_calls but aren't passing `tools=` — happens
#     during summarization / session seeding).
litellm.drop_params = True
litellm.suppress_debug_info = True
litellm.modify_params = True

from agent.core.agent_loop import submission_loop  # noqa: E402

__all__ = ["submission_loop"]


================================================
FILE: agent/config.py
================================================
import json
import os
import re
from pathlib import Path
from typing import Any, Union

from dotenv import load_dotenv

# Project root: two levels up from this file (agent/config.py -> project root)
_PROJECT_ROOT = Path(__file__).resolve().parent.parent
from fastmcp.mcp_config import (
    RemoteMCPServer,
    StdioMCPServer,
)
from pydantic import BaseModel

# These two are the canonical server config types for MCP servers.
MCPServerConfig = Union[StdioMCPServer, RemoteMCPServer]


class Config(BaseModel):
    """Configuration manager"""

    model_name: str
    mcpServers: dict[str, MCPServerConfig] = {}
    save_sessions: bool = True
    session_dataset_repo: str = "akseljoonas/hf-agent-sessions"
    auto_save_interval: int = 3  # Save every N user turns (0 = disabled)
    yolo_mode: bool = False  # Auto-approve all tool calls without confirmation
    max_iterations: int = 300  # Max LLM calls per agent turn (-1 = unlimited)

    # Permission control parameters
    confirm_cpu_jobs: bool = True
    auto_file_upload: bool = False

    # Reasoning effort *preference* — the ceiling the user wants. The probe
    # on `/model` walks a cascade down from here (``max`` → ``xhigh`` → ``high``
    # → …) and caches per-model what the provider actually accepted in
    # ``Session.model_effective_effort``. Default ``max`` because we'd rather
    # burn tokens thinking than ship a wrong ML recipe; the cascade lands on
    # whichever level the model supports (``high`` for GPT-5 / HF router,
    # ``xhigh`` or ``max`` for Anthropic 4.6 / 4.7). ``None`` = thinking off.
    # Valid values: None | "minimal" | "low" | "medium" | "high" | "xhigh" | "max"
    reasoning_effort: str | None = "max"


def substitute_env_vars(obj: Any) -> Any:
    """
    Recursively substitute environment variables in any data structure.

    Supports ${VAR_NAME} syntax for required variables and ${VAR_NAME:-default} for optional.
    """
    if isinstance(obj, str):
        pattern = r"\$\{([^}:]+)(?::(-)?([^}]*))?\}"

        def replacer(match):
            var_name = match.group(1)
            has_default = match.group(2) is not None
            default_value = match.group(3) if has_default else None

            env_value = os.environ.get(var_name)

            if env_value is not None:
                return env_value
            elif has_default:
                return default_value or ""
            else:
                raise ValueError(
                    f"Environment variable '{var_name}' is not set. "
                    f"Add it to your .env file."
                )

        return re.sub(pattern, replacer, obj)

    elif isinstance(obj, dict):
        return {key: substitute_env_vars(value) for key, value in obj.items()}

    elif isinstance(obj, list):
        return [substitute_env_vars(item) for item in obj]

    return obj


def load_config(config_path: str = "config.json") -> Config:
    """
    Load configuration with environment variable substitution.

    Use ${VAR_NAME} in your JSON for any secret.
    Automatically loads from .env file.
    """
    # Load .env from project root first (so it works from any directory),
    # then CWD .env can override if present
    load_dotenv(_PROJECT_ROOT / ".env")
    load_dotenv(override=False)

    with open(config_path, "r") as f:
        raw_config = json.load(f)

    config_with_env = substitute_env_vars(raw_config)
    return Config.model_validate(config_with_env)


================================================
FILE: agent/context_manager/__init__.py
================================================
"""
Context manager for handling conversation history
"""

from agent.context_manager.manager import ContextManager

__all__ = ["ContextManager"]


================================================
FILE: agent/context_manager/manager.py
================================================
"""
Context management for conversation history
"""

import logging
import os
import zoneinfo
from datetime import datetime
from pathlib import Path
from typing import Any

import yaml
from jinja2 import Template
from litellm import Message, acompletion

from agent.core.prompt_caching import with_prompt_caching

logger = logging.getLogger(__name__)

_HF_WHOAMI_URL = "https://huggingface.co/api/whoami-v2"
_HF_WHOAMI_TIMEOUT = 5  # seconds


def _get_hf_username(hf_token: str | None = None) -> str:
    """Return the HF username for the given token.

    Uses subprocess + curl to avoid Python HTTP client IPv6 issues that
    cause 40+ second hangs (httpx/urllib try IPv6 first which times out
    at OS level before falling back to IPv4 — the "Happy Eyeballs" problem).
    """
    import json
    import subprocess
    import time as _t

    if not hf_token:
        logger.warning("No hf_token provided, using 'unknown' as username")
        return "unknown"

    t0 = _t.monotonic()
    try:
        result = subprocess.run(
            [
                "curl",
                "-s",
                "-4",  # force IPv4
                "-m",
                str(_HF_WHOAMI_TIMEOUT),  # max time
                "-H",
                f"Authorization: Bearer {hf_token}",
                _HF_WHOAMI_URL,
            ],
            capture_output=True,
            text=True,
            timeout=_HF_WHOAMI_TIMEOUT + 2,
        )
        t1 = _t.monotonic()
        if result.returncode == 0 and result.stdout:
            data = json.loads(result.stdout)
            username = data.get("name", "unknown")
            logger.info(f"HF username resolved to '{username}' in {t1 - t0:.2f}s")
            return username
        else:
            logger.warning(
                f"curl whoami failed (rc={result.returncode}) in {t1 - t0:.2f}s"
            )
            return "unknown"
    except Exception as e:
        t1 = _t.monotonic()
        logger.warning(f"HF whoami failed in {t1 - t0:.2f}s: {e}")
        return "unknown"


_COMPACT_PROMPT = (
    "Please provide a concise summary of the conversation above, focusing on "
    "key decisions, the 'why' behind the decisions, problems solved, and "
    "important context needed for developing further. Your summary will be "
    "given to someone who has never worked on this project before and they "
    "will be have to be filled in."
)

# Used when seeding a brand-new session from prior browser-cached messages.
# Here we're writing a note to *ourselves* — so preserve the tool-call trail,
# files produced, and planned next steps in first person. Optimized for
# continuity, not brevity.
_RESTORE_PROMPT = (
    "You're about to be restored into a fresh session with no memory of the "
    "conversation above. Write a first-person note to your future self so "
    "you can continue right where you left off. Include:\n"
    "  • What the user originally asked for and what progress you've made.\n"
    "  • Every tool you called, with arguments and a one-line result summary.\n"
    "  • Any code, files, scripts, or artifacts you produced (with paths).\n"
    "  • Key decisions and the reasoning behind them.\n"
    "  • What you were planning to do next.\n\n"
    "Don't be cute. Be specific. This is the only context you'll have."
)


async def summarize_messages(
    messages: list[Message],
    model_name: str,
    hf_token: str | None = None,
    max_tokens: int = 2000,
    tool_specs: list[dict] | None = None,
    prompt: str = _COMPACT_PROMPT,
) -> tuple[str, int]:
    """Run a summarization prompt against a list of messages.

    ``prompt`` defaults to the compaction prompt (terse, decision-focused).
    Callers seeding a new session after a restart should pass ``_RESTORE_PROMPT``
    instead — it preserves the tool-call trail so the agent can answer
    follow-up questions about what it did.

    Returns ``(summary_text, completion_tokens)``.
    """
    from agent.core.llm_params import _resolve_llm_params

    prompt_messages = list(messages) + [Message(role="user", content=prompt)]
    llm_params = _resolve_llm_params(model_name, hf_token, reasoning_effort="high")
    prompt_messages, tool_specs = with_prompt_caching(
        prompt_messages, tool_specs, llm_params.get("model")
    )
    response = await acompletion(
        messages=prompt_messages,
        max_completion_tokens=max_tokens,
        tools=tool_specs,
        **llm_params,
    )
    summary = response.choices[0].message.content or ""
    completion_tokens = response.usage.completion_tokens if response.usage else 0
    return summary, completion_tokens


class ContextManager:
    """Manages conversation context and message history for the agent"""

    def __init__(
        self,
        model_max_tokens: int = 180_000,
        compact_size: float = 0.1,
        untouched_messages: int = 5,
        tool_specs: list[dict[str, Any]] | None = None,
        prompt_file_suffix: str = "system_prompt_v3.yaml",
        hf_token: str | None = None,
        local_mode: bool = False,
    ):
        self.system_prompt = self._load_system_prompt(
            tool_specs or [],
            prompt_file_suffix="system_prompt_v3.yaml",
            hf_token=hf_token,
            local_mode=local_mode,
        )
        # The model's real input-token ceiling (from litellm.get_model_info).
        # Compaction triggers at _COMPACT_THRESHOLD_RATIO below it — see
        # the compaction_threshold property.
        self.model_max_tokens = model_max_tokens
        self.compact_size = int(model_max_tokens * compact_size)
        # Running count of tokens the last LLM call reported. Drives the
        # compaction gate; updated in add_message() with each response's
        # usage.total_tokens.
        self.running_context_usage = 0
        self.untouched_messages = untouched_messages
        self.items: list[Message] = [Message(role="system", content=self.system_prompt)]

    def _load_system_prompt(
        self,
        tool_specs: list[dict[str, Any]],
        prompt_file_suffix: str = "system_prompt.yaml",
        hf_token: str | None = None,
        local_mode: bool = False,
    ):
        """Load and render the system prompt from YAML file with Jinja2"""
        prompt_file = Path(__file__).parent.parent / "prompts" / f"{prompt_file_suffix}"

        with open(prompt_file, "r") as f:
            prompt_data = yaml.safe_load(f)
            template_str = prompt_data.get("system_prompt", "")

        # Get current date and time
        tz = zoneinfo.ZoneInfo("Europe/Paris")
        now = datetime.now(tz)
        current_date = now.strftime("%d-%m-%Y")
        current_time = now.strftime("%H:%M:%S.%f")[:-3]
        current_timezone = f"{now.strftime('%Z')} (UTC{now.strftime('%z')[:3]}:{now.strftime('%z')[3:]})"

        # Get HF user info from OAuth token
        hf_user_info = _get_hf_username(hf_token)

        template = Template(template_str)
        static_prompt = template.render(
            tools=tool_specs,
            num_tools=len(tool_specs),
        )

        # CLI-specific context for local mode
        if local_mode:
            import os
            cwd = os.getcwd()
            local_context = (
                f"\n\n# CLI / Local mode\n\n"
                f"You are running as a local CLI tool on the user's machine. "
                f"There is NO sandbox — bash, read, write, and edit operate directly "
                f"on the local filesystem.\n\n"
                f"Working directory: {cwd}\n"
                f"Use absolute paths or paths relative to the working directory. "
                f"Do NOT use /app/ paths — that is a sandbox convention that does not apply here.\n"
                f"The sandbox_create tool is NOT available. Run code directly with bash."
            )
            static_prompt += local_context

        return (
            f"{static_prompt}\n\n"
            f"[Session context: Date={current_date}, Time={current_time}, "
            f"Timezone={current_timezone}, User={hf_user_info}, "
            f"Tools={len(tool_specs)}]"
        )

    def add_message(self, message: Message, token_count: int = None) -> None:
        """Add a message to the history"""
        if token_count:
            self.running_context_usage = token_count
        self.items.append(message)

    def get_messages(self) -> list[Message]:
        """Get all messages for sending to LLM.

        Patches any dangling tool_calls (assistant messages with tool_calls
        that have no matching tool-result message) so the LLM API doesn't
        reject the request.
        """
        self._patch_dangling_tool_calls()
        return self.items

    @staticmethod
    def _normalize_tool_calls(msg: Message) -> None:
        """Ensure msg.tool_calls contains proper ToolCall objects, not dicts.

        litellm's Message has validate_assignment=False (Pydantic v2 default),
        so direct attribute assignment (e.g. inside litellm's streaming handler)
        can leave raw dicts.  Re-assigning via the constructor fixes this.
        """
        from litellm import ChatCompletionMessageToolCall as ToolCall

        tool_calls = getattr(msg, "tool_calls", None)
        if not tool_calls:
            return
        needs_fix = any(isinstance(tc, dict) for tc in tool_calls)
        if not needs_fix:
            return
        msg.tool_calls = [
            tc if not isinstance(tc, dict) else ToolCall(**tc) for tc in tool_calls
        ]

    def _patch_dangling_tool_calls(self) -> None:
        """Add stub tool results for any tool_calls that lack a matching result.

        Scans backwards to find the last assistant message with tool_calls,
        which may not be items[-1] if some tool results were already added.
        """
        if not self.items:
            return

        # Find the last assistant message with tool_calls
        assistant_msg = None
        for i in range(len(self.items) - 1, -1, -1):
            msg = self.items[i]
            if getattr(msg, "role", None) == "assistant" and getattr(
                msg, "tool_calls", None
            ):
                assistant_msg = msg
                break
            # Stop scanning once we hit a user message — anything before
            # that belongs to a previous (complete) turn.
            if getattr(msg, "role", None) == "user":
                break

        if not assistant_msg:
            return

        self._normalize_tool_calls(assistant_msg)
        answered_ids = {
            getattr(m, "tool_call_id", None)
            for m in self.items
            if getattr(m, "role", None) == "tool"
        }
        for tc in assistant_msg.tool_calls:
            if tc.id not in answered_ids:
                self.items.append(
                    Message(
                        role="tool",
                        content="Tool was not executed (interrupted or error).",
                        tool_call_id=tc.id,
                        name=tc.function.name,
                    )
                )

    def undo_last_turn(self) -> bool:
        """Remove the last complete turn (user msg + all assistant/tool msgs that follow).

        Pops from the end until the last user message is removed, keeping the
        tool_use/tool_result pairing valid. Never removes the system message.

        Returns True if a user message was found and removed.
        """
        if len(self.items) <= 1:
            return False

        while len(self.items) > 1:
            msg = self.items.pop()
            if getattr(msg, "role", None) == "user":
                return True

        return False

    def truncate_to_user_message(self, user_message_index: int) -> bool:
        """Truncate history to just before the Nth user message (0-indexed).

        Removes that user message and everything after it.
        System message (index 0) is never removed.

        Returns True if the target user message was found and removed.
        """
        count = 0
        for i, msg in enumerate(self.items):
            if i == 0:
                continue  # skip system message
            if getattr(msg, "role", None) == "user":
                if count == user_message_index:
                    self.items = self.items[:i]
                    return True
                count += 1
        return False

    # Compaction fires at 90% of model_max_tokens so there's headroom for
    # the next turn's prompt + response before we actually hit the ceiling.
    _COMPACT_THRESHOLD_RATIO = 0.9

    @property
    def compaction_threshold(self) -> int:
        """Token count at which `compact()` kicks in."""
        return int(self.model_max_tokens * self._COMPACT_THRESHOLD_RATIO)

    @property
    def needs_compaction(self) -> bool:
        return self.running_context_usage > self.compaction_threshold and bool(self.items)

    async def compact(
        self,
        model_name: str,
        tool_specs: list[dict] | None = None,
        hf_token: str | None = None,
    ) -> None:
        """Remove old messages to keep history under target size"""
        if not self.needs_compaction:
            return

        system_msg = (
            self.items[0] if self.items and self.items[0].role == "system" else None
        )

        # Preserve the first user message (task prompt) — never summarize it
        first_user_msg = None
        first_user_idx = 1
        for i in range(1, len(self.items)):
            if getattr(self.items[i], "role", None) == "user":
                first_user_msg = self.items[i]
                first_user_idx = i
                break

        # Don't summarize a certain number of just-preceding messages
        # Walk back to find a user message to make sure we keep an assistant -> user ->
        # assistant general conversation structure
        idx = len(self.items) - self.untouched_messages
        while idx > 1 and self.items[idx].role != "user":
            idx -= 1

        recent_messages = self.items[idx:]
        messages_to_summarize = self.items[first_user_idx + 1:idx]

        # improbable, messages would have to very long
        if not messages_to_summarize:
            return

        summary, completion_tokens = await summarize_messages(
            messages_to_summarize,
            model_name=model_name,
            hf_token=hf_token,
            max_tokens=self.compact_size,
            tool_specs=tool_specs,
            prompt=_COMPACT_PROMPT,
        )
        summarized_message = Message(role="assistant", content=summary)

        # Reconstruct: system + first user msg + summary + recent messages
        head = [system_msg] if system_msg else []
        if first_user_msg:
            head.append(first_user_msg)
        self.items = head + [summarized_message] + recent_messages

        # Count the actual post-compact context — system prompt + first user
        # turn + summary + the preserved tail all contribute, not just the
        # summary. litellm.token_counter uses the model's real tokenizer.
        from litellm import token_counter

        try:
            self.running_context_usage = token_counter(
                model=model_name,
                messages=[m.model_dump() for m in self.items],
            )
        except Exception as e:
            logger.warning("token_counter failed post-compact (%s); falling back to rough estimate", e)
            self.running_context_usage = len(self.system_prompt) // 4 + completion_tokens


================================================
FILE: agent/core/__init__.py
================================================
"""
Core agent implementation
Contains the main agent logic, decision-making, and orchestration
"""

from agent.core.tools import ToolRouter, ToolSpec, create_builtin_tools

__all__ = [
    "ToolRouter",
    "ToolSpec",
    "create_builtin_tools",
]


================================================
FILE: agent/core/agent_loop.py
================================================
"""loop
Main agent implementation with integrated tool system and MCP support
"""

import asyncio
import json
import logging
import os
from dataclasses import dataclass

from litellm import ChatCompletionMessageToolCall, Message, acompletion
from litellm.exceptions import ContextWindowExceededError

from agent.config import Config
from agent.core.doom_loop import check_for_doom_loop
from agent.core.llm_params import _resolve_llm_params
from agent.core.prompt_caching import with_prompt_caching
from agent.core.session import Event, OpType, Session
from agent.core.tools import ToolRouter
from agent.tools.jobs_tool import CPU_FLAVORS

logger = logging.getLogger(__name__)

ToolCall = ChatCompletionMessageToolCall


def _validate_tool_args(tool_args: dict) -> tuple[bool, str | None]:
    """
    Validate tool arguments structure.

    Returns:
        (is_valid, error_message)
    """
    args = tool_args.get("args", {})
    # Sometimes LLM passes args as string instead of dict
    if isinstance(args, str):
        return (
            False,
            f"Tool call error: 'args' must be a JSON object, not a string. You passed: {repr(args)}",
        )
    if not isinstance(args, dict) and args is not None:
        return (
            False,
            f"Tool call error: 'args' must be a JSON object. You passed type: {type(args).__name__}",
        )
    return True, None


def _needs_approval(
    tool_name: str, tool_args: dict, config: Config | None = None
) -> bool:
    """Check if a tool call requires user approval before execution."""
    # Yolo mode: skip all approvals
    if config and config.yolo_mode:
        return False

    # If args are malformed, skip approval (validation error will be shown later)
    args_valid, _ = _validate_tool_args(tool_args)
    if not args_valid:
        return False

    if tool_name == "sandbox_create":
        return True

    if tool_name == "hf_jobs":
        operation = tool_args.get("operation", "")
        if operation not in ["run", "uv", "scheduled run", "scheduled uv"]:
            return False

        # Check if this is a CPU-only job
        # hardware_flavor is at top level of tool_args, not nested in args
        hardware_flavor = (
            tool_args.get("hardware_flavor")
            or tool_args.get("flavor")
            or tool_args.get("hardware")
            or "cpu-basic"
        )
        is_cpu_job = hardware_flavor in CPU_FLAVORS

        if is_cpu_job:
            if config and not config.confirm_cpu_jobs:
                return False
            return True

        return True

    # Check for file upload operations (hf_private_repos or other tools)
    if tool_name == "hf_private_repos":
        operation = tool_args.get("operation", "")
        if operation == "upload_file":
            if config and config.auto_file_upload:
                return False
            return True
        # Other operations (create_repo, etc.) always require approval
        if operation in ["create_repo"]:
            return True

    # hf_repo_files: upload (can overwrite) and delete require approval
    if tool_name == "hf_repo_files":
        operation = tool_args.get("operation", "")
        if operation in ["upload", "delete"]:
            return True

    # hf_repo_git: destructive operations require approval
    if tool_name == "hf_repo_git":
        operation = tool_args.get("operation", "")
        if operation in [
            "delete_branch",
            "delete_tag",
            "merge_pr",
            "create_repo",
            "update_repo",
        ]:
            return True

    return False


# -- LLM retry constants --------------------------------------------------
_MAX_LLM_RETRIES = 3
_LLM_RETRY_DELAYS = [5, 15, 30]  # seconds between retries


def _is_transient_error(error: Exception) -> bool:
    """Return True for errors that are likely transient and worth retrying."""
    err_str = str(error).lower()
    transient_patterns = [
        "timeout", "timed out",
        "429", "rate limit", "rate_limit",
        "503", "service unavailable",
        "502", "bad gateway",
        "500", "internal server error",
        "overloaded", "capacity",
        "connection reset", "connection refused", "connection error",
        "eof", "broken pipe",
    ]
    return any(pattern in err_str for pattern in transient_patterns)


def _is_effort_config_error(error: Exception) -> bool:
    """Catch the two 400s the effort probe also handles — thinking
    unsupported for this model, or the specific effort level invalid.

    This is our safety net for the case where ``/effort`` was changed
    mid-conversation (which clears the probe cache) and the new level
    doesn't work for the current model. We heal the cache and retry once.
    """
    from agent.core.effort_probe import _is_invalid_effort, _is_thinking_unsupported
    return _is_thinking_unsupported(error) or _is_invalid_effort(error)


async def _heal_effort_and_rebuild_params(
    session: Session, error: Exception, llm_params: dict,
) -> dict:
    """Update the session's effort cache based on ``error`` and return new
    llm_params. Called only when ``_is_effort_config_error(error)`` is True.

    Two branches:
      • thinking-unsupported → cache ``None`` for this model, next call
        strips thinking entirely
      • invalid-effort → re-run the full cascade probe; the result lands
        in the cache
    """
    from agent.core.effort_probe import ProbeInconclusive, _is_thinking_unsupported, probe_effort

    model = session.config.model_name
    if _is_thinking_unsupported(error):
        session.model_effective_effort[model] = None
        logger.info("healed: %s doesn't support thinking — stripped", model)
    else:
        try:
            outcome = await probe_effort(
                model, session.config.reasoning_effort, session.hf_token,
            )
            session.model_effective_effort[model] = outcome.effective_effort
            logger.info(
                "healed: %s effort cascade → %s", model, outcome.effective_effort,
            )
        except ProbeInconclusive:
            # Transient during healing — strip thinking for safety, next
            # call will either succeed or surface the real error.
            session.model_effective_effort[model] = None
            logger.info("healed: %s probe inconclusive — stripped", model)

    return _resolve_llm_params(
        model,
        session.hf_token,
        reasoning_effort=session.effective_effort_for(model),
    )


def _friendly_error_message(error: Exception) -> str | None:
    """Return a user-friendly message for known error types, or None to fall back to traceback."""
    err_str = str(error).lower()

    if "authentication" in err_str or "unauthorized" in err_str or "invalid x-api-key" in err_str:
        return (
            "Authentication failed — your API key is missing or invalid.\n\n"
            "To fix this, set the API key for your model provider:\n"
            "  • Anthropic:   export ANTHROPIC_API_KEY=sk-...\n"
            "  • OpenAI:      export OPENAI_API_KEY=sk-...\n"
            "  • HF Router:   export HF_TOKEN=hf_...\n\n"
            "You can also add it to a .env file in the project root.\n"
            "To switch models, use the /model command."
        )

    if "insufficient" in err_str and "credit" in err_str:
        return (
            "Insufficient API credits. Please check your account balance "
            "at your model provider's dashboard."
        )

    if "not supported by provider" in err_str or "no provider supports" in err_str:
        return (
            "The model isn't served by the provider you pinned.\n\n"
            "Drop the ':<provider>' suffix to let the HF router auto-pick a "
            "provider, or use '/model' (no arg) to see which providers host "
            "which models."
        )

    if "model_not_found" in err_str or (
        "model" in err_str
        and ("not found" in err_str or "does not exist" in err_str)
    ):
        return (
            "Model not found. Use '/model' to list suggestions, or paste an "
            "HF model id like 'MiniMaxAI/MiniMax-M2.7'. Availability is shown "
            "when you switch."
        )

    return None


async def _compact_and_notify(session: Session) -> None:
    """Run compaction and send event if context was reduced."""
    cm = session.context_manager
    old_usage = cm.running_context_usage
    logger.debug(
        "Compaction check: usage=%d, max=%d, threshold=%d, needs_compact=%s",
        old_usage, cm.model_max_tokens, cm.compaction_threshold, cm.needs_compaction,
    )
    await cm.compact(
        model_name=session.config.model_name,
        tool_specs=session.tool_router.get_tool_specs_for_llm(),
        hf_token=session.hf_token,
    )
    new_usage = cm.running_context_usage
    if new_usage != old_usage:
        logger.warning(
            "Context compacted: %d -> %d tokens (max=%d, %d messages)",
            old_usage, new_usage, cm.model_max_tokens, len(cm.items),
        )
        await session.send_event(
            Event(
                event_type="compacted",
                data={"old_tokens": old_usage, "new_tokens": new_usage},
            )
        )


async def _cleanup_on_cancel(session: Session) -> None:
    """Kill sandbox processes and cancel HF jobs when the user interrupts."""
    # Kill active sandbox processes
    sandbox = getattr(session, "sandbox", None)
    if sandbox:
        try:
            await asyncio.to_thread(sandbox.kill_all)
            logger.info("Killed sandbox processes on cancel")
        except Exception as e:
            logger.warning("Failed to kill sandbox processes: %s", e)

    # Cancel running HF jobs
    job_ids = list(session._running_job_ids)
    if job_ids:
        from huggingface_hub import HfApi

        api = HfApi(token=session.hf_token)
        for job_id in job_ids:
            try:
                await asyncio.to_thread(api.cancel_job, job_id=job_id)
                logger.info("Cancelled HF job %s on interrupt", job_id)
            except Exception as e:
                logger.warning("Failed to cancel HF job %s: %s", job_id, e)
        session._running_job_ids.clear()


@dataclass
class LLMResult:
    """Result from an LLM call (streaming or non-streaming)."""
    content: str | None
    tool_calls_acc: dict[int, dict]
    token_count: int
    finish_reason: str | None


async def _call_llm_streaming(session: Session, messages, tools, llm_params) -> LLMResult:
    """Call the LLM with streaming, emitting assistant_chunk events."""
    response = None
    _healed_effort = False  # one-shot safety net per call
    messages, tools = with_prompt_caching(messages, tools, llm_params.get("model"))
    for _llm_attempt in range(_MAX_LLM_RETRIES):
        try:
            response = await acompletion(
                messages=messages,
                tools=tools,
                tool_choice="auto",
                stream=True,
                stream_options={"include_usage": True},
                timeout=600,
                **llm_params,
            )
            break
        except ContextWindowExceededError:
            raise
        except Exception as e:
            if not _healed_effort and _is_effort_config_error(e):
                _healed_effort = True
                llm_params = await _heal_effort_and_rebuild_params(session, e, llm_params)
                await session.send_event(Event(
                    event_type="tool_log",
                    data={"tool": "system", "log": "Reasoning effort not supported for this model — adjusting and retrying."},
                ))
                continue
            if _llm_attempt < _MAX_LLM_RETRIES - 1 and _is_transient_error(e):
                _delay = _LLM_RETRY_DELAYS[_llm_attempt]
                logger.warning(
                    "Transient LLM error (attempt %d/%d): %s — retrying in %ds",
                    _llm_attempt + 1, _MAX_LLM_RETRIES, e, _delay,
                )
                await session.send_event(Event(
                    event_type="tool_log",
                    data={"tool": "system", "log": f"LLM connection error, retrying in {_delay}s..."},
                ))
                await asyncio.sleep(_delay)
                continue
            raise

    full_content = ""
    tool_calls_acc: dict[int, dict] = {}
    token_count = 0
    finish_reason = None

    async for chunk in response:
        if session.is_cancelled:
            tool_calls_acc.clear()
            break

        choice = chunk.choices[0] if chunk.choices else None
        if not choice:
            if hasattr(chunk, "usage") and chunk.usage:
                token_count = chunk.usage.total_tokens
            continue

        delta = choice.delta
        if choice.finish_reason:
            finish_reason = choice.finish_reason

        if delta.content:
            full_content += delta.content
            await session.send_event(
                Event(event_type="assistant_chunk", data={"content": delta.content})
            )

        if delta.tool_calls:
            for tc_delta in delta.tool_calls:
                idx = tc_delta.index
                if idx not in tool_calls_acc:
                    tool_calls_acc[idx] = {
                        "id": "", "type": "function",
                        "function": {"name": "", "arguments": ""},
                    }
                if tc_delta.id:
                    tool_calls_acc[idx]["id"] = tc_delta.id
                if tc_delta.function:
                    if tc_delta.function.name:
                        tool_calls_acc[idx]["function"]["name"] += tc_delta.function.name
                    if tc_delta.function.arguments:
                        tool_calls_acc[idx]["function"]["arguments"] += tc_delta.function.arguments

        if hasattr(chunk, "usage") and chunk.usage:
            token_count = chunk.usage.total_tokens

    return LLMResult(
        content=full_content or None,
        tool_calls_acc=tool_calls_acc,
        token_count=token_count,
        finish_reason=finish_reason,
    )


async def _call_llm_non_streaming(session: Session, messages, tools, llm_params) -> LLMResult:
    """Call the LLM without streaming, emit assistant_message at the end."""
    response = None
    _healed_effort = False
    messages, tools = with_prompt_caching(messages, tools, llm_params.get("model"))
    for _llm_attempt in range(_MAX_LLM_RETRIES):
        try:
            response = await acompletion(
                messages=messages,
                tools=tools,
                tool_choice="auto",
                stream=False,
                timeout=600,
                **llm_params,
            )
            break
        except ContextWindowExceededError:
            raise
        except Exception as e:
            if not _healed_effort and _is_effort_config_error(e):
                _healed_effort = True
                llm_params = await _heal_effort_and_rebuild_params(session, e, llm_params)
                await session.send_event(Event(
                    event_type="tool_log",
                    data={"tool": "system", "log": "Reasoning effort not supported for this model — adjusting and retrying."},
                ))
                continue
            if _llm_attempt < _MAX_LLM_RETRIES - 1 and _is_transient_error(e):
                _delay = _LLM_RETRY_DELAYS[_llm_attempt]
                logger.warning(
                    "Transient LLM error (attempt %d/%d): %s — retrying in %ds",
                    _llm_attempt + 1, _MAX_LLM_RETRIES, e, _delay,
                )
                await session.send_event(Event(
                    event_type="tool_log",
                    data={"tool": "system", "log": f"LLM connection error, retrying in {_delay}s..."},
                ))
                await asyncio.sleep(_delay)
                continue
            raise

    choice = response.choices[0]
    message = choice.message
    content = message.content or None
    finish_reason = choice.finish_reason
    token_count = response.usage.total_tokens if response.usage else 0

    # Build tool_calls_acc in the same format as streaming
    tool_calls_acc: dict[int, dict] = {}
    if message.tool_calls:
        for idx, tc in enumerate(message.tool_calls):
            tool_calls_acc[idx] = {
                "id": tc.id,
                "type": "function",
                "function": {
                    "name": tc.function.name,
                    "arguments": tc.function.arguments,
                },
            }

    # Emit the full message as a single event
    if content:
        await session.send_event(
            Event(event_type="assistant_message", data={"content": content})
        )

    return LLMResult(
        content=content,
        tool_calls_acc=tool_calls_acc,
        token_count=token_count,
        finish_reason=finish_reason,
    )


class Handlers:
    """Handler functions for each operation type"""

    @staticmethod
    async def _abandon_pending_approval(session: Session) -> None:
        """Cancel pending approval tools when the user continues the conversation.

        Injects rejection tool-result messages into the LLM context (so the
        history stays valid) and notifies the frontend that those tools were
        abandoned.
        """
        tool_calls = session.pending_approval.get("tool_calls", [])
        for tc in tool_calls:
            tool_name = tc.function.name
            abandon_msg = (
                "Task abandoned — user continued the conversation without approving."
            )

            # Keep LLM context valid: every tool_call needs a tool result
            tool_msg = Message(
                role="tool",
                content=abandon_msg,
                tool_call_id=tc.id,
                name=tool_name,
            )
            session.context_manager.add_message(tool_msg)

            await session.send_event(
                Event(
                    event_type="tool_state_change",
                    data={
                        "tool_call_id": tc.id,
                        "tool": tool_name,
                        "state": "abandoned",
                    },
                )
            )

        session.pending_approval = None
        logger.info("Abandoned %d pending approval tool(s)", len(tool_calls))

    @staticmethod
    async def run_agent(
        session: Session, text: str,
    ) -> str | None:
        """
        Handle user input (like user_input_or_turn in codex.rs:1291)
        Returns the final assistant response content, if any.
        """
        # Clear any stale cancellation flag from a previous run
        session.reset_cancel()

        # If there's a pending approval and the user sent a new message,
        # abandon the pending tools so the LLM context stays valid.
        if text and session.pending_approval:
            await Handlers._abandon_pending_approval(session)

        # Add user message to history only if there's actual content
        if text:
            user_msg = Message(role="user", content=text)
            session.context_manager.add_message(user_msg)

        # Send event that we're processing
        await session.send_event(
            Event(event_type="processing", data={"message": "Processing user input"})
        )

        # Agentic loop - continue until model doesn't call tools or max iterations is reached
        iteration = 0
        final_response = None
        errored = False
        max_iterations = session.config.max_iterations

        while max_iterations == -1 or iteration < max_iterations:
            # ── Cancellation check: before LLM call ──
            if session.is_cancelled:
                break

            # Compact before calling the LLM if context is near the limit
            await _compact_and_notify(session)

            # Doom-loop detection: break out of repeated tool call patterns
            doom_prompt = check_for_doom_loop(session.context_manager.items)
            if doom_prompt:
                session.context_manager.add_message(
                    Message(role="user", content=doom_prompt)
                )
                await session.send_event(
                    Event(
                        event_type="tool_log",
                        data={
                            "tool": "system",
                            "log": "Doom loop detected — injecting corrective prompt",
                        },
                    )
                )

            messages = session.context_manager.get_messages()
            tools = session.tool_router.get_tool_specs_for_llm()
            try:
                # ── Call the LLM (streaming or non-streaming) ──
                # Pull the per-model probed effort from the session cache when
                # available; fall back to the raw preference for models we
                # haven't probed yet (e.g. research sub-model).
                llm_params = _resolve_llm_params(
                    session.config.model_name,
                    session.hf_token,
                    reasoning_effort=session.effective_effort_for(session.config.model_name),
                )
                if session.stream:
                    llm_result = await _call_llm_streaming(session, messages, tools, llm_params)
                else:
                    llm_result = await _call_llm_non_streaming(session, messages, tools, llm_params)

                content = llm_result.content
                tool_calls_acc = llm_result.tool_calls_acc
                token_count = llm_result.token_count
                finish_reason = llm_result.finish_reason

                # If output was truncated, all tool call args are garbage.
                # Inject a system hint so the LLM retries with smaller content.
                if finish_reason == "length" and tool_calls_acc:
                    dropped_names = [
                        tc["function"]["name"]
                        for tc in tool_calls_acc.values()
                        if tc["function"]["name"]
                    ]
                    logger.warning(
                        "Output truncated (finish_reason=length) — dropping tool calls: %s",
                        dropped_names,
                    )
                    tool_calls_acc.clear()

                    # Tell the agent what happened so it can retry differently
                    truncation_hint = (
                        "Your previous response was truncated because the output hit the "
                        "token limit. The following tool calls were lost: "
                        f"{dropped_names}. "
                        "IMPORTANT: Do NOT retry with the same large content. Instead:\n"
                        "  • For 'write': use bash with cat<<'HEREDOC' to write the file, "
                        "or split into several smaller edit calls.\n"
                        "  • For other tools: reduce the size of your arguments or use bash."
                    )
                    if content:
                        assistant_msg = Message(role="assistant", content=content)
                        session.context_manager.add_message(assistant_msg, token_count)
                    session.context_manager.add_message(
                        Message(role="user", content=f"[SYSTEM: {truncation_hint}]")
                    )
                    if session.stream:
                        await session.send_event(
                            Event(event_type="assistant_stream_end", data={})
                        )
                    await session.send_event(
                        Event(
                            event_type="tool_log",
                            data={"tool": "system", "log": f"Output truncated — retrying with smaller content ({dropped_names})"},
                        )
                    )
                    iteration += 1
                    continue  # retry this iteration

                # Build tool_calls list from accumulated deltas
                tool_calls: list[ToolCall] = []
                for idx in sorted(tool_calls_acc.keys()):
                    tc_data = tool_calls_acc[idx]
                    tool_calls.append(
                        ToolCall(
                            id=tc_data["id"],
                            type="function",
                            function={
                                "name": tc_data["function"]["name"],
                                "arguments": tc_data["function"]["arguments"],
                            },
                        )
                    )

                # Signal end of streaming to the frontend
                if session.stream:
                    await session.send_event(
                        Event(event_type="assistant_stream_end", data={})
                    )

                # If no tool calls, add assistant message and we're done
                if not tool_calls:
                    logger.debug(
                        "Agent loop ending: no tool calls. "
                        "finish_reason=%s, token_count=%d, "
                        "usage=%d, model_max_tokens=%d, "
                        "iteration=%d/%d, "
                        "response_text=%s",
                        finish_reason,
                        token_count,
                        session.context_manager.running_context_usage,
                        session.context_manager.model_max_tokens,
                        iteration,
                        max_iterations,
                        (content or "")[:500],
                    )
                    if content:
                        assistant_msg = Message(role="assistant", content=content)
                        session.context_manager.add_message(assistant_msg, token_count)
                        final_response = content
                    break

                # Validate tool call args (one json.loads per call, once)
                # and split into good vs bad
                good_tools: list[tuple[ToolCall, str, dict]] = []
                bad_tools: list[ToolCall] = []
                for tc in tool_calls:
                    try:
                        args = json.loads(tc.function.arguments)
                        good_tools.append((tc, tc.function.name, args))
                    except (json.JSONDecodeError, TypeError, ValueError):
                        logger.warning(
                            "Malformed arguments for tool_call %s (%s) — skipping",
                            tc.id, tc.function.name,
                        )
                        tc.function.arguments = "{}"
                        bad_tools.append(tc)

                # Add assistant message with all tool calls to context
                assistant_msg = Message(
                    role="assistant",
                    content=content,
                    tool_calls=tool_calls,
                )
                session.context_manager.add_message(assistant_msg, token_count)

                # Add error results for bad tool calls so the LLM
                # knows what happened and can retry differently
                for tc in bad_tools:
                    error_msg = (
                        f"ERROR: Tool call to '{tc.function.name}' had malformed JSON "
                        f"arguments and was NOT executed. Retry with smaller content — "
                        f"for 'write', split into multiple smaller writes using 'edit'."
                    )
                    session.context_manager.add_message(Message(
                        role="tool",
                        content=error_msg,
                        tool_call_id=tc.id,
                        name=tc.function.name,
                    ))
                    await session.send_event(Event(
                        event_type="tool_call",
                        data={"tool": tc.function.name, "arguments": {}, "tool_call_id": tc.id},
                    ))
                    await session.send_event(Event(
                        event_type="tool_output",
                        data={"tool": tc.function.name, "tool_call_id": tc.id, "output": error_msg, "success": False},
                    ))

                # ── Cancellation check: before tool execution ──
                if session.is_cancelled:
                    break

                # Separate good tools into approval-required vs auto-execute
                approval_required_tools: list[tuple[ToolCall, str, dict]] = []
                non_approval_tools: list[tuple[ToolCall, str, dict]] = []
                for tc, tool_name, tool_args in good_tools:
                    if _needs_approval(tool_name, tool_args, session.config):
                        approval_required_tools.append((tc, tool_name, tool_args))
                    else:
                        non_approval_tools.append((tc, tool_name, tool_args))

                # Execute non-approval tools (in parallel when possible)
                if non_approval_tools:
                    # 1. Validate args upfront
                    parsed_tools: list[
                        tuple[ToolCall, str, dict, bool, str]
                    ] = []
                    for tc, tool_name, tool_args in non_approval_tools:
                        args_valid, error_msg = _validate_tool_args(tool_args)
                        parsed_tools.append(
                            (tc, tool_name, tool_args, args_valid, error_msg)
                        )

                    # 2. Send all tool_call events upfront (so frontend shows them all)
                    for tc, tool_name, tool_args, args_valid, _ in parsed_tools:
                        if args_valid:
                            await session.send_event(
                                Event(
                                    event_type="tool_call",
                                    data={
                                        "tool": tool_name,
                                        "arguments": tool_args,
                                        "tool_call_id": tc.id,
                                    },
                                )
                            )

                    # 3. Execute all valid tools in parallel, cancellable
                    async def _exec_tool(
                        tc: ToolCall,
                        name: str,
                        args: dict,
                        valid: bool,
                        err: str,
                    ) -> tuple[ToolCall, str, dict, str, bool]:
                        if not valid:
                            return (tc, name, args, err, False)
                        out, ok = await session.tool_router.call_tool(
                            name, args, session=session, tool_call_id=tc.id
                        )
                        return (tc, name, args, out, ok)

                    gather_task = asyncio.ensure_future(asyncio.gather(
                        *[
                            _exec_tool(tc, name, args, valid, err)
                            for tc, name, args, valid, err in parsed_tools
                        ]
                    ))
                    cancel_task = asyncio.ensure_future(session._cancelled.wait())

                    done, _ = await asyncio.wait(
                        [gather_task, cancel_task],
                        return_when=asyncio.FIRST_COMPLETED,
                    )

                    if cancel_task in done:
                        gather_task.cancel()
                        try:
                            await gather_task
                        except asyncio.CancelledError:
                            pass
                        # Notify frontend that in-flight tools were cancelled
                        for tc, name, _args, valid, _ in parsed_tools:
                            if valid:
                                await session.send_event(Event(
                                    event_type="tool_state_change",
                                    data={"tool_call_id": tc.id, "tool": name, "state": "cancelled"},
                                ))
                        await _cleanup_on_cancel(session)
                        break

                    cancel_task.cancel()
                    results = gather_task.result()

                    # 4. Record results and send outputs (order preserved)
                    for tc, tool_name, tool_args, output, success in results:
                        tool_msg = Message(
                            role="tool",
                            content=output,
                            tool_call_id=tc.id,
                            name=tool_name,
                        )
                        session.context_manager.add_message(tool_msg)

                        await session.send_event(
                            Event(
                                event_type="tool_output",
                                data={
                                    "tool": tool_name,
                                    "tool_call_id": tc.id,
                                    "output": output,
                                    "success": success,
                                },
                            )
                        )

                # If there are tools requiring approval, ask for batch approval
                if approval_required_tools:
                    # Prepare batch approval data
                    tools_data = []
                    for tc, tool_name, tool_args in approval_required_tools:
                        # Resolve sandbox file paths for hf_jobs scripts so the
                        # frontend can display & edit the actual file content.
                        if tool_name == "hf_jobs" and isinstance(tool_args.get("script"), str):
                            from agent.tools.sandbox_tool import resolve_sandbox_script
                            sandbox = getattr(session, "sandbox", None)
                            resolved, _ = await resolve_sandbox_script(sandbox, tool_args["script"])
                            if resolved:
                                tool_args = {**tool_args, "script": resolved}

                        tools_data.append({
                            "tool": tool_name,
                            "arguments": tool_args,
                            "tool_call_id": tc.id,
                        })

                    await session.send_event(Event(
                        event_type="approval_required",
                        data={"tools": tools_data, "count": len(tools_data)},
                    ))

                    # Store all approval-requiring tools (ToolCall objects for execution)
                    session.pending_approval = {
                        "tool_calls": [tc for tc, _, _ in approval_required_tools],
                    }

                    # Return early - wait for EXEC_APPROVAL operation
                    return None

                iteration += 1

            except ContextWindowExceededError:
                # Force compact and retry this iteration
                cm = session.context_manager
                logger.warning(
                    "ContextWindowExceededError at iteration %d — forcing compaction "
                    "(usage=%d, model_max_tokens=%d, messages=%d)",
                    iteration, cm.running_context_usage, cm.model_max_tokens, len(cm.items),
                )
                cm.running_context_usage = cm.model_max_tokens + 1
                await _compact_and_notify(session)
                continue

            except Exception as e:
                import traceback

                error_msg = _friendly_error_message(e)
                if error_msg is None:
                    error_msg = str(e) + "\n" + traceback.format_exc()

                await session.send_event(
                    Event(
                        event_type="error",
                        data={"error": error_msg},
                    )
                )
                errored = True
                break

        if session.is_cancelled:
            await _cleanup_on_cancel(session)
            await session.send_event(Event(event_type="interrupted"))
        elif not errored:
            await session.send_event(
                Event(
                    event_type="turn_complete",
                    data={"history_size": len(session.context_manager.items)},
                )
            )

        # Increment turn counter and check for auto-save
        session.increment_turn()
        await session.auto_save_if_needed()

        return final_response

    @staticmethod
    async def undo(session: Session) -> None:
        """Remove the last complete turn and notify the frontend."""
        removed = session.context_manager.undo_last_turn()
        if not removed:
            logger.warning("Undo: no user message found to remove")
        await session.send_event(Event(event_type="undo_complete"))

    @staticmethod
    async def exec_approval(session: Session, approvals: list[dict]) -> None:
        """Handle batch job execution approval"""
        if not session.pending_approval:
            await session.send_event(
                Event(
                    event_type="error",
                    data={"error": "No pending approval to process"},
                )
            )
            return

        tool_calls = session.pending_approval.get("tool_calls", [])
        if not tool_calls:
            await session.send_event(
                Event(
                    event_type="error",
                    data={"error": "No pending tool calls found"},
                )
            )
            return

        # Create a map of tool_call_id -> approval decision
        approval_map = {a["tool_call_id"]: a for a in approvals}
        for a in approvals:
            if a.get("edited_script"):
                logger.info(
                    f"Received edited script for tool_call {a['tool_call_id']} ({len(a['edited_script'])} chars)"
                )

        # Separate approved and rejected tool calls
        approved_tasks = []
        rejected_tasks = []

        for tc in tool_calls:
            tool_name = tc.function.name
            try:
                tool_args = json.loads(tc.function.arguments)
            except (json.JSONDecodeError, TypeError) as e:
                # Malformed arguments — treat as failed, notify agent
                logger.warning(f"Malformed tool arguments for {tool_name}: {e}")
                tool_msg = Message(
                    role="tool",
                    content=f"Malformed arguments: {e}",
                    tool_call_id=tc.id,
                    name=tool_name,
                )
                session.context_manager.add_message(tool_msg)
                await session.send_event(
                    Event(
                        event_type="tool_output",
                        data={
                            "tool": tool_name,
                            "tool_call_id": tc.id,
                            "output": f"Malformed arguments: {e}",
                            "success": False,
                        },
                    )
                )
                continue

            approval_decision = approval_map.get(tc.id, {"approved": False})

            if approval_decision.get("approved", False):
                edited_script = approval_decision.get("edited_script")
                was_edited = False
                if edited_script and "script" in tool_args:
                    tool_args["script"] = edited_script
                    was_edited = True
                    logger.info(f"Using user-edited script for {tool_name} ({tc.id})")
                approved_tasks.append((tc, tool_name, tool_args, was_edited))
            else:
                rejected_tasks.append((tc, tool_name, approval_decision))

        # Clear pending approval immediately so a page refresh during
        # execution won't re-show the approval dialog.
        session.pending_approval = None

        # Notify frontend of approval decisions immediately (before execution)
        for tc, tool_name, tool_args, _was_edited in approved_tasks:
            await session.send_event(
                Event(
                    event_type="tool_state_change",
                    data={
                        "tool_call_id": tc.id,
                        "tool": tool_name,
                        "state": "approved",
                    },
                )
            )
        for tc, tool_name, approval_decision in rejected_tasks:
            await session.send_event(
                Event(
                    event_type="tool_state_change",
                    data={
                        "tool_call_id": tc.id,
                        "tool": tool_name,
                        "state": "rejected",
                    },
                )
            )

        # Execute all approved tools concurrently
        async def execute_tool(tc, tool_name, tool_args, was_edited):
            """Execute a single tool and return its result.

            The TraceLog already exists on the frontend (created by
            approval_required), so we send tool_state_change instead of
            tool_call to avoid creating a duplicate.
            """
            await session.send_event(
                Event(
                    event_type="tool_state_change",
                    data={
                        "tool_call_id": tc.id,
                        "tool": tool_name,
                        "state": "running",
                    },
                )
            )

            output, success = await session.tool_router.call_tool(
                tool_name, tool_args, session=session, tool_call_id=tc.id
            )

            return (tc, tool_name, output, success, was_edited)

        # Execute all approved tools concurrently (cancellable)
        if approved_tasks:
            gather_task = asyncio.ensure_future(asyncio.gather(
                *[
                    execute_tool(tc, tool_name, tool_args, was_edited)
                    for tc, tool_name, tool_args, was_edited in approved_tasks
                ],
                return_exceptions=True,
            ))
            cancel_task = asyncio.ensure_future(session._cancelled.wait())

            done, _ = await asyncio.wait(
                [gather_task, cancel_task],
                return_when=asyncio.FIRST_COMPLETED,
            )

            if cancel_task in done:
                gather_task.cancel()
                try:
                    await gather_task
                except asyncio.CancelledError:
                    pass
                # Notify frontend that approved tools were cancelled
                for tc, tool_name, _args, _was_edited in approved_tasks:
                    await session.send_event(Event(
                        event_type="tool_state_change",
                        data={"tool_call_id": tc.id, "tool": tool_name, "state": "cancelled"},
                    ))
                await _cleanup_on_cancel(session)
                await session.send_event(Event(event_type="interrupted"))
                session.increment_turn()
                await session.auto_save_if_needed()
                return

            cancel_task.cancel()
            results = gather_task.result()

            # Process results and add to context
            for result in results:
                if isinstance(result, Exception):
                    # Handle execution error
                    logger.error(f"Tool execution error: {result}")
                    continue

                tc, tool_name, output, success, was_edited = result

                if was_edited:
                    output = f"[Note: The user edited the script before execution. The output below reflects the user-modified version, not your original script.]\n\n{output}"

                # Add tool result to context
                tool_msg = Message(
                    role="tool",
                    content=output,
                    tool_call_id=tc.id,
                    name=tool_name,
                )
                session.context_manager.add_message(tool_msg)

                await session.send_event(
                    Event(
                        event_type="tool_output",
                        data={
                            "tool": tool_name,
                            "tool_call_id": tc.id,
                            "output": output,
                            "success": success,
                        },
                    )
                )

        # Process rejected tools
        for tc, tool_name, approval_decision in rejected_tasks:
            rejection_msg = "Job execution cancelled by user"
            user_feedback = approval_decision.get("feedback")
            if user_feedback:
                # Ensure feedback is a string and sanitize any problematic characters
                feedback_str = str(user_feedback).strip()
                # Remove any control characters that might break JSON parsing
                feedback_str = "".join(
                    char for char in feedback_str if ord(char) >= 32 or char in "\n\t"
                )
                rejection_msg += f". User feedback: {feedback_str}"

            # Ensure rejection_msg is a clean string
            rejection_msg = str(rejection_msg).strip()

            tool_msg = Message(
                role="tool",
                content=rejection_msg,
                tool_call_id=tc.id,
                name=tool_name,
            )
            session.context_manager.add_message(tool_msg)

            await session.send_event(
                Event(
                    event_type="tool_output",
                    data={
                        "tool": tool_name,
                        "tool_call_id": tc.id,
                        "output": rejection_msg,
                        "success": False,
                    },
                )
            )

        # Continue agent loop with empty input to process the tool results
        await Handlers.run_agent(session, "")

    @staticmethod
    async def shutdown(session: Session) -> bool:
        """Handle shutdown (like shutdown in codex.rs:1329)"""
        # Save session trajectory if enabled (fire-and-forget, returns immediately)
        if session.config.save_sessions:
            logger.info("Saving session...")
            repo_id = session.config.session_dataset_repo
            _ = session.save_and_upload_detached(repo_id)

        session.is_running = False
        await session.send_event(Event(event_type="shutdown"))
        return True


async def process_submission(session: Session, submission) -> bool:
    """
    Process a single submission and return whether to continue running.

    Returns:
        bool: True to continue, False to shutdown
    """
    op = submission.operation
    logger.debug("Received operation: %s", op.op_type.value)

    if op.op_type == OpType.USER_INPUT:
        text = op.data.get("text", "") if op.data else ""
        await Handlers.run_agent(session, text)
        return True

    if op.op_type == OpType.COMPACT:
        await _compact_and_notify(session)
        return True

    if op.op_type == OpType.UNDO:
        await Handlers.undo(session)
        return True

    if op.op_type == OpType.EXEC_APPROVAL:
        approvals = op.data.get("approvals", []) if op.data else []
        await Handlers.exec_approval(session, approvals)
        return True

    if op.op_type == OpType.SHUTDOWN:
        return not await Handlers.shutdown(session)

    logger.warning(f"Unknown operation: {op.op_type}")
    return True


async def submission_loop(
    submission_queue: asyncio.Queue,
    event_queue: asyncio.Queue,
    config: Config | None = None,
    tool_router: ToolRouter | None = None,
    session_holder: list | None = None,
    hf_token: str | None = None,
    local_mode: bool = False,
    stream: bool = True,
) -> None:
    """
    Main agent loop - processes submissions and dispatches to handlers.
    This is the core of the agent (like submission_loop in codex.rs:1259-1340)
    """

    # Create session with tool router
    session = Session(
        event_queue, config=config, tool_router=tool_router, hf_token=hf_token,
        local_mode=local_mode, stream=stream,
    )
    if session_holder is not None:
        session_holder[0] = session
    logger.info("Agent loop started")

    # Retry any failed uploads from previous sessions (fire-and-forget)
    if config and config.save_sessions:
        Session.retry_failed_uploads_detached(
            directory="session_logs", repo_id=config.session_dataset_repo
        )

    try:
        # Main processing loop
        async with tool_router:
            # Emit ready event after initialization
            await session.send_event(
                Event(event_type="ready", data={
                    "message": "Agent initialized",
                    "tool_count": len(tool_router.tools),
                })
            )

            while session.is_running:
                submission = await submission_queue.get()

                try:
                    should_continue = await process_submission(session, submission)
                    if not should_continue:
                        break
                except asyncio.CancelledError:
                    logger.warning("Agent loop cancelled")
                    break
                except Exception as e:
                    logger.error(f"Error in agent loop: {e}")
                    await session.send_event(
                        Event(event_type="error", data={"error": str(e)})
                    )

        logger.info("Agent loop exited")

    finally:
        # Emergency save if session saving is enabled and shutdown wasn't called properly
        if session.config.save_sessions and session.is_running:
            logger.info("Emergency save: preserving session before exit...")
            try:
                local_path = session.save_and_upload_detached(
                    session.config.session_dataset_repo
                )
                if local_path:
                    logger.info("Emergency save successful, upload in progress")
            except Exception as e:
                logger.error(f"Emergency save failed: {e}")


================================================
FILE: agent/core/doom_loop.py
================================================
"""
Doom-loop detection for repeated tool call patterns.

Detects when the agent is stuck calling the same tools repeatedly
and injects a corrective prompt to break the cycle.
"""

import hashlib
import json
import logging
from dataclasses import dataclass

from litellm import Message

logger = logging.getLogger(__name__)


@dataclass(frozen=True)
class ToolCallSignature:
    """Hashable signature for a single tool call (name + args hash)."""

    name: str
    args_hash: str


def _hash_args(args_str: str) -> str:
    """Return a short hash of the JSON arguments string."""
    return hashlib.md5(args_str.encode()).hexdigest()[:12]


def extract_recent_tool_signatures(
    messages: list[Message], lookback: int = 30
) -> list[ToolCallSignature]:
    """Extract tool call signatures from recent assistant messages."""
    signatures: list[ToolCallSignature] = []
    recent = messages[-lookback:] if len(messages) > lookback else messages

    for msg in recent:
        if getattr(msg, "role", None) != "assistant":
            continue
        tool_calls = getattr(msg, "tool_calls", None)
        if not tool_calls:
            continue
        for tc in tool_calls:
            fn = getattr(tc, "function", None)
            if not fn:
                continue
            name = getattr(fn, "name", "") or ""
            args_str = getattr(fn, "arguments", "") or ""
            signatures.append(ToolCallSignature(name=name, args_hash=_hash_args(args_str)))

    return signatures


def detect_identical_consecutive(
    signatures: list[ToolCallSignature], threshold: int = 3
) -> str | None:
    """Return the tool name if threshold+ identical consecutive calls are found."""
    if len(signatures) < threshold:
        return None

    count = 1
    for i in range(1, len(signatures)):
        if signatures[i] == signatures[i - 1]:
            count += 1
            if count >= threshold:
                return signatures[i].name
        else:
            count = 1

    return None


def detect_repeating_sequence(
    signatures: list[ToolCallSignature],
) -> list[ToolCallSignature] | None:
    """Detect repeating patterns like [A,B,A,B] for sequences of length 2-5 with 2+ reps."""
    n = len(signatures)
    for seq_len in range(2, 6):
        min_required = seq_len * 2
        if n < min_required:
            continue

        # Check the tail of the signatures list
        tail = signatures[-min_required:]
        pattern = tail[:seq_len]

        # Count how many full repetitions from the end
        reps = 0
        for start in range(n - seq_len, -1, -seq_len):
            chunk = signatures[start : start + seq_len]
            if chunk == pattern:
                reps += 1
            else:
                break

        if reps >= 2:
            return pattern

    return None


def check_for_doom_loop(messages: list[Message]) -> str | None:
    """Check for doom loop patterns. Returns a corrective prompt or None."""
    signatures = extract_recent_tool_signatures(messages, lookback=30)
    if len(signatures) < 3:
        return None

    # Check for identical consecutive calls
    tool_name = detect_identical_consecutive(signatures, threshold=3)
    if tool_name:
        logger.warning("Doom loop detected: %d+ identical consecutive calls to '%s'", 3, tool_name)
        return (
            f"[SYSTEM: DOOM LOOP DETECTED] You have called '{tool_name}' with the same "
            f"arguments multiple times in a row, getting the same result each time. "
            f"STOP repeating this approach — it is not working. "
            f"Step back and try a fundamentally different strategy. "
            f"Consider: using a different tool, changing your arguments significantly, "
            f"or explaining to the user what you're stuck on and asking for guidance."
        )

    # Check for repeating sequences
    pattern = detect_repeating_sequence(signatures)
    if pattern:
        pattern_desc = " → ".join(s.name for s in pattern)
        logger.warning("Doom loop detected: repeating sequence [%s]", pattern_desc)
        return (
            f"[SYSTEM: DOOM LOOP DETECTED] You are stuck in a repeating cycle of tool calls: "
            f"[{pattern_desc}]. This pattern has repeated multiple times without progress. "
            f"STOP this cycle and try a fundamentally different approach. "
            f"Consider: breaking down the problem differently, using alternative tools, "
            f"or explaining to the user what you're stuck on and asking for guidance."
        )

    return None


================================================
FILE: agent/core/effort_probe.py
================================================
"""Probe-and-cascade for reasoning effort on /model switch.

We don't maintain a per-model capability table. Instead, the first time a
user picks a model we fire a 1-token ping with the same params we'd use
for real and walk down a cascade (``max`` → ``xhigh`` → ``high`` → …)
until the provider stops rejecting us. The result is cached per-model on
the session, so real messages don't pay the probe cost again.

Three outcomes, classified from the 400 error text:

* success → cache the effort that worked
* ``"thinking ... not supported"`` → model doesn't do thinking at all;
  cache ``None`` so we stop sending thinking params
* ``"effort ... invalid"`` / synonyms → cascade walks down and retries

Transient errors (5xx, timeout, connection reset) bubble out as
``ProbeInconclusive`` so the caller can complete the switch with a
warning instead of blocking on a flaky provider.
"""

from __future__ import annotations

import asyncio
import logging
from dataclasses import dataclass

from litellm import acompletion

from agent.core.llm_params import UnsupportedEffortError, _resolve_llm_params

logger = logging.getLogger(__name__)


# Cascade: for each user-stated preference, the ordered list of levels to
# try. First success wins. ``max`` / ``xhigh`` are Anthropic-only; providers
# that don't accept them raise ``UnsupportedEffortError`` synchronously (no
# wasted network round-trip) and we advance to the next level.
_EFFORT_CASCADE: dict[str, list[str]] = {
    "max":     ["max", "xhigh", "high", "medium", "low"],
    "xhigh":   ["xhigh", "high", "medium", "low"],
    "high":    ["high", "medium", "low"],
    "medium":  ["medium", "low"],
    "minimal": ["minimal", "low"],
    "low":     ["low"],
}

_PROBE_TIMEOUT = 15.0
_PROBE_MAX_TOKENS = 16


class ProbeInconclusive(Exception):
    """The probe couldn't reach a verdict (transient network / provider error).

    Caller should complete the switch with a warning — the next real call
    will re-surface the error if it's persistent.
    """


@dataclass
class ProbeOutcome:
    """What the probe learned. ``effective_effort`` semantics match the cache:

    * str → send this level
    * None → model doesn't support thinking; strip it
    """
    effective_effort: str | None
    attempts: int
    elapsed_ms: int
    note: str | None = None  # e.g. "max not supported, falling back"


def _is_thinking_unsupported(e: Exception) -> bool:
    """Model rejected any thinking config.

    Matches Anthropic's 'thinking.type.enabled is not supported for this
    model' as well as the adaptive variant. Substring-match because the
    exact wording shifts across API versions.
    """
    s = str(e).lower()
    return "thinking" in s and "not supported" in s


def _is_invalid_effort(e: Exception) -> bool:
    """The requested effort level isn't accepted for this model.

    Covers both API responses (Anthropic/OpenAI 400 with "invalid", "must
    be one of", etc.) and LiteLLM's local validation that fires *before*
    the request (e.g. "effort='max' is only supported by Claude Opus 4.6"
    — LiteLLM knows max is Opus-4.6-only and raises synchronously). The
    cascade walks down on either.

    Explicitly returns False when the message is really about thinking
    itself (e.g. Anthropic's 4.7 error mentions ``output_config.effort``
    in its fix hint, but the actual failure is ``thinking.type.enabled``
    being unsupported). That case is caught by ``_is_thinking_unsupported``.
    """
    if _is_thinking_unsupported(e):
        return False
    s = str(e).lower()
    if "effort" not in s and "output_config" not in s:
        return False
    return any(
        phrase in s
        for phrase in (
            "invalid", "not supported", "must be one of", "not a valid",
            "unrecognized", "unknown",
            # LiteLLM's own pre-flight validation phrasing.
            "only supported by", "is only supported",
        )
    )


def _is_transient(e: Exception) -> bool:
    """Network / provider-side flake. Keep in sync with agent_loop's list.

    Also matches by type for ``asyncio.TimeoutError`` — its ``str(e)`` is
    empty, so substring matching alone misses it.
    """
    if isinstance(e, (asyncio.TimeoutError, TimeoutError)):
        return True
    s = str(e).lower()
    return any(
        p in s
        for p in (
            "timeout", "timed out", "429", "rate limit",
            "503", "service unavailable", "502", "bad gateway",
            "500", "internal server error", "overloaded", "capacity",
            "connection reset", "connection refused", "connection error",
            "eof", "broken pipe",
        )
    )


async def probe_effort(
    model_name: str,
    preference: str | None,
    hf_token: str | None,
) -> ProbeOutcome:
    """Walk the cascade for ``preference`` on ``model_name``.

    Returns the first effort the provider accepts, or ``None`` if it
    rejects thinking altogether. Raises ``ProbeInconclusive`` only for
    transient errors (5xx, timeout) — persistent 4xx that aren't thinking/
    effort related bubble as the original exception so callers can surface
    them (auth, model-not-found, quota, etc.).
    """
    loop = asyncio.get_event_loop()
    start = loop.time()
    attempts = 0

    if not preference:
        # User explicitly turned effort off — nothing to probe. A bare
        # ping with no thinking params is pointless; just report "off".
        return ProbeOutcome(effective_effort=None, attempts=0, elapsed_ms=0)

    cascade = _EFFORT_CASCADE.get(preference, [preference])
    skipped: list[str] = []  # levels the provider rejected synchronously

    last_error: Exception | None = None
    for effort in cascade:
        try:
            params = _resolve_llm_params(
                model_name, hf_token, reasoning_effort=effort, strict=True,
            )
        except UnsupportedEffortError:
            # Provider can't even accept this effort name (e.g. "max" on
            # HF router). Skip without a network call.
            skipped.append(effort)
            continue

        attempts += 1
        try:
            await asyncio.wait_for(
                acompletion(
                    messages=[{"role": "user", "content": "ping"}],
                    max_tokens=_PROBE_MAX_TOKENS,
                    stream=False,
                    **params,
                ),
                timeout=_PROBE_TIMEOUT,
            )
        except Exception as e:
            last_error = e
            if _is_thinking_unsupported(e):
                elapsed = int((loop.time() - start) * 1000)
                return ProbeOutcome(
                    effective_effort=None,
                    attempts=attempts,
                    elapsed_ms=elapsed,
                    note="model doesn't support reasoning, dropped",
                )
            if _is_invalid_effort(e):
                logger.debug("probe: %s rejected effort=%s, trying next", model_name, effort)
                continue
            if _is_transient(e):
                raise ProbeInconclusive(str(e)) from e
            # Persistent non-thinking 4xx (auth, quota, model-not-found) —
            # let the caller classify & surface.
            raise
        else:
            elapsed = int((loop.time() - start) * 1000)
            note = None
            if effort != preference:
                note = f"{preference} not supported, using {effort}"
            return ProbeOutcome(
                effective_effort=effort,
                attempts=attempts,
                elapsed_ms=elapsed,
                note=note,
            )

    # Cascade exhausted without a success. This only happens when every
    # level was either rejected synchronously (``UnsupportedEffortError``,
    # e.g. preference=max on HF and we also somehow filtered all others)
    # or the provider 400'd ``invalid effort`` on every level.
    elapsed = int((loop.time() - start) * 1000)
    if last_error is not None and not _is_invalid_effort(last_error):
        raise last_error
    note = (
        "no effort level accepted — proceeding without thinking"
        if not skipped
        else f"provider rejected all efforts ({', '.join(skipped)})"
    )
    return ProbeOutcome(
        effective_effort=None,
        attempts=attempts,
        elapsed_ms=elapsed,
        note=note,
    )


================================================
FILE: agent/core/hf_router_catalog.py
================================================
"""Fetch and cache the HF Inference Router model catalog.

The router exposes an OpenAI-compatible listing at
``https://router.huggingface.co/v1/models`` with per-provider availability,
pricing, context length, and tool-use support. We use it to:

  • Validate ``/model`` switches with live data instead of a hard-coded allowlist.
  • Show the user which providers serve a model, at what price, and whether they
    support tool calls.
  • Derive a reasonable context-window limit for any routed model.

The listing is cached in-memory for a few minutes so repeated lookups during a
session are free. On fetch failure we return stale data if we have it, or an
empty catalog otherwise.
"""

import logging
import time
from dataclasses import dataclass
from difflib import get_close_matches
from typing import Optional

import httpx

logger = logging.getLogger(__name__)

_CATALOG_URL = "https://router.huggingface.co/v1/models"
_CACHE_TTL_SECONDS = 300
_HTTP_TIMEOUT_SECONDS = 5.0

_cache: Optional[dict] = None
_cache_time: float = 0.0


@dataclass
class ProviderInfo:
    provider: str
    status: str
    context_length: Optional[int]
    input_price: Optional[float]
    output_price: Optional[float]
    supports_tools: bool
    supports_structured_output: bool


@dataclass
class ModelInfo:
    id: str
    providers: list[ProviderInfo]

    @property
    def live_providers(self) -> list[ProviderInfo]:
        return [p for p in self.providers if p.status == "live"]

    @property
    def max_context_length(self) -> Optional[int]:
        lengths = [p.context_length for p in self.live_providers if p.context_length]
        return max(lengths) if lengths else None

    @property
    def any_supports_tools(self) -> bool:
        return any(p.supports_tools for p in self.live_providers)


def _fetch_catalog(force: bool = False) -> dict:
    global _cache, _cache_time
    now = time.time()
    if not force and _cache is not None and now - _cache_time < _CACHE_TTL_SECONDS:
        return _cache
    try:
        resp = httpx.get(_CATALOG_URL, timeout=_HTTP_TIMEOUT_SECONDS)
        resp.raise_for_status()
        _cache = resp.json()
        _cache_time = now
    except Exception as e:
        logger.warning("Failed to fetch HF router catalog: %s", e)
        if _cache is None:
            _cache = {"data": []}
            _cache_time = now
    return _cache


def _parse_entry(entry: dict) -> ModelInfo:
    providers = []
    for p in entry.get("providers", []) or []:
        pricing = p.get("pricing") or {}
        providers.append(
            ProviderInfo(
                provider=p.get("provider", ""),
                status=p.get("status", ""),
                context_length=p.get("context_length"),
                input_price=pricing.get("input"),
                output_price=pricing.get("output"),
                supports_tools=bool(p.get("supports_tools", False)),
                supports_structured_output=bool(p.get("supports_structured_output", False)),
            )
        )
    return ModelInfo(id=entry.get("id", ""), providers=providers)


def lookup(model_id: str) -> Optional[ModelInfo]:
    """Find a model in the router catalog.

    Accepts ``<org>/<model>`` or ``<org>/<model>:<tag>`` — the tag is stripped
    for lookup. Returns ``None`` if the model isn't listed.
    """
    bare = model_id.split(":", 1)[0]
    catalog = _fetch_catalog()
    for entry in catalog.get("data", []):
        if entry.get("id") == bare:
            return _parse_entry(entry)
    return None


def fuzzy_suggest(model_id: str, limit: int = 3) -> list[str]:
    """Return the closest model ids from the catalog."""
    bare = model_id.split(":", 1)[0]
    catalog = _fetch_catalog()
    ids = [e.get("id", "") for e in catalog.get("data", []) if e.get("id")]
    return get_close_matches(bare, ids, n=limit, cutoff=0.4)


def prewarm() -> None:
    """Fetch the catalog so subsequent lookups are instant. Safe to call from
    a background task — swallows failures."""
    try:
        _fetch_catalog(force=False)
    except Exception:
        pass


================================================
FILE: agent/core/llm_params.py
================================================
"""LiteLLM kwargs resolution for the model ids this agent accepts.

Kept separate from ``agent_loop`` so tools (research, context compaction, etc.)
can import it without pulling in the whole agent loop / tool router and
creating circular imports.
"""

import os


def _patch_litellm_effort_validation() -> None:
    """Neuter LiteLLM 1.83's hardcoded effort-level validation.

    Context: at ``litellm/llms/anthropic/chat/transformation.py:~1443`` the
    Anthropic adapter validates ``output_config.effort ∈ {high, medium,
    low, max}`` and gates ``max`` behind an ``_is_opus_4_6_model`` check
    that only matches the substring ``opus-4-6`` / ``opus_4_6``. Result:

    * ``xhigh`` — valid on Anthropic's real API for Claude 4.7 — is
      rejected pre-flight with "Invalid effort value: xhigh".
    * ``max`` on Opus 4.7 is rejected with "effort='max' is only supported
      by Claude Opus 4.6", even though Opus 4.7 accepts it in practice.

    We don't want to maintain a parallel model table, so we let the
    Anthropic API itself be the validator: widen ``_is_opus_4_6_model``
    to also match ``opus-4-7``+ families, and drop the valid-effort-set
    check entirely. If Anthropic rejects an effort level, we see a 400
    and the cascade walks down — exactly the behavior we want for any
    future model family.

    Removable once litellm ships 1.83.8-stable (which merges PR #25867,
    "Litellm day 0 opus 4.7 support") — see commit 0868a82 on their main
    branch. Until then, this one-time patch is the escape hatch.
    """
    try:
        from litellm.llms.anthropic.chat import transformation as _t
    except Exception:
        return

    cfg = getattr(_t, "AnthropicConfig", None)
    if cfg is None:
        return

    original = getattr(cfg, "_is_opus_4_6_model", None)
    if original is None or getattr(original, "_hf_agent_patched", False):
        return

    def _widened(model: str) -> bool:
        m = model.lower()
        # Original 4.6 match plus any future Opus >= 4.6. We only need this
        # to return True for families where "max" / "xhigh" are acceptable
        # at the API; the cascade handles the case when they're not.
        return any(
            v in m for v in (
                "opus-4-6", "opus_4_6", "opus-4.6", "opus_4.6",
                "opus-4-7", "opus_4_7", "opus-4.7", "opus_4.7",
            )
        )

    _widened._hf_agent_patched = True  # type: ignore[attr-defined]
    cfg._is_opus_4_6_model = staticmethod(_widened)


_patch_litellm_effort_validation()


# Effort levels accepted on the wire.
#   Anthropic (4.6+):  low | medium | high | xhigh | max   (output_config.effort)
#   OpenAI direct:     minimal | low | medium | high       (reasoning_effort top-level)
#   HF router:         low | medium | high                 (extra_body.reasoning_effort)
#
# We validate *shape* here and let the probe cascade walk down on rejection;
# we deliberately do NOT maintain a per-model capability table.
_ANTHROPIC_EFFORTS = {"low", "medium", "high", "xhigh", "max"}
_OPENAI_EFFORTS = {"minimal", "low", "medium", "high"}
_HF_EFFORTS = {"low", "medium", "high"}


class UnsupportedEffortError(ValueError):
    """The requested effort isn't valid for this provider's API surface.

    Raised synchronously before any network call so the probe cascade can
    skip levels the provider can't accept (e.g. ``max`` on HF router).
    """


def _resolve_llm_params(
    model_name: str,
    session_hf_token: str | None = None,
    reasoning_effort: str | None = None,
    strict: bool = False,
) -> dict:
    """
    Build LiteLLM kwargs for a given model id.

    • ``anthropic/<model>`` — native thinking config. We bypass LiteLLM's
      ``reasoning_effort`` → ``thinking`` mapping (which lags new Claude
      releases like 4.7 and sends the wrong API shape). Instead we pass
      both ``thinking={"type": "adaptive"}`` and ``output_config=
      {"effort": <level>}`` as top-level kwargs — LiteLLM's Anthropic
      adapter forwards unknown top-level kwargs into the request body
      verbatim (confirmed by live probe; ``extra_body`` does NOT work
      here because Anthropic's API rejects it as "Extra inputs are not
      permitted"). This is the stable API for 4.6 and 4.7. Older
      extended-thinking models that only accept ``thinking.type.enabled``
      will reject this; the probe's cascade catches that and falls back
      to no thinking.

    • ``openai/<model>`` — ``reasoning_effort`` forwarded as a top-level
      kwarg (GPT-5 / o-series). LiteLLM uses the user's ``OPENAI_API_KEY``.

    • Anything else is treated as a HuggingFace router id. We hit the
      auto-routing OpenAI-compatible endpoint at
      ``https://router.huggingface.co/v1``. The id can be bare or carry an
      HF routing suffix (``:fastest`` / ``:cheapest`` / ``:<provider>``).
      A leading ``huggingface/`` is stripped. ``reasoning_effort`` is
      forwarded via ``extra_body`` (LiteLLM's OpenAI adapter refuses it as
      a top-level kwarg for non-OpenAI models). "minimal" normalizes to
      "low".

    ``strict=True`` raises ``UnsupportedEffortError`` when the requested
    effort isn't in the provider's accepted set, instead of silently
    dropping it. The probe cascade uses strict mode so it can walk down
    (``max`` → ``xhigh`` → ``high`` …) without making an API call. Regular
    runtime callers leave ``strict=False``, so a stale cached effort
    can't crash a turn — it just doesn't get sent.

    Token precedence (first non-empty wins):
      1. INFERENCE_TOKEN env — shared key on the hosted Space (inference is
         free for users, billed to the Space owner via ``X-HF-Bill-To``).
      2. session.hf_token — the user's own token (CLI / OAuth / cache file).
      3. HF_TOKEN env — belt-and-suspenders fallback for CLI users.
    """
    if model_name.startswith("anthropic/"):
        params: dict = {"model": model_name}
        if reasoning_effort:
            level = reasoning_effort
            if level == "minimal":
                level = "low"
            if level not in _ANTHROPIC_EFFORTS:
                if strict:
                    raise UnsupportedEffortError(
                        f"Anthropic doesn't accept effort={level!r}"
                    )
            else:
                # Adaptive thinking + output_config.effort is the stable
                # Anthropic API for Claude 4.6 / 4.7. Both kwargs are
                # passed top-level: LiteLLM forwards unknown params into
                # the request body for Anthropic, so ``output_config``
                # reaches the API. ``extra_body`` does NOT work here —
                # Anthropic rejects it as "Extra inputs are not
                # permitted".
                params["thinking"] = {"type": "adaptive"}
                params["output_config"] = {"effort": level}
        return params

    if model_name.startswith("bedrock/"):
        # LiteLLM routes ``bedrock/...`` through the Converse adapter, which
        # picks up AWS credentials from the standard env vars
        # (``AWS_ACCESS_KEY_ID`` / ``AWS_SECRET_ACCESS_KEY`` / ``AWS_REGION``).
        # The Anthropic thinking/effort shape is not forwarded through Converse
        # the same way, so we leave it off for now.
        return {"model": model_name}

    if model_name.startswith("openai/"):
        params = {"model": model_name}
        if reasoning_effort:
            if reasoning_effort not in _OPENAI_EFFORTS:
                if strict:
                    raise UnsupportedEffortError(
                        f"OpenAI doesn't accept effort={reasoning_effort!r}"
                    )
            else:
                params["reasoning_effort"] = reasoning_effort
        return params

    hf_model = model_name.removeprefix("huggingface/")
    api_key = (
        os.environ.get("INFERENCE_TOKEN")
        or session_hf_token
        or os.environ.get("HF_TOKEN")
    )
    params = {
        "model": f"openai/{hf_model}",
        "api_base": "https://router.huggingface.co/v1",
        "api_key": api_key,
    }
    if os.environ.get("INFERENCE_TOKEN"):
        bill_to = os.environ.get("HF_BILL_TO", "smolagents")
        params["extra_headers"] = {"X-HF-Bill-To": bill_to}
    if reasoning_effort:
        hf_level = "low" if reasoning_effort == "minimal" else reasoning_effort
        if hf_level not in _HF_EFFORTS:
            if strict:
                raise UnsupportedEffortError(
                    f"HF router doesn't accept effort={hf_level!r}"
                )
        else:
            params["extra_body"] = {"reasoning_effort": hf_level}
    return params


================================================
FILE: agent/core/model_switcher.py
================================================
"""Model-switching logic for the interactive CLI's ``/model`` command.

Split out of ``agent.main`` so the REPL dispatcher stays focused on input
parsing. Exposes:

* ``SUGGESTED_MODELS`` — the short list shown by ``/model`` with no arg.
* ``is_valid_model_id`` — loose format check on user input.
* ``probe_and_switch_model`` — async: checks routing, fires a 1-token
  probe to resolve the effort cascade, then commits the switch (or
  rejects it on hard error).

The probe's cascade lives in ``agent.core.effort_probe``; this module
glues it to CLI output + session state.
"""

from __future__ import annotations

from agent.core.effort_probe import ProbeInconclusive, probe_effort


# Suggested models shown by `/model` (not a gate). Users can paste any HF
# model id (e.g. "MiniMaxAI/MiniMax-M2.7") or an `anthropic/` / `openai/`
# prefix for direct API access. For HF ids, append ":fastest" /
# ":cheapest" / ":preferred" / ":<provider>" to override the default
# routing policy (auto = fastest with failover).
SUGGESTED_MODELS = [
    {"id": "bedrock/us.anthropic.claude-opus-4-7", "label": "Claude Opus 4.7"},
    {"id": "bedrock/us.anthropic.claude-opus-4-6-v1", "label": "Claude Opus 4.6"},
    {"id": "MiniMaxAI/MiniMax-M2.7", "label": "MiniMax M2.7"},
    {"id": "moonshotai/Kimi-K2.6", "label": "Kimi K2.6"},
    {"id": "zai-org/GLM-5.1", "label": "GLM 5.1"},
]


_ROUTING_POLICIES = {"fastest", "cheapest", "preferred"}


def is_valid_model_id(model_id: str) -> bool:
    """Loose format check — lets users pick any model id.

    Accepts:
      • anthropic/<model>
      • openai/<model>
      • <org>/<model>[:<tag>]            (HF router; tag = provider or policy)
      • huggingface/<org>/<model>[:<tag>] (same, accepts legacy prefix)

    Actual availability is verified against the HF router catalog on
    switch, and by the provider on the probe's ping call.
    """
    if not model_id or "/" not in model_id:
        return False
    head = model_id.split(":", 1)[0]
    parts = head.split("/")
    return len(parts) >= 2 and all(parts)


def _print_hf_routing_info(model_id: str, console) -> bool:
    """Show HF router catalog info (providers, price, context, tool support)
    for an HF-router model id. Returns ``True`` to signal the caller can
    proceed with the switch, ``False`` to indicate a hard problem the user
    should notice before we fire the effort probe.

    Anthropic / OpenAI ids return ``True`` without printing anything —
    the probe below covers "does this model exist".
    """
    if model_id.startswith(("anthropic/", "openai/")):
        return True

    from agent.core import hf_router_catalog as cat

    bare, _, tag = model_id.partition(":")
    info = cat.lookup(bare)
    if info is None:
        console.print(
            f"[bold red]Warning:[/bold red] '{bare}' isn't in the HF router "
            "catalog. Checking anyway — first call may fail."
        )
        suggestions = cat.fuzzy_suggest(bare)
        if suggestions:
            console.print(f"[dim]Did you mean: {', '.join(suggestions)}[/dim]")
        return True

    live = info.live_providers
    if not live:
        console.print(
            f"[bold red]Warning:[/bold red] '{bare}' has no live providers "
            "right now. First call will likely fail."
        )
        return True

    if tag and tag not in _ROUTING_POLICIES:
        matched = [p for p in live if p.provider == tag]
        if not matched:
            names = ", ".join(p.provider for p in live)
            console.print(
                f"[bold red]Warning:[/bold red] provider '{tag}' doesn't serve "
                f"'{bare}'. Live providers: {names}. Checking anyway."
            )

    if not info.any_supports_tools:
        console.print(
            f"[bold red]Warning:[/bold red] no provider for '{bare}' advertises "
            "tool-call support. This agent relies on tool calls — expect errors."
        )

    if tag in _ROUTING_POLICIES:
        policy = tag
    elif tag:
        policy = f"pinned to {tag}"
    else:
        policy = "auto (fastest)"
    console.print(f"  [dim]routing: {policy}[/dim]")
    for p in live:
        price = (
            f"${p.input_price:g}/${p.output_price:g} per M tok"
            if p.input_price is not None and p.output_price is not None
            else "price n/a"
        )
        ctx = f"{p.context_length:,} ctx" if p.context_length else "ctx n/a"
        tools = "tools" if p.supports_tools else "no tools"
        console.print(
            f"  [dim]{p.provider}: {price}, {ctx}, {tools}[/dim]"
        )
    return True


def print_model_listing(config, console) -> None:
    """Render the default ``/model`` (no-arg) view: current + suggested."""
    current = config.model_name if config else ""
    console.print("[bold]Current model:[/bold]")
    console.print(f"  {current}")
    console.print("\n[bold]Suggested:[/bold]")
    for m in SUGGESTED_MODELS:
        marker = " [dim]<-- current[/dim]" if m["id"] == current else ""
        console.print(f"  {m['id']}  [dim]({m['label']})[/dim]{marker}")
    console.print(
        "\n[dim]Paste any HF model id (e.g. 'MiniMaxAI/MiniMax-M2.7').\n"
        "Add ':fastest', ':cheapest', ':preferred', or ':<provider>' to override routing.\n"
        "Use 'anthropic/<model>' or 'openai/<model>' for direct API access.[/dim]"
    )


def print_invalid_id(arg: str, console) -> None:
    console.print(f"[bold red]Invalid model id format:[/bold red] {arg}")
    console.print(
        "[dim]Expected:\n"
        "  • <org>/<model>[:tag]    (HF router — paste from huggingface.co)\n"
        "  • anthropic/<model>\n"
        "  • openai/<model>[/dim]"
    )


async def probe_and_switch_model(
    model_id: str,
    config,
    session,
    console,
    hf_token: str | None,
) -> None:
    """Validate model+effort with a 1-token ping, cache the effective effort,
    then commit the switch.

    Three visible outcomes:

    * ✓ ``effort: <level>`` — model accepted the preferred effort (or a
      fallback from the cascade; the note explains if so)
    * ✓ ``effort: off`` — model doesn't support thinking; we'll strip it
    * ✗ hard error (auth, model-not-found, quota) — we reject the switch
      and keep the current model so the user isn't stranded

    Transient errors (5xx, timeout) complete the switch with a yellow
    warning; the next real call re-surfaces the error if it's persistent.
    """
    preference = config.reasoning_effort
    if not _print_hf_routing_info(model_id, console):
        return

    if not preference:
        # Nothing to validate with a ping that we couldn't validate on the
        # first real call just as cheaply. Skip the probe entirely.
        _commit_switch(model_id, config, session, effective=None, cache=False)
        console.print(f"[green]Model switched to {model_id}[/green] [dim](effort: off)[/dim]")
        return

    console.print(f"[dim]checking {model_id} (effort: {preference})...[/dim]")
    try:
        outcome = await probe_effort(model_id, preference, hf_token)
    except ProbeInconclusive as e:
        _commit_switch(model_id, config, session, effective=None, cache=False)
        console.print(
            f"[yellow]Model switched to {model_id}[/yellow] "
            f"[dim](couldn't validate: {e}; will verify on first message)[/dim]"
        )
        return
    except Exception as e:
        # Hard persistent error — auth, unknown model, quota. Don't switch.
        console.print(f"[bold red]Switch failed:[/bold red] {e}")
        console.print(f"[dim]Keeping current model: {config.model_name}[/dim]")
        return

    _commit_switch(
        model_id, config, session,
        effective=outcome.effective_effort, cache=True,
    )
    effort_label = outcome.effective_effort or "off"
    suffix = f" — {outcome.note}" if outcome.note else ""
    console.print(
        f"[green]Model switched to {model_id}[/green] "
        f"[dim](effort: {effort_label}{suffix}, {outcome.elapsed_ms}ms)[/dim]"
    )


def _commit_switch(model_id, config, session, effective, cache: bool) -> None:
    """Apply the switch to the session (or bare config if no session yet).

    ``effective`` is the probe's resolved effort; ``cache=True`` stores it
    in the session's per-model cache so real calls use the resolved level
    instead of re-probing. ``cache=False`` (inconclusive probe / effort
    off) leaves the cache untouched — next call falls back to preference.
    """
    if session is not None:
        session.update_model(model_id)
        if cache:
            session.model_effective_effort[model_id] = effective
        else:
            session.model_effective_effort.pop(model_id, None)
    else:
        config.model_name = model_id


================================================
FILE: agent/core/prompt_caching.py
================================================
"""Anthropic prompt caching breakpoints for outgoing LLM requests.

Caching is GA on Anthropic's API and natively supported by litellm >=1.83
via ``cache_control`` blocks. We apply two breakpoints (out of 4 allowed):

  1. The tool block — caches all tool definitions as a single prefix.
  2. The system message — caches the rendered system prompt.

Together these cover the ~4-5K static tokens that were being re-billed on
every turn. Subsequent turns within the 5-minute TTL hit cache_read pricing
(~10% of input cost) instead of full input.

Non-Anthropic models (HF router, OpenAI) are passed through unchanged.
"""

from typing import Any


def with_prompt_caching(
    messages: list[Any],
    tools: list[dict] | None,
    model_name: str | None,
) -> tuple[list[Any], list[dict] | None]:
    """Return (messages, tools) with cache_control breakpoints for Anthropic.

    No-op for non-Anthropic models. Original objects are not mutated; a fresh
    list with replaced first message and last tool is returned, so callers
    that share the underlying ``ContextManager.items`` list don't see their
    persisted history rewritten.
    """
    if not model_name or "anthropic" not in model_name:
        return messages, tools

    if tools:
        new_tools = list(tools)
        last = dict(new_tools[-1])
        last["cache_control"] = {"type": "ephemeral"}
        new_tools[-1] = last
        tools = new_tools

    if messages:
        first = messages[0]
        role = first.get("role") if isinstance(first, dict) else getattr(first, "role", None)
        if role == "system":
            content = (
                first.get("content")
                if isinstance(first, dict)
                else getattr(first, "content", None)
            )
            if isinstance(content, str) and content:
                cached_block = [{
                    "type": "text",
                    "text": content,
                    "cache_control": {"type": "ephemeral"},
                }]
                new_first = {"role": "system", "content": cached_block}
                messages = [new_first] + list(messages[1:])

    return messages, tools


================================================
FILE: agent/core/session.py
================================================
import asyncio
import json
import logging
import subprocess
import sys
import uuid
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from pathlib import Path
from typing import Any, Optional

from agent.config import Config
from agent.context_manager.manager import ContextManager

logger = logging.getLogger(__name__)

_DEFAULT_MAX_TOKENS = 200_000


def _get_max_tokens_safe(model_name: str) -> int:
    """Return the max input-context tokens for a model.

    Primary source: ``litellm.get_model_info(model)['max_input_tokens']`` —
    LiteLLM maintains an upstream catalog that knows Claude Opus 4.6 is
    1M, GPT-5 is 272k, Sonnet 4.5 is 200k, and so on. Strips any HF routing
    suffix / huggingface/ prefix so tagged ids ('moonshotai/Kimi-K2.6:cheapest')
    look up the bare model. Falls back to a conservative 200k default for
    models not in the catalog (typically HF-router-only models).
    """
    from litellm import get_model_info

    candidates = [model_name]
    stripped = model_name.removeprefix("huggingface/").split(":", 1)[0]
    if stripped != model_name:
        candidates.append(stripped)
    for candidate in candidates:
        try:
            info = get_model_info(candidate)
            max_input = info.get("max_input_tokens") if info else None
            if isinstance(max_input, int) and max_input > 0:
                return max_input
        except Exception:
            continue
    logger.info(
        "No litellm.get_model_info entry for %s, falling back to %d",
        model_name, _DEFAULT_MAX_TOKENS,
    )
    return _DEFAULT_MAX_TOKENS


class OpType(Enum):
    USER_INPUT = "user_input"
    EXEC_APPROVAL = "exec_approval"
    INTERRUPT = "interrupt"
    UNDO = "undo"
    COMPACT = "compact"
    SHUTDOWN = "shutdown"


@dataclass
class Event:
    event_type: str
    data: Optional[dict[str, Any]] = None


class Session:
    """
    Maintains agent session state
    Similar to Session in codex-rs/core/src/codex.rs
    """

    def __init__(
        self,
        event_queue: asyncio.Queue,
        config: Config | None = None,
        tool_router=None,
        context_manager: ContextManager | None = None,
        hf_token: str | None = None,
        local_mode: bool = False,
        stream: bool = True,
    ):
        self.hf_token: Optional[str] = hf_token
        self.tool_router = tool_router
        self.stream = stream
        tool_specs = tool_router.get_tool_specs_for_llm() if tool_router else []
        self.context_manager = context_manager or ContextManager(
            model_max_tokens=_get_max_tokens_safe(config.model_name),
            compact_size=0.1,
            untouched_messages=5,
            tool_specs=tool_specs,
            hf_token=hf_token,
            local_mode=local_mode,
        )
        self.event_queue = event_queue
        self.session_id = str(uuid.uuid4())
        self.config = config or Config(
            model_name="bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        )
        self.is_running = True
        self._cancelled = asyncio.Event()
        self.pending_approval: Optional[dict[str, Any]] = None
        self.sandbox = None
        self._running_job_ids: set[str] = set()  # HF job IDs currently executing

        # Session trajectory logging
        self.logged_events: list[dict] = []
        self.session_start_time = datetime.now().isoformat()
        self.turn_count: int = 0
        self.last_auto_save_turn: int = 0

        # Per-model probed reasoning-effort cache. Populated by the probe
        # on /model switch, read by ``effective_effort_for`` below. Keys are
        # raw model ids (including any ``:tag``). Values:
        #   str  → the effort level to send (may be a downgrade from the
        #          preference, e.g. "high" when user asked for "max")
        #   None → model rejected all efforts in the cascade; send no
        #          thinking params at all
        # Key absent → not probed yet; fall back to the raw preference.
        self.model_effective_effort: dict[str, str | None] = {}

    async def send_event(self, event: Event) -> None:
        """Send event back to client and log to trajectory"""
        await self.event_queue.put(event)

        # Log event to trajectory
        self.logged_events.append(
            {
                "timestamp": datetime.now().isoformat(),
                "event_type": event.event_type,
                "data": event.data,
            }
        )

    def cancel(self) -> None:
        """Signal cancellation to the running agent loop."""
        self._cancelled.set()

    def reset_cancel(self) -> None:
        """Clear the cancellation flag before a new run."""
        self._cancelled.clear()

    @property
    def is_cancelled(self) -> bool:
        return self._cancelled.is_set()

    def update_model(self, model_name: str) -> None:
        """Switch the active model and update the context window limit."""
        self.config.model_name = model_name
        self.context_manager.model_max_tokens = _get_max_tokens_safe(model_name)

    def effective_effort_for(self, model_name: str) -> str | None:
        """Resolve the effort level to actually send for ``model_name``.

        Returns the probed result when we have one (may be ``None`` meaning
        "model doesn't do thinking, strip it"), else the raw preference.
        Unknown-model case falls back to the preference so a stale cache
        from a prior ``/model`` can't poison research sub-calls that use a
        different model id.
        """
        if model_name in self.model_effective_effort:
            return self.model_effective_effort[model_name]
        return self.config.reasoning_effort

    def increment_turn(self) -> None:
        """Increment turn counter (called after each user interaction)"""
        self.turn_count += 1

    async def auto_save_if_needed(self) -> None:
        """Check if auto-save should trigger and save if so (completely non-blocking)"""
        if not self.config.save_sessions:
            return

        interval = self.config.auto_save_interval
        if interval <= 0:
            return

        turns_since_last_save = self.turn_count - self.last_auto_save_turn
        if turns_since_last_save >= interval:
            logger.info(f"Auto-saving session (turn {self.turn_count})...")
            # Fire-and-forget save - returns immediately
            self.save_and_upload_detached(self.config.session_dataset_repo)
            self.last_auto_save_turn = self.turn_count

    def get_trajectory(self) -> dict:
        """Serialize complete session trajectory for logging"""
        return {
            "session_id": self.session_id,
            "session_start_time": self.session_start_time,
            "session_end_time": datetime.now().isoformat(),
            "model_name": self.config.model_name,
            "messages": [msg.model_dump() for msg in self.context_manager.items],
            "events": self.logged_events,
        }

    def save_trajectory_local(
        self,
        directory: str = "session_logs",
        upload_status: str = "pending",
        dataset_url: Optional[str] = None,
    ) -> Optional[str]:
        """
        Save trajectory to local JSON file as backup with upload status

        Args:
            directory: Directory to save logs (default: "session_logs")
            upload_status: Status of upload attempt ("pending", "success", "failed")
            dataset_url: URL of dataset if upload succeeded

        Returns:
            Path to saved file if successful, None otherwise
        """
        try:
            log_dir = Path(directory)
            log_dir.mkdir(parents=True, exist_ok=True)

            trajectory = self.get_trajectory()

            # Add upload metadata
            trajectory["upload_status"] = upload_status
            trajectory["upload_url"] = dataset_url
            trajectory["last_save_time"] = datetime.now().isoformat()

            filename = f"session_{self.session_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
            filepath = log_dir / filename

            with open(filepath, "w") as f:
                json.dump(trajectory, f, indent=2)

            return str(filepath)
        except Exception as e:
            logger.error(f"Failed to save session locally: {e}")
            return None

    def update_local_save_status(
        self, filepath: str, upload_status: str, dataset_url: Optional[str] = None
    ) -> bool:
        """Update the upload status of an existing local save file"""
        try:
            with open(filepath, "r") as f:
                data = json.load(f)

            data["upload_status"] = upload_status
            data["upload_url"] = dataset_url
            data["last_save_time"] = datetime.now().isoformat()

            with open(filepath, "w") as f:
                json.dump(data, f, indent=2)

            return True
        except Exception as e:
            logger.error(f"Failed to update local save status: {e}")
            return False

    def save_and_upload_detached(self, repo_id: str) -> Optional[str]:
        """
        Save session locally and spawn detached subprocess for upload (fire-and-forget)

        Args:
            repo_id: HuggingFace dataset repo ID

        Returns:
            Path to local save file
        """
        # Save locally first (fast, synchronous)
        local_path = self.save_trajectory_local(upload_status="pending")
        if not local_path:
            return None

        # Spawn detached subprocess for upload (fire-and-forget)
        try:
            uploader_script = Path(__file__).parent / "session_uploader.py"

            # Use Popen with detached process
            subprocess.Popen(
                [sys.executable, str(uploader_script), "upload", local_path, repo_id],
                stdin=subprocess.DEVNULL,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
                start_new_session=True,  # Detach from parent
            )
        except Exception as e:
            logger.warning(f"Failed to spawn upload subprocess: {e}")

        return local_path

    @staticmethod
    def retry_failed_uploads_detached(
        directory: str = "session_logs", repo_id: Optional[str] = None
    ) -> None:
        """
        Spawn detached subprocess to retry failed/pending uploads (fire-and-forget)

        Args:
            directory: Directory containing session logs
            repo_id: Target dataset repo ID
        """
        if not repo_id:
            return

        try:
            uploader_script = Path(__file__).parent / "session_uploader.py"

            # Spawn detached subprocess for retry
            subprocess.Popen(
                [sys.executable, str(uploader_script), "retry", directory, repo_id],
                stdin=subprocess.DEVNULL,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
                start_new_session=True,  # Detach from parent
            )
        except Exception as e:
            logger.warning(f"Failed to spawn retry subprocess: {e}")


================================================
FILE: agent/core/session_uploader.py
================================================
#!/usr/bin/env python3
"""
Standalone script for uploading session trajectories to HuggingFace.
This runs as a separate process to avoid blocking the main agent.
Uses individual file uploads to avoid race conditions.
"""

import json
import os
import sys
from datetime import datetime
from pathlib import Path

from dotenv import load_dotenv

load_dotenv()

# Token for session uploads — loaded from env var (never hardcode tokens in source)
_SESSION_TOKEN = os.environ.get("HF_SESSION_UPLOAD_TOKEN", "")


def upload_session_as_file(
    session_file: str, repo_id: str, max_retries: int = 3
) -> bool:
    """
    Upload a single session as an individual JSONL file (no race conditions)

    Args:
        session_file: Path to local session JSON file
        repo_id: HuggingFace dataset repo ID
        max_retries: Number of retry attempts

    Returns:
        True if successful, False otherwise
    """
    try:
        from huggingface_hub import HfApi
    except ImportError:
        print("Error: huggingface_hub library not available", file=sys.stderr)
        return False

    try:
        # Load session data
        with open(session_file, "r") as f:
            data = json.load(f)

        # Check if already uploaded
        upload_status = data.get("upload_status")
        if upload_status == "success":
            return True

        # Use dedicated session upload token (write-only access to session dataset)
        hf_token = _SESSION_TOKEN
        if not hf_token:
            # Update status to failed
            data["upload_status"] = "failed"
            with open(session_file, "w") as f:
                json.dump(data, f, indent=2)
            return False

        # Prepare JSONL content (single line)
        # Store messages and events as JSON strings to avoid schema conflicts
        session_row = {
            "session_id": data["session_id"],
            "session_start_time": data["session_start_time"],
            "session_end_time": data["session_end_time"],
            "model_name": data["model_name"],
            "messages": json.dumps(data["messages"]),
            "events": json.dumps(data["events"]),
        }

        # Create temporary JSONL file
        import tempfile

        with tempfile.NamedTemporaryFile(
            mode="w", suffix=".jsonl", delete=False
        ) as tmp:
            json.dump(session_row, tmp)  # Single line JSON
            tmp_path = tmp.name

        try:
            # Generate unique path in repo: sessions/YYYY-MM-DD/session_id.jsonl
            session_id = data["session_id"]
            date_str = datetime.fromisoformat(data["session_start_time"]).strftime(
                "%Y-%m-%d"
            )
            repo_path = f"sessions/{date_str}/{session_id}.jsonl"

            # Upload with retries
            api = HfApi()
            for attempt in range(max_retries):
                try:
                    # Try to create repo if it doesn't exist (idempotent)
                    try:
                        api.create_repo(
                            repo_id=repo_id,
                            repo_type="dataset",
                            private=False,
                            token=hf_token,
                            exist_ok=True,  # Don't fail if already exists
                        )

                    except Exception:
                        # Repo might already exist, continue
                        pass

                    # Upload the session file
                    api.upload_file(
                        path_or_fileobj=tmp_path,
                        path_in_repo=repo_path,
                        repo_id=repo_id,
                        repo_type="dataset",
                        token=hf_token,
                        commit_message=f"Add session {session_id}",
                    )

                    # Update local status to success
                    data["upload_status"] = "success"
                    data["upload_url"] = f"https://huggingface.co/datasets/{repo_id}"
                    with open(session_file, "w") as f:
                        json.dump(data, f, indent=2)

                    return True

                except Exception:
                    if attempt < max_retries - 1:
                        import time

                        wait_time = 2**attempt
                        time.sleep(wait_time)
                    else:
                        # Final attempt failed
                        data["upload_status"] = "failed"
                        with open(session_file, "w") as f:
                            json.dump(data, f, indent=2)
                        return False

        finally:
            # Clean up temp file
            try:
                os.unlink(tmp_path)
            except Exception:
                pass

    except Exception as e:
        print(f"Error uploading session: {e}", file=sys.stderr)
        return False


def retry_failed_uploads(directory: str, repo_id: str):
    """Retry all failed/pending uploads in a directory"""
    log_dir = Path(directory)
    if not log_dir.exists():
        return

    session_files = list(log_dir.glob("session_*.json"))

    for filepath in session_files:
        try:
            with open(filepath, "r") as f:
                data = json.load(f)

            upload_status = data.get("upload_status", "unknown")

            # Only retry pending or failed uploads
            if upload_status in ["pending", "failed"]:
                upload_session_as_file(str(filepath), repo_id)

        except Exception:
            pass


if __name__ == "__main__":
    if len(sys.argv) < 3:
        print("Usage: session_uploader.py <command> <args...>")
        sys.exit(1)

    command = sys.argv[1]

    if command == "upload":
        # python session_uploader.py upload <session_file> <repo_id>
        if len(sys.argv) < 4:
            print("Usage: session_uploader.py upload <session_file> <repo_id>")
            sys.exit(1)
        session_file = sys.argv[2]
        repo_id = sys.argv[3]
        success = upload_session_as_file(session_file, repo_id)
        sys.exit(0 if success else 1)

    elif command == "retry":
        # python session_uploader.py retry <directory> <repo_id>
        if len(sys.argv) < 4:
            print("Usage: session_uploader.py retry <directory> <repo_id>")
            sys.exit(1)
        directory = sys.argv[2]
        repo_id = sys.argv[3]
        retry_failed_uploads(directory, repo_id)
        sys.exit(0)

    else:
        print(f"Unknown command: {command}")
        sys.exit(1)


================================================
FILE: agent/core/tools.py
================================================
"""
Tool system for the agent
Provides ToolSpec and ToolRouter for managing both built-in and MCP tools
"""

import logging
import warnings
from dataclasses import dataclass
from typing import Any, Awaitable, Callable, Optional

logger = logging.getLogger(__name__)

from fastmcp import Client
from fastmcp.exceptions import ToolError
from mcp.types import EmbeddedResource, ImageContent, TextContent

from agent.config import MCPServerConfig
from agent.tools.dataset_tools import (
    HF_INSPECT_DATASET_TOOL_SPEC,
    hf_inspect_dataset_handler,
)
from agent.tools.docs_tools import (
    EXPLORE_HF_DOCS_TOOL_SPEC,
    HF_DOCS_FETCH_TOOL_SPEC,
    explore_hf_docs_handler,
    hf_docs_fetch_handler,
)
from agent.tools.github_find_examples import (
    GITHUB_FIND_EXAMPLES_TOOL_SPEC,
    github_find_examples_handler,
)
from agent.tools.github_list_repos import (
    GITHUB_LIST_REPOS_TOOL_SPEC,
    github_list_repos_handler,
)
from agent.tools.github_read_file import (
    GITHUB_READ_FILE_TOOL_SPEC,
    github_read_file_handler,
)
from agent.tools.hf_repo_files_tool import (
    HF_REPO_FILES_TOOL_SPEC,
    hf_repo_files_handler,
)
from agent.tools.hf_repo_git_tool import (
    HF_REPO_GIT_TOOL_SPEC,
    hf_repo_git_handler,
)
from agent.tools.jobs_tool import HF_JOBS_TOOL_SPEC, hf_jobs_handler
from agent.tools.papers_tool import HF_PAPERS_TOOL_SPEC, hf_papers_handler
from agent.tools.plan_tool import PLAN_TOOL_SPEC, plan_tool_handler
from agent.tools.research_tool import RESEARCH_TOOL_SPEC, research_handler
from agent.tools.sandbox_tool import get_sandbox_tools

# NOTE: Private HF repo tool disabled - replaced by hf_repo_files and hf_repo_git
# from agent.tools.private_hf_repo_tools import (
#     PRIVATE_HF_REPO_TOOL_SPEC,
#     private_hf_repo_handler,
# )

# Suppress aiohttp deprecation warning
warnings.filterwarnings(
    "ignore", category=DeprecationWarning, module="aiohttp.connector"
)

NOT_ALLOWED_TOOL_NAMES = ["hf_jobs", "hf_doc_search", "hf_doc_fetch", "hf_whoami"]


def convert_mcp_content_to_string(content: list) -> str:
    """
    Convert MCP content blocks to a string format compatible with LLM messages.

    Based on FastMCP documentation, content can be:
    - TextContent: has .text field
    - ImageContent: has .data and .mimeType fields
    - EmbeddedResource: has .resource field with .text or .blob

    Args:
        content: List of MCP content blocks

    Returns:
        String representation of the content suitable for LLM consumption
    """
    if not content:
        return ""

    parts = []
    for item in content:
        if isinstance(item, TextContent):
            # Extract text from TextContent blocks
            parts.append(item.text)
        elif isinstance(item, ImageContent):
            # TODO: Handle images
            # For images, include a description with MIME type
            parts.append(f"[Image: {item.mimeType}]")
        elif isinstance(item, EmbeddedResource):
            # TODO: Handle embedded resources
            # For embedded resources, try to extract text
            resource = item.resource
            if hasattr(resource, "text") and resource.text:
                parts.append(resource.text)
            elif hasattr(resource, "blob") and resource.blob:
                parts.append(
                    f"[Binary data: {resource.mimeType if hasattr(resource, 'mimeType') else 'unknown'}]"
                )
            else:
                parts.append(
                    f"[Resource: {resource.uri if hasattr(resource, 'uri') else 'unknown'}]"
                )
        else:
            # Fallback: try to convert to string
            parts.append(str(item))

    return "\n".join(parts)


@dataclass
class ToolSpec:
    """Tool specification for LLM"""

    name: str
    description: str
    parameters: dict[str, Any]
    handler: Optional[Callable[[dict[str, Any]], Awaitable[tuple[str, bool]]]] = None


class ToolRouter:
    """
    Routes tool calls to appropriate handlers.
    Based on codex-rs/core/src/tools/router.rs
    """

    def __init__(self, mcp_servers: dict[str, MCPServerConfig], hf_token: str | None = None, local_mode: bool = False):
        self.tools: dict[str, ToolSpec] = {}
        self.mcp_servers: dict[str, dict[str, Any]] = {}

        for tool in create_builtin_tools(local_mode=local_mode):
            self.register_tool(tool)

        self.mcp_client: Client | None = None
        if mcp_servers:
            mcp_servers_payload = {}
            for name, server in mcp_servers.items():
                data = server.model_dump()
                if hf_token:
                    data.setdefault("headers", {})["Authorization"] = f"Bearer {hf_token}"
                mcp_servers_payload[name] = data
            self.mcp_client = Client({"mcpServers": mcp_servers_payload})
        self._mcp_initialized = False

    def register_tool(self, tool: ToolSpec) -> None:
        self.tools[tool.name] = tool

    async def register_mcp_tools(self) -> None:
        tools = await self.mcp_client.list_tools()
        registered_names = []
        skipped_count = 0
        for tool in tools:
            if tool.name in NOT_ALLOWED_TOOL_NAMES:
                skipped_count += 1
                continue
            registered_names.append(tool.name)
            self.register_tool(
                ToolSpec(
                    name=tool.name,
                    description=tool.description,
                    parameters=tool.inputSchema,
                    handler=None,
                )
            )
        logger.info(
            f"Loaded {len(registered_names)} MCP tools: {', '.join(registered_names)} ({skipped_count} disabled)"
        )

    async def register_openapi_tool(self) -> None:
        """Register the OpenAPI search tool (requires async initialization)"""
        from agent.tools.docs_tools import (
            _get_api_search_tool_spec,
            search_openapi_handler,
        )

        try:
            openapi_spec = await _get_api_search_tool_spec()
            self.register_tool(
                ToolSpec(
                    name=openapi_spec["name"],
                    description=openapi_spec["description"],
                    parameters=openapi_spec["parameters"],
                    handler=search_openapi_handler,
                )
            )
            logger.info(f"Loaded OpenAPI search tool: {openapi_spec['name']}")
        except Exception as e:
            logger.warning("Failed to load OpenAPI search tool: %s", e)

    def get_tool_specs_for_llm(self) -> list[dict[str, Any]]:
        """Get tool specifications in OpenAI format"""
        specs = []
        for tool in self.tools.values():
            specs.append(
                {
                    "type": "function",
                    "function": {
                        "name": tool.name,
                        "description": tool.description,
                        "parameters": tool.parameters,
                    },
                }
            )
        return specs

    async def __aenter__(self) -> "ToolRouter":
        if self.mcp_client is not None:
            try:
                await self.mcp_client.__aenter__()
                await self.mcp_client.initialize()
                await self.register_mcp_tools()
                self._mcp_initialized = True
            except Exception as e:
                logger.warning("MCP connection failed, continuing without MCP tools: %s", e)
                self.mcp_client = None

        await self.register_openapi_tool()

        total_tools = len(self.tools)
        logger.info(f"Agent ready with {total_tools} tools total")

        return self

    async def __aexit__(self, exc_type, exc, tb) -> None:
        if self.mcp_client is not None:
            await self.mcp_client.__aexit__(exc_type, exc, tb)
            self._mcp_initialized = False

    async def call_tool(
        self,
        tool_name: str,
        arguments: dict[str, Any],
        session: Any = None,
        tool_call_id: str | None = None,
    ) -> tuple[str, bool]:
        """
        Call a tool and return (output_string, success_bool).

        For MCP tools, converts the CallToolResult content blocks to a string.
        For built-in tools, calls their handler directly.
        """
        # Check if this is a built-in tool with a handler
        tool = self.tools.get(tool_name)
        if tool and tool.handler:
            import inspect

            # Check if handler accepts session argument
            sig = inspect.signature(tool.handler)
            if "session" in sig.parameters:
                # Check if handler also accepts tool_call_id parameter
                if "tool_call_id" in sig.parameters:
                    return await tool.handler(
                        arguments, session=session, tool_call_id=tool_call_id
                    )
                return await tool.handler(arguments, session=session)
            return await tool.handler(arguments)

        # Otherwise, use MCP client
        if self._mcp_initialized:
            try:
                result = await self.mcp_client.call_tool(tool_name, arguments)
                output = convert_mcp_content_to_string(result.content)
                return output, not result.is_error
            except ToolError as e:
                # Catch MCP tool errors and return them to the agent
                error_msg = f"Tool error: {str(e)}"
                return error_msg, False

        return "MCP client not initialized", False


# ============================================================================
# BUILT-IN TOOL HANDLERS
# ============================================================================


def create_builtin_tools(local_mode: bool = False) -> list[ToolSpec]:
    """Create built-in tool specifications"""
    # in order of importance
    tools = [
        # Research sub-agent (delegates to read-only tools in independent context)
        ToolSpec(
            name=RESEARCH_TOOL_SPEC["name"],
            description=RESEARCH_TOOL_SPEC["description"],
            parameters=RESEARCH_TOOL_SPEC["parameters"],
            handler=research_handler,
        ),
        # Documentation search tools
        ToolSpec(
            name=EXPLORE_HF_DOCS_TOOL_SPEC["name"],
            description=EXPLORE_HF_DOCS_TOOL_SPEC["description"],
            parameters=EXPLORE_HF_DOCS_TOOL_SPEC["parameters"],
            handler=explore_hf_docs_handler,
        ),
        ToolSpec(
            name=HF_DOCS_FETCH_TOOL_SPEC["name"],
            description=HF_DOCS_FETCH_TOOL_SPEC["description"],
            parameters=HF_DOCS_FETCH_TOOL_SPEC["parameters"],
            handler=hf_docs_fetch_handler,
        ),
        # Paper discovery and reading
        ToolSpec(
            name=HF_PAPERS_TOOL_SPEC["name"],
            description=HF_PAPERS_TOOL_SPEC["description"],
            parameters=HF_PAPERS_TOOL_SPEC["parameters"],
            handler=hf_papers_handler,
        ),
        # Dataset inspection tool (unified)
        ToolSpec(
            name=HF_INSPECT_DATASET_TOOL_SPEC["name"],
            description=HF_INSPECT_DATASET_TOOL_SPEC["description"],
            parameters=HF_INSPECT_DATASET_TOOL_SPEC["parameters"],
            handler=hf_inspect_dataset_handler,
        ),
        # Planning and job management tools
        ToolSpec(
            name=PLAN_TOOL_SPEC["name"],
            description=PLAN_TOOL_SPEC["description"],
            parameters=PLAN_TOOL_SPEC["parameters"],
            handler=plan_tool_handler,
        ),
        ToolSpec(
            name=HF_JOBS_TOOL_SPEC["name"],
            description=HF_JOBS_TOOL_SPEC["description"],
            parameters=HF_JOBS_TOOL_SPEC["parameters"],
            handler=hf_jobs_handler,
        ),
        # HF Repo management tools
        ToolSpec(
            name=HF_REPO_FILES_TOOL_SPEC["name"],
            description=HF_REPO_FILES_TOOL_SPEC["description"],
            parameters=HF_REPO_FILES_TOOL_SPEC["parameters"],
            handler=hf_repo_files_handler,
        ),
        ToolSpec(
            name=HF_REPO_GIT_TOOL_SPEC["name"],
            description=HF_REPO_GIT_TOOL_SPEC["description"],
            parameters=HF_REPO_GIT_TOOL_SPEC["parameters"],
            handler=hf_repo_git_handler,
        ),
        ToolSpec(
            name=GITHUB_FIND_EXAMPLES_TOOL_SPEC["name"],
            description=GITHUB_FIND_EXAMPLES_TOOL_SPEC["description"],
            parameters=GITHUB_FIND_EXAMPLES_TOOL_SPEC["parameters"],
            handler=github_find_examples_handler,
        ),
        ToolSpec(
            name=GITHUB_LIST_REPOS_TOOL_SPEC["name"],
            description=GITHUB_LIST_REPOS_TOOL_SPEC["description"],
            parameters=GITHUB_LIST_REPOS_TOOL_SPEC["parameters"],
            handler=github_list_repos_handler,
        ),
        ToolSpec(
            name=GITHUB_READ_FILE_TOOL_SPEC["name"],
            description=GITHUB_READ_FILE_TOOL_SPEC["description"],
            parameters=GITHUB_READ_FILE_TOOL_SPEC["parameters"],
            handler=github_read_file_handler,
        ),
    ]

    # Sandbox or local tools (highest priority)
    if local_mode:
        from agent.tools.local_tools import get_local_tools
        tools = get_local_tools() + tools
    else:
        tools = get_sandbox_tools() + tools

    tool_names = ", ".join([t.name for t in tools])
    logger.info(f"Loaded {len(tools)} built-in tools: {tool_names}")

    return tools


================================================
FILE: agent/main.py
================================================
"""
Interactive CLI chat with the agent

Supports two modes:
  Interactive:  python -m agent.main
  Headless:     python -m agent.main "find me bird datasets"
"""

import argparse
import asyncio
import json
import os
import signal
import sys
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Optional

import litellm
from prompt_toolkit import PromptSession

from agent.config import load_config
from agent.core.agent_loop import submission_loop
from agent.core import model_switcher
from agent.core.session import OpType
from agent.core.tools import ToolRouter
from agent.utils.reliability_checks import check_training_script_save_pattern
from agent.utils.terminal_display import (
    get_console,
    print_approval_header,
    print_approval_item,
    print_banner,
    print_compacted,
    print_error,
    print_help,
    print_init_done,
    print_interrupted,
    print_markdown,
    print_plan,
    print_tool_call,
    print_tool_log,
    print_tool_output,
    print_turn_complete,
    print_yolo_approve,
)

litellm.drop_params = True
# Suppress the "Give Feedback / Get Help" banner LiteLLM prints to stderr
# on every error — users don't need it, and our friendly errors cover the case.
litellm.suppress_debug_info = True

def _safe_get_args(arguments: dict) -> dict:
    """Safely extract args dict from arguments, handling cases where LLM passes string."""
    args = arguments.get("args", {})
    # Sometimes LLM passes args as string instead of dict
    if isinstance(args, str):
        return {}
    return args if isinstance(args, dict) else {}


def _get_hf_token() -> str | None:
    """Get HF token from environment, huggingface_hub API, or cached token file."""
    token = os.environ.get("HF_TOKEN")
    if token:
        return token
    try:
        from huggingface_hub import HfApi
        api = HfApi()
        token = api.token
        if token:
            return token
    except Exception:
        pass
    # Fallback: read the cached token file directly
    token_path = Path.home() / ".cache" / "huggingface" / "token"
    if token_path.exists():
        token = token_path.read_text().strip()
        if token:
            return token
    return None


async def _prompt_and_save_hf_token(prompt_session: PromptSession) -> str:
    """Prompt user for HF token, validate it, save via huggingface_hub.login(). Loops until valid."""
    from prompt_toolkit.formatted_text import HTML
    from huggingface_hub import HfApi, login

    print("\nA Hugging Face token is required.")
    print("Get one at: https://huggingface.co/settings/tokens\n")

    while True:
        try:
            token = await prompt_session.prompt_async(
                HTML("<b>Paste your HF token: </b>")
            )
        except (EOFError, KeyboardInterrupt):
            print("\nToken is required to continue.")
            continue

        token = token.strip()
        if not token:
            print("Token cannot be empty.")
            continue

        # Validate token against the API
        try:
            api = HfApi(token=token)
            user_info = api.whoami()
            username = user_info.get("name", "unknown")
            print(f"Token valid (user: {username})")
        except Exception:
            print("Invalid token. Please try again.")
            continue

        # Save for future sessions
        try:
            login(token=token, add_to_git_credential=False)
            print("Token saved to ~/.cache/huggingface/token")
        except Exception as e:
            print(f"Warning: could not persist token ({e}), using for this session only.")

        return token

@dataclass
class Operation:
    """Operation to be executed by the agent"""

    op_type: OpType
    data: Optional[dict[str, Any]] = None


@dataclass
class Submission:
    """Submission to the agent loop"""

    id: str
    operation: Operation


def _create_rich_console():
    """Get the shared rich Console."""
    return get_console()


class _ThinkingShimmer:
    """Animated shiny/shimmer thinking indicator — a bright gradient sweeps across the text."""

    _BASE = (90, 90, 110)       # dim base color
    _HIGHLIGHT = (255, 200, 80) # bright shimmer highlight (warm gold)
    _WIDTH = 5                  # shimmer width in characters
    _FPS = 24

    def __init__(self, console):
        self._console = console
        self._task = None
        self._running = False

    def start(self):
        if self._running:
            return
        self._running = True
        self._task = asyncio.ensure_future(self._animate())

    def stop(self):
        if not self._running:
            return  # no-op when never started (e.g. headless mode)
        self._running = False
        if self._task:
            self._task.cancel()
            self._task = None
        # Clear the shimmer line
        self._console.file.write("\r\033[K")
        self._console.file.flush()

    def _render_frame(self, text: str, offset: float) -> str:
        """Render one frame: a bright spot sweeps left-to-right across `text`."""
        out = []
        n = len(text)
        for i, ch in enumerate(text):
            # Distance from the shimmer center (wraps around)
            dist = abs(i - offset)
            wrap_dist = abs(i - offset + n + self._WIDTH)
            dist = min(dist, wrap_dist, abs(i - offset - n - self._WIDTH))
            # Blend factor: 1.0 at center, 0.0 beyond _WIDTH
            t = max(0.0, 1.0 - dist / self._WIDTH)
            t = t * t * (3 - 2 * t)  # smoothstep
            r = int(self._BASE[0] + (self._HIGHLIGHT[0] - self._BASE[0]) * t)
            g = int(self._BASE[1] + (self._HIGHLIGHT[1] - self._BASE[1]) * t)
            b = int(self._BASE[2] + (self._HIGHLIGHT[2] - self._BASE[2]) * t)
            out.append(f"\033[38;2;{r};{g};{b}m{ch}")
        out.append("\033[0m")
        return "".join(out)

    async def _animate(self):
        text = "Thinking..."
        n = len(text)
        speed = 0.45  # characters per frame
        pos = 0.0
        try:
            while self._running:
                frame = self._render_frame(text, pos)
                self._console.file.write(f"\r  {frame}")
                self._console.file.flush()
                pos = (pos + speed) % (n + self._WIDTH)
                await asyncio.sleep(1.0 / self._FPS)
        except asyncio.CancelledError:
            pass


class _StreamBuffer:
    """Accumulates streamed tokens, renders markdown block-by-block as complete
    blocks appear. A "block" is everything up to a paragraph break (\\n\\n).
    Unclosed code fences (odd count of ```) hold back flushing until closed so
    a code block is always rendered as one unit."""

    def __init__(self, console):
        self._console = console
        self._buffer = ""

    def add_chunk(self, text: str):
        self._buffer += text

    def _pop_block(self) -> str | None:
        """Extract the next complete block, or return None if nothing complete."""
        if self._buffer.count("```") % 2 == 1:
            return None  # inside an open code fence — wait for close
        idx = self._buffer.find("\n\n")
        if idx == -1:
            return None
        block = self._buffer[:idx]
        self._buffer = self._buffer[idx + 2:]
        return block

    async def flush_ready(
        self,
        cancel_event: "asyncio.Event | None" = None,
        instant: bool = False,
    ):
        """Render any complete blocks that have accumulated; leave the tail."""
        while True:
            if cancel_event is not None and cancel_event.is_set():
                return
            block = self._pop_block()
            if block is None:
                return
            if block.strip():
                await print_markdown(block, cancel_event=cancel_event, instant=instant)

    async def finish(
        self,
        cancel_event: "asyncio.Event | None" = None,
        instant: bool = False,
    ):
        """Flush complete blocks, then render whatever incomplete tail remains."""
        await self.flush_ready(cancel_event=cancel_event, instant=instant)
        if self._buffer.strip():
            await print_markdown(self._buffer, cancel_event=cancel_event, instant=instant)
        self._buffer = ""

    def discard(self):
        self._buffer = ""


async def event_listener(
    event_queue: asyncio.Queue,
    submission_queue: asyncio.Queue,
    turn_complete_event: asyncio.Event,
    ready_event: asyncio.Event,
    prompt_session: PromptSession,
    config=None,
    session_holder=None,
) -> None:
    """Background task that listens for events and displays them"""
    submission_id = [1000]
    last_tool_name = [None]
    console = _create_rich_console()
    shimmer = _ThinkingShimmer(console)
    stream_buf = _StreamBuffer(console)

    def _cancel_event():
        """Return the session's cancellation Event so print_markdown can abort
        its typewriter loop mid-stream when Ctrl+C fires."""
        s = session_holder[0] if session_holder else None
        return s._cancelled if s is not None else None

    while True:
        try:
            event = await event_queue.get()

            if event.event_type == "ready":
                tool_count = event.data.get("tool_count", 0) if event.data else 0
                print_init_done(tool_count=tool_count)
                ready_event.set()
            elif event.event_type == "assistant_message":
                shimmer.stop()
                content = event.data.get("content", "") if event.data else ""
                if content:
                    await print_markdown(content, cancel_event=_cancel_event())
            elif event.event_type == "assistant_chunk":
                content = event.data.get("content", "") if event.data else ""
                if content:
                    stream_buf.add_chunk(content)
                    # Flush any complete markdown blocks progressively so the
                    # user sees paragraphs appear as they're produced, not just
                    # at the end of the whole response.
                    shimmer.stop()
                    await stream_buf.flush_ready(cancel_event=_cancel_event())
            elif event.event_type == "assistant_stream_end":
                shimmer.stop()
                await stream_buf.finish(cancel_event=_cancel_event())
            elif event.event_type == "tool_call":
                shimmer.stop()
                stream_buf.discard()
                tool_name = event.data.get("tool", "") if event.data else ""
                arguments = event.data.get("arguments", {}) if event.data else {}
                if tool_name:
                    last_tool_name[0] = tool_name
                    # Skip printing research tool_call — the tool_log handler shows it
                    if tool_name != "research":
                        args_str = json.dumps(arguments)[:80]
                        print_tool_call(tool_name, args_str)
            elif event.event_type == "tool_output":
                output = event.data.get("output", "") if event.data else ""
                success = event.data.get("success", False) if event.data else False
                # Only show output for plan_tool — everything else is noise
                if last_tool_name[0] == "plan_tool" and output:
                    print_tool_output(output, success, truncate=False)
                shimmer.start()
            elif event.event_type == "turn_complete":
                shimmer.stop()
                stream_buf.discard()
                print_turn_complete()
                print_plan()
                turn_complete_event.set()
            elif event.event_type == "interrupted":
                shimmer.stop()
                stream_buf.discard()
                print_interrupted()
                turn_complete_event.set()
            elif event.event_type == "undo_complete":
                console.print("[dim]Undone.[/dim]")
                turn_complete_event.set()
            elif event.event_type == "tool_log":
                tool = event.data.get("tool", "") if event.data else ""
                log = event.data.get("log", "") if event.data else ""
                if log:
                    agent_id = event.data.get("agent_id", "") if event.data else ""
                    label = event.data.get("label", "") if event.data else ""
                    print_tool_log(tool, log, agent_id=agent_id, label=label)
            elif event.event_type == "tool_state_change":
                pass  # visual noise — approval flow handles this
            elif event.event_type == "error":
                shimmer.stop()
                stream_buf.discard()
                error = event.data.get("error", "Unknown error") if event.data else "Unknown error"
                print_error(error)
                turn_complete_event.set()
            elif event.event_type == "shutdown":
                shimmer.stop()
                stream_buf.discard()
                break
            elif event.event_type == "processing":
                shimmer.start()
            elif event.event_type == "compacted":
                old_tokens = event.data.get("old_tokens", 0) if event.data else 0
                new_tokens = event.data.get("new_tokens", 0) if event.data else 0
                print_compacted(old_tokens, new_tokens)
            elif event.event_type == "approval_required":
                # Handle batch approval format
                tools_data = event.data.get("tools", []) if event.data else []
                count = event.data.get("count", 0) if event.data else 0

                # If yolo mode is active, auto-approve everything
                if config and config.yolo_mode:
                    approvals = [
                        {
                            "tool_call_id": t.get("tool_call_id", ""),
                            "approved": True,
                            "feedback": None,
                        }
                        for t in tools_data
                    ]
                    print_yolo_approve(count)
                    submission_id[0] += 1
                    approval_submission = Submission(
                        id=f"approval_{submission_id[0]}",
                        operation=Operation(
                            op_type=OpType.EXEC_APPROVAL,
                            data={"approvals": approvals},
                        ),
                    )
                    await submission_queue.put(approval_submission)
                    continue

                print_approval_header(count)
                approvals = []

                # Ask for approval for each tool
                for i, tool_info in enumerate(tools_data, 1):
                    tool_name = tool_info.get("tool", "")
                    arguments = tool_info.get("arguments", {})
                    tool_call_id = tool_info.get("tool_call_id", "")

                    # Handle case where arguments might be a JSON string
                    if isinstance(arguments, str):
                        try:
                            arguments = json.loads(arguments)
                        except json.JSONDecodeError:
                            print(f"Warning: Failed to parse arguments for {tool_name}")
                            arguments = {}

                    operation = arguments.get("operation", "")

                    print_approval_item(i, count, tool_name, operation)

                    # Handle different tool types
                    if tool_name == "hf_jobs":
                        # Check if this is Python mode (script) or Docker mode (command)
                        script = arguments.get("script")
                        command = arguments.get("command")

                        if script:
                            # Python mode
                            dependencies = arguments.get("dependencies", [])
                            python_version = arguments.get("python")
                            script_args = arguments.get("script_args", [])

                            # Show full script
                            print(f"Script:\n{script}")
                            if dependencies:
                                print(f"Dependencies: {', '.join(dependencies)}")
                            if python_version:
                                print(f"Python version: {python_version}")
                            if script_args:
                                print(f"Script args: {' '.join(script_args)}")

                            # Run reliability checks on the full script (not truncated)
                            check_message = check_training_script_save_pattern(script)
                            if check_message:
                                print(check_message)
                        elif command:
                            # Docker mode
                            image = arguments.get("image", "python:3.12")
                            command_str = (
                                " ".join(command)
                                if isinstance(command, list)
                                else str(command)
                            )
                            print(f"Docker image: {image}")
                            print(f"Command: {command_str}")

                        # Common parameters for jobs
                        hardware_flavor = arguments.get("hardware_flavor", "cpu-basic")
                        timeout = arguments.get("timeout", "30m")
                        env = arguments.get("env", {})
                        schedule = arguments.get("schedule")

                        print(f"Hardware: {hardware_flavor}")
                        print(f"Timeout: {timeout}")

                        if env:
                            env_keys = ", ".join(env.keys())
                            print(f"Environment variables: {env_keys}")

                        if schedule:
                            print(f"Schedule: {schedule}")

                    elif tool_name == "hf_private_repos":
                        # Handle private repo operations
                        args = _safe_get_args(arguments)

                        if operation in ["create_repo", "upload_file"]:
                            repo_id = args.get("repo_id", "")
                            repo_type = args.get("repo_type", "dataset")

                            # Build repo URL
                            type_path = "" if repo_type == "model" else f"{repo_type}s"
                            repo_url = (
                                f"https://huggingface.co/{type_path}/{repo_id}".replace(
                                    "//", "/"
                                )
                            )

                            print(f"Repository: {repo_id}")
                            print(f"Type: {repo_type}")
                            print("Private: Yes")
                            print(f"URL: {repo_url}")

                            # Show file preview for upload_file operation
                            if operation == "upload_file":
                                path_in_repo = args.get("path_in_repo", "")
                                file_content = args.get("file_content", "")
                                print(f"File: {path_in_repo}")

                                if isinstance(file_content, str):
                                    # Calculate metrics
                                    all_lines = file_content.split("\n")
                                    line_count = len(all_lines)
                                    size_bytes = len(file_content.encode("utf-8"))
                                    size_kb = size_bytes / 1024
                                    size_mb = size_kb / 1024

                                    print(f"Line count: {line_count}")
                                    if size_kb < 1024:
                                        print(f"Size: {size_kb:.2f} KB")
                                    else:
                                        print(f"Size: {size_mb:.2f} MB")

                                    # Show preview
                                    preview_lines = all_lines[:5]
                                    preview = "\n".join(preview_lines)
                                    print(
                                        f"Content preview (first 5 lines):\n{preview}"
                                    )
                                    if len(all_lines) > 5:
                                        print("...")

                    elif tool_name == "hf_repo_files":
                        # Handle repo files operations (upload, delete)
                        repo_id = arguments.get("repo_id", "")
                        repo_type = arguments.get("repo_type", "model")
                        revision = arguments.get("revision", "main")

                        # Build repo URL
                        if repo_type == "model":
                            repo_url = f"https://huggingface.co/{repo_id}"
                        else:
                            repo_url = f"https://huggingface.co/{repo_type}s/{repo_id}"

                        print(f"Repository: {repo_id}")
                        print(f"Type: {repo_type}")
                        print(f"Branch: {revision}")
                        print(f"URL: {repo_url}")

                        if operation == "upload":
                            path = arguments.get("path", "")
                            content = arguments.get("content", "")
                            create_pr = arguments.get("create_pr", False)

                            print(f"File: {path}")
                            if create_pr:
                                print("Mode: Create PR")

                            if isinstance(content, str):
                                all_lines = content.split("\n")
                                line_count = len(all_lines)
                                size_bytes = len(content.encode("utf-8"))
                                size_kb = size_bytes / 1024

                                print(f"Lines: {line_count}")
                                if size_kb < 1024:
                                    print(f"Size: {size_kb:.2f} KB")
                                else:
                                    print(f"Size: {size_kb / 1024:.2f} MB")

                                # Show full content
                                print(f"Content:\n{content}")

                        elif operation == "delete":
                            patterns = arguments.get("patterns", [])
                            if isinstance(patterns, str):
                                patterns = [patterns]
                            print(f"Patterns to delete: {', '.join(patterns)}")

                    elif tool_name == "hf_repo_git":
                        # Handle git operations (branches, tags, PRs, repo management)
                        repo_id = arguments.get("repo_id", "")
                        repo_type = arguments.get("repo_type", "model")

                        # Build repo URL
                        if repo_type == "model":
                            repo_url = f"https://huggingface.co/{repo_id}"
                        else:
                            repo_url = f"https://huggingface.co/{repo_type}s/{repo_id}"

                        print(f"Repository: {repo_id}")
                        print(f"Type: {repo_type}")
                        print(f"URL: {repo_url}")

                        if operation == "delete_branch":
                            branch = arguments.get("branch", "")
                            print(f"Branch to delete: {branch}")

                        elif operation == "delete_tag":
                            tag = arguments.get("tag", "")
                            print(f"Tag to delete: {tag}")

                        elif operation == "merge_pr":
                            pr_num = arguments.get("pr_num", "")
                            print(f"PR to merge: #{pr_num}")

                        elif operation == "create_repo":
                            private = arguments.get("private", False)
                            space_sdk = arguments.get("space_sdk")
                            print(f"Private: {private}")
                            if space_sdk:
                                print(f"Space SDK: {space_sdk}")

                        elif operation == "update_repo":
                            private = arguments.get("private")
                            gated = arguments.get("gated")
                            if private is not None:
                                print(f"Private: {private}")
                            if gated is not None:
                                print(f"Gated: {gated}")

                    # Get user decision for this item. Ctrl+C / EOF here is
                    # treated as "reject remaining" (matches Codex's modal
                    # priority and Forgecode's approval-cancel path). Without
                    # this, KeyboardInterrupt kills the event listener and
                    # the main loop deadlocks waiting for turn_complete.
                    try:
                        response = await prompt_session.prompt_async(
                            f"Approve item {i}? (y=yes, yolo=approve all, n=no, or provide feedback): "
                        )
                    except (KeyboardInterrupt, EOFError):
                        get_console().print("[dim]Approval cancelled — rejecting remaining items[/dim]")
                        approvals.append(
                            {
                                "tool_call_id": tool_call_id,
                                "approved": False,
                                "feedback": "User cancelled approval",
                            }
                        )
                        for remaining in tools_data[i:]:
                            approvals.append(
                                {
                                    "tool_call_id": remaining.get("tool_call_id", ""),
                                    "approved": False,
                                    "feedback": None,
                                }
                            )
                        break

                    response = response.strip().lower()

                    # Handle yolo mode activation
                    if response == "yolo":
                        config.yolo_mode = True
                        print(
                            "YOLO MODE ACTIVATED - Auto-approving all future tool calls"
                        )
                        # Auto-approve this item and all remaining
                        approvals.append(
                            {
                                "tool_call_id": tool_call_id,
                                "approved": True,
                                "feedback": None,
                            }
                        )
                        for remaining in tools_data[i:]:
                            approvals.append(
                                {
                                    "tool_call_id": remaining.get("tool_call_id", ""),
                                    "approved": True,
                                    "feedback": None,
                                }
                            )
                        break

                    approved = response in ["y", "yes"]
                    feedback = None if approved or response in ["n", "no"] else response

                    approvals.append(
                        {
                            "tool_call_id": tool_call_id,
                            "approved": approved,
                            "feedback": feedback,
                        }
                    )

                # Submit batch approval
                submission_id[0] += 1
                approval_submission = Submission(
                    id=f"approval_{submission_id[0]}",
                    operation=Operation(
                        op_type=OpType.EXEC_APPROVAL,
                        data={"approvals": approvals},
                    ),
                )
                await submission_queue.put(approval_submission)
                console.print()  # spacing after approval
            # Silently ignore other events

        except asyncio.CancelledError:
            break
        except Exception as e:
            print(f"Event listener error: {e}")


async def get_user_input(prompt_session: PromptSession) -> str:
    """Get user input asynchronously"""
    from prompt_toolkit.formatted_text import HTML

    return await prompt_session.prompt_async(HTML("\n<b><cyan>></cyan></b> "))


# ── Slash command helpers ────────────────────────────────────────────────

# Slash commands are defined in terminal_display


async def _handle_slash_command(
    cmd: str,
    config,
    session_holder: list,
    submission_queue: asyncio.Queue,
    submission_id: list[int],
) -> Submission | None:
    """
    Handle a slash command. Returns a Submission to enqueue, or None if
    the command was handled locally (caller should set turn_complete_event).

    Async because ``/model`` fires a probe ping to validate the model+effort
    combo before committing the switch.
    """
    parts = cmd.strip().split(None, 1)
    command = parts[0].lower()
    arg = parts[1].strip() if len(parts) > 1 else ""

    if command == "/help":
        print_help()
        return None

    if command == "/undo":
        submission_id[0] += 1
        return Submission(
            id=f"sub_{submission_id[0]}",
            operation=Operation(op_type=OpType.UNDO),
        )

    if command == "/compact":
        submission_id[0] += 1
        return Submission(
            id=f"sub_{submission_id[0]}",
            operation=Operation(op_type=OpType.COMPACT),
        )

    if command == "/model":
        console = get_console()
        if not arg:
            model_switcher.print_model_listing(config, console)
            return None
        if not model_switcher.is_valid_model_id(arg):
            model_switcher.print_invalid_id(arg, console)
            return None
        normalized = arg.removeprefix("huggingface/")
        session = session_holder[0] if session_holder else None
        await model_switcher.probe_and_switch_model(
            normalized, config, session, console, _get_hf_token(),
        )
        return None

    if command == "/yolo":
        config.yolo_mode = not config.yolo_mode
        state = "ON" if config.yolo_mode else "OFF"
        print(f"YOLO mode: {state}")
        return None

    if command == "/effort":
        console = get_console()
        valid = {"minimal", "low", "medium", "high", "xhigh", "max", "off"}
        session = session_holder[0] if session_holder else None
        if not arg:
            current = config.reasoning_effort or "off"
            console.print(f"[bold]Reasoning effort preference:[/bold] {current}")
            if session and session.model_effective_effort:
                console.print("[dim]Probed per model:[/dim]")
                for m, eff in session.model_effective_effort.items():
                    console.print(f"  [dim]{m}: {eff or 'off'}[/dim]")
            console.print(
                "[dim]Set with '/effort minimal|low|medium|high|xhigh|max|off'. "
                "'max' and 'xhigh' are Anthropic-only; the cascade falls back "
                "to whatever the model actually accepts.[/dim]"
            )
            return None
        level = arg.lower()
        if level not in valid:
            console.print(f"[bold red]Invalid level:[/bold red] {arg}")
            console.print(f"[dim]Expected one of: {', '.join(sorted(valid))}[/dim]")
            return None
        config.reasoning_effort = None if level == "off" else level
        # Drop the per-model probe cache — the new preference may resolve
        # differently. Next ``/model`` (or the retry safety net) reprobes.
        if session is not None:
            session.model_effective_effort.clear()
        console.print(f"[green]Reasoning effort: {level}[/green]")
        if session is not None:
            console.print(
                "[dim]run /model <current> to re-probe, or send a message — "
                "the agent adjusts automatically if the new level isn't supported.[/dim]"
            )
        return None

    if command == "/status":
        session = session_holder[0] if session_holder else None
        print(f"Model: {config.model_name}")
        print(f"Reasoning effort: {config.reasoning_effort or 'off'}")
        if session:
            print(f"Turns: {session.turn_count}")
            print(f"Context items: {len(session.context_manager.items)}")
        return None

    print(f"Unknown command: {command}. Type /help for available commands.")
    return None


async def main():
    """Interactive chat with the agent"""

    # Clear screen
    os.system("clear" if os.name != "nt" else "cls")

    # Create prompt session for input (needed early for token prompt)
    prompt_session = PromptSession()

    # HF token — required, prompt if missing
    hf_token = _get_hf_token()
    if not hf_token:
        hf_token = await _prompt_and_save_hf_token(prompt_session)

    # Resolve username for banner
    hf_user = None
    try:
        from huggingface_hub import HfApi
        hf_user = HfApi(token=hf_token).whoami().get("name")
    except Exception:
        pass

    print_banner(hf_user=hf_user)

    # Pre-warm the HF router catalog in the background so /model switches
    # don't block on a network fetch.
    from agent.core import hf_router_catalog
    asyncio.create_task(asyncio.to_thread(hf_router_catalog.prewarm))

    # Create queues for communication
    submission_queue = asyncio.Queue()
    event_queue = asyncio.Queue()

    # Events to signal agent state
    turn_complete_event = asyncio.Event()
    turn_complete_event.set()
    ready_event = asyncio.Event()

    # Start agent loop in background
    config_path = Path(__file__).parent.parent / "configs" / "main_agent_config.json"
    config = load_config(config_path)

    # Create tool router with local mode
    tool_router = ToolRouter(config.mcpServers, hf_token=hf_token, local_mode=True)

    # Session holder for interrupt/model/status access
    session_holder = [None]

    agent_task = asyncio.create_task(
        submission_loop(
            submission_queue,
            event_queue,
            config=config,
            tool_router=tool_router,
            session_holder=session_holder,
            hf_token=hf_token,
            local_mode=True,
            stream=True,
        )
    )

    # Start event listener in background
    listener_task = asyncio.create_task(
        event_listener(
            event_queue,
            submission_queue,
            turn_complete_event,
            ready_event,
            prompt_session,
            config,
            session_holder=session_holder,
        )
    )

    await ready_event.wait()

    submission_id = [0]
    # Mirrors codex-rs/tui/src/bottom_pane/mod.rs:137
    # (`QUIT_SHORTCUT_TIMEOUT = Duration::from_secs(1)`). Two Ctrl+C presses
    # within this window quit; a single press cancels the in-flight turn.
    CTRL_C_QUIT_WINDOW = 1.0
    # Hint string matches codex-rs/tui/src/bottom_pane/footer.rs:746
    # (`" again to quit"` prefixed with the key binding, rendered dim).
    CTRL_C_HINT = "[dim]ctrl + c again to quit[/dim]"
    interrupt_state = {"last": 0.0, "exit": False}

    loop = asyncio.get_running_loop()

    def _on_sigint() -> None:
        """SIGINT handler — fires while the agent is generating (terminal is
        in cooked mode between prompts). Mirrors Codex's `on_ctrl_c` in
        codex-rs/tui/src/chatwidget.rs: first press cancels active work and
        arms the quit hint; second press within the window quits."""
        now = time.monotonic()
        session = session_holder[0]

        if now - interrupt_state["last"] < CTRL_C_QUIT_WINDOW:
            interrupt_state["exit"] = True
            if session:
                session.cancel()
            # Wake the main loop out of turn_complete_event.wait()
            turn_complete_event.set()
            return

        interrupt_state["last"] = now
        if session and not session.is_cancelled:
            session.cancel()
        get_console().print(f"\n{CTRL_C_HINT}")

    def _install_sigint() -> bool:
        try:
            loop.add_signal_handler(signal.SIGINT, _on_sigint)
            return True
        except (NotImplementedError, RuntimeError):
            return False  # Windows or non-main thread

    # prompt_toolkit's prompt_async installs its own SIGINT handler and, on
    # exit

Download .txt

gitextract_xwbm9csz/

├── .gitattributes
├── .github/
│   └── workflows/
│       ├── claude-review.yml
│       └── claude.yml
├── .gitignore
├── .python-version
├── Dockerfile
├── README.md
├── REVIEW.md
├── agent/
│   ├── README.md
│   ├── __init__.py
│   ├── config.py
│   ├── context_manager/
│   │   ├── __init__.py
│   │   └── manager.py
│   ├── core/
│   │   ├── __init__.py
│   │   ├── agent_loop.py
│   │   ├── doom_loop.py
│   │   ├── effort_probe.py
│   │   ├── hf_router_catalog.py
│   │   ├── llm_params.py
│   │   ├── model_switcher.py
│   │   ├── prompt_caching.py
│   │   ├── session.py
│   │   ├── session_uploader.py
│   │   └── tools.py
│   ├── main.py
│   ├── prompts/
│   │   ├── system_prompt.yaml
│   │   ├── system_prompt_v2.yaml
│   │   └── system_prompt_v3.yaml
│   ├── tools/
│   │   ├── __init__.py
│   │   ├── dataset_tools.py
│   │   ├── docs_tools.py
│   │   ├── edit_utils.py
│   │   ├── github_find_examples.py
│   │   ├── github_list_repos.py
│   │   ├── github_read_file.py
│   │   ├── hf_repo_files_tool.py
│   │   ├── hf_repo_git_tool.py
│   │   ├── jobs_tool.py
│   │   ├── local_tools.py
│   │   ├── papers_tool.py
│   │   ├── plan_tool.py
│   │   ├── private_hf_repo_tools.py
│   │   ├── research_tool.py
│   │   ├── sandbox_client.py
│   │   ├── sandbox_tool.py
│   │   ├── types.py
│   │   └── utilities.py
│   └── utils/
│       ├── __init__.py
│       ├── boot_timing.py
│       ├── braille.py
│       ├── crt_boot.py
│       ├── particle_logo.py
│       ├── reliability_checks.py
│       └── terminal_display.py
├── backend/
│   ├── __init__.py
│   ├── dependencies.py
│   ├── main.py
│   ├── models.py
│   ├── routes/
│   │   ├── __init__.py
│   │   ├── agent.py
│   │   └── auth.py
│   ├── session_manager.py
│   ├── start.sh
│   └── user_quotas.py
├── configs/
│   └── main_agent_config.json
├── frontend/
│   ├── eslint.config.js
│   ├── index.html
│   ├── package.json
│   ├── src/
│   │   ├── App.tsx
│   │   ├── components/
│   │   │   ├── Chat/
│   │   │   │   ├── ActivityStatusBar.tsx
│   │   │   │   ├── AssistantMessage.tsx
│   │   │   │   ├── ChatInput.tsx
│   │   │   │   ├── ExpiredBanner.tsx
│   │   │   │   ├── MarkdownContent.tsx
│   │   │   │   ├── MessageBubble.tsx
│   │   │   │   ├── MessageList.tsx
│   │   │   │   ├── ThinkingIndicator.tsx
│   │   │   │   ├── ToolCallGroup.tsx
│   │   │   │   └── UserMessage.tsx
│   │   │   ├── ClaudeCapDialog.tsx
│   │   │   ├── CodePanel/
│   │   │   │   └── CodePanel.tsx
│   │   │   ├── Layout/
│   │   │   │   └── AppLayout.tsx
│   │   │   ├── SessionChat.tsx
│   │   │   ├── SessionSidebar/
│   │   │   │   └── SessionSidebar.tsx
│   │   │   └── WelcomeScreen/
│   │   │       └── WelcomeScreen.tsx
│   │   ├── hooks/
│   │   │   ├── useAgentChat.ts
│   │   │   ├── useAuth.ts
│   │   │   ├── useOrgMembership.ts
│   │   │   └── useUserQuota.ts
│   │   ├── lib/
│   │   │   ├── backend-message-store.ts
│   │   │   ├── chat-message-store.ts
│   │   │   ├── convert-llm-messages.ts
│   │   │   ├── research-store.ts
│   │   │   └── sse-chat-transport.ts
│   │   ├── main.tsx
│   │   ├── store/
│   │   │   ├── agentStore.ts
│   │   │   ├── layoutStore.ts
│   │   │   └── sessionStore.ts
│   │   ├── theme.ts
│   │   ├── types/
│   │   │   ├── agent.ts
│   │   │   └── events.ts
│   │   ├── utils/
│   │   │   ├── api.ts
│   │   │   ├── logProcessor.ts
│   │   │   ├── logger.ts
│   │   │   └── model.ts
│   │   └── vite-env.d.ts
│   ├── tsconfig.json
│   └── vite.config.ts
├── pyproject.toml
└── tests/
    └── unit/
        └── test_user_quotas.py

Download .txt

SYMBOL INDEX (671 symbols across 81 files)

FILE: agent/config.py
  class Config (line 21) | class Config(BaseModel):
  function substitute_env_vars (line 47) | def substitute_env_vars(obj: Any) -> Any:
  function load_config (line 84) | def load_config(config_path: str = "config.json") -> Config:

FILE: agent/context_manager/manager.py
  function _get_hf_username (line 24) | def _get_hf_username(hf_token: str | None = None) -> str:
  function summarize_messages (line 98) | async def summarize_messages(
  class ContextManager (line 133) | class ContextManager:
    method __init__ (line 136) | def __init__(
    method _load_system_prompt (line 164) | def _load_system_prompt(
    method add_message (line 217) | def add_message(self, message: Message, token_count: int = None) -> None:
    method get_messages (line 223) | def get_messages(self) -> list[Message]:
    method _normalize_tool_calls (line 234) | def _normalize_tool_calls(msg: Message) -> None:
    method _patch_dangling_tool_calls (line 253) | def _patch_dangling_tool_calls(self) -> None:
    method undo_last_turn (line 296) | def undo_last_turn(self) -> bool:
    method truncate_to_user_message (line 314) | def truncate_to_user_message(self, user_message_index: int) -> bool:
    method compaction_threshold (line 338) | def compaction_threshold(self) -> int:
    method needs_compaction (line 343) | def needs_compaction(self) -> bool:
    method compact (line 346) | async def compact(

FILE: agent/core/agent_loop.py
  function _validate_tool_args (line 27) | def _validate_tool_args(tool_args: dict) -> tuple[bool, str | None]:
  function _needs_approval (line 49) | def _needs_approval(
  function _is_transient_error (line 124) | def _is_transient_error(error: Exception) -> bool:
  function _is_effort_config_error (line 140) | def _is_effort_config_error(error: Exception) -> bool:
  function _heal_effort_and_rebuild_params (line 152) | async def _heal_effort_and_rebuild_params(
  function _friendly_error_message (line 192) | def _friendly_error_message(error: Exception) -> str | None:
  function _compact_and_notify (line 234) | async def _compact_and_notify(session: Session) -> None:
  function _cleanup_on_cancel (line 261) | async def _cleanup_on_cancel(session: Session) -> None:
  class LLMResult (line 288) | class LLMResult:
  function _call_llm_streaming (line 296) | async def _call_llm_streaming(session: Session, messages, tools, llm_par...
  function _call_llm_non_streaming (line 391) | async def _call_llm_non_streaming(session: Session, messages, tools, llm...
  class Handlers (line 465) | class Handlers:
    method _abandon_pending_approval (line 469) | async def _abandon_pending_approval(session: Session) -> None:
    method run_agent (line 507) | async def run_agent(
    method undo (line 906) | async def undo(session: Session) -> None:
    method exec_approval (line 914) | async def exec_approval(session: Session, approvals: list[dict]) -> None:
    method shutdown (line 1150) | async def shutdown(session: Session) -> bool:
  function process_submission (line 1163) | async def process_submission(session: Session, submission) -> bool:
  function submission_loop (line 1198) | async def submission_loop(

FILE: agent/core/doom_loop.py
  class ToolCallSignature (line 19) | class ToolCallSignature:
  function _hash_args (line 26) | def _hash_args(args_str: str) -> str:
  function extract_recent_tool_signatures (line 31) | def extract_recent_tool_signatures(
  function detect_identical_consecutive (line 55) | def detect_identical_consecutive(
  function detect_repeating_sequence (line 74) | def detect_repeating_sequence(
  function check_for_doom_loop (line 103) | def check_for_doom_loop(messages: list[Message]) -> str | None:

FILE: agent/core/effort_probe.py
  class ProbeInconclusive (line 51) | class ProbeInconclusive(Exception):
  class ProbeOutcome (line 60) | class ProbeOutcome:
  function _is_thinking_unsupported (line 72) | def _is_thinking_unsupported(e: Exception) -> bool:
  function _is_invalid_effort (line 83) | def _is_invalid_effort(e: Exception) -> bool:
  function _is_transient (line 113) | def _is_transient(e: Exception) -> bool:
  function probe_effort (line 134) | async def probe_effort(

FILE: agent/core/hf_router_catalog.py
  class ProviderInfo (line 36) | class ProviderInfo:
  class ModelInfo (line 47) | class ModelInfo:
    method live_providers (line 52) | def live_providers(self) -> list[ProviderInfo]:
    method max_context_length (line 56) | def max_context_length(self) -> Optional[int]:
    method any_supports_tools (line 61) | def any_supports_tools(self) -> bool:
  function _fetch_catalog (line 65) | def _fetch_catalog(force: bool = False) -> dict:
  function _parse_entry (line 83) | def _parse_entry(entry: dict) -> ModelInfo:
  function lookup (line 101) | def lookup(model_id: str) -> Optional[ModelInfo]:
  function fuzzy_suggest (line 115) | def fuzzy_suggest(model_id: str, limit: int = 3) -> list[str]:
  function prewarm (line 123) | def prewarm() -> None:

FILE: agent/core/llm_params.py
  function _patch_litellm_effort_validation (line 11) | def _patch_litellm_effort_validation() -> None:
  class UnsupportedEffortError (line 79) | class UnsupportedEffortError(ValueError):
  function _resolve_llm_params (line 87) | def _resolve_llm_params(

FILE: agent/core/model_switcher.py
  function is_valid_model_id (line 38) | def is_valid_model_id(model_id: str) -> bool:
  function _print_hf_routing_info (line 57) | def _print_hf_routing_info(model_id: str, console) -> bool:
  function print_model_listing (line 127) | def print_model_listing(config, console) -> None:
  function print_invalid_id (line 143) | def print_invalid_id(arg: str, console) -> None:
  function probe_and_switch_model (line 153) | async def probe_and_switch_model(
  function _commit_switch (line 213) | def _commit_switch(model_id, config, session, effective, cache: bool) ->...

FILE: agent/core/prompt_caching.py
  function with_prompt_caching (line 19) | def with_prompt_caching(

FILE: agent/core/session.py
  function _get_max_tokens_safe (line 21) | def _get_max_tokens_safe(model_name: str) -> int:
  class OpType (line 52) | class OpType(Enum):
  class Event (line 62) | class Event:
  class Session (line 67) | class Session:
    method __init__ (line 73) | def __init__(
    method send_event (line 122) | async def send_event(self, event: Event) -> None:
    method cancel (line 135) | def cancel(self) -> None:
    method reset_cancel (line 139) | def reset_cancel(self) -> None:
    method is_cancelled (line 144) | def is_cancelled(self) -> bool:
    method update_model (line 147) | def update_model(self, model_name: str) -> None:
    method effective_effort_for (line 152) | def effective_effort_for(self, model_name: str) -> str | None:
    method increment_turn (line 165) | def increment_turn(self) -> None:
    method auto_save_if_needed (line 169) | async def auto_save_if_needed(self) -> None:
    method get_trajectory (line 185) | def get_trajectory(self) -> dict:
    method save_trajectory_local (line 196) | def save_trajectory_local(
    method update_local_save_status (line 235) | def update_local_save_status(
    method save_and_upload_detached (line 255) | def save_and_upload_detached(self, repo_id: str) -> Optional[str]:
    method retry_failed_uploads_detached (line 288) | def retry_failed_uploads_detached(

FILE: agent/core/session_uploader.py
  function upload_session_as_file (line 22) | def upload_session_as_file(
  function retry_failed_uploads (line 150) | def retry_failed_uploads(directory: str, repo_id: str):

FILE: agent/core/tools.py
  function convert_mcp_content_to_string (line 68) | def convert_mcp_content_to_string(content: list) -> str:
  class ToolSpec (line 117) | class ToolSpec:
  class ToolRouter (line 126) | class ToolRouter:
    method __init__ (line 132) | def __init__(self, mcp_servers: dict[str, MCPServerConfig], hf_token: ...
    method register_tool (line 150) | def register_tool(self, tool: ToolSpec) -> None:
    method register_mcp_tools (line 153) | async def register_mcp_tools(self) -> None:
    method register_openapi_tool (line 174) | async def register_openapi_tool(self) -> None:
    method get_tool_specs_for_llm (line 195) | def get_tool_specs_for_llm(self) -> list[dict[str, Any]]:
    method __aenter__ (line 211) | async def __aenter__(self) -> "ToolRouter":
    method __aexit__ (line 229) | async def __aexit__(self, exc_type, exc, tb) -> None:
    method call_tool (line 234) | async def call_tool(
  function create_builtin_tools (line 282) | def create_builtin_tools(local_mode: bool = False) -> list[ToolSpec]:

FILE: agent/main.py
  function _safe_get_args (line 53) | def _safe_get_args(arguments: dict) -> dict:
  function _get_hf_token (line 62) | def _get_hf_token() -> str | None:
  function _prompt_and_save_hf_token (line 84) | async def _prompt_and_save_hf_token(prompt_session: PromptSession) -> str:
  class Operation (line 126) | class Operation:
  class Submission (line 134) | class Submission:
  function _create_rich_console (line 141) | def _create_rich_console():
  class _ThinkingShimmer (line 146) | class _ThinkingShimmer:
    method __init__ (line 154) | def __init__(self, console):
    method start (line 159) | def start(self):
    method stop (line 165) | def stop(self):
    method _render_frame (line 176) | def _render_frame(self, text: str, offset: float) -> str:
    method _animate (line 195) | async def _animate(self):
  class _StreamBuffer (line 211) | class _StreamBuffer:
    method __init__ (line 217) | def __init__(self, console):
    method add_chunk (line 221) | def add_chunk(self, text: str):
    method _pop_block (line 224) | def _pop_block(self) -> str | None:
    method flush_ready (line 235) | async def flush_ready(
    method finish (line 250) | async def finish(
    method discard (line 261) | def discard(self):
  function event_listener (line 265) | async def event_listener(
  function get_user_input (line 692) | async def get_user_input(prompt_session: PromptSession) -> str:
  function _handle_slash_command (line 704) | async def _handle_slash_command(
  function main (line 809) | async def main():
  function headless_main (line 1029) | async def headless_main(
  function cli (line 1218) | def cli():

FILE: agent/tools/dataset_tools.py
  class SplitConfig (line 21) | class SplitConfig(TypedDict):
  function _get_headers (line 28) | def _get_headers(token: str | None = None) -> dict:
  function inspect_dataset (line 35) | async def inspect_dataset(
  function _format_status (line 148) | def _format_status(data: dict) -> str:
  function _extract_configs (line 160) | def _extract_configs(splits_data: dict) -> list[SplitConfig]:
  function _format_structure (line 171) | def _format_structure(configs: list[SplitConfig], max_rows: int = 10) ->...
  function _format_schema (line 199) | def _format_schema(info: dict, config: str) -> str:
  function _get_type_str (line 209) | def _get_type_str(col_info: dict) -> str:
  function _format_samples (line 220) | def _format_samples(rows_data: dict, config: str, split: str, limit: int...
  function _format_messages_structure (line 250) | def _format_messages_structure(messages_data: Any) -> str | None:
  function _format_parquet_files (line 353) | def _format_parquet_files(data: dict, max_rows: int = 10) -> str | None:
  function hf_inspect_dataset_handler (line 426) | async def hf_inspect_dataset_handler(arguments: dict[str, Any], session=...

FILE: agent/tools/docs_tools.py
  function _fetch_gradio_docs (line 66) | async def _fetch_gradio_docs(query: str | None = None) -> str:
  function _fetch_endpoint_docs (line 99) | async def _fetch_endpoint_docs(hf_token: str, endpoint: str) -> list[dic...
  function _get_docs (line 143) | async def _get_docs(hf_token: str, endpoint: str) -> list[dict[str, str]]:
  function _build_search_index (line 173) | async def _build_search_index(
  function _search_docs (line 216) | async def _search_docs(
  function _format_results (line 251) | def _format_results(
  function explore_hf_docs_handler (line 289) | async def explore_hf_docs_handler(
  function hf_docs_fetch_handler (line 382) | async def hf_docs_fetch_handler(
  function _fetch_openapi_spec (line 420) | async def _fetch_openapi_spec() -> dict[str, Any]:
  function _extract_all_tags (line 434) | def _extract_all_tags(spec: dict[str, Any]) -> list[str]:
  function _extract_all_endpoints (line 448) | def _extract_all_endpoints(spec: dict[str, Any]) -> list[dict[str, Any]]:
  function _build_openapi_index (line 487) | async def _build_openapi_index() -> tuple[Any, MultifieldParser, list[di...
  function _search_openapi (line 541) | async def _search_openapi(
  function _generate_curl_example (line 579) | def _generate_curl_example(endpoint: dict[str, Any]) -> str:
  function _format_parameters (line 620) | def _format_parameters(parameters: list[dict[str, Any]]) -> str:
  function _format_response_info (line 655) | def _format_response_info(responses: dict[str, Any]) -> str:
  function _format_openapi_results (line 673) | def _format_openapi_results(
  function search_openapi_handler (line 737) | async def search_openapi_handler(arguments: dict[str, Any]) -> tuple[str...
  function _get_api_search_tool_spec (line 786) | async def _get_api_search_tool_spec() -> dict[str, Any]:

FILE: agent/tools/edit_utils.py
  function _normalize_unicode (line 28) | def _normalize_unicode(s: str) -> str:
  function fuzzy_find (line 35) | def fuzzy_find(content: str, pattern: str) -> tuple[int | None, str | No...
  function _map_back (line 92) | def _map_back(
  function fuzzy_find_original_match (line 117) | def fuzzy_find_original_match(content: str, pattern: str) -> tuple[str |...
  function apply_edit (line 157) | def apply_edit(
  function validate_python (line 233) | def validate_python(content: str, path: str = "") -> list[str]:

FILE: agent/tools/github_find_examples.py
  function _get_repo_tree (line 56) | def _get_repo_tree(org: str, repo: str, token: str) -> tuple[List[Dict[s...
  function _search_similar_repos (line 112) | def _search_similar_repos(org: str, repo: str, token: str) -> List[Dict[...
  function _score_against_example_patterns (line 151) | def _score_against_example_patterns(file_path: str) -> int:
  function _score_against_keyword (line 160) | def _score_against_keyword(file_path: str, keyword: str) -> int:
  function _get_pattern_priority (line 171) | def _get_pattern_priority(file_path: str) -> tuple[int, int, int]:
  function _handle_repo_tree_errors (line 210) | def _handle_repo_tree_errors(
  function find_examples (line 267) | def find_examples(
  function github_find_examples_handler (line 448) | async def github_find_examples_handler(arguments: Dict[str, Any]) -> tup...

FILE: agent/tools/github_list_repos.py
  function list_repos (line 15) | def list_repos(
  function github_list_repos_handler (line 275) | async def github_list_repos_handler(arguments: Dict[str, Any]) -> tuple[...

FILE: agent/tools/github_read_file.py
  function _convert_ipynb_to_markdown (line 20) | def _convert_ipynb_to_markdown(content: str) -> str:
  function read_file (line 67) | def read_file(
  function github_read_file_handler (line 290) | async def github_read_file_handler(arguments: Dict[str, Any]) -> tuple[s...

FILE: agent/tools/hf_repo_files_tool.py
  function _async_call (line 18) | async def _async_call(func, *args, **kwargs):
  function _build_repo_url (line 23) | def _build_repo_url(repo_id: str, repo_type: str = "model") -> str:
  function _format_size (line 30) | def _format_size(size_bytes: int) -> str:
  class HfRepoFilesTool (line 39) | class HfRepoFilesTool:
    method __init__ (line 42) | def __init__(self, hf_token: Optional[str] = None):
    method execute (line 45) | async def execute(self, args: Dict[str, Any]) -> ToolResult:
    method _help (line 73) | def _help(self) -> ToolResult:
    method _list (line 89) | async def _list(self, args: Dict[str, Any]) -> ToolResult:
    method _read (line 125) | async def _read(self, args: Dict[str, Any]) -> ToolResult:
    method _upload (line 166) | async def _upload(self, args: Dict[str, Any]) -> ToolResult:
    method _delete (line 205) | async def _delete(self, args: Dict[str, Any]) -> ToolResult:
    method _error (line 236) | def _error(self, message: str) -> ToolResult:
  function hf_repo_files_handler (line 315) | async def hf_repo_files_handler(arguments: Dict[str, Any], session=None)...

FILE: agent/tools/hf_repo_git_tool.py
  function _async_call (line 24) | async def _async_call(func, *args, **kwargs):
  function _build_repo_url (line 29) | def _build_repo_url(repo_id: str, repo_type: str = "model") -> str:
  class HfRepoGitTool (line 36) | class HfRepoGitTool:
    method __init__ (line 39) | def __init__(self, hf_token: Optional[str] = None):
    method execute (line 42) | async def execute(self, args: Dict[str, Any]) -> ToolResult:
    method _help (line 79) | def _help(self) -> ToolResult:
    method _create_branch (line 111) | async def _create_branch(self, args: Dict[str, Any]) -> ToolResult:
    method _delete_branch (line 136) | async def _delete_branch(self, args: Dict[str, Any]) -> ToolResult:
    method _create_tag (line 161) | async def _create_tag(self, args: Dict[str, Any]) -> ToolResult:
    method _delete_tag (line 188) | async def _delete_tag(self, args: Dict[str, Any]) -> ToolResult:
    method _list_refs (line 213) | async def _list_refs(self, args: Dict[str, Any]) -> ToolResult:
    method _create_pr (line 250) | async def _create_pr(self, args: Dict[str, Any]) -> ToolResult:
    method _list_prs (line 278) | async def _list_prs(self, args: Dict[str, Any]) -> ToolResult:
    method _get_pr (line 314) | async def _get_pr(self, args: Dict[str, Any]) -> ToolResult:
    method _merge_pr (line 358) | async def _merge_pr(self, args: Dict[str, Any]) -> ToolResult:
    method _close_pr (line 382) | async def _close_pr(self, args: Dict[str, Any]) -> ToolResult:
    method _comment_pr (line 406) | async def _comment_pr(self, args: Dict[str, Any]) -> ToolResult:
    method _change_pr_status (line 432) | async def _change_pr_status(self, args: Dict[str, Any]) -> ToolResult:
    method _create_repo (line 464) | async def _create_repo(self, args: Dict[str, Any]) -> ToolResult:
    method _update_repo (line 495) | async def _update_repo(self, args: Dict[str, Any]) -> ToolResult:
    method _error (line 526) | def _error(self, message: str) -> ToolResult:
  function hf_repo_git_handler (line 656) | async def hf_repo_git_handler(arguments: Dict[str, Any], session=None) -...

FILE: agent/tools/jobs_tool.py
  function _filter_uv_install_output (line 82) | def _filter_uv_install_output(logs: list[str]) -> list[str]:
  function _strip_ansi (line 123) | def _strip_ansi(text: str) -> str:
  function _add_default_env (line 136) | def _add_default_env(params: Dict[str, Any] | None) -> Dict[str, Any]:
  function _add_environment_variables (line 143) | def _add_environment_variables(
  function _build_uv_command (line 163) | def _build_uv_command(
  function _wrap_inline_script (line 189) | def _wrap_inline_script(
  function _ensure_hf_transfer_dependency (line 204) | def _ensure_hf_transfer_dependency(deps: list[str] | None) -> list[str]:
  function _resolve_uv_command (line 216) | def _resolve_uv_command(
  function _async_call (line 236) | async def _async_call(func, *args, **kwargs):
  function _job_info_to_dict (line 241) | def _job_info_to_dict(job_info) -> Dict[str, Any]:
  function _scheduled_job_info_to_dict (line 255) | def _scheduled_job_info_to_dict(scheduled_job_info) -> Dict[str, Any]:
  class HfJobsTool (line 294) | class HfJobsTool:
    method __init__ (line 297) | def __init__(
    method execute (line 312) | async def execute(self, params: Dict[str, Any]) -> ToolResult:
    method _wait_for_job_completion (line 382) | async def _wait_for_job_completion(
    method _run_job (line 491) | async def _run_job(self, args: Dict[str, Any]) -> ToolResult:
    method _list_jobs (line 608) | async def _list_jobs(self, args: Dict[str, Any]) -> ToolResult:
    method _get_logs (line 645) | async def _get_logs(self, args: Dict[str, Any]) -> ToolResult:
    method _inspect_job (line 683) | async def _inspect_job(self, args: Dict[str, Any]) -> ToolResult:
    method _cancel_job (line 717) | async def _cancel_job(self, args: Dict[str, Any]) -> ToolResult:
    method _scheduled_run (line 740) | async def _scheduled_run(self, args: Dict[str, Any]) -> ToolResult:
    method _list_scheduled_jobs (line 813) | async def _list_scheduled_jobs(self, args: Dict[str, Any]) -> ToolResult:
    method _inspect_scheduled_job (line 849) | async def _inspect_scheduled_job(self, args: Dict[str, Any]) -> ToolRe...
    method _delete_scheduled_job (line 875) | async def _delete_scheduled_job(self, args: Dict[str, Any]) -> ToolRes...
    method _suspend_scheduled_job (line 898) | async def _suspend_scheduled_job(self, args: Dict[str, Any]) -> ToolRe...
    method _resume_scheduled_job (line 921) | async def _resume_scheduled_job(self, args: Dict[str, Any]) -> ToolRes...
  function hf_jobs_handler (line 1059) | async def hf_jobs_handler(

FILE: agent/tools/local_tools.py
  function _resolve_path (line 31) | def _resolve_path(path: str) -> str:
  function _atomic_write (line 38) | def _atomic_write(path: Path, content: str) -> None:
  function _strip_ansi (line 65) | def _strip_ansi(text: str) -> str:
  function _truncate_output (line 69) | def _truncate_output(output: str, max_chars: int = MAX_OUTPUT_CHARS, hea...
  function _bash_handler (line 96) | async def _bash_handler(args: dict[str, Any], **_kw) -> tuple[str, bool]:
  function _read_handler (line 129) | async def _read_handler(args: dict[str, Any], **_kw) -> tuple[str, bool]:
  function _write_handler (line 159) | async def _write_handler(args: dict[str, Any], **_kw) -> tuple[str, bool]:
  function _edit_handler (line 185) | async def _edit_handler(args: dict[str, Any], **_kw) -> tuple[str, bool]:
  function get_local_tools (line 409) | def get_local_tools():

FILE: agent/tools/papers_tool.py
  function _s2_paper_id (line 51) | def _s2_paper_id(arxiv_id: str) -> str:
  function _s2_cache_key (line 56) | def _s2_cache_key(path: str, params: dict | None) -> str:
  function _s2_request (line 62) | async def _s2_request(
  function _s2_get_json (line 104) | async def _s2_get_json(
  function _s2_get_paper (line 121) | async def _s2_get_paper(
  function _parse_paper_html (line 137) | def _parse_paper_html(html: str) -> dict[str, Any]:
  function _find_section (line 213) | def _find_section(sections: list[dict], query: str) -> dict | None:
  function _clean_description (line 245) | def _clean_description(text: str) -> str:
  function _truncate (line 252) | def _truncate(text: str, max_len: int) -> str:
  function _format_paper_list (line 258) | def _format_paper_list(
  function _format_paper_detail (line 294) | def _format_paper_detail(paper: dict, s2_data: dict | None = None) -> str:
  function _format_read_paper_toc (line 349) | def _format_read_paper_toc(parsed: dict[str, Any], arxiv_id: str) -> str:
  function _format_read_paper_section (line 371) | def _format_read_paper_section(section: dict, arxiv_id: str) -> str:
  function _format_datasets (line 387) | def _format_datasets(datasets: list, arxiv_id: str, sort: str) -> str:
  function _format_datasets_compact (line 414) | def _format_datasets_compact(datasets: list) -> str:
  function _format_models (line 425) | def _format_models(models: list, arxiv_id: str, sort: str) -> str:
  function _format_models_compact (line 449) | def _format_models_compact(models: list) -> str:
  function _format_collections (line 462) | def _format_collections(collections: list, arxiv_id: str) -> str:
  function _format_collections_compact (line 484) | def _format_collections_compact(collections: list) -> str:
  function _error (line 501) | def _error(message: str) -> ToolResult:
  function _validate_arxiv_id (line 510) | def _validate_arxiv_id(args: dict) -> str | None:
  function _op_trending (line 515) | async def _op_trending(args: dict[str, Any], limit: int) -> ToolResult:
  function _format_s2_paper_list (line 558) | def _format_s2_paper_list(papers: list[dict], title: str) -> str:
  function _s2_bulk_search (line 589) | async def _s2_bulk_search(query: str, args: dict[str, Any], limit: int) ...
  function _op_search (line 640) | async def _op_search(args: dict[str, Any], limit: int) -> ToolResult:
  function _op_paper_details (line 675) | async def _op_paper_details(args: dict[str, Any], limit: int) -> ToolRes...
  function _op_read_paper (line 692) | async def _op_read_paper(args: dict[str, Any], limit: int) -> ToolResult:
  function _format_citation_entry (line 757) | def _format_citation_entry(entry: dict, show_context: bool = False) -> str:
  function _format_citation_graph (line 783) | def _format_citation_graph(
  function _op_citation_graph (line 813) | async def _op_citation_graph(args: dict[str, Any], limit: int) -> ToolRe...
  function _op_find_datasets (line 854) | async def _op_find_datasets(args: dict[str, Any], limit: int) -> ToolRes...
  function _op_find_models (line 889) | async def _op_find_models(args: dict[str, Any], limit: int) -> ToolResult:
  function _op_find_collections (line 924) | async def _op_find_collections(args: dict[str, Any], limit: int) -> Tool...
  function _op_find_all_resources (line 949) | async def _op_find_all_resources(args: dict[str, Any], limit: int) -> To...
  function _format_snippets (line 1017) | def _format_snippets(snippets: list[dict], query: str) -> str:
  function _op_snippet_search (line 1046) | async def _op_snippet_search(args: dict[str, Any], limit: int) -> ToolRe...
  function _op_recommend (line 1093) | async def _op_recommend(args: dict[str, Any], limit: int) -> ToolResult:
  function hf_papers_handler (line 1274) | async def hf_papers_handler(arguments: dict[str, Any]) -> tuple[str, bool]:

FILE: agent/tools/plan_tool.py
  class PlanTool (line 12) | class PlanTool:
    method __init__ (line 15) | def __init__(self, session: Any = None):
    method execute (line 18) | async def execute(self, params: Dict[str, Any]) -> ToolResult:
  function get_current_plan (line 79) | def get_current_plan() -> List[Dict[str, str]]:
  function plan_tool_handler (line 126) | async def plan_tool_handler(

FILE: agent/tools/private_hf_repo_tools.py
  function _async_call (line 24) | async def _async_call(func, *args, **kwargs):
  function _build_repo_url (line 29) | def _build_repo_url(repo_id: str, repo_type: str = "dataset") -> str:
  function _content_to_bytes (line 35) | def _content_to_bytes(content: str | bytes) -> bytes:
  class PrivateHfRepoTool (line 42) | class PrivateHfRepoTool:
    method __init__ (line 45) | def __init__(self, hf_token: Optional[str] = None):
    method execute (line 48) | async def execute(self, params: Dict[str, Any]) -> ToolResult:
    method _show_help (line 101) | def _show_help(self) -> ToolResult:
    method _show_operation_help (line 232) | def _show_operation_help(self, operation: str) -> ToolResult:
    method _upload_file (line 237) | async def _upload_file(self, args: Dict[str, Any]) -> ToolResult:
    method _create_repo (line 338) | async def _create_repo(self, args: Dict[str, Any]) -> ToolResult:
    method _check_repo (line 407) | async def _check_repo(self, args: Dict[str, Any]) -> ToolResult:
    method _list_files (line 461) | async def _list_files(self, args: Dict[str, Any]) -> ToolResult:
    method _read_file (line 514) | async def _read_file(self, args: Dict[str, Any]) -> ToolResult:
  function private_hf_repo_handler (line 643) | async def private_hf_repo_handler(arguments: Dict[str, Any]) -> tuple[st...

FILE: agent/tools/research_tool.py
  function _get_research_model (line 217) | def _get_research_model(main_model: str) -> str:
  function research_handler (line 225) | async def research_handler(

FILE: agent/tools/sandbox_client.py
  class ToolResult (line 460) | class ToolResult:
    method __str__ (line 465) | def __str__(self):
    method to_dict (line 470) | def to_dict(self) -> dict:
  class Sandbox (line 475) | class Sandbox:
    method __post_init__ (line 493) | def __post_init__(self):
    class Cancelled (line 508) | class Cancelled(Exception):
    method create (line 512) | def create(
    method _setup_server (line 630) | def _setup_server(space_id: str, api: HfApi, *, log: Callable[[str], o...
    method connect (line 651) | def connect(cls, space_id: str, *, token: str | None = None) -> Sandbox:
    method _wait_for_api (line 661) | def _wait_for_api(self, timeout: int = API_WAIT_TIMEOUT, log: Callable...
    method delete (line 681) | def delete(self):
    method pause (line 693) | def pause(self):
    method restart (line 697) | def restart(self):
    method url (line 703) | def url(self) -> str:
    method status (line 708) | def status(self) -> str:
    method __enter__ (line 712) | def __enter__(self) -> Sandbox:
    method __exit__ (line 715) | def __exit__(self, *exc):
    method _call (line 725) | def _call(
    method bash (line 786) | def bash(
    method read (line 804) | def read(
    method write (line 817) | def write(self, path: str, content: str) -> ToolResult:
    method edit (line 833) | def edit(
    method kill_all (line 855) | def kill_all(self) -> ToolResult:
    method tool_definitions (line 1026) | def tool_definitions(cls) -> list[dict]:
    method call_tool (line 1029) | def call_tool(self, name: str, arguments: dict[str, Any]) -> ToolResult:

FILE: agent/tools/sandbox_tool.py
  function _looks_like_path (line 24) | def _looks_like_path(script: str) -> bool:
  function resolve_sandbox_script (line 38) | async def resolve_sandbox_script(
  function _ensure_sandbox (line 68) | async def _ensure_sandbox(
  function sandbox_create_handler (line 203) | async def sandbox_create_handler(
  function _make_tool_handler (line 237) | def _make_tool_handler(sandbox_tool_name: str):
  function get_sandbox_tools (line 264) | def get_sandbox_tools():

FILE: agent/tools/types.py
  class ToolResult (line 10) | class ToolResult(TypedDict, total=False):

FILE: agent/tools/utilities.py
  function truncate (line 13) | def truncate(text: str, max_length: int) -> str:
  function format_date (line 20) | def format_date(date_str: Optional[str]) -> str:
  function format_command (line 31) | def format_command(command: Optional[List[str]]) -> str:
  function get_image_or_space (line 38) | def get_image_or_space(job: Dict[str, Any]) -> str:
  function format_jobs_table (line 47) | def format_jobs_table(jobs: List[Dict[str, Any]]) -> str:
  function format_scheduled_jobs_table (line 85) | def format_scheduled_jobs_table(jobs: List[Dict[str, Any]]) -> str:
  function format_job_details (line 129) | def format_job_details(jobs: Any) -> str:
  function format_scheduled_job_details (line 137) | def format_scheduled_job_details(jobs: Any) -> str:

FILE: agent/utils/boot_timing.py
  function settle_curve (line 6) | def settle_curve(progress: float, sharpness: float = 3.0) -> float:
  function warm_gold_from_white (line 12) | def warm_gold_from_white(progress: float) -> tuple[int, int, int]:

FILE: agent/utils/braille.py
  class BrailleCanvas (line 19) | class BrailleCanvas:
    method __init__ (line 22) | def __init__(self, term_width: int, term_height: int):
    method clear (line 29) | def clear(self) -> None:
    method set_pixel (line 33) | def set_pixel(self, x: int, y: int) -> None:
    method render (line 39) | def render(self) -> list[str]:
  function _define_font (line 55) | def _define_font() -> None:
  function text_to_pixels (line 102) | def text_to_pixels(text: str, scale: int = 1) -> list[tuple[int, int]]:

FILE: agent/utils/crt_boot.py
  function _glitch_text (line 17) | def _glitch_text(text: str, intensity: float, rng: random.Random) -> str:
  function run_boot_sequence (line 27) | def run_boot_sequence(console: Console, boot_lines: list[tuple[str, str]...

FILE: agent/utils/particle_logo.py
  class Particle (line 23) | class Particle:
    method __init__ (line 26) | def __init__(self, x: float, y: float, target_x: float, target_y: floa...
    method update_converge (line 36) | def update_converge(self, t: float, strength: float = 0.08, damping: f...
    method at_target (line 61) | def at_target(self) -> bool:
  function run_particle_logo (line 65) | def run_particle_logo(console: Console, hold_seconds: float = 1.5) -> None:

FILE: agent/utils/reliability_checks.py
  function check_training_script_save_pattern (line 4) | def check_training_script_save_pattern(script: str) -> str | None:

FILE: agent/utils/terminal_display.py
  class _LeftHeading (line 13) | class _LeftHeading(Heading):
    method __rich_console__ (line 17) | def __rich_console__(self, console, options):
  function _clip_to_width (line 28) | def _clip_to_width(s: str, width: int) -> str:
  function get_console (line 84) | def get_console() -> Console:
  function print_banner (line 90) | def print_banner(model: str | None = None, hf_user: str | None = None) -...
  function print_init_done (line 123) | def print_init_done(tool_count: int = 0) -> None:
  function print_tool_call (line 146) | def print_tool_call(tool_name: str, args_preview: str) -> None:
  function print_tool_output (line 161) | def print_tool_output(output: str, success: bool, truncate: bool = True)...
  class SubAgentDisplayManager (line 170) | class SubAgentDisplayManager:
    method __init__ (line 180) | def __init__(self):
    method start (line 185) | def start(self, agent_id: str, label: str = "research") -> None:
    method set_tokens (line 199) | def set_tokens(self, agent_id: str, tokens: int) -> None:
    method set_tool_count (line 203) | def set_tool_count(self, agent_id: str, count: int) -> None:
    method add_call (line 207) | def add_call(self, agent_id: str, tool_desc: str) -> None:
    method clear (line 212) | def clear(self, agent_id: str) -> None:
    method _render_completion_line (line 233) | def _render_completion_line(agent: dict) -> str:
    method _tick (line 242) | async def _tick(self) -> None:
    method _format_stats (line 253) | def _format_stats(agent: dict) -> str:
    method _erase (line 267) | def _erase(self) -> None:
    method _render_agent_lines (line 274) | def _render_agent_lines(self, agent: dict, compact: bool = False) -> l...
    method _redraw (line 302) | def _redraw(self) -> None:
  function print_tool_log (line 320) | def print_tool_log(tool: str, log: str, agent_id: str = "", label: str =...
  function print_markdown (line 340) | async def print_markdown(
  function print_error (line 403) | def print_error(message: str) -> None:
  function print_turn_complete (line 407) | def print_turn_complete() -> None:
  function print_interrupted (line 411) | def print_interrupted() -> None:
  function print_compacted (line 415) | def print_compacted(old_tokens: int, new_tokens: int) -> None:
  function print_approval_header (line 421) | def print_approval_header(count: int) -> None:
  function print_approval_item (line 427) | def print_approval_item(index: int, total: int, tool_name: str, operatio...
  function print_yolo_approve (line 431) | def print_yolo_approve(count: int) -> None:
  function print_help (line 449) | def print_help() -> None:
  function format_plan_display (line 457) | def format_plan_display() -> str:
  function print_plan (line 482) | def print_plan() -> None:
  function format_plan_tool_output (line 490) | def format_plan_tool_output(todos: list) -> str:
  function _truncate (line 512) | def _truncate(text: str, max_lines: int = 6) -> str:

FILE: backend/dependencies.py
  function _validate_token (line 40) | async def _validate_token(token: str) -> dict[str, Any] | None:
  function _user_from_info (line 72) | def _user_from_info(user_info: dict[str, Any]) -> dict[str, Any]:
  function _normalize_plan (line 83) | def _normalize_plan(whoami: dict[str, Any]) -> str:
  function _fetch_user_plan (line 118) | async def _fetch_user_plan(token: str) -> str:
  function _extract_user_from_token (line 155) | async def _extract_user_from_token(token: str) -> dict[str, Any] | None:
  function check_org_membership (line 165) | async def check_org_membership(token: str, org_name: str) -> bool:
  function get_current_user (line 190) | async def get_current_user(request: Request) -> dict[str, Any]:
  function _extract_token (line 224) | def _extract_token(request: Request) -> str | None:
  function require_huggingface_org_member (line 235) | async def require_huggingface_org_member(request: Request) -> bool:

FILE: backend/main.py
  function lifespan (line 27) | async def lifespan(app: FastAPI):
  function api_root (line 69) | async def api_root():

FILE: backend/models.py
  class OpType (line 9) | class OpType(str, Enum):
  class Operation (line 20) | class Operation(BaseModel):
  class Submission (line 27) | class Submission(BaseModel):
  class ToolApproval (line 34) | class ToolApproval(BaseModel):
  class ApprovalRequest (line 43) | class ApprovalRequest(BaseModel):
  class SubmitRequest (line 50) | class SubmitRequest(BaseModel):
  class TruncateRequest (line 57) | class TruncateRequest(BaseModel):
  class SessionResponse (line 63) | class SessionResponse(BaseModel):
  class PendingApprovalTool (line 70) | class PendingApprovalTool(BaseModel):
  class SessionInfo (line 78) | class SessionInfo(BaseModel):
  class HealthResponse (line 91) | class HealthResponse(BaseModel):
  class LLMHealthResponse (line 99) | class LLMHealthResponse(BaseModel):

FILE: backend/routes/agent.py
  function _is_anthropic_model (line 71) | def _is_anthropic_model(model_id: str) -> bool:
  function _require_hf_for_anthropic (line 75) | async def _require_hf_for_anthropic(request: Request, model_id: str) -> ...
  function _enforce_claude_quota (line 100) | async def _enforce_claude_quota(
  function _check_session_access (line 139) | def _check_session_access(session_id: str, user: dict[str, Any]) -> None:
  function health_check (line 149) | async def health_check() -> HealthResponse:
  function llm_health_check (line 159) | async def llm_health_check() -> LLMHealthResponse:
  function get_model (line 212) | async def get_model() -> dict:
  function generate_title (line 224) | async def generate_title(
  function create_session (line 278) | async def create_session(
  function restore_session_summary (line 332) | async def restore_session_summary(
  function get_session (line 387) | async def get_session(
  function set_session_model (line 397) | async def set_session_model(
  function get_user_quota (line 432) | async def get_user_quota(user: dict = Depends(get_current_user)) -> dict:
  function list_sessions (line 446) | async def list_sessions(user: dict = Depends(get_current_user)) -> list[...
  function delete_session (line 453) | async def delete_session(
  function submit_input (line 465) | async def submit_input(
  function submit_approval (line 480) | async def submit_approval(
  function chat_sse (line 501) | async def chat_sse(
  function _sse_response (line 572) | def _sse_response(broadcaster, event_queue, sub_id) -> StreamingResponse:
  function subscribe_events (line 606) | async def subscribe_events(
  function interrupt_session (line 627) | async def interrupt_session(
  function get_session_messages (line 639) | async def get_session_messages(
  function undo_session (line 651) | async def undo_session(session_id: str, user: dict = Depends(get_current...
  function truncate_session (line 661) | async def truncate_session(
  function compact_session (line 673) | async def compact_session(
  function shutdown_session (line 685) | async def shutdown_session(

FILE: backend/routes/auth.py
  function _cleanup_expired_states (line 29) | def _cleanup_expired_states() -> None:
  function get_redirect_uri (line 37) | def get_redirect_uri(request: Request) -> str:
  function oauth_login (line 48) | async def oauth_login(request: Request) -> RedirectResponse:
  function oauth_callback (line 83) | async def oauth_callback(
  function logout (line 152) | async def logout() -> RedirectResponse:
  function auth_status (line 160) | async def auth_status() -> dict:
  function get_me (line 166) | async def get_me(user: dict = Depends(get_current_user)) -> dict:
  function org_membership (line 178) | async def org_membership(

FILE: backend/session_manager.py
  class Operation (line 23) | class Operation:
  class Submission (line 31) | class Submission:
  class EventBroadcaster (line 41) | class EventBroadcaster:
    method __init__ (line 49) | def __init__(self, event_queue: asyncio.Queue):
    method subscribe (line 54) | def subscribe(self) -> tuple[int, asyncio.Queue]:
    method unsubscribe (line 62) | def unsubscribe(self, sub_id: int) -> None:
    method run (line 65) | async def run(self) -> None:
  class AgentSession (line 80) | class AgentSession:
  class SessionCapacityError (line 100) | class SessionCapacityError(Exception):
    method __init__ (line 103) | def __init__(self, message: str, error_type: str = "global") -> None:
  class SessionManager (line 117) | class SessionManager:
    method __init__ (line 120) | def __init__(self, config_path: str | None = None) -> None:
    method _count_user_sessions (line 125) | def _count_user_sessions(self, user_id: str) -> int:
    method create_session (line 133) | async def create_session(
    method seed_from_summary (line 225) | async def seed_from_summary(self, session_id: str, messages: list[dict...
    method _cleanup_sandbox (line 289) | async def _cleanup_sandbox(session: Session) -> None:
    method _run_session (line 299) | async def _run_session(
    method submit (line 365) | async def submit(self, session_id: str, operation: Operation) -> bool:
    method submit_user_input (line 378) | async def submit_user_input(self, session_id: str, text: str) -> bool:
    method submit_approval (line 383) | async def submit_approval(
    method interrupt (line 392) | async def interrupt(self, session_id: str) -> bool:
    method undo (line 400) | async def undo(self, session_id: str) -> bool:
    method truncate (line 405) | async def truncate(self, session_id: str, user_message_index: int) -> ...
    method compact (line 413) | async def compact(self, session_id: str) -> bool:
    method shutdown_session (line 418) | async def shutdown_session(self, session_id: str) -> bool:
    method delete_session (line 435) | async def delete_session(self, session_id: str) -> bool:
    method get_session_owner (line 456) | def get_session_owner(self, session_id: str) -> str | None:
    method verify_session_access (line 463) | def verify_session_access(self, session_id: str, user_id: str) -> bool:
    method get_session_info (line 477) | def get_session_info(self, session_id: str) -> dict[str, Any] | None:
    method list_sessions (line 511) | def list_sessions(self, user_id: str | None = None) -> list[dict[str, ...
    method active_session_count (line 529) | def active_session_count(self) -> int:

FILE: backend/user_quotas.py
  function _today (line 29) | def _today() -> str:
  function daily_cap_for (line 33) | def daily_cap_for(plan: str | None) -> int:
  function get_claude_used_today (line 38) | async def get_claude_used_today(user_id: str) -> int:
  function increment_claude (line 52) | async def increment_claude(user_id: str) -> int:
  function refund_claude (line 64) | async def refund_claude(user_id: str) -> None:
  function _reset_for_tests (line 81) | def _reset_for_tests() -> None:

FILE: frontend/src/App.tsx
  function App (line 5) | function App() {

FILE: frontend/src/components/Chat/ActivityStatusBar.tsx
  constant TOOL_LABELS (line 11) | const TOOL_LABELS: Record<string, string> = {
  function formatResearchStatus (line 24) | function formatResearchStatus(raw: string): string {
  function statusLabel (line 99) | function statusLabel(status: ActivityStatus): string {
  function ActivityStatusBar (line 119) | function ActivityStatusBar() {

FILE: frontend/src/components/Chat/AssistantMessage.tsx
  type AssistantMessageProps (line 8) | interface AssistantMessageProps {
  type DynamicToolPart (line 18) | type DynamicToolPart = Extract<UIMessage['parts'][number], { type: 'dyna...
  function groupParts (line 20) | function groupParts(parts: UIMessage['parts']) {
  function AssistantMessage (line 46) | function AssistantMessage({ message, isStreaming = false, approveTools }...

FILE: frontend/src/components/Chat/ChatInput.tsx
  type ModelOption (line 13) | interface ModelOption {
  constant MODEL_OPTIONS (line 27) | const MODEL_OPTIONS: ModelOption[] = [
  type ChatInputProps (line 64) | interface ChatInputProps {
  function ChatInput (line 76) | function ChatInput({ sessionId, onSend, onStop, isProcessing = false, di...

FILE: frontend/src/components/Chat/ExpiredBanner.tsx
  type Props (line 16) | interface Props {
  function ExpiredBanner (line 20) | function ExpiredBanner({ sessionId }: Props) {

FILE: frontend/src/components/Chat/MarkdownContent.tsx
  type MarkdownContentProps (line 7) | interface MarkdownContentProps {
  function useThrottledValue (line 117) | function useThrottledValue(value: string, isStreaming: boolean, interval...
  function MarkdownContent (line 163) | function MarkdownContent({ content, sx, isStreaming = false }: MarkdownC...

FILE: frontend/src/components/Chat/MessageBubble.tsx
  type MessageBubbleProps (line 5) | interface MessageBubbleProps {
  function MessageBubble (line 15) | function MessageBubble({

FILE: frontend/src/components/Chat/MessageList.tsx
  type MessageListProps (line 8) | interface MessageListProps {
  function getGreeting (line 16) | function getGreeting(): string {
  function WelcomeGreeting (line 23) | function WelcomeGreeting() {
  function MessageList (line 60) | function MessageList({ messages, isProcessing, approveTools, onUndoLastT...

FILE: frontend/src/components/Chat/ThinkingIndicator.tsx
  function ThinkingIndicator (line 4) | function ThinkingIndicator() {

FILE: frontend/src/components/Chat/ToolCallGroup.tsx
  type DynamicToolPart (line 19) | type DynamicToolPart = Extract<UIMessage['parts'][number], { type: 'dyna...
  type ToolPartState (line 21) | type ToolPartState = DynamicToolPart['state'];
  function isCancelledTool (line 24) | function isCancelledTool(tool: DynamicToolPart): boolean {
  type ToolCallGroupProps (line 30) | interface ToolCallGroupProps {
  function useSecondTick (line 42) | function useSecondTick(enabled: boolean): void {
  function computeElapsed (line 52) | function computeElapsed(startedAt: number | null): number | null {
  function formatTokens (line 58) | function formatTokens(tokens: number): string {
  function formatElapsed (line 63) | function formatElapsed(seconds: number): string {
  function researchChipLabel (line 69) | function researchChipLabel(
  function parseStepArgs (line 84) | function parseStepArgs(step: string): Record<string, string> {
  function formatResearchStep (line 114) | function formatResearchStep(raw: string): { label: string } {
  function ResearchSteps (line 182) | function ResearchSteps({ steps }: { steps: string[] }) {
  constant HARDWARE_PRICING (line 226) | const HARDWARE_PRICING: Record<string, string> = {
  function costLabel (line 245) | function costLabel(hardware: string): string | null {
  function StatusIcon (line 253) | function StatusIcon({ state, cancelled, isRejected }: { state: ToolPartS...
  function statusLabel (line 275) | function statusLabel(state: ToolPartState): string | null {
  function statusColor (line 287) | function statusColor(state: ToolPartState): string {
  function InlineApproval (line 302) | function InlineApproval({
  constant EMPTY_AGENTS (line 517) | const EMPTY_AGENTS: Record<string, ResearchAgentState> = {};
  function ToolCallGroup (line 519) | function ToolCallGroup({ tools, approveTools }: ToolCallGroupProps) {

FILE: frontend/src/components/Chat/UserMessage.tsx
  type UserMessageProps (line 9) | interface UserMessageProps {
  function extractText (line 17) | function extractText(message: UIMessage): string {
  function UserMessage (line 24) | function UserMessage({

FILE: frontend/src/components/ClaudeCapDialog.tsx
  constant HF_PRICING_URL (line 13) | const HF_PRICING_URL = 'https://huggingface.co/pricing';
  constant PRO_CAP (line 14) | const PRO_CAP = 20;
  type ClaudeCapDialogProps (line 16) | interface ClaudeCapDialogProps {
  function ClaudeCapDialog (line 24) | function ClaudeCapDialog({

FILE: frontend/src/components/CodePanel/CodePanel.tsx
  function PlanStatusIcon (line 24) | function PlanStatusIcon({ status }: { status: string }) {
  function ViewToggle (line 91) | function ViewToggle({ view, icon, label, isActive, onClick }: {
  function CodePanel (line 130) | function CodePanel() {

FILE: frontend/src/components/Layout/AppLayout.tsx
  constant DRAWER_WIDTH (line 29) | const DRAWER_WIDTH = 260;
  function AppLayout (line 31) | function AppLayout() {

FILE: frontend/src/components/SessionChat.tsx
  type SessionChatProps (line 18) | interface SessionChatProps {
  function SessionChat (line 24) | function SessionChat({ sessionId, isActive, onSessionDead }: SessionChat...

FILE: frontend/src/components/SessionSidebar/SessionSidebar.tsx
  type SessionSidebarProps (line 23) | interface SessionSidebarProps {
  function SessionSidebar (line 27) | function SessionSidebar({ onClose }: SessionSidebarProps) {

FILE: frontend/src/components/WelcomeScreen/WelcomeScreen.tsx
  constant HF_ORANGE (line 20) | const HF_ORANGE = '#FF9D00';
  constant ORG_JOIN_URL (line 21) | const ORG_JOIN_URL =
  type StepStatus (line 28) | type StepStatus = 'completed' | 'active' | 'locked';
  type ChecklistStepProps (line 30) | interface ChecklistStepProps {
  function StepIndicator (line 44) | function StepIndicator({ status, stepNumber }: { status: StepStatus; ste...
  function ChecklistStep (line 69) | function ChecklistStep({
  function WelcomeScreen (line 185) | function WelcomeScreen() {

FILE: frontend/src/hooks/useAgentChat.ts
  type UseAgentChatOptions (line 24) | interface UseAgentChatOptions {
  function useAgentChat (line 32) | function useAgentChat({ sessionId, isActive, onReady, onError, onSession...

FILE: frontend/src/hooks/useAuth.ts
  function isInIframe (line 16) | function isInIframe(): boolean {
  function triggerLogin (line 25) | function triggerLogin(): void {
  function useAuth (line 33) | function useAuth() {

FILE: frontend/src/hooks/useOrgMembership.ts
  constant POLL_INTERVAL_MS (line 9) | const POLL_INTERVAL_MS = 3000;
  function useOrgMembership (line 15) | function useOrgMembership(enabled: boolean) {

FILE: frontend/src/hooks/useUserQuota.ts
  type PlanTier (line 12) | type PlanTier = 'free' | 'pro' | 'org';
  type UserQuota (line 14) | interface UserQuota {
  function useUserQuota (line 21) | function useUserQuota() {

FILE: frontend/src/lib/backend-message-store.ts
  constant STORAGE_KEY (line 9) | const STORAGE_KEY = 'hf-agent-backend-messages';
  constant MAX_SESSIONS (line 10) | const MAX_SESSIONS = 50;
  type MessagesMap (line 12) | type MessagesMap = Record<string, unknown[]>;
  function readAll (line 14) | function readAll(): MessagesMap {
  function writeAll (line 28) | function writeAll(map: MessagesMap): void {
  function loadBackendMessages (line 37) | function loadBackendMessages(sessionId: string): unknown[] {
  function saveBackendMessages (line 42) | function saveBackendMessages(sessionId: string, messages: unknown[]): vo...
  function moveBackendMessages (line 55) | function moveBackendMessages(fromId: string, toId: string): void {
  function deleteBackendMessages (line 63) | function deleteBackendMessages(sessionId: string): void {

FILE: frontend/src/lib/chat-message-store.ts
  constant STORAGE_KEY (line 11) | const STORAGE_KEY = 'hf-agent-messages';
  constant MAX_SESSIONS (line 12) | const MAX_SESSIONS = 50;
  type MessagesMap (line 14) | type MessagesMap = Record<string, UIMessage[]>;
  function readAll (line 16) | function readAll(): MessagesMap {
  function writeAll (line 31) | function writeAll(map: MessagesMap): void {
  function loadMessages (line 39) | function loadMessages(sessionId: string): UIMessage[] {
  function saveMessages (line 45) | function saveMessages(sessionId: string, messages: UIMessage[]): void {
  function deleteMessages (line 59) | function deleteMessages(sessionId: string): void {
  function moveMessages (line 65) | function moveMessages(fromId: string, toId: string): void {

FILE: frontend/src/lib/convert-llm-messages.ts
  type LLMToolCall (line 6) | interface LLMToolCall {
  type LLMMessage (line 11) | interface LLMMessage {
  function nextId (line 22) | function nextId(): string {
  function llmMessagesToUIMessages (line 33) | function llmMessagesToUIMessages(
  type ToolPart (line 148) | interface ToolPart {
  function joinText (line 158) | function joinText(parts: UIMessage['parts']): string {
  function stringifyOutput (line 165) | function stringifyOutput(output: unknown): string {
  function uiMessagesToLLMMessages (line 185) | function uiMessagesToLLMMessages(uiMessages: UIMessage[]): LLMMessage[] {

FILE: frontend/src/lib/research-store.ts
  constant RESEARCH_MAX_STEPS (line 8) | const RESEARCH_MAX_STEPS = 4;
  constant STORAGE_KEY (line 10) | const STORAGE_KEY = 'hf-agent-research';
  type ResearchState (line 12) | type ResearchState = {
  type ResearchMap (line 17) | type ResearchMap = Record<string, ResearchState>;
  function readAll (line 19) | function readAll(): ResearchMap {
  function writeAll (line 28) | function writeAll(map: ResearchMap): void {
  function saveResearch (line 34) | function saveResearch(
  function loadResearch (line 47) | function loadResearch(sessionId: string): ResearchState | null {
  function clearResearch (line 52) | function clearResearch(sessionId: string): void {

FILE: frontend/src/lib/sse-chat-transport.ts
  type SideChannelCallbacks (line 17) | interface SideChannelCallbacks {
  function nextPartId (line 41) | function nextPartId(prefix: string): string {
  function createSSEParserStream (line 46) | function createSSEParserStream(): TransformStream<string, AgentEvent> {
  function createEventToChunkStream (line 79) | function createEventToChunkStream(sideChannel: SideChannelCallbacks): Tr...
  class SSEChatTransport (line 274) | class SSEChatTransport implements ChatTransport<UIMessage> {
    method constructor (line 278) | constructor(sessionId: string, sideChannel: SideChannelCallbacks) {
    method updateSideChannel (line 286) | updateSideChannel(sideChannel: SideChannelCallbacks): void {
    method destroy (line 290) | destroy(): void {
    method sendMessages (line 296) | async sendMessages(
    method reconnectToStream (line 381) | async reconnectToStream(): Promise<ReadableStream<UIMessageChunk> | nu...

FILE: frontend/src/main.tsx
  function Root (line 9) | function Root() {

FILE: frontend/src/store/agentStore.ts
  type PlanItem (line 21) | interface PlanItem {
  type PanelSection (line 27) | interface PanelSection {
  type PanelData (line 32) | interface PanelData {
  type PanelView (line 40) | type PanelView = 'script' | 'output';
  type LLMHealthError (line 42) | interface LLMHealthError {
  type ActivityStatus (line 48) | type ActivityStatus =
  type ResearchAgentStats (line 56) | interface ResearchAgentStats {
  type ResearchAgentState (line 63) | interface ResearchAgentState {
  type PerSessionState (line 70) | interface PerSessionState {
  type AgentStore (line 99) | interface AgentStore {
  function syncSnapshot (line 191) | function syncSnapshot(
  function loadToolErrors (line 206) | function loadToolErrors(): Record<string, boolean> {
  function saveToolErrors (line 216) | function saveToolErrors(errors: Record<string, boolean>): void {
  function loadRejectedTools (line 225) | function loadRejectedTools(): Record<string, boolean> {
  function saveRejectedTools (line 235) | function saveRejectedTools(rejected: Record<string, boolean>): void {

FILE: frontend/src/store/layoutStore.ts
  type ThemeMode (line 4) | type ThemeMode = 'dark' | 'light';
  type LayoutStore (line 6) | interface LayoutStore {

FILE: frontend/src/store/sessionStore.ts
  type SessionStore (line 7) | interface SessionStore {

FILE: frontend/src/theme.ts
  function makeCssBaseline (line 109) | function makeCssBaseline(vars: Record<string, string>) {
  function makeDrawer (line 145) | function makeDrawer() {
  function makeTextField (line 156) | function makeTextField() {

FILE: frontend/src/types/agent.ts
  type MessageMeta (line 9) | interface MessageMeta {
  type SessionMeta (line 13) | interface SessionMeta {
  type ToolApproval (line 26) | interface ToolApproval {
  type User (line 32) | interface User {

FILE: frontend/src/types/events.ts
  type EventType (line 5) | type EventType =
  type AgentEvent (line 24) | interface AgentEvent {
  type ReadyEventData (line 29) | interface ReadyEventData {
  type ProcessingEventData (line 33) | interface ProcessingEventData {
  type AssistantMessageEventData (line 37) | interface AssistantMessageEventData {
  type ToolCallEventData (line 41) | interface ToolCallEventData {
  type ToolOutputEventData (line 46) | interface ToolOutputEventData {
  type ToolLogEventData (line 52) | interface ToolLogEventData {
  type PlanUpdateEventData (line 57) | interface PlanUpdateEventData {
  type ApprovalRequiredEventData (line 61) | interface ApprovalRequiredEventData {
  type ApprovalToolItem (line 66) | interface ApprovalToolItem {
  type TurnCompleteEventData (line 72) | interface TurnCompleteEventData {
  type CompactedEventData (line 76) | interface CompactedEventData {
  type ErrorEventData (line 81) | interface ErrorEventData {

FILE: frontend/src/utils/api.ts
  function apiFetch (line 11) | async function apiFetch(

FILE: frontend/src/utils/logProcessor.ts
  function processLogs (line 1) | function processLogs(logs: string): string {

FILE: frontend/src/utils/model.ts
  constant CLAUDE_MODEL_PATH (line 10) | const CLAUDE_MODEL_PATH = 'anthropic/claude-opus-4-6';
  constant FIRST_FREE_MODEL_PATH (line 11) | const FIRST_FREE_MODEL_PATH = 'moonshotai/Kimi-K2.6';
  function isClaudePath (line 13) | function isClaudePath(modelPath: string | undefined): boolean {

FILE: tests/unit/test_user_quotas.py
  function _reset_store (line 21) | def _reset_store():
  function test_daily_cap_for_known_plans (line 28) | def test_daily_cap_for_known_plans():
  function test_daily_cap_for_unknown_or_missing_defaults_to_free (line 34) | def test_daily_cap_for_unknown_or_missing_defaults_to_free():
  function test_increment_and_read_back_same_day (line 44) | async def test_increment_and_read_back_same_day():
  function test_independent_users_do_not_share_counts (line 52) | async def test_independent_users_do_not_share_counts():
  function test_stale_day_resets_before_next_read (line 61) | async def test_stale_day_resets_before_next_read():
  function test_concurrent_increments_under_lock_do_not_lose_writes (line 71) | async def test_concurrent_increments_under_lock_do_not_lose_writes():
  function test_refund_decrements_and_drops_entry_at_zero (line 78) | async def test_refund_decrements_and_drops_entry_at_zero():
  function test_refund_on_nonexistent_user_is_noop (line 87) | async def test_refund_on_nonexistent_user_is_noop():
  function test_refund_on_stale_day_resets_rather_than_underflow (line 93) | async def test_refund_on_stale_day_resets_rather_than_underflow():
  function test_free_user_cap_reached_at_one (line 101) | async def test_free_user_cap_reached_at_one():
  function test_pro_user_cap_reached_at_twenty (line 109) | async def test_pro_user_cap_reached_at_twenty():

Download .json

Condensed preview — 110 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,033K chars).

[
  {
    "path": ".gitattributes",
    "chars": 42,
    "preview": "*.png filter=lfs diff=lfs merge=lfs -text\n"
  },
  {
    "path": ".github/workflows/claude-review.yml",
    "chars": 2045,
    "preview": "name: Claude PR Review\n\non:\n  pull_request:\n    types: [opened, synchronize, ready_for_review]\n\npermissions:\n  contents:"
  },
  {
    "path": ".github/workflows/claude.yml",
    "chars": 1037,
    "preview": "name: Claude on Mention\n\non:\n  issue_comment:\n    types: [created]\n  pull_request_review_comment:\n    types: [created]\n "
  },
  {
    "path": ".gitignore",
    "chars": 837,
    "preview": "# Python-generated files\n__pycache__/\n*.py[oc]\nbuild/\ndist/\nwheels/\n*.egg-info\n.pytest_cache/\n.mypy_cache/\n.tox/\n.covera"
  },
  {
    "path": ".python-version",
    "chars": 5,
    "preview": "3.12\n"
  },
  {
    "path": "Dockerfile",
    "chars": 1326,
    "preview": "# Stage 1: Build frontend\nFROM node:20-alpine AS frontend-builder\nWORKDIR /app/frontend\nCOPY frontend/package.json front"
  },
  {
    "path": "README.md",
    "chars": 8518,
    "preview": "<p align=\"center\">\n  <img src=\"frontend/public/smolagents.webp\" alt=\"smolagents logo\" width=\"160\" />\n</p>\n\n# ML Intern\n\n"
  },
  {
    "path": "REVIEW.md",
    "chars": 5683,
    "preview": "# Review instructions\n\nThese rules override the default review guidance. Treat them as the highest-priority\ninstruction "
  },
  {
    "path": "agent/README.md",
    "chars": 1689,
    "preview": "# Agent\n\nAsync agent loop with LiteLLM.\n\n## Architecture\n\n**Queue-based async system:**\n- Submissions in (user input) → "
  },
  {
    "path": "agent/__init__.py",
    "chars": 756,
    "preview": "\"\"\"\nHF Agent - Main agent module\n\"\"\"\n\nimport litellm\n\n# Global LiteLLM behavior — set once at package import so both CLI"
  },
  {
    "path": "agent/config.py",
    "chars": 3457,
    "preview": "import json\nimport os\nimport re\nfrom pathlib import Path\nfrom typing import Any, Union\n\nfrom dotenv import load_dotenv\n\n"
  },
  {
    "path": "agent/context_manager/__init__.py",
    "chars": 146,
    "preview": "\"\"\"\nContext manager for handling conversation history\n\"\"\"\n\nfrom agent.context_manager.manager import ContextManager\n\n__a"
  },
  {
    "path": "agent/context_manager/manager.py",
    "chars": 15497,
    "preview": "\"\"\"\nContext management for conversation history\n\"\"\"\n\nimport logging\nimport os\nimport zoneinfo\nfrom datetime import datet"
  },
  {
    "path": "agent/core/__init__.py",
    "chars": 250,
    "preview": "\"\"\"\nCore agent implementation\nContains the main agent logic, decision-making, and orchestration\n\"\"\"\n\nfrom agent.core.too"
  },
  {
    "path": "agent/core/agent_loop.py",
    "chars": 50662,
    "preview": "\"\"\"loop\nMain agent implementation with integrated tool system and MCP support\n\"\"\"\n\nimport asyncio\nimport json\nimport log"
  },
  {
    "path": "agent/core/doom_loop.py",
    "chars": 4565,
    "preview": "\"\"\"\nDoom-loop detection for repeated tool call patterns.\n\nDetects when the agent is stuck calling the same tools repeate"
  },
  {
    "path": "agent/core/effort_probe.py",
    "chars": 8347,
    "preview": "\"\"\"Probe-and-cascade for reasoning effort on /model switch.\n\nWe don't maintain a per-model capability table. Instead, th"
  },
  {
    "path": "agent/core/hf_router_catalog.py",
    "chars": 4084,
    "preview": "\"\"\"Fetch and cache the HF Inference Router model catalog.\n\nThe router exposes an OpenAI-compatible listing at\n``https://"
  },
  {
    "path": "agent/core/llm_params.py",
    "chars": 8649,
    "preview": "\"\"\"LiteLLM kwargs resolution for the model ids this agent accepts.\n\nKept separate from ``agent_loop`` so tools (research"
  },
  {
    "path": "agent/core/model_switcher.py",
    "chars": 8760,
    "preview": "\"\"\"Model-switching logic for the interactive CLI's ``/model`` command.\n\nSplit out of ``agent.main`` so the REPL dispatch"
  },
  {
    "path": "agent/core/prompt_caching.py",
    "chars": 2162,
    "preview": "\"\"\"Anthropic prompt caching breakpoints for outgoing LLM requests.\n\nCaching is GA on Anthropic's API and natively suppor"
  },
  {
    "path": "agent/core/session.py",
    "chars": 11176,
    "preview": "import asyncio\nimport json\nimport logging\nimport subprocess\nimport sys\nimport uuid\nfrom dataclasses import dataclass\nfro"
  },
  {
    "path": "agent/core/session_uploader.py",
    "chars": 6598,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nStandalone script for uploading session trajectories to HuggingFace.\nThis runs as a separate "
  },
  {
    "path": "agent/core/tools.py",
    "chars": 13518,
    "preview": "\"\"\"\nTool system for the agent\nProvides ToolSpec and ToolRouter for managing both built-in and MCP tools\n\"\"\"\n\nimport logg"
  },
  {
    "path": "agent/main.py",
    "chars": 50274,
    "preview": "\"\"\"\nInteractive CLI chat with the agent\n\nSupports two modes:\n  Interactive:  python -m agent.main\n  Headless:     python"
  },
  {
    "path": "agent/prompts/system_prompt.yaml",
    "chars": 8245,
    "preview": "system_prompt: |\n  You are Hugging Face Agent, a skilled AI assistant for machine learning engineering. Hugging Face is "
  },
  {
    "path": "agent/prompts/system_prompt_v2.yaml",
    "chars": 20774,
    "preview": "system_prompt: |\n  You are Hugging Face Agent, a skilled AI assistant for machine learning engineering with deep experti"
  },
  {
    "path": "agent/prompts/system_prompt_v3.yaml",
    "chars": 12364,
    "preview": "system_prompt: |\n  You are Hugging Face Agent, an ML engineering assistant with {{ num_tools }} tools for training, fine"
  },
  {
    "path": "agent/tools/__init__.py",
    "chars": 1055,
    "preview": "\"\"\"\nHugging Face tools for the agent\n\"\"\"\n\nfrom agent.tools.dataset_tools import (\n    HF_INSPECT_DATASET_TOOL_SPEC,\n    "
  },
  {
    "path": "agent/tools/dataset_tools.py",
    "chars": 14590,
    "preview": "\"\"\"\nDataset Inspection Tool - Comprehensive dataset analysis in one call\n\nCombines /is-valid, /splits, /info, /first-row"
  },
  {
    "path": "agent/tools/docs_tools.py",
    "chars": 36918,
    "preview": "\"\"\"\nDocumentation search tools for exploring HuggingFace and Gradio documentation.\n\"\"\"\n\nimport asyncio\nimport json\nfrom "
  },
  {
    "path": "agent/tools/edit_utils.py",
    "chars": 9728,
    "preview": "\"\"\"\nShared utilities for file editing tools — fuzzy matching, syntax validation,\nand richer edit operations.\n\nUsed by bo"
  },
  {
    "path": "agent/tools/github_find_examples.py",
    "chars": 15633,
    "preview": "\"\"\"\nGitHub Find Examples Tool - Discover examples, tutorials, and guides for any library\n\nLists all files in a repositor"
  },
  {
    "path": "agent/tools/github_list_repos.py",
    "chars": 10311,
    "preview": "\"\"\"\nGitHub List Repositories Tool - List and sort repositories for any user or organization\n\nEfficiently discover reposi"
  },
  {
    "path": "agent/tools/github_read_file.py",
    "chars": 10023,
    "preview": "\"\"\"\nGitHub Read File Tool - Read file contents from any GitHub repository with line range support\n\nFetch exact file cont"
  },
  {
    "path": "agent/tools/hf_repo_files_tool.py",
    "chars": 11941,
    "preview": "\"\"\"\nHF Repo Files Tool - File operations on Hugging Face repositories\n\nOperations: list, read, upload, delete\n\"\"\"\n\nimpor"
  },
  {
    "path": "agent/tools/hf_repo_git_tool.py",
    "chars": 24702,
    "preview": "\"\"\"\nHF Repo Git Tool - Git-like operations on Hugging Face repositories\n\nOperations: branches, tags, PRs, repo managemen"
  },
  {
    "path": "agent/tools/jobs_tool.py",
    "chars": 40903,
    "preview": "\"\"\"\nHugging Face Jobs Tool - Using huggingface-hub library\n\nRefactored to use official huggingface-hub library instead o"
  },
  {
    "path": "agent/tools/local_tools.py",
    "chars": 15870,
    "preview": "\"\"\"\nLocal tool implementations — bash/read/write/edit running on the user's machine.\n\nDrop-in replacement for sandbox to"
  },
  {
    "path": "agent/tools/papers_tool.py",
    "chars": 46527,
    "preview": "\"\"\"\nHF Papers Tool — Discover papers, read their contents, and find linked resources.\n\nOperations: trending, search, pap"
  },
  {
    "path": "agent/tools/plan_tool.py",
    "chars": 4536,
    "preview": "from typing import Any, Dict, List\n\nfrom agent.core.session import Event\nfrom agent.utils.terminal_display import format"
  },
  {
    "path": "agent/tools/private_hf_repo_tools.py",
    "chars": 21212,
    "preview": "\"\"\"\nPrivate HF Repos Tool - Manage private Hugging Face repositories\n\nPRIMARY USE: Store job outputs, training scripts, "
  },
  {
    "path": "agent/tools/research_tool.py",
    "chars": 20412,
    "preview": "\"\"\"\nResearch subagent tool — spawns a cheap LLM call with a focused\nresearch task and returns a summary. The subagent ge"
  },
  {
    "path": "agent/tools/sandbox_client.py",
    "chars": 39967,
    "preview": "#!/usr/bin/env python3\n# /// script\n# requires-python = \">=3.10\"\n# dependencies = [\"huggingface_hub>=0.20.0\", \"httpx>=0."
  },
  {
    "path": "agent/tools/sandbox_tool.py",
    "chars": 10120,
    "preview": "\"\"\"\nSandbox tools — expose the Sandbox client as agent tools.\n\n5 tools total:\n  sandbox_create — explicit sandbox creati"
  },
  {
    "path": "agent/tools/types.py",
    "chars": 294,
    "preview": "\"\"\"\nTypes for Hugging Face tools\n\nPorted from: hf-mcp-server/packages/mcp/src/types/\n\"\"\"\n\nfrom typing import TypedDict\n\n"
  },
  {
    "path": "agent/tools/utilities.py",
    "chars": 5482,
    "preview": "\"\"\"\nUtility functions for Hugging Face tools\n\nPorted from: hf-mcp-server/packages/mcp/src/jobs/formatters.ts\nIncludes GP"
  },
  {
    "path": "agent/utils/__init__.py",
    "chars": 38,
    "preview": "\"\"\"\nUtility functions and helpers\n\"\"\"\n"
  },
  {
    "path": "agent/utils/boot_timing.py",
    "chars": 521,
    "preview": "\"\"\"Shared timing and color helpers for startup visual effects.\"\"\"\n\nimport math\n\n\ndef settle_curve(progress: float, sharp"
  },
  {
    "path": "agent/utils/braille.py",
    "chars": 5261,
    "preview": "\"\"\"Braille-character canvas for high-resolution terminal graphics.\n\nEach terminal cell maps to a 2x4 dot grid using Unic"
  },
  {
    "path": "agent/utils/crt_boot.py",
    "chars": 4146,
    "preview": "\"\"\"CRT / glitch boot sequence effect for CLI startup.\n\nSimulates an old CRT terminal booting up: text appearing characte"
  },
  {
    "path": "agent/utils/particle_logo.py",
    "chars": 8138,
    "preview": "\"\"\"Particle coalesce effect for the HUGGING FACE ML INTERN logo.\n\nRandom particles swirl in from the edges, converge to "
  },
  {
    "path": "agent/utils/reliability_checks.py",
    "chars": 603,
    "preview": "\"\"\"Reliability checks for job submissions and other operations\"\"\"\n\n\ndef check_training_script_save_pattern(script: str) "
  },
  {
    "path": "agent/utils/terminal_display.py",
    "chars": 17375,
    "preview": "\"\"\"\nTerminal display utilities — rich-powered CLI formatting.\n\"\"\"\n\nimport re\n\nfrom rich.console import Console\nfrom rich"
  },
  {
    "path": "backend/__init__.py",
    "chars": 45,
    "preview": "# Backend package for HF Agent web interface\n"
  },
  {
    "path": "backend/dependencies.py",
    "chars": 8551,
    "preview": "\"\"\"Authentication dependencies for FastAPI routes.\n\n- In dev mode (OAUTH_CLIENT_ID not set): auth is bypassed, returns a"
  },
  {
    "path": "backend/main.py",
    "chars": 2099,
    "preview": "\"\"\"FastAPI application for HF Agent web interface.\"\"\"\n\nimport logging\nimport os\nfrom contextlib import asynccontextmanag"
  },
  {
    "path": "backend/models.py",
    "chars": 2220,
    "preview": "\"\"\"Pydantic models for API requests and responses.\"\"\"\n\nfrom enum import Enum\nfrom typing import Any\n\nfrom pydantic impor"
  },
  {
    "path": "backend/routes/__init__.py",
    "chars": 17,
    "preview": "# Routes package\n"
  },
  {
    "path": "backend/routes/agent.py",
    "chars": 25506,
    "preview": "\"\"\"Agent API routes — REST + SSE endpoints.\n\nAll routes (except /health) require authentication via the get_current_user"
  },
  {
    "path": "backend/routes/auth.py",
    "chars": 6259,
    "preview": "\"\"\"Authentication routes for HF OAuth.\n\nHandles the OAuth 2.0 authorization code flow with HF as provider.\nAfter success"
  },
  {
    "path": "backend/session_manager.py",
    "chars": 20141,
    "preview": "\"\"\"Session manager for handling multiple concurrent agent sessions.\"\"\"\n\nimport asyncio\nimport logging\nimport uuid\nfrom d"
  },
  {
    "path": "backend/start.sh",
    "chars": 578,
    "preview": "#!/bin/bash\n# Entrypoint for HF Spaces dev mode compatibility.\n# Dev mode spawns CMD multiple times simultaneously on re"
  },
  {
    "path": "backend/user_quotas.py",
    "chars": 2731,
    "preview": "\"\"\"In-memory daily quota for Claude session creations.\n\nTracks per-user Claude session starts against a daily cap derive"
  },
  {
    "path": "configs/main_agent_config.json",
    "chars": 351,
    "preview": "{\n  \"model_name\": \"bedrock/us.anthropic.claude-opus-4-6-v1\",\n  \"save_sessions\": true,\n  \"session_dataset_repo\": \"akseljo"
  },
  {
    "path": "frontend/eslint.config.js",
    "chars": 734,
    "preview": "import js from '@eslint/js'\nimport globals from 'globals'\nimport reactHooks from 'eslint-plugin-react-hooks'\nimport reac"
  },
  {
    "path": "frontend/index.html",
    "chars": 734,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"UTF-8\" />\n    <link rel=\"icon\" type=\"image/webp\" href=\"/smo"
  },
  {
    "path": "frontend/package.json",
    "chars": 1059,
    "preview": "{\n  \"name\": \"hf-agent-frontend\",\n  \"private\": true,\n  \"version\": \"1.0.0\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"dev\": "
  },
  {
    "path": "frontend/src/App.tsx",
    "chars": 427,
    "preview": "import { Box } from '@mui/material';\nimport AppLayout from '@/components/Layout/AppLayout';\nimport { useAuth } from '@/h"
  },
  {
    "path": "frontend/src/components/Chat/ActivityStatusBar.tsx",
    "chars": 5315,
    "preview": "import { Box, Typography } from '@mui/material';\nimport { keyframes } from '@mui/system';\nimport { useAgentStore, type A"
  },
  {
    "path": "frontend/src/components/Chat/AssistantMessage.tsx",
    "chars": 3662,
    "preview": "import { useMemo } from 'react';\nimport { Box, Stack, Typography } from '@mui/material';\nimport MarkdownContent from './"
  },
  {
    "path": "frontend/src/components/Chat/ChatInput.tsx",
    "chars": 14770,
    "preview": "import { useState, useCallback, useEffect, useRef, KeyboardEvent } from 'react';\nimport { Box, TextField, IconButton, Ci"
  },
  {
    "path": "frontend/src/components/Chat/ExpiredBanner.tsx",
    "chars": 3969,
    "preview": "/**\n * Shown inline in a chat when the backend no longer recognizes the\n * session id (typically: Space was restarted). "
  },
  {
    "path": "frontend/src/components/Chat/MarkdownContent.tsx",
    "chars": 4633,
    "preview": "import { useMemo, useRef, useState, useEffect } from 'react';\nimport { Box } from '@mui/material';\nimport ReactMarkdown "
  },
  {
    "path": "frontend/src/components/Chat/MessageBubble.tsx",
    "chars": 1170,
    "preview": "import UserMessage from './UserMessage';\nimport AssistantMessage from './AssistantMessage';\nimport type { UIMessage } fr"
  },
  {
    "path": "frontend/src/components/Chat/MessageList.tsx",
    "chars": 4437,
    "preview": "import { useCallback, useEffect, useRef, useMemo } from 'react';\nimport { Box, Stack, Typography } from '@mui/material';"
  },
  {
    "path": "frontend/src/components/Chat/ThinkingIndicator.tsx",
    "chars": 1326,
    "preview": "import { Box, Typography } from '@mui/material';\n\n/** Pulsing dots shown while the agent is processing. */\nexport defaul"
  },
  {
    "path": "frontend/src/components/Chat/ToolCallGroup.tsx",
    "chars": 45114,
    "preview": "import { useCallback, useEffect, useMemo, useRef, useState } from 'react';\nimport { Box, Stack, Typography, Chip, Button"
  },
  {
    "path": "frontend/src/components/Chat/UserMessage.tsx",
    "chars": 6707,
    "preview": "import { useState, useRef, useEffect } from 'react';\nimport { Box, Stack, Typography, IconButton, Tooltip, TextField } f"
  },
  {
    "path": "frontend/src/components/ClaudeCapDialog.tsx",
    "chars": 3718,
    "preview": "import {\n  Box,\n  Button,\n  Dialog,\n  DialogActions,\n  DialogContent,\n  DialogContentText,\n  DialogTitle,\n  Typography,\n"
  },
  {
    "path": "frontend/src/components/CodePanel/CodePanel.tsx",
    "chars": 20368,
    "preview": "import { useRef, useEffect, useMemo, useState, useCallback } from 'react';\nimport { Box, Stack, Typography, IconButton, "
  },
  {
    "path": "frontend/src/components/Layout/AppLayout.tsx",
    "chars": 14306,
    "preview": "import { useCallback, useRef, useEffect, useState } from 'react';\nimport {\n  Avatar,\n  Box,\n  Drawer,\n  Typography,\n  Ic"
  },
  {
    "path": "frontend/src/components/SessionChat.tsx",
    "chars": 4721,
    "preview": "/**\n * Per-session chat component.\n *\n * Each session renders its own SessionChat. The hook (useAgentChat) always\n * run"
  },
  {
    "path": "frontend/src/components/SessionSidebar/SessionSidebar.tsx",
    "chars": 13981,
    "preview": "import { useCallback, useState } from 'react';\nimport {\n  Alert,\n  Box,\n  Button,\n  Dialog,\n  DialogActions,\n  DialogCon"
  },
  {
    "path": "frontend/src/components/WelcomeScreen/WelcomeScreen.tsx",
    "chars": 14573,
    "preview": "import { useState, useCallback, useEffect, useRef, type ReactNode } from 'react';\nimport {\n  Box,\n  Typography,\n  Button"
  },
  {
    "path": "frontend/src/hooks/useAgentChat.ts",
    "chars": 31978,
    "preview": "/**\n * Central hook wiring the Vercel AI SDK's useChat with our SSE-based\n * ChatTransport.\n *\n * In the per-session arc"
  },
  {
    "path": "frontend/src/hooks/useAuth.ts",
    "chars": 2289,
    "preview": "/**\n * Authentication hook — simple server-side OAuth.\n *\n * - Hors iframe: /auth/login redirect (cookies work fine)\n * "
  },
  {
    "path": "frontend/src/hooks/useOrgMembership.ts",
    "chars": 1514,
    "preview": "/**\n * Polls backend for org membership status.\n * When membership is detected, updates the user in the agent store\n * a"
  },
  {
    "path": "frontend/src/hooks/useUserQuota.ts",
    "chars": 1529,
    "preview": "/**\n * Reads the current user's Claude daily quota + plan tier from the backend.\n *\n * Fetches once when the user become"
  },
  {
    "path": "frontend/src/lib/backend-message-store.ts",
    "chars": 1853,
    "preview": "/**\n * localStorage cache of raw backend (litellm Message) dicts keyed by\n * session ID. Used to restore a session into "
  },
  {
    "path": "frontend/src/lib/chat-message-store.ts",
    "chars": 1885,
    "preview": "/**\n * Lightweight localStorage persistence for UIMessage arrays,\n * keyed by session ID.\n *\n * Uses the same storage na"
  },
  {
    "path": "frontend/src/lib/convert-llm-messages.ts",
    "chars": 7918,
    "preview": "/**\n * Convert backend LLM messages (litellm format) to Vercel AI SDK UIMessage format.\n */\nimport type { UIMessage } fr"
  },
  {
    "path": "frontend/src/lib/research-store.ts",
    "chars": 1333,
    "preview": "/**\n * Persist research sub-agent state (steps + stats) per session.\n * Survives page refresh so the rolling display isn"
  },
  {
    "path": "frontend/src/lib/sse-chat-transport.ts",
    "chars": 15720,
    "preview": "/**\n * SSE-based ChatTransport that bridges our backend event protocol\n * to the Vercel AI SDK's UIMessageChunk streamin"
  },
  {
    "path": "frontend/src/main.tsx",
    "chars": 668,
    "preview": "import { StrictMode } from 'react';\nimport { createRoot } from 'react-dom/client';\nimport { ThemeProvider } from '@mui/m"
  },
  {
    "path": "frontend/src/store/agentStore.ts",
    "chars": 15698,
    "preview": "/**\n * Agent store — manages UI state that is NOT handled by the Vercel AI SDK.\n *\n * Message state (messages, streaming"
  },
  {
    "path": "frontend/src/store/layoutStore.ts",
    "chars": 1355,
    "preview": "import { create } from 'zustand';\nimport { persist } from 'zustand/middleware';\n\nexport type ThemeMode = 'dark' | 'light"
  },
  {
    "path": "frontend/src/store/sessionStore.ts",
    "chars": 4186,
    "preview": "import { create } from 'zustand';\nimport { persist } from 'zustand/middleware';\nimport type { SessionMeta } from '@/type"
  },
  {
    "path": "frontend/src/theme.ts",
    "chars": 7027,
    "preview": "import { createTheme, type ThemeOptions } from '@mui/material/styles';\n\n// ── Shared tokens ────────────────────────────"
  },
  {
    "path": "frontend/src/types/agent.ts",
    "chars": 934,
    "preview": "/**\n * Agent-related types.\n *\n * Message and tool-call types are now provided by the Vercel AI SDK\n * (UIMessage, UIMes"
  },
  {
    "path": "frontend/src/types/events.ts",
    "chars": 1483,
    "preview": "/**\n * Event types from the agent backend\n */\n\nexport type EventType =\n  | 'ready'\n  | 'processing'\n  | 'assistant_messa"
  },
  {
    "path": "frontend/src/utils/api.ts",
    "chars": 1131,
    "preview": "/**\n * Centralized API utilities.\n *\n * In production: HttpOnly cookie (hf_access_token) is sent automatically.\n * In de"
  },
  {
    "path": "frontend/src/utils/logProcessor.ts",
    "chars": 2255,
    "preview": "export function processLogs(logs: string): string {\n  if (!logs) return '';\n\n  // 1. Handle \\r (Carriage Return) for pro"
  },
  {
    "path": "frontend/src/utils/logger.ts",
    "chars": 691,
    "preview": "/**\n * Lightweight logger that silences verbose output in production.\n *\n * - `log` / `debug` are only emitted when `imp"
  },
  {
    "path": "frontend/src/utils/model.ts",
    "chars": 588,
    "preview": "/**\n * Shared model-id constants used by session-create call sites and the\n * ClaudeCapDialog \"Use a free model\" escape "
  },
  {
    "path": "frontend/src/vite-env.d.ts",
    "chars": 38,
    "preview": "/// <reference types=\"vite/client\" />\n"
  },
  {
    "path": "frontend/tsconfig.json",
    "chars": 616,
    "preview": "{\n  \"compilerOptions\": {\n    \"target\": \"ES2020\",\n    \"useDefineForClassFields\": true,\n    \"lib\": [\"ES2020\", \"DOM\", \"DOM."
  },
  {
    "path": "frontend/vite.config.ts",
    "chars": 603,
    "preview": "import { defineConfig } from 'vite'\nimport react from '@vitejs/plugin-react'\nimport path from 'path'\n\nexport default def"
  },
  {
    "path": "pyproject.toml",
    "chars": 1214,
    "preview": "[project]\nname = \"hf-agent\"\nversion = \"0.1.0\"\ndescription = \"Add your description here\"\nreadme = \"README.md\"\nrequires-py"
  },
  {
    "path": "tests/unit/test_user_quotas.py",
    "chars": 4400,
    "preview": "\"\"\"Tests for backend/user_quotas.py — the in-memory Claude daily-quota store.\"\"\"\n\nimport asyncio\nimport os\nimport sys\nfr"
  }
]

About this extraction

This page contains the full source code of the huggingface/ml-intern GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 110 files (948.1 KB), approximately 226.4k tokens, and a symbol index with 671 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo