main b847e54be485 cached
43 files
4.3 MB
1.1M tokens
160 symbols
1 requests
Download .txt
Showing preview only (4,566K chars total). Download the full file or copy to clipboard to get everything.
Repository: langchain-ai/open_deep_research
Branch: main
Commit: b847e54be485
Files: 43
Total size: 4.3 MB

Directory structure:
gitextract_qsg5gbbh/

├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── claude-code-review.yml
│       └── claude.yml
├── .gitignore
├── CLAUDE.md
├── LICENSE
├── README.md
├── examples/
│   ├── arxiv.md
│   ├── inference-market-gpt45.md
│   ├── inference-market.md
│   └── pubmed.md
├── langgraph.json
├── pyproject.toml
├── src/
│   ├── legacy/
│   │   ├── CLAUDE.md
│   │   ├── __init__.py
│   │   ├── configuration.py
│   │   ├── files/
│   │   │   └── vibe_code.md
│   │   ├── graph.ipynb
│   │   ├── graph.py
│   │   ├── legacy.md
│   │   ├── multi_agent.ipynb
│   │   ├── multi_agent.py
│   │   ├── prompts.py
│   │   ├── state.py
│   │   ├── tests/
│   │   │   ├── conftest.py
│   │   │   ├── run_test.py
│   │   │   └── test_report_quality.py
│   │   └── utils.py
│   ├── open_deep_research/
│   │   ├── configuration.py
│   │   ├── deep_researcher.py
│   │   ├── prompts.py
│   │   ├── state.py
│   │   └── utils.py
│   └── security/
│       └── auth.py
└── tests/
    ├── evaluators.py
    ├── expt_results/
    │   ├── deep_research_bench_claude4-sonnet.jsonl
    │   ├── deep_research_bench_gpt-4.1.jsonl
    │   └── deep_research_bench_gpt-5.jsonl
    ├── extract_langsmith_data.py
    ├── pairwise_evaluation.py
    ├── prompts.py
    ├── run_evaluate.py
    └── supervisor_parallel_evaluation.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/dependabot.yml
================================================
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file

version: 2
updates:
  - package-ecosystem: "pip" # See documentation for possible values
    directory: "/" # Location of package manifests
    schedule:
      interval: "weekly"
updates:
  - package-ecosystem: "github-actions"
    directory: "/" 
    schedule:
      interval: "weekly"


================================================
FILE: .github/workflows/claude-code-review.yml
================================================
name: Claude Code Review

on:
  pull_request:
    types: [opened, synchronize]
    # Optional: Only run on specific file changes
    # paths:
    #   - "src/**/*.ts"
    #   - "src/**/*.tsx"
    #   - "src/**/*.js"
    #   - "src/**/*.jsx"

jobs:
  claude-review:
    # Optional: Filter by PR author
    # if: |
    #   github.event.pull_request.user.login == 'external-contributor' ||
    #   github.event.pull_request.user.login == 'new-developer' ||
    #   github.event.pull_request.author_association == 'FIRST_TIME_CONTRIBUTOR'
    
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: read
      issues: read
      id-token: write
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
        with:
          fetch-depth: 1

      - name: Run Claude Code Review
        id: claude-review
        uses: anthropics/claude-code-action@beta
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

          # Optional: Specify model (defaults to Claude Sonnet 4, uncomment for Claude Opus 4.1)
          # model: "claude-opus-4-1-20250805"

          # Direct prompt for automated review (no @claude mention needed)
          direct_prompt: |
            Please review this pull request and provide feedback on:
            - Code quality and best practices
            - Potential bugs or issues
            - Performance considerations
            - Security concerns
            - Test coverage
            
            Be constructive and helpful in your feedback.

          # Optional: Use sticky comments to make Claude reuse the same comment on subsequent pushes to the same PR
          # use_sticky_comment: true
          
          # Optional: Customize review based on file types
          # direct_prompt: |
          #   Review this PR focusing on:
          #   - For TypeScript files: Type safety and proper interface usage
          #   - For API endpoints: Security, input validation, and error handling
          #   - For React components: Performance, accessibility, and best practices
          #   - For tests: Coverage, edge cases, and test quality
          
          # Optional: Different prompts for different authors
          # direct_prompt: |
          #   ${{ github.event.pull_request.author_association == 'FIRST_TIME_CONTRIBUTOR' && 
          #   'Welcome! Please review this PR from a first-time contributor. Be encouraging and provide detailed explanations for any suggestions.' ||
          #   'Please provide a thorough code review focusing on our coding standards and best practices.' }}
          
          # Optional: Add specific tools for running tests or linting
          # allowed_tools: "Bash(npm run test),Bash(npm run lint),Bash(npm run typecheck)"
          
          # Optional: Skip review for certain conditions
          # if: |
          #   !contains(github.event.pull_request.title, '[skip-review]') &&
          #   !contains(github.event.pull_request.title, '[WIP]')



================================================
FILE: .github/workflows/claude.yml
================================================
name: Claude Code

on:
  issue_comment:
    types: [created]
  pull_request_review_comment:
    types: [created]
  issues:
    types: [opened, assigned]
  pull_request_review:
    types: [submitted]

jobs:
  claude:
    if: |
      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||
      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||
      (github.event_name == 'issues' && (contains(github.event.issue.body, '@claude') || contains(github.event.issue.title, '@claude')))
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: read
      issues: read
      id-token: write
      actions: read # Required for Claude to read CI results on PRs
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
        with:
          fetch-depth: 1

      - name: Run Claude Code
        id: claude
        uses: anthropics/claude-code-action@beta
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

          # This is an optional setting that allows Claude to read CI results on PRs
          additional_permissions: |
            actions: read
          
          # Optional: Specify model (defaults to Claude Sonnet 4, uncomment for Claude Opus 4.1)
          # model: "claude-opus-4-1-20250805"
          
          # Optional: Customize the trigger phrase (default: @claude)
          # trigger_phrase: "/claude"
          
          # Optional: Trigger when specific user is assigned to an issue
          # assignee_trigger: "claude-bot"
          
          # Optional: Allow Claude to run specific commands
          # allowed_tools: "Bash(npm install),Bash(npm run build),Bash(npm run test:*),Bash(npm run lint:*)"
          
          # Optional: Add custom instructions for Claude to customize its behavior for your project
          # custom_instructions: |
          #   Follow our coding standards
          #   Ensure all new code has tests
          #   Use TypeScript for new files
          
          # Optional: Custom environment variables for Claude
          # claude_env: |
          #   NODE_ENV: test



================================================
FILE: .gitignore
================================================

*.egg-info
*.pyc

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual environments
venv/
env/
ENV/
.env

# IDE specific files
.idea/
.vscode/
*.swp
*.swo
.DS_Store

# Jupyter Notebook
.ipynb_checkpoints

# Testing
.coverage
htmlcov/
.pytest_cache/
.tox/

# Logs
*.log
logs/

# Local development
.env.local
.env.development.local
.env.test.local
.env.production.local

# Dependencies
node_modules/

# LangGraph specific
.langgraph/

# Temporary files
tmp/
temp/

.langgraph_api


================================================
FILE: CLAUDE.md
================================================
# Open Deep Research Repository Overview

## Project Description
Open Deep Research is a configurable, fully open-source deep research agent that works across multiple model providers, search tools, and MCP (Model Context Protocol) servers. It enables automated research with parallel processing and comprehensive report generation.

## Repository Structure

### Root Directory
- `README.md` - Comprehensive project documentation with quickstart guide
- `pyproject.toml` - Python project configuration and dependencies
- `langgraph.json` - LangGraph configuration defining the main graph entry point
- `uv.lock` - UV package manager lock file
- `LICENSE` - MIT license
- `.env.example` - Environment variables template (not tracked)

### Core Implementation (`src/open_deep_research/`)
- `deep_researcher.py` - Main LangGraph implementation (entry point: `deep_researcher`)
- `configuration.py` - Configuration management and settings
- `state.py` - Graph state definitions and data structures  
- `prompts.py` - System prompts and prompt templates
- `utils.py` - Utility functions and helpers
- `files/` - Research output and example files

### Legacy Implementations (`src/legacy/`)
Contains two earlier research implementations:
- `graph.py` - Plan-and-execute workflow with human-in-the-loop
- `multi_agent.py` - Supervisor-researcher multi-agent architecture
- `legacy.md` - Documentation for legacy implementations
- `CLAUDE.md` - Legacy-specific Claude instructions
- `tests/` - Legacy-specific tests

### Security (`src/security/`)
- `auth.py` - Authentication handler for LangGraph deployment

### Testing (`tests/`)
- `run_evaluate.py` - Main evaluation script configured to run on deep research bench
- `evaluators.py` - Specialized evaluation functions  
- `prompts.py` - Evaluation prompts and criteria
- `pairwise_evaluation.py` - Comparative evaluation tools
- `supervisor_parallel_evaluation.py` - Multi-threaded evaluation

### Examples (`examples/`)
- `arxiv.md` - ArXiv research example
- `pubmed.md` - PubMed research example
- `inference-market.md` - Inference market analysis examples

## Key Technologies
- **LangGraph** - Workflow orchestration and graph execution
- **LangChain** - LLM integration and tool calling
- **Multiple LLM Providers** - OpenAI, Anthropic, Google, Groq, DeepSeek support
- **Search APIs** - Tavily, OpenAI/Anthropic native search, DuckDuckGo, Exa
- **MCP Servers** - Model Context Protocol for extended capabilities

## Development Commands
- `uvx langgraph dev` - Start development server with LangGraph Studio
- `python tests/run_evaluate.py` - Run comprehensive evaluations
- `ruff check` - Code linting
- `mypy` - Type checking

## Configuration
All settings configurable via:
- Environment variables (`.env` file)
- Web UI in LangGraph Studio
- Direct configuration modification

Key settings include model selection, search API choice, concurrency limits, and MCP server configurations.

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2025 LangChain

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================================================
FILE: README.md
================================================
# 🔬 Open Deep Research

<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />

Deep research has broken out as one of the most popular agent applications. This is a simple, configurable, fully open source deep research agent that works across many model providers, search tools, and MCP servers. It's performance is on par with many popular deep research agents ([see Deep Research Bench leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard)).

<img width="817" height="666" alt="Screenshot 2025-07-13 at 11 21 12 PM" src="https://github.com/user-attachments/assets/052f2ed3-c664-4a4f-8ec2-074349dcaa3f" />

### 🔥 Recent Updates

**August 14, 2025**: See our free course [here](https://academy.langchain.com/courses/deep-research-with-langgraph) (and course repo [here](https://github.com/langchain-ai/deep_research_from_scratch)) on building open deep research.

**August 7, 2025**: Added GPT-5 and updated the Deep Research Bench evaluation w/ GPT-5 results.

**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344. 

**July 30, 2025**: Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).

**July 16, 2025**: Read more in our [blog](https://blog.langchain.com/open-deep-research/) and watch our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview.

### 🚀 Quickstart

1. Clone the repository and activate a virtual environment:
```bash
git clone https://github.com/langchain-ai/open_deep_research.git
cd open_deep_research
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
```

2. Install dependencies:
```bash
uv sync
# or
uv pip install -r pyproject.toml
```

3. Set up your `.env` file to customize the environment variables (for model selection, search tools, and other configuration settings):
```bash
cp .env.example .env
```

4. Launch agent with the LangGraph server locally:

```bash
# Install dependencies and start the LangGraph server
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking
```

This will open the LangGraph Studio UI in your browser.

```
- 🚀 API: http://127.0.0.1:2024
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
- 📚 API Docs: http://127.0.0.1:2024/docs
```

Ask a question in the `messages` input field and click `Submit`. Select different configuration in the "Manage Assistants" tab.

### ⚙️ Configurations

#### LLM :brain:

Open Deep Research supports a wide range of LLM providers via the [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/). It uses LLMs for a few different tasks. See the below model fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI. 

- **Summarization** (default: `openai:gpt-4.1-mini`): Summarizes search API results
- **Research** (default: `openai:gpt-4.1`): Power the search agent
- **Compression** (default: `openai:gpt-4.1`): Compresses research findings
- **Final Report Model** (default: `openai:gpt-4.1`): Write the final report

> Note: the selected model will need to support [structured outputs](https://python.langchain.com/docs/integrations/chat/) and [tool calling](https://python.langchain.com/docs/how_to/tool_calling/).

> Note: For OpenRouter: Follow [this guide](https://github.com/langchain-ai/open_deep_research/issues/75#issuecomment-2811472408) and for local models via Ollama  see [setup instructions](https://github.com/langchain-ai/open_deep_research/issues/65#issuecomment-2743586318).

#### Search API :mag:

Open Deep Research supports a wide range of search tools. By default it uses the [Tavily](https://www.tavily.com/) search API. Has full MCP compatibility and work native web search for Anthropic and OpenAI. See the `search_api` and `mcp_config` fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI. 

#### Other 

See the fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) for various other settings to customize the behavior of Open Deep Research. 

### 📊 Evaluation

Open Deep Research is configured for evaluation with [Deep Research Bench](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard). This benchmark has 100 PhD-level research tasks (50 English, 50 Chinese), crafted by domain experts across 22 fields (e.g., Science & Tech, Business & Finance) to mirror real-world deep-research needs. It has 2 evaluation metrics, but the leaderboard is based on the RACE score. This uses LLM-as-a-judge (Gemini) to evaluate research reports against a golden set of reports compiled by experts across a set of metrics.

#### Usage

> Warning: Running across the 100 examples can cost ~$20-$100 depending on the model selection.

The dataset is available on [LangSmith via this link](https://smith.langchain.com/public/c5e7a6ad-fdba-478c-88e6-3a388459ce8b/d). To kick off evaluation, run the following command:

```bash
# Run comprehensive evaluation on LangSmith datasets
python tests/run_evaluate.py
```

This will provide a link to a LangSmith experiment, which will have a name `YOUR_EXPERIMENT_NAME`. Once this is done, extract the results to a JSONL file that can be submitted to the Deep Research Bench.

```bash
python tests/extract_langsmith_data.py --project-name "YOUR_EXPERIMENT_NAME" --model-name "you-model-name" --dataset-name "deep_research_bench"
```

This creates `tests/expt_results/deep_research_bench_model-name.jsonl` with the required format. Move the generated JSONL file to a local clone of the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission.

#### Results 

| Name | Commit | Summarization | Research | Compression | Total Cost | Total Tokens | RACE Score | Experiment |
|------|--------|---------------|----------|-------------|------------|--------------|------------|------------|
| GPT-5 | [ca3951d](https://github.com/langchain-ai/open_deep_research/pull/168/commits) | openai:gpt-4.1-mini | openai:gpt-5 | openai:gpt-4.1 |  | 204,640,896 | 0.4943 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-613c-4bda-8bde-f64f0422bbf3/compare?selectedSessions=4d5941c8-69ce-4f3d-8b3e-e3c99dfbd4cc&baseline=undefined) |
| Defaults | [6532a41](https://github.com/langchain-ai/open_deep_research/commit/6532a4176a93cc9bb2102b3d825dcefa560c85d9) | openai:gpt-4.1-mini | openai:gpt-4.1 | openai:gpt-4.1 | $45.98 | 58,015,332 | 0.4309 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=cf4355d7-6347-47e2-a774-484f290e79bc&baseline=undefined) |
| Claude Sonnet 4 | [f877ea9](https://github.com/langchain-ai/open_deep_research/pull/163/commits/f877ea93641680879c420ea991e998b47aab9bcc) | openai:gpt-4.1-mini | anthropic:claude-sonnet-4-20250514 | openai:gpt-4.1 | $187.09 | 138,917,050 | 0.4401 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=04f6002d-6080-4759-bcf5-9a52e57449ea&baseline=undefined) |
| Deep Research Bench Submission | [c0a160b](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) | openai:gpt-4.1-nano | openai:gpt-4.1 | openai:gpt-4.1 | $87.83 | 207,005,549 | 0.4344 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=e6647f74-ad2f-4cb9-887e-acb38b5f73c0&baseline=undefined) |

### 🚀 Deployments and Usage

#### LangGraph Studio

Follow the [quickstart](#-quickstart) to start LangGraph server locally and test the agent out on LangGraph Studio.

#### Hosted deployment
 
You can easily deploy to [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#deployment-options). 

#### Open Agent Platform

Open Agent Platform (OAP) is a UI from which non-technical users can build and configure their own agents. OAP is great for allowing users to configure the Deep Researcher with different MCP tools and search APIs that are best suited to their needs and the problems that they want to solve.

We've deployed Open Deep Research to our public demo instance of OAP. All you need to do is add your API Keys, and you can test out the Deep Researcher for yourself! Try it out [here](https://oap.langchain.com)

You can also deploy your own instance of OAP, and make your own custom agents (like Deep Researcher) available on it to your users.
1. [Deploy Open Agent Platform](https://docs.oap.langchain.com/quickstart)
2. [Add Deep Researcher to OAP](https://docs.oap.langchain.com/setup/agents)

### Legacy Implementations 🏛️

The `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research. They are less performant than the current implementation, but provide alternative ideas understanding the different approaches to deep research.

#### 1. Workflow Implementation (`legacy/graph.py`)
- **Plan-and-Execute**: Structured workflow with human-in-the-loop planning
- **Sequential Processing**: Creates sections one by one with reflection
- **Interactive Control**: Allows feedback and approval of report plans
- **Quality Focused**: Emphasizes accuracy through iterative refinement

#### 2. Multi-Agent Implementation (`legacy/multi_agent.py`)  
- **Supervisor-Researcher Architecture**: Coordinated multi-agent system
- **Parallel Processing**: Multiple researchers work simultaneously
- **Speed Optimized**: Faster report generation through concurrency
- **MCP Support**: Extensive Model Context Protocol integration


================================================
FILE: examples/arxiv.md
================================================
# Obesity Among Young Adults in the United States: A Growing Public Health Challenge

The obesity epidemic among young adults in the United States represents a complex public health crisis shaped by interconnected social, economic, and environmental factors. Recent research reveals that over one-third of US adults suffer from obesity, with rates disproportionately affecting disadvantaged communities. This health challenge extends beyond individual choices, as built environment characteristics and socioeconomic conditions explain up to 90% of obesity prevalence variation across American cities. Understanding these systemic influences is crucial for developing effective interventions that address both individual and community-level factors contributing to obesity among young adults.

## Obesity Prevalence and Trends in US Young Adults

**Over one-third of US adults suffer from obesity, with the condition showing strong correlations to socioeconomic and environmental factors that disproportionately affect disadvantaged communities.** National data reveals systematic variations in obesity rates that map closely to neighborhood characteristics and built environment features.

Advanced analysis using satellite imagery and machine learning has demonstrated that built environment characteristics explain 72-90% of obesity prevalence variation at the census tract level across major US cities. These correlations are particularly pronounced in disadvantaged neighborhoods where multiple social determinants of health intersect.

Key factors associated with higher adult obesity rates include:
- Lower median household income
- Limited health insurance coverage
- Higher concentration of rental housing
- Reduced access to physical activity resources
- Higher poverty rates

A comprehensive study in Shelby County, Tennessee exemplifies these patterns, showing significantly higher obesity prevalence in areas with multiple socioeconomic challenges. The findings suggest that addressing structural and environmental factors may be as crucial as individual interventions for reducing obesity rates.

### Sources
- Association Between Neighborhood Factors and Adult Obesity in Shelby County, Tennessee (2022): http://arxiv.org/abs/2208.05335v1
- Using Deep Learning to Examine the Association between the Built Environment and Neighborhood Adult Obesity Prevalence (2017): http://arxiv.org/abs/1711.00885v1
- Progress of the anti-obesity of Berberine (2025): http://arxiv.org/abs/2501.02282v1

## Socioeconomic Determinants of Obesity in Young Adults

**Social and economic disparities create stark differences in obesity prevalence among young adults, with disadvantaged neighborhoods showing up to 90% higher rates compared to affluent areas.** Research from Shelby County, Tennessee demonstrates how multiple socioeconomic factors intersect to influence obesity risk through both direct and indirect pathways.

Key social determinants shaping obesity outcomes include:
* Median household income - Affects access to healthy food options
* Insurance status - Determines preventive care availability
* Housing conditions - Influences exposure to obesity-promoting environments
* Education level - Impacts health literacy and dietary choices
* Geographic location - Correlates with neighborhood resources

Advanced geospatial analysis reveals that built environment characteristics explain 72-90% of obesity variation across cities. In Shelby County, census tracts with higher percentages of uninsured residents, home renters, and individuals living below the poverty level demonstrated significantly elevated obesity rates.

These findings emphasize the need for obesity interventions that address systemic inequalities rather than focusing solely on individual behavior modification. Public health initiatives must consider how social determinants create barriers to healthy weight maintenance.

### Sources
- Association Between Neighborhood Factors and Adult Obesity in Shelby County, Tennessee: http://arxiv.org/abs/2208.05335v1
- Using Deep Learning to Examine the Association between the Built Environment and Neighborhood Adult Obesity Prevalence: http://arxiv.org/abs/1711.00885v1

## Built Environment's Impact on Obesity

**The physical design of urban spaces significantly influences obesity rates, with walkability and food accessibility emerging as critical factors that can increase obesity risk by up to 42% in underserved areas.** Research demonstrates that neighborhood characteristics create complex ecosystems affecting dietary health and physical activity patterns.

The built environment shapes obesity risk through three primary mechanisms: food accessibility, physical activity opportunities, and socioeconomic factors. Studies reveal that areas with limited walkability and higher concentrations of fast-food establishments, particularly through online food delivery platforms, create "cyber food swamps" that contribute to unhealthy dietary choices. A 10% increase in accessible fast-food options raises the probability of unhealthy food orders by 22%.

Key built environment factors affecting obesity include:
* Walking infrastructure and neighborhood walkability
* Distance to healthy food retailers versus fast food
* Availability of recreational facilities
* Transportation access
* Socioeconomic status of the area

Recent research in tertiary education campuses demonstrates that improving walkability can increase positive walking experiences by 9.75%, suggesting that targeted modifications to the built environment could help reduce obesity rates.

### Sources
- Using Tableau and Google Map API for Understanding the Impact of Walkability on Dublin City: http://arxiv.org/abs/2310.07563v1
- Exploring the Causal Relationship between Walkability and Affective Walking Experience: http://arxiv.org/abs/2311.06262v1
- Cyber Food Swamps: Investigating the Impacts of Online-to-Offline Food Delivery Platforms: http://arxiv.org/abs/2409.16601v2
- The association between neighborhood obesogenic factors and prostate cancer risk and mortality: http://arxiv.org/abs/2405.18456v1

## Machine Learning Applications in Obesity Analysis

**Advanced machine learning and deep learning techniques are revolutionizing obesity research by uncovering complex patterns in environmental, behavioral, and socioeconomic factors, with prediction accuracies reaching up to 88% for adolescent obesity risk.**

Recent studies using deep learning analysis of satellite imagery have demonstrated that built environment features can explain 72-90% of obesity prevalence variation across U.S. cities. This breakthrough enables automated assessment of neighborhood characteristics that influence obesity rates at the census tract level.

Machine learning models have identified key social determinants of health strongly correlated with adult obesity, including:
* Median household income
* Housing status (rental vs. ownership)
* Insurance coverage
* Race and ethnicity demographics
* Age distribution
* Marital status

Novel applications include DeepHealthNet, which achieves 88.4% accuracy in adolescent obesity prediction by analyzing physical activity patterns and health metrics. Similarly, recurrent neural networks analyzing longitudinal patient records and wearable device data have achieved 77-86% accuracy in predicting obesity status improvements.

These insights are particularly valuable for public health decision-making, enabling targeted interventions in disadvantaged neighborhoods where obesity prevalence is significantly higher.

### Sources
- Using Deep Learning to Examine the Built Environment and Neighborhood Adult Obesity: http://arxiv.org/abs/1711.00885v1
- DeepHealthNet: Adolescent Obesity Prediction System: http://arxiv.org/abs/2308.14657v2
- Association Between Neighborhood Factors and Adult Obesity in Shelby County, Tennessee: http://arxiv.org/abs/2208.05335v1
- Recurrent Neural Networks based Obesity Status Prediction: http://arxiv.org/abs/1809.07828v1

## Current Interventions and Policy Recommendations

**Current obesity interventions targeting young adults must shift from individual-focused approaches to addressing systemic neighborhood-level factors that drive health disparities.** Research demonstrates that built environment characteristics explain up to 90% of obesity prevalence variation across cities, highlighting the critical role of structural determinants.

Recent geospatial analyses have identified key social determinants that shape obesity rates in disadvantaged communities, including housing stability, food access, and neighborhood infrastructure. The Shelby County, Tennessee case study reveals significant associations between obesity prevalence and multiple socioeconomic factors, particularly in areas with lower median household incomes and higher percentages of uninsured residents.

To develop more effective interventions, policymakers should prioritize:
* Implementing zoning policies that promote physical activity
* Improving access to healthy food options in underserved areas
* Addressing housing stability through rental assistance programs
* Expanding health insurance coverage in high-risk communities
* Investing in neighborhood infrastructure improvements

These evidence-based policy measures represent a crucial shift toward addressing the root causes of obesity through coordinated community-level interventions rather than focusing solely on individual behavior change.

### Sources
- Association Between Neighborhood Factors and Adult Obesity in Shelby County, Tennessee: http://arxiv.org/abs/2208.05335v1
- Using Deep Learning to Examine the Built Environment and Neighborhood Adult Obesity Prevalence: http://arxiv.org/abs/1711.00885v1
- Structured psychosocial stress and the US obesity epidemic: http://arxiv.org/abs/q-bio/0312011v1

# Obesity in Young Adults: A Complex Public Health Challenge

The rising prevalence of obesity among young adults in the United States represents a critical public health challenge shaped by interconnected social, economic, and environmental factors. Recent research reveals that over one-third of US adults suffer from obesity, with rates disproportionately affecting disadvantaged communities. Advanced analysis demonstrates that neighborhood characteristics and built environment features explain up to 90% of obesity prevalence variation across major cities, highlighting how systemic inequalities create barriers to maintaining healthy weight.

## Key Findings and Future Directions

The evidence demonstrates that obesity in young adults stems from complex interactions between built environment, socioeconomic factors, and healthcare access. Machine learning analyses have revolutionized our understanding of these relationships, achieving prediction accuracies up to 88% for obesity risk. The research points to critical areas requiring immediate intervention:

* Built Environment Modifications
  - Improve neighborhood walkability
  - Increase access to recreational facilities
  - Address food desert challenges
  - Regulate "cyber food swamps"

* Policy Interventions
  - Expand health insurance coverage
  - Implement supportive housing policies
  - Develop targeted community programs
  - Enhance public transportation access

Success in reducing obesity rates will require coordinated efforts that address these systemic factors rather than focusing solely on individual behavior change. Future initiatives must prioritize evidence-based structural interventions that promote health equity across all communities.

================================================
FILE: examples/inference-market-gpt45.md
================================================
# Introduction

The AI inference market is rapidly expanding, driven by growing demand for real-time data processing and advancements in specialized hardware and cloud-based solutions. This report examines three innovative companies—Fireworks AI, Together.ai, and Groq—that are shaping the competitive landscape. Fireworks AI offers flexible, multimodal inference solutions; Together.ai emphasizes optimized performance for open-source models; and Groq delivers unmatched speed through custom hardware. By analyzing their technologies, market positioning, and performance metrics, this report provides insights into how these key players are influencing the future of AI inference.

## Market Overview of AI Inference

**The global AI inference server market is experiencing rapid growth, projected to expand from USD 38.4 billion in 2023 to USD 166.7 billion by 2031, at a CAGR of 18%.** This growth is driven by increasing demand for real-time data processing, advancements in AI technologies, and widespread adoption of cloud-based and edge computing solutions.

North America currently dominates the market, accounting for approximately 38% of global revenue, due to its advanced technological infrastructure, significant R&D investments, and presence of major industry players such as NVIDIA, Intel, and Dell. Asia-Pacific is expected to exhibit the highest growth rate, driven by rapid digital transformation initiatives and government support for AI adoption, particularly in China, India, and Japan.

Key factors influencing market growth include:

- Rising adoption of AI-driven applications in healthcare, finance, automotive, and retail sectors.
- Increased deployment of specialized hardware (GPUs, TPUs, FPGAs) optimized for AI workloads.
- Growing preference for cloud-based deployment models due to scalability and cost-effectiveness.

However, high initial implementation costs, complexity of integration, and data privacy concerns remain significant challenges.

### Sources

- AI Inference Server Market Size, Scope, Growth, and Forecast : https://www.verifiedmarketresearch.com/product/ai-inference-server-market/
- AI Server Market Size & Share, Growth Forecasts Report 2032 : https://www.gminsights.com/industry-analysis/ai-server-market
- AI Inference Server Market Forecast To 2032 : https://www.businessresearchinsights.com/market-reports/ai-inference-server-market-118293

## Deep Dive: Fireworks AI

**Fireworks AI provides a flexible inference platform optimized for deploying and fine-tuning large language models (LLMs), emphasizing ease of use, scalability, and performance customization.**

The platform supports two primary deployment modes: serverless inference and dedicated deployments. Serverless inference allows quick experimentation with popular pre-deployed models like Llama 3.1 405B, billed per token without guaranteed SLAs. Dedicated deployments offer private, GPU-based infrastructure with performance guarantees, supporting both base models and efficient Low-Rank Adaptation (LoRA) addons.

Fireworks AI's Document Inlining feature notably extends text-based models into multimodal capabilities, enabling visual reasoning tasks by seamlessly integrating image and PDF content. Performance optimization techniques include quantization, batching, and caching, tailored to specific use cases such as chatbots and coding assistants requiring low latency.

Competitively, Fireworks AI positions itself against providers like OpenAI and Cohere, with a recent Series B funding round of $52M, total funding of $77M, and estimated annual recurring revenue (ARR) around $6M.

- Founded: 2022
- Headquarters: Redwood City, CA
- Employees: ~60
- Key Investors: Sequoia Capital, NVIDIA, AMD Ventures

### Sources
- Overview - Fireworks AI Docs : https://docs.fireworks.ai/models/overview  
- Performance optimization - Fireworks AI Docs : https://docs.fireworks.ai/faq/deployment/performance/optimization  
- DeepSeek R1 Just Got Eyes with Fireworks AI Document Inlining : https://fireworks.ai/blog/deepseek-r1-got-eyes  
- Fireworks AI 2025 Company Profile: Valuation, Funding & Investors : https://pitchbook.com/profiles/company/561272-14  
- Fireworks AI: Contact Details, Revenue, Funding, Employees and Company Profile : https://siliconvalleyjournals.com/company/fireworks-ai/  
- Fireworks AI - Overview, News & Similar companies - ZoomInfo : https://www.zoominfo.com/c/fireworks-ai-inc/5000025791  
- Fireworks AI Stock Price, Funding, Valuation, Revenue & Financial : https://www.cbinsights.com/company/fireworks-ai/financials

## Deep Dive: Together.ai

**Together.ai differentiates itself in the AI inference market through its comprehensive cloud platform, optimized for rapid inference, extensive model selection, and flexible GPU infrastructure.**

Together.ai provides a robust cloud-based solution for training, fine-tuning, and deploying generative AI models, emphasizing high-performance inference capabilities. Its inference engine leverages proprietary technologies such as FlashAttention-3 and speculative decoding, achieving inference speeds up to four times faster than competitors. The platform supports over 100 open-source models, including popular large language models (LLMs) like Llama-2 and RedPajama, enabling developers to quickly experiment and deploy tailored AI solutions.

Together.ai's flexible GPU clusters, featuring NVIDIA H100 and H200 GPUs interconnected via high-speed Infiniband networks, facilitate scalable distributed training and inference workloads. This infrastructure positions Together.ai competitively against GPU cloud providers like CoreWeave and Lambda Labs, particularly for startups and enterprises requiring variable compute resources.

Financially, Together.ai has demonstrated rapid growth, reaching an estimated $130M ARR in 2024, driven by increasing demand for generative AI applications and developer-friendly tooling.

### Sources
- Together AI: Reviews, Features, Pricing, Guides, and Alternatives : https://aipure.ai/products/together-ai
- Together AI revenue, valuation & growth rate | Sacra : https://sacra.com/c/together-ai/
- AI Solutions with Together.ai: Inference, Fine-Tuning & Models : https://pwraitools.com/generative-ai-tools/ai-solutions-with-together-ai-inference-fine-tuning-and-models/

## Deep Dive: Groq

**Groq's vertically integrated Tensor Streaming Processor (TSP) architecture delivers unmatched inference performance and energy efficiency, significantly outperforming traditional GPUs.**

Groq's TSP chip achieves inference speeds of 500-700 tokens per second on large language models, representing a 5-10x improvement over Nvidia's latest GPUs. Independent benchmarks confirm Groq's LPU (Language Processing Unit) reaches 276 tokens per second on Meta's Llama 3.3 70B model, maintaining consistent performance across varying context lengths without typical latency trade-offs.

Groq's unique hardware-software co-design eliminates external memory dependencies, embedding memory directly on-chip. This approach reduces data movement, resulting in up to 10x greater energy efficiency compared to GPUs. GroqCloud, the company's cloud inference platform, supports popular open-source models and has attracted over 360,000 developers.

Financially, Groq has raised $640 million in a Series D round at a $2.8 billion valuation, reflecting strong market confidence. Groq plans to deploy over 108,000 LPUs by early 2025, positioning itself as a leading provider of low-latency AI inference infrastructure.

### Sources
- Groq revenue, valuation & funding | Sacra : https://sacra.com/c/groq/
- Groq Raises $640M To Meet Soaring Demand for Fast AI Inference : https://groq.com/news_press/groq-raises-640m-to-meet-soaring-demand-for-fast-ai-inference/
- New AI Inference Speed Benchmark for Llama 3.3 70B, Powered by Groq : https://groq.com/new-ai-inference-speed-benchmark-for-llama-3-3-70b-powered-by-groq/
- Groq Inference Performance, Quality, & Cost Savings : https://groq.com/inference/
- GroqThoughts PowerPaper 2024 : https://groq.com/wp-content/uploads/2024/07/GroqThoughts_PowerPaper_2024.pdf

## Comparative Analysis

**Fireworks AI, Together.ai, and Groq each offer distinct strengths in AI inference, targeting different market segments and performance needs.**

Fireworks AI emphasizes speed and scalability through its proprietary FireAttention inference engine, delivering multi-modal capabilities (text, image, audio) with low latency. It prioritizes data privacy, maintaining HIPAA and SOC2 compliance, and offers flexible deployment options including serverless and on-demand models.

Together.ai differentiates itself by providing optimized inference for over 200 open-source large language models (LLMs). It achieves sub-100ms latency through automated infrastructure optimizations such as token caching, load balancing, and model quantization. Its cost-effective approach makes it attractive for developers requiring extensive model variety and scalability.

Groq specializes in hardware-accelerated inference, leveraging its custom Tensor Streaming Processor (TSP) chip architecture. GroqCloud provides ultra-low latency inference performance (500-700 tokens/second), significantly outperforming traditional GPUs. Groq targets latency-sensitive enterprise applications, including conversational AI and autonomous systems, with both cloud and on-premises deployment options.

| Feature             | Fireworks AI                 | Together.ai                  | Groq                          |
|---------------------|------------------------------|------------------------------|-------------------------------|
| Technology          | Proprietary inference engine | Optimized open-source models | Custom hardware (TSP chips)   |
| Market Positioning  | Multi-modal, privacy-focused | Cost-effective, scalable     | Ultra-low latency enterprise  |
| Revenue Estimates   | Not publicly available       | Not publicly available       | $3.4M (2023)                  |
| Performance Metrics | Low latency, multi-modal     | Sub-100ms latency            | 500-700 tokens/sec inference  |

### Sources
- Fireworks AI vs GroqCloud Platform Comparison 2025 | PeerSpot : https://www.peerspot.com/products/comparisons/fireworks-ai_vs_groqcloud-platform
- Fireworks AI vs Together Inference Comparison 2025 | PeerSpot : https://www.peerspot.com/products/comparisons/fireworks-ai_vs_together-inference
- Top 10 AI Inference Platforms in 2025 - DEV Community : https://dev.to/lina_lam_9ee459f98b67e9d5/top-10-ai-inference-platforms-in-2025-56kd
- Groq revenue, valuation & funding | Sacra : https://sacra.com/c/groq/

## Conclusion and Synthesis

The AI inference market is rapidly expanding, projected to reach $166.7 billion by 2031, driven by demand for real-time processing and specialized hardware. Fireworks AI, Together.ai, and Groq each offer distinct competitive advantages:

| Feature            | Fireworks AI                      | Together.ai                      | Groq                             |
|--------------------|-----------------------------------|----------------------------------|----------------------------------|
| Core Strength      | Multi-modal, privacy-focused      | Extensive open-source support    | Custom hardware, ultra-low latency |
| Technology         | Proprietary inference engine      | Optimized GPU infrastructure     | Tensor Streaming Processor (TSP) |
| Revenue Estimates  | ~$6M ARR                          | ~$130M ARR                       | ~$3.4M ARR                       |
| Performance        | Low latency, flexible deployment  | Sub-100ms latency                | 500-700 tokens/sec inference     |

Next steps include monitoring Groq's hardware adoption, evaluating Together.ai's scalability for diverse models, and assessing Fireworks AI's multimodal capabilities for specialized enterprise applications.

================================================
FILE: examples/inference-market.md
================================================
# The AI Inference Market: Analyzing Emerging Leaders

The AI inference market is experiencing unprecedented growth, projected to reach $133.2 billion by 2034, as specialized providers challenge traditional semiconductor dominance. While established chip manufacturers control over 80% of the market, new entrants like Fireworks, Together.ai, and Groq are reshaping the competitive landscape through innovative approaches to inference optimization and pricing.

This analysis examines how these emerging players are disrupting the market through differentiated technologies, aggressive pricing strategies, and superior performance metrics, particularly in the rapidly expanding cloud-based inference segment that now represents 55% of total market share. Their success highlights a fundamental shift in how AI computation is being delivered and monetized.

## AI Inference Market Overview

**The global AI inference market is experiencing unprecedented growth, projected to reach $133.2 billion by 2034, with a transformative shift occurring in market dynamics as new specialized providers challenge traditional semiconductor dominance.**

While established chip manufacturers (NVIDIA, AMD, Intel) control 80-82% of the market, emerging players are gaining traction through differentiated approaches. The market expansion is particularly evident in cloud-based deployments, which now represent 55% of total market share.

Key factors driving market evolution include:
* Increasing demand for real-time processing capabilities
* Shift toward token-based pricing models
* Rising adoption of specialized AI hardware
* Growth in open-source model deployment
* Integration of edge computing solutions

North America maintains market leadership with 38% global share, generating $9.34 billion in revenue (2024). This dominance stems from robust digital infrastructure and concentrated presence of technology companies, particularly in the United States where revenue reaches $8.6 billion.

The market shows sustained growth potential, supported by ongoing infrastructure investments and technological innovation, particularly in cloud-based deployments where North America maintains clear leadership.

### Sources
- AI Inference Server Market Forecast : https://www.einpresswire.com/article/779610673/ai-inference-server-market-supports-new-technology-with-usd-133-2-billion-by-2034-regional-growth-at-usd-9-34-billion
- SemiAnalysis Market Report : https://semianalysis.com/2024/02/21/groq-inference-tokenomics-speed-but/
- Markets and Markets AI Inference Report : https://www.marketsandmarkets.com/Market-Reports/ai-inference-market-189921964.html

## Fireworks.ai Profile

**Fireworks.ai has emerged as a significant AI inference provider by focusing on performance optimization, reaching a $552M valuation in 2024 with an estimated $44M in annual revenue.** Their platform serves over 25 billion tokens daily to more than 23,000 developers through a tiered pricing structure that scales with usage.

The company's technical differentiation comes from custom optimizations like FireAttention, which demonstrates superior performance metrics compared to competitors. Benchmark tests show up to 5.6x higher throughput and 12.2x lower latency versus vLLM for Mixtral 8x7B models in fp8 format.

Their pricing model combines usage-based tiers with flexible deployment options:
* Basic tier: $50/month spending limit
* Growth tier: $500/month spending limit
* Scale tier: $5,000/month spending limit
* Enterprise tier: Custom limits with dedicated support
* On-demand GPU deployments: $2.90-$9.99 per hour

Notable enterprise customers including DoorDash, Quora, and Upwork validate their approach. Since founding in 2022, Fireworks has secured $77M in funding from investors like Benchmark and Sequoia Capital.

### Sources
- Fireworks AI Valued at $552M: https://www.pymnts.com/news/investment-tracker/2024/fireworks-ai-valued-552-million-dollars-after-new-funding-round/
- FireAttention v3 Performance Metrics: https://fireworks.ai/blog/fireattention-v3
- AWS Case Study: https://aws.amazon.com/solutions/case-studies/fireworks-ai-case-study/

## Together.ai Profile

**Together.ai has established itself as a major AI inference provider by combining competitive pricing with superior technical performance, reaching a $3.3B valuation in early 2024.** Their platform supports over 200 open-source models and serves both individual developers and enterprise customers through a tiered pricing structure.

The company's technical advantage stems from their integrated inference stack, which delivers up to 400 tokens per second on Llama models. This performance translates to significant cost savings, with their 70B parameter models priced at $0.88 per million tokens—substantially below market rates.

Their pricing strategy segments customers into three tiers:
- Build: Pay-as-you-go with $1 free credit for developers
- Scale: Reserved GPU instances for production workloads
- Enterprise: Private deployments with custom optimization

Notable enterprise adoption includes Salesforce, Zoom, and The Washington Post, validating their platform's capabilities. Together.ai's recent $305M Series B funding demonstrates strong market confidence in their approach to democratizing AI infrastructure.

### Sources
- Together.ai Series B Announcement: https://www.together.ai/blog/together-ai-announcing-305m-series-b
- Together.ai Pricing Strategy: https://canvasbusinessmodel.com/blogs/marketing-strategy/together-ai-marketing-strategy
- Salesforce Ventures Investment: https://salesforceventures.com/perspectives/welcome-together-ai/

## Groq Profile

**Groq's Language Processing Unit (LPU) represents a radical departure from traditional GPU architectures, delivering superior inference performance at significantly lower costs.** Their proprietary tensor-streaming processor achieves 241 tokens per second for Llama 2 Chat (70B), more than double competing solutions, while maintaining exceptional energy efficiency at 1-3 joules per token.

The company's aggressive pricing strategy undercuts competitors, offering Mixtral 8x7B inference at $0.24 per million tokens compared to Fireworks' $0.50. This pricing advantage stems from lower manufacturing costs ($6,000 per 14nm wafer vs. $16,000 for NVIDIA's 5nm H100) and architectural efficiencies.

Key competitive advantages:
- Superior inference speed: Up to 18x faster than cloud competitors
- Cost efficiency: $20,000 per LPU vs $25,000+ for NVIDIA H100
- Energy optimization: 80 TB/s bandwidth with 750 TOPS at INT8

Recently valued at $2.8 billion after raising $640M, Groq has gained significant traction with over 360,000 developers on GroqCloud. While 2023 revenue was modest at $3.4M, planned deployment of 108,000 LPUs by Q1 2025 positions them for substantial growth in the expanding inference market.

### Sources
- Groq Report Analysis: https://notice-reports.s3.amazonaws.com/Groq%20Report%202024.12.23_17.58.23.pdf
- SemiAnalysis Pricing Study: https://semianalysis.com/2024/02/21/groq-inference-tokenomics-speed-but/
- Groq Funding Announcement: https://www.prnewswire.com/news-releases/groq-raises-640m-to-meet-soaring-demand-for-fast-ai-inference-302214097.html

## Comparative Performance Analysis

**Recent benchmarks reveal Groq as the current performance leader in LLM inference, with Together.ai and Fireworks competing for second position across key metrics.** Independent testing from ArtificialAnalysis.ai shows significant variations in core performance indicators:

| Provider | TTFT (seconds) | Tokens/Second | Cost (per 1M tokens) |
|----------|---------------|---------------|---------------------|
| Groq | 0.22 | 241 | $0.27 |
| Together | 0.50 | 117 | $0.88 |
| Fireworks | 0.40 | 98 | $0.90 |

Performance advantages can vary significantly based on specific workloads and model sizes. Together.ai's Inference Engine 2.0 demonstrates strong performance with smaller models, while Fireworks maintains consistent performance across their model range.

A notable limitation emerges with larger inputs - Groq shows a 560% increase in TTFT when processing 10K versus 1K input tokens. This suggests optimal use cases may differ between providers despite headline performance metrics.

The competitive landscape remains dynamic, with providers regularly releasing optimization updates that can significantly impact these metrics.

### Sources
- ArtificialAnalysis.ai LLM Benchmark: https://wandb.ai/capecape/benchmark_llama_70b/reports/Is-the-new-Cerebras-API-the-fastest-LLM-service-provider
- Comparative Analysis of AI API Providers: https://friendli.ai/blog/comparative-analysis-ai-api-provider
- Together Inference Engine Analysis: https://www.together.ai/blog/together-inference-engine-v1

## Conclusion and Market Outlook

The AI inference market is rapidly evolving with specialized providers challenging traditional semiconductor dominance. Our analysis reveals distinct competitive advantages among emerging leaders:

| Provider | Key Strength | Performance | Pricing | Market Position |
|----------|--------------|-------------|----------|-----------------|
| Groq | Custom LPU Architecture | 241 tokens/sec | $0.24/M tokens | $2.8B valuation, disruptive hardware |
| Together.ai | Model Variety | 117 tokens/sec | $0.88/M tokens | $3.3B valuation, broad adoption |
| Fireworks | Optimization Tech | 98 tokens/sec | $0.90/M tokens | $552M valuation, developer focus |

Looking ahead, Groq's superior performance metrics and aggressive pricing position them to capture significant market share, particularly in high-throughput applications. Together.ai's extensive model support and enterprise relationships suggest continued growth in the mid-market segment, while Fireworks' optimization technology provides a strong foundation for specialized use cases. As the market expands toward $133.2B by 2034, these providers are well-positioned to challenge NVIDIA's dominance through differentiated approaches to inference delivery.

================================================
FILE: examples/pubmed.md
================================================
# Diabetic Nephropathy Treatment: Current Approaches and Future Directions

Diabetic nephropathy has emerged as the leading cause of end-stage renal disease worldwide, affecting approximately 40% of diabetes patients. The condition's progressive nature and complex pathophysiology demand early intervention through comprehensive treatment strategies. Recent advances in therapeutic options, from SGLT2 inhibitors to non-steroidal mineralocorticoid receptor antagonists, have transformed the management landscape. This report examines current treatment protocols, emerging therapies, and diagnostic approaches, with particular emphasis on the growing importance of personalized medicine and integrated care models in improving patient outcomes.

## Key Treatment Advances and Future Directions

Modern diabetic nephropathy management has evolved into a sophisticated, multi-faceted approach that combines established treatments with innovative therapies. The emergence of the four-pillar treatment strategy, incorporating RAS blockers, SGLT2 inhibitors, GLP-1 receptor agonists, and finerenone, represents a significant advancement in care standards. Technological progress in diagnostic tools, particularly multiparametric MRI and novel biomarkers, enables earlier intervention and more precise monitoring of disease progression.

Key developments driving treatment evolution:
* Integration of multiple therapeutic agents for enhanced outcomes
* Adoption of personalized medicine approaches using proteomics
* Implementation of comprehensive care models showing cost-effective results
* Advanced imaging techniques enabling non-invasive monitoring
* Emergence of novel biomarkers for earlier detection

The future of diabetic nephropathy treatment lies in closing the evidence-to-practice gap and expanding access to these advanced therapeutic options.

## Prevalence and Mechanisms of Diabetic Nephropathy

**Diabetic nephropathy has become the leading cause of end-stage renal disease worldwide, affecting approximately 40% of diabetes patients and contributing to 38% of renal disease cases in regions like the Philippines.**

The pathogenesis involves complex interactions between metabolic and hemodynamic factors. Hyperglycemia triggers increased production of advanced glycation end-products (AGEs) and activates inflammatory pathways, while concurrent hypertension amplifies kidney damage through elevated glomerular pressure. The condition typically develops over 10-15 years as these mechanisms progressively damage the kidney's filtering system.

Key risk factors that accelerate nephropathy progression include:
* Poorly controlled blood glucose (HbA1c >7%)
* Sustained hypertension (>130/80 mmHg)
* Genetic variants in ACE and APOL1 genes
* Obesity and smoking
* Limited access to regular screening

Recent guidelines from KDIGO emphasize the importance of early detection and holistic care through multidisciplinary teams. The initial presentation typically involves microalbuminuria, which can progress to overt proteinuria and declining glomerular filtration rate without intervention. Research shows that aggressive early treatment can delay or prevent progression, particularly when addressing both glycemic control and blood pressure management.

### Sources
- Diabetic Nephropathy: StatPearls : https://pubmed.ncbi.nlm.nih.gov/30480939/
- Current status of diabetes mellitus care in the Philippines : https://pubmed.ncbi.nlm.nih.gov/38382166/
- Lifestyle Modifications in Delaying CKD Progression : https://pubmed.ncbi.nlm.nih.gov/36874334/

## Biomarkers for Early Detection of Diabetic Nephropathy

**The landscape of diabetic nephropathy detection is rapidly evolving beyond traditional microalbuminuria testing, as emerging biomarkers offer more precise and earlier disease identification.** While microalbuminuria remains the clinical standard, its limited predictive power has driven research into more sophisticated detection methods.

Recent studies have identified several promising biomarker categories that can detect kidney damage before albumin changes become apparent. These include markers of specific nephron damage sites, oxidative stress indicators, and inflammatory signals. A comprehensive 2024 review highlighted five key biomarker categories:

- Glomerular damage markers
- Tubular damage indicators
- Oxidative stress biomarkers
- Inflammatory biomarkers
- Novel molecular markers (miRNAs, proteomics, metabolomics)

A significant advancement comes from combining multiple biomarker types. For example, integrating serum creatinine with cystatin C measurements has demonstrated superior accuracy in detecting early kidney dysfunction, particularly when using newer race-free prediction equations. This multi-marker approach reflects the complex pathophysiology of diabetic kidney disease and enables more personalized intervention strategies.

### Sources
- Insights into the Novel Biomarkers Expressed in Diabetic Nephropathy (2024): https://pubmed.ncbi.nlm.nih.gov/39415582/
- Diagnostic challenges of diabetic kidney disease (2023): https://pubmed.ncbi.nlm.nih.gov/37545693/
- Urinary biomarkers for early diabetic nephropathy (2014): https://pubmed.ncbi.nlm.nih.gov/25060761/

## Treatment Protocols for Diabetic Nephropathy

**Modern diabetic nephropathy management requires a comprehensive approach combining established treatments with emerging therapeutic options to effectively slow disease progression and protect kidney function.** The foundation remains strict glycemic control (HbA1c <7%) and blood pressure management (<130/80 mmHg in patients with albuminuria).

Renin-angiotensin system (RAS) blockers, particularly ACE inhibitors and ARBs, continue as first-line treatments for their dual action on blood pressure and nephroprotection. Recent evidence supports combination therapy with newer agents for enhanced outcomes.

Key therapeutic advances include:
* SGLT2 inhibitors (dapagliflozin, empagliflozin) - reduce disease progression by promoting urinary potassium excretion and normalizing plasma potassium levels
* Non-steroidal mineralocorticoid receptor antagonists (finerenone) - decrease albuminuria and cardiovascular complications
* Lifestyle modifications - Mediterranean diet adherence and regular exercise show significant benefits
* Antioxidant interventions - target oxidative stress mechanisms

The SONAR trial demonstrated that atrasentan, an endothelin receptor antagonist, significantly decreased renal events in diabetic kidney disease patients. Regular monitoring of kidney function, albuminuria, and electrolyte levels remains essential for optimizing treatment outcomes.

### Sources
- What Not to Overlook in the Management of Patients with Type 2 Diabetes Mellitus: https://pubmed.ncbi.nlm.nih.gov/39062970/
- Lifestyle Modifications and Nutritional and Therapeutic Interventions: https://pubmed.ncbi.nlm.nih.gov/36874334/
- Diabetic Kidney Disease: https://pubmed.ncbi.nlm.nih.gov/25905328/
- Impaired distal renal potassium handling in diabetic mice: https://pubmed.ncbi.nlm.nih.gov/38779755/

## Recent Advances in Diabetic Nephropathy Treatment

**The emergence of a four-pillar treatment approach represents a paradigm shift in diabetic nephropathy management, moving beyond the traditional reliance on RAS blockade alone to include multiple complementary therapeutic agents.** This comprehensive strategy has demonstrated superior cardiorenal protection compared to single-agent approaches.

The four essential pillars of modern treatment include:

* RAS blockers (ACE inhibitors/ARBs) as foundational therapy
* SGLT2 inhibitors for reducing kidney disease progression
* GLP-1 receptor agonists for glycemic control and renoprotection
* Finerenone, a non-steroidal mineralocorticoid receptor antagonist, for additional protection

Recent clinical trials suggest that combining these therapies may provide additive benefits, though ongoing studies are still evaluating optimal combinations. The PRIORITY study exemplifies the movement toward personalized medicine, using urinary proteomics to predict treatment response and guide therapy selection.

Implementation challenges persist, with many eligible patients not receiving recommended combinations. Healthcare systems are addressing this through specialized clinics and electronic health record-based decision support tools to narrow the evidence-to-practice gap.

### Sources
- Finerenone: Do We Really Need an Additional Therapy in Type 2 Diabetes Mellitus and Kidney Disease?: https://pubmed.ncbi.nlm.nih.gov/39862018/
- Slowing the Progression of Chronic Kidney Disease in Patients with Type 2 Diabetes Using Four Pillars of Therapy: https://pubmed.ncbi.nlm.nih.gov/39259460/
- Updated evidence on cardiovascular and renal effects of GLP-1 receptor agonists: https://pubmed.ncbi.nlm.nih.gov/39548500/

## Noninvasive MRI Techniques for Diabetic Nephropathy Assessment

**Multiparametric MRI represents a breakthrough in noninvasive renal assessment, enabling detailed evaluation of kidney structure and function without radiation or contrast agents.** This technology combines multiple specialized imaging sequences to provide comprehensive insights into kidney health.

The diffusion-weighted imaging (DWI) sequence measures water molecule movement, offering early detection of interstitial fibrosis and predictive value for renal function deterioration in diabetic nephropathy. Blood oxygen level-dependent (BOLD) MRI assesses tissue oxygenation by detecting deoxyhemoglobin levels, proving particularly valuable for monitoring chronic kidney disease progression.

Key MRI sequences and their clinical applications:
- T1/T2 Relaxometry: Evaluates tissue water content and fibrosis; corticomedullary changes correlate with filtration rate
- DWI: Measures microstructural changes and fibrosis development
- BOLD: Monitors tissue oxygenation and predicts functional decline
- Arterial Spin Labeling: Assesses renal hemodynamics without contrast

While these techniques show promise for early disease detection and monitoring, further clinical trials are needed before widespread implementation. The technology's potential for personalized treatment decisions and virtual biopsy capabilities represents a significant advance in diabetic nephropathy management.

### Sources
- Multiparametric MRI: can we assess renal function differently? (2024): https://pubmed.ncbi.nlm.nih.gov/40008350/
- Noninvasive Assessment of Diabetic Kidney Disease With MRI: Hype or Hope? (2023): https://pubmed.ncbi.nlm.nih.gov/37675919/

## Integrated Care and Systemic Challenges in Diabetic Nephropathy Management

**Quality improvement collaboratives in integrated diabetes care settings can significantly improve patient outcomes while remaining cost-effective, with studies showing increased life expectancy of nearly one year for male patients and 0.76 years for female patients.** The success of such integrated approaches demonstrates the critical importance of coordinated care between specialists in managing diabetic nephropathy.

However, implementing effective integrated care faces several systemic barriers that must be addressed:

* Limited specialist availability in rural regions
* Poor communication between healthcare providers
* Insurance coverage restrictions
* Lack of standardized protocols
* Delayed specialist referrals

A notable example comes from a Netherlands study of integrated diabetes care across 37 general practices and 13 outpatient clinics. Their collaborative care model reduced cardiovascular event risk (hazard ratio: 0.83 for men, 0.98 for women) and cardiovascular mortality (hazard ratio: 0.78 for men, 0.88 for women). The program cost approximately €22 per patient initially, with lifetime costs increasing by €860 for men and €645 for women – proving highly cost-effective at under €2,000 per quality-adjusted life year.

### Sources
- Cost-effectiveness of a quality improvement collaborative focusing on patients with diabetes: https://pubmed.ncbi.nlm.nih.gov/20808258/

# Diabetic Nephropathy Treatment: Current Approaches and Future Directions

Diabetic nephropathy has emerged as the leading cause of end-stage renal disease globally, affecting 40% of diabetes patients and demanding increasingly sophisticated treatment approaches. The evolution of treatment strategies from single-agent protocols to comprehensive four-pillar approaches, combined with advances in early detection and monitoring, has transformed the management landscape. This report examines current best practices, emerging therapies, and the critical role of integrated care in improving patient outcomes.

## Key Findings and Treatment Framework

Modern diabetic nephropathy management has evolved into a multi-faceted approach requiring careful coordination of therapeutic strategies. The evidence supports a structured treatment framework that combines established protocols with emerging innovations.

* Foundation Treatments
  - Glycemic control (HbA1c <7%)
  - Blood pressure management (<130/80 mmHg)
  - RAS blockers (ACE inhibitors/ARBs)
  - Lifestyle modifications

* Emerging Therapeutic Advances
  - SGLT2 inhibitors for disease progression
  - Non-steroidal mineralocorticoid receptor antagonists
  - GLP-1 receptor agonists
  - Multiparametric MRI for monitoring

The path forward requires addressing implementation challenges through integrated care models while leveraging new diagnostic tools and biomarkers for earlier intervention. Success depends on bridging the evidence-to-practice gap through specialized clinics and improved coordination among healthcare providers.

================================================
FILE: langgraph.json
================================================
{
    "dockerfile_lines": [],
    "graphs": {
      "Deep Researcher": "./src/open_deep_research/deep_researcher.py:deep_researcher"
    },
    "python_version": "3.11",
    "env": "./.env",
    "dependencies": [
      "."
    ],
    "auth": {
      "path": "./src/security/auth.py:auth"
    }
}

================================================
FILE: pyproject.toml
================================================
[project]
name = "open_deep_research"
version = "0.0.16"
description = "Planning, research, and report generation."
authors = [
    { name = "Lance Martin" }
]
readme = "README.md"
license = { text = "MIT" }
requires-python = ">=3.10"
dependencies = [
    "langgraph>=0.5.4",
    "langchain-community>=0.3.9",
    "langchain-openai>=0.3.28",
    "langchain-anthropic>=0.3.15",
    "langchain-mcp-adapters>=0.1.6",
    "langchain-deepseek>=0.1.2",
    "langchain-tavily",
    "langchain-groq>=0.2.4",
    "openai>=1.99.2",
    "tavily-python>=0.5.0",
    "arxiv>=2.1.3",
    "pymupdf>=1.25.3",
    "xmltodict>=0.14.2",
    "linkup-sdk>=0.2.3",
    "duckduckgo-search>=3.0.0",
    "exa-py>=1.8.8",
    "requests>=2.32.3",
    "beautifulsoup4==4.14.3",
    "python-dotenv>=1.0.1",
    "pytest",
    "httpx>=0.24.0",
    "markdownify>=0.11.6",
    "azure-identity>=1.21.0",
    "azure-search>=1.0.0b2",
    "azure-search-documents>=11.5.2",
    "rich>=13.0.0",
    "langgraph-cli[inmem]>=0.3.1",
    "langsmith>=0.3.37",
    "langchain-google-vertexai>=2.0.25",
    "langchain-google-genai>=2.1.5",
    "ipykernel>=6.29.5",
    "supabase>=2.15.3",
    "mcp>=1.9.4",
    "langchain-aws>=0.2.28",
    "pandas>=2.3.1",
]

[project.optional-dependencies]
dev = ["mypy>=1.11.1", "ruff>=0.6.1"]

[build-system]
requires = ["setuptools>=73.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = ["open_deep_research", "legacy", "tests"]

[tool.setuptools.package-dir]
"open_deep_research" = "src/open_deep_research"
"legacy" = "src/legacy"
"tests" = "tests"

[tool.setuptools.package-data]
"*" = ["py.typed"]

[tool.ruff]
lint.select = [
    "E",    # pycodestyle
    "F",    # pyflakes
    "I",    # isort
    "D",    # pydocstyle
    "D401", # First line should be in imperative mood
    "T201",
    "UP",
]
lint.ignore = [
    "UP006",
    "UP007",
    "UP035",
    "D417",
    "E501",
]

[tool.ruff.lint.per-file-ignores]
"tests/*" = ["D", "UP"]

[tool.ruff.lint.pydocstyle]
convention = "google"


================================================
FILE: src/legacy/CLAUDE.md
================================================
# Open Deep Research

## About Open Deep Research

Open Deep Research is an experimental, fully open-source research assistant that automates deep research and produces comprehensive reports on any topic. It's designed to help researchers, analysts, and curious individuals generate detailed, well-sourced reports without manual research overhead.

### Key Features
- **Automated Research**: Searches multiple sources (web, academic papers, specialized databases)
- **Comprehensive Reports**: Generates structured markdown reports with proper citations
- **Multiple Search APIs**: Supports Tavily, Perplexity, Exa, ArXiv, PubMed, DuckDuckGo, and more
- **Flexible Models**: Compatible with any LLM that supports the `init_chat_model()` API
- **Quality Evaluation**: Built-in evaluation systems to assess report quality

## Two Research Implementations

Open Deep Research offers two distinct approaches to automated research, each with unique advantages:

### 1. Graph-based Workflow Implementation

The **graph-based implementation** (`src/open_deep_research/graph.py`) follows a structured plan-and-execute workflow:

**Characteristics:**
- **Interactive Planning**: Uses a planner model to generate a structured report outline
- **Human-in-the-Loop**: Allows review and feedback on the report plan before execution
- **Sequential Process**: Creates sections one by one with reflection between iterations
- **Quality Focus**: Emphasizes report accuracy and structure through iterative refinement

**Best for:**
- High-stakes research where accuracy is critical
- Reports requiring specific structure or customization
- Situations where you want control over the research process
- Academic or professional research contexts

### 2. Multi-Agent Implementation

The **multi-agent implementation** (`src/open_deep_research/multi_agent.py`) uses a supervisor-researcher architecture:

**Characteristics:**
- **Supervisor Agent**: Manages overall research process and assembles final report
- **Parallel Research**: Multiple researcher agents work simultaneously on different sections
- **Speed Optimized**: Significantly faster due to parallel processing
- **Tool Specialization**: Each agent has specific tools for their role

**Best for:**
- Quick research and rapid report generation
- Exploratory research where speed matters
- Situations with less need for human oversight
- Business intelligence and market research

## Quality Evaluation

This guide explains how to quickly test and evaluate the quality of reports generated by Open Deep Research using the pytest evaluation system. The pytest evaluation system provides an easy way to:
- Test both research agent implementations (multi-agent and graph-based)
- Get immediate visual feedback with rich console output
- Verify report quality against 9 comprehensive criteria
- Compare different model configurations
- Track results in LangSmith for analysis

### Test Specific Agent
```bash
# Test only the multi-agent implementation
python tests/run_test.py --agent multi_agent

# Test only the graph-based implementation  
python tests/run_test.py --agent graph
```

## Understanding the Output

### Console Output
The evaluation provides rich visual feedback including:

1. **Test Configuration Panel**: Shows which agent and search API are being tested
2. **Model Configuration Table**: Displays all model settings in a formatted table
3. **Report Generation Status**: Real-time feedback during report creation
4. **Generated Report Display**: Full report rendered in markdown format
5. **Evaluation Results**: 
   - **PASSED/FAILED** status in color-coded panel
   - **Report Structure Analysis**: Table showing section headers
   - **Evaluation Justification**: Detailed explanation from the evaluator

### What Gets Evaluated

The system checks reports against 9 quality criteria:

1. **Topic Relevance (Overall)**: Does the report address the input topic thoroughly?
2. **Section Relevance (Critical)**: Are all sections directly relevant to the main topic?
3. **Structure and Flow**: Do sections flow logically and create a cohesive narrative?
4. **Introduction Quality**: Does the introduction provide context and scope?
5. **Conclusion Quality**: Does the conclusion summarize key findings?
6. **Structural Elements**: Proper use of tables, lists, etc.
7. **Section Headers**: Correct Markdown formatting (# for title, ## for sections)
8. **Citations**: Proper source citation in each main body section
9. **Overall Quality**: Well-researched, accurate, and professionally written

================================================
FILE: src/legacy/__init__.py
================================================
"""Planning, research, and report generation."""

__version__ = "0.0.15"

================================================
FILE: src/legacy/configuration.py
================================================
import os
from enum import Enum
from dataclasses import dataclass, fields
from typing import Any, Optional, Dict, Literal

from langchain_core.runnables import RunnableConfig

DEFAULT_REPORT_STRUCTURE = """Use this structure to create a report on the user-provided topic:

1. Introduction (no research needed)
   - Brief overview of the topic area

2. Main Body Sections:
   - Each section should focus on a sub-topic of the user-provided topic
   
3. Conclusion
   - Aim for 1 structural element (either a list or table) that distills the main body sections 
   - Provide a concise summary of the report"""

class SearchAPI(Enum):
    PERPLEXITY = "perplexity"
    TAVILY = "tavily"
    EXA = "exa"
    ARXIV = "arxiv"
    PUBMED = "pubmed"
    LINKUP = "linkup"
    DUCKDUCKGO = "duckduckgo"
    GOOGLESEARCH = "googlesearch"
    NONE = "none"

@dataclass(kw_only=True)
class Configuration:
    """Configuration for the workflow/graph-based implementation (graph.py)."""
    # Common configuration
    report_structure: str = DEFAULT_REPORT_STRUCTURE
    search_api: SearchAPI = SearchAPI.TAVILY
    search_api_config: Optional[Dict[str, Any]] = None
    process_search_results: Literal["summarize", "split_and_rerank"] | None = None
    summarization_model_provider: str = "openai"
    summarization_model: str = "gpt-4.1"
    max_structured_output_retries: int = 3
    include_source_str: bool = False
    
    # Workflow-specific configuration
    number_of_queries: int = 2 # Number of search queries to generate per iteration
    max_search_depth: int = 2 # Maximum number of reflection + search iterations
    planner_provider: str = "anthropic"
    planner_model: str = "claude-3-7-sonnet-latest"
    planner_model_kwargs: Optional[Dict[str, Any]] = None
    writer_provider: str = "openai"
    writer_model: str = "gpt-4.1"
    writer_model_kwargs: Optional[Dict[str, Any]] = None

    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> "Configuration":
        """Create a Configuration instance from a RunnableConfig."""
        configurable = (
            config["configurable"] if config and "configurable" in config else {}
        )
        values: dict[str, Any] = {
            f.name: os.environ.get(f.name.upper(), configurable.get(f.name))
            for f in fields(cls)
            if f.init
        }
        return cls(**{k: v for k, v in values.items() if v})

@dataclass(kw_only=True)
class MultiAgentConfiguration:
    """Configuration for the multi-agent implementation (multi_agent.py)."""
    # Common configuration
    search_api: SearchAPI = SearchAPI.TAVILY
    search_api_config: Optional[Dict[str, Any]] = None
    process_search_results: Literal["summarize", "split_and_rerank"] | None = None
    summarization_model_provider: str = "openai"
    summarization_model: str = "gpt-4.1"
    include_source_str: bool = False
    
    # Multi-agent specific configuration
    number_of_queries: int = 2 # Number of search queries to generate per section
    supervisor_model: str = "anthropic:claude-sonnet-4-20250514"
    researcher_model: str = "anthropic:claude-sonnet-4-20250514"
    ask_for_clarification: bool = False # Whether to ask for clarification from the user
    # MCP server configuration
    mcp_server_config: Optional[Dict[str, Any]] = None
    mcp_prompt: Optional[str] = None
    mcp_tools_to_include: Optional[list[str]] = None

    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> "MultiAgentConfiguration":
        """Create a MultiAgentConfiguration instance from a RunnableConfig."""
        configurable = (
            config["configurable"] if config and "configurable" in config else {}
        )
        values: dict[str, Any] = {
            f.name: os.environ.get(f.name.upper(), configurable.get(f.name))
            for f in fields(cls)
            if f.init
        }
        return cls(**{k: v for k, v in values.items() if v})

# Keep the old Configuration class for backward compatibility
Configuration = Configuration


================================================
FILE: src/legacy/files/vibe_code.md
================================================
# Vibe coding MenuGen

Andrej Karpathy

Very often, I sit down at a restaurant, look through their menu, and feel... kind of stuck. What is Pâté again? What is a Tagine? Cavatappi... that's a pasta right? Sweetbread sounds delicious (I have a huge sweet tooth). It can get really out of hand sometimes. "Confit tubers folded with matured curd and finished with a beurre noisette infusion." okay so... what is this exactly? I've spent so much of my life googling pictures of foods that when the time came to attend a recent vibe coding hackathon, I knew it was the perfect opportunity to finally build the app I always wanted, but could nowhere find. And here it is in flesh, I call it... 🥁🥁🥁 ... MenuGen:

Screenshot 2025-04-26 at 1

MenuGen is super simple. You take a picture of a menu and it generates images for all the menu items. It visualizes the menu. Obviously it's not exactly what you will be served in that specific restaurant, but it gives you the basic idea: Some of these dishes are salads, this is a fish, this is a soup, etc. I found it so helpful in my personal use that after the hackathon (where I got the first version to work on localhost) I continued vibe coding a bit to deploy it, add authentication, payments, and generally make it real. So here it is, give it a shot the next time you go out :): menugen.app!

MenuGen is my first end-to-end vibe coded app, where I (someone who tinkers but has little to no actual web development experience) went from scratch all the way to a real product that people can sign up for, pay for, get utility out of, and where I pocket some good and honest 10% markup. It's pretty cool. But in addition to the utility of the app, MenuGen was interesting to me as an exploration of vibe coding apps and how feasible it is today. As such, I did not write any code directly; 100% of the code was written by Cursor+Claude and I basically don't really know how MenuGen works in the conventional sense that I am used to. So now that the project is "done" (as in the first version seems to work), I wanted to write up this quick post on my experience - what it looks like today for a non-webdev to vibe code a web app.

First, local version. In what is a relatively common experience in vibe coding, the very first prototype of the app running on my local machine took very little time. I took Cursor + Claude 3.7, I gave it the description of the app, and it wrote all the React frontend components very quickly, laying out a beautiful web page with smooth, multicolored fonts, little CSS animations, responsive design and all that, except for the actual backend functionality. Seeing a new website materialize so quickly is a strong hook. I felt like I was 80% done but (foreshadowing...) it was a bit closer to 20%.

OpenAI API. Around here is where some of the troubles started. I needed to call OpenAI APIs to OCR the menu items from the image. I had to get the OpenAI API keys. I had to navigate slightly convoluted menus asking me about "projects" and detailed permissions. Claude kept hallucinating deprecated APIs, model names, and input/output conventions that have all changed recently, which was confusing, but it resolved them after I copy pasted the docs back and forth for a while. Once the individual API calls were working, I immediately ran into some heavy rate limiting of the API calls, allowing me to only issue a few queries every 10 minutes.

Replicate API. Next, I needed to generate images given the descriptions. I signed up for a new Replicate API key and ran into similar issues relatively quickly. My queries didn't work because LLM knowledge was deprecated, but in addition, this time even the official docs were a little bit out of date due to recent changes in the API, which now don't return the JSON directly but instead some kind of a Streaming object that neither I or Claude understood. I then faced rate limiting on the API so it was difficult to debug the app. I was told later that these are common protection measures by these services to mitigate fraud, but they also make it harder to get started with new, legitimate accounts. I'm told Replicate is moving to a different approach where you pre-purchase credits, which might help going forward.

Vercel deploy. At this point at least, the app was working locally so I was quite happy. It was time to deploy the basic first version. Sign up for Vercel, add project, configure it, point it at my GitHub repo, push to master, watch a new Deployment build and... ERROR. The logs showed some linting errors due to unused variables and other basic things like that, but it was hard to understand or debug because everything worked fine on local and only broke on Vercel build, so I debugged the issues by pushing fake debugging commits to master to force redeploys. Once I fixed these issues, the site still refused to work. I asked Claude. I asked ChatGPT. I consulted docs. I googled around. 1 hour later I finally realized my silly mistake - My .env.local file stored the API keys to OpenAI and Replicate, but this file is (correctly!) part of .gitignore and doesn't get pushed to git, so you have to manually navigate to Vercel project settings, find the right place, and add your environment keys manually. I kind of understood the issue relatively quickly, but I could see an aspiring vibe coder get stuck on this for a while. Once the deployment finally succeeded, Vercel happily offered a URL. This surprised me again because my project was a private git repo that was not ready to see the light of day. I didn't realize that Vercel will take your !private! repo of an unfinished project and auto-deploy it on a totally public and easy to guess url just like that, hah.

Clerk authentication. Claude suggested that we use Clerk for authentication, so I went along with it. Signed up for Clerk, configured the project, got my API keys. At this point Claude hallucinated about 1000 lines of code that appeared to be deprecated Clerk APIs. I had to copy paste a lot of the docs back and forth to get things gradually unstuck. Next, so far, Clerk was running in a "Development" deployment. To move to a "Production" deployment, there were more hoops to jump through. Clerk demands that you host your app on a custom domain that you own. menugen.vercel.com will not work. So I had to purchase the domain name menugen.app. Then I had to wire the domain to my Vercel project. Then I had to change the DNS records. Then I had to pick an OAuth provider, e.g. I went with Google. But to do that was its own configuration adventure . I had to enable an "SSO connection". I had to go over to Google Cloud Console and create a new project, and add a new OAuth Credential. I had to wait some time for an approval process around here. I then had to go back and forth between the nested settings of all of Vercel, Clerk and Google for a while to wire it up properly. I thought of quitting the project around here, but I felt better when I woke up the next morning.

Stripe payments. Next I wanted to add payments so that people can purchase credits. This means another website, another account, more docs, more keys. I select "Next.js" as the backend, copy paste the very first snippet of code from the "getting started" docs into my app and... ERROR. I realized later that Stripe gives you JavaScript code when you select Next.js, but my app is built in TypeScript, so every time I pasted a snippet of code it made Cursor unhappy with linter errors, but Claude patched things up ok over time after I told it to "fix errors" a few times and after I threatened to switch to ChatGPT. Then back in the Stripe dashboard we create a Product, we create a Price, we find the price key (not the product key!), copy paste all the keys around. Around here, I caught Claude using a really bad idea approach to match up a successful Stripe payment to user credits (it tried to match up the email addresses, but the email the user might give in the Stripe checkout may not be the email of the Google account they signed up with, so the user might not actually get the credits that they purchased). I point this out to Claude and it immediately apologizes and rewrites it correctly by passing around unique user ids in the request metadata. It thanks me for pointing out the issue and tells me that it will do it correctly in the future, which I know is just gaslighting. But since our quick test works, only a few more clicks to upgrade the deployment from Development to Production, now re-do a new Product, redo a new Price, re-copy paste all the keys and ids, locally and in the Vercel settings... and then it worked :)

Database? Work queues? So far, all of the processing is done "in the moment" - it's just requests and results right there and then, nothing is cached, saved, or etc. So the results are ephemeral and if the response takes too long (e.g. because the menu is too long and has too many items, or because the APIs show too much latency), the request can time out and break. If you refresh the page, everything is gone too. The correct way to do this is to have a database where we register and keep track of work, and the client just displays the latest state as it's ready. I realized I'd have to connect a database from the Marketplace, something like Supabase PostgreSQL (even when Claude pitched me on using Vercel KV, which I know is actually deprecated). And then we'd also need some queue service like Upstash or so to run the actual processing. It would mean more services. More logins. More API keys. More configurations. More docs. More suffering. It was too much bear. Leave as future work.

TLDR. Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app. Building a modern app is a bit like assembling IKEA future. There are all these services, docs, API keys, configurations, dev/prod deployments, team and security features, rate limits, pricing tiers... Meanwhile the LLMs have slightly outdated knowledge of everything, they make subtle but critical design mistakes when you watch them closely, and sometimes they hallucinate or gaslight you about solutions. But the most interesting part to me was that I didn't even spend all that much work in the code editor itself. I spent most of it in the browser, moving between tabs and settings and configuring and gluing a monster. All of this work and state is not even accessible or manipulatable by an LLM - how are we supposed to be automating society by 2027 like this?

Going forward. As an exploration of what it's like to vibe code an app today if you have little to no web dev background, I'm left with an equal mix of amazement (it's actually possible and much easier/faster than what was possible before!) and a bit of frustration of what could be. Part of the pain of course is that none of this infrastructure was really designed to be used like this. The intended target audience are teams of professional web developers living in a pre-LLM world. Not vibe coding solo devs prototyping apps. Some thoughts on solutions that could make super simple apps like MenuGen a lot easier to create:

Some app development platform could come with all the batteries included. Something that looks like the opposite of Vercel Marketplace. Something opinionated, concrete, preconfigured with all the basics that everyone wants: domain, hosting, authentication, payments, database, server functions. If some service made these easy and "just work" out of the box, it could be amazing.
All of these services could become more LLM friendly. Everything you tell the user will be basically right away copy pasted to an LLM, so you might as well talk directly to the LLM. Your service could have a CLI tool. The backend could be configured with curl commands. The docs could be Markdown. All of these are ergonomically a lot friendlier surfaces and abstractions for an LLM. Don't talk to a developer. Don't ask a developer to visit, look, or click. Instruct and empower their LLM.
For my next app I'm considering rolling with basic HTML/CSS/JS + Python backend (FastAPI + Fly.io style or so?), something a lot simpler than the serverless multiverse of "modern web development". It's possible that a simple app like MenuGen (or apps like it) could have been significantly easier in that paradigm.
Finally, it's quite likely that MenuGen shouldn't be a full-featured app at all. The "app" is simply one call to GPT to OCR a menu, and then a for loop over results to generate the images for each item and present them nicely to the user. This almost sounds like a simple custom GPT (in the terminology of the original GPT "app store" that OpenAI released earlier). Could MenuGen be just a prompt? Could the LLM respond not with text but with a simple webpage to present the results, along the lines of Artifacts? Could many other apps look like this too? Could I publish it as an app on a store and earn markup in the same way?
For now, I'm pretty happy to have vibe coded my first super custom app through the finish line of something that is real, solves a need I've had for a long time, and is shareable with friends. Thank you to all the services above that I've used to build it. In principle, it could earn some $ if others like it too, in a completely passive way - the @levelsio dream. Ultimately, vibe coding full web apps today is kind of messy and not a good idea for anything of actual importance. But there are clear hints of greatness and I think the industry just needs a bit of time to adapt to the new world of LLMs. I'm personally quite excited to see the barrier to app drop to ~zero, where anyone could build and publish an app just as easily as they can make a TikTok. These kinds of hyper-custom automations could become a beautiful new canvas for human creativity.

================================================
FILE: src/legacy/graph.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Research Workflow\n",
    "\n",
    "This notebook demonstrates the research [workflow](https://langchain-ai.github.io/langgraph/tutorials/workflows/) that creates comprehensive reports through a series of focused steps. The system:\n",
    "\n",
    "1. Uses a **graph workflow** with specialized nodes for each report creation stage\n",
    "2. Enables user **feedback and approval** at critical planning points \n",
    "3. Produces a well-structured report with introduction, researched body sections, and conclusion\n",
    "\n",
    "## From repo "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/rlm/Desktop/Code/open_deep_research/src\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/rlm/Desktop/Code/open_deep_research/open-deep-research-env/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
      "  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
     ]
    }
   ],
   "source": [
    "%cd ..\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## From package "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.0.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! pip install -U -q open-deep-research"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compile the Graph-Based Research Workflow\n",
    "\n",
    "The next step is to compile the LangGraph workflow that orchestrates the report creation process. This defines the sequence of operations and decision points in the research pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required modules and initialize the builder from open_deep_research\n",
    "import uuid \n",
    "import os, getpass\n",
    "import open_deep_research   \n",
    "print(open_deep_research.__version__) \n",
    "from IPython.display import Image, display, Markdown\n",
    "from langgraph.types import Command\n",
    "from langgraph.checkpoint.memory import MemorySaver\n",
    "from open_deep_research.graph import builder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a memory-based checkpointer and compile the graph\n",
    "# This enables state persistence and tracking throughout the workflow execution\n",
    "\n",
    "memory = MemorySaver()\n",
    "graph = builder.compile(checkpointer=memory)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize the graph structure\n",
    "# This shows the nodes and edges in the research workflow\n",
    "\n",
    "display(Image(graph.get_graph(xray=1).draw_mermaid_png()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function to set environment variables for API keys\n",
    "# This ensures all necessary credentials are available for various services\n",
    "\n",
    "def _set_env(var: str):\n",
    "    if not os.environ.get(var):\n",
    "        os.environ[var] = getpass.getpass(f\"{var}: \")\n",
    "\n",
    "# Set the API keys used for any model or search tool selections below, such as:\n",
    "_set_env(\"OPENAI_API_KEY\")\n",
    "_set_env(\"ANTHROPIC_API_KEY\")\n",
    "_set_env(\"TAVILY_API_KEY\")\n",
    "_set_env(\"GROQ_API_KEY\")\n",
    "_set_env(\"PERPLEXITY_API_KEY\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define report structure template and configure the research workflow\n",
    "# This sets parameters for models, search tools, and report organization\n",
    "\n",
    "REPORT_STRUCTURE = \"\"\"Use this structure to create a report on the user-provided topic:\n",
    "\n",
    "1. Introduction (no research needed)\n",
    "   - Brief overview of the topic area\n",
    "\n",
    "2. Main Body Sections:\n",
    "   - Each section should focus on a sub-topic of the user-provided topic\n",
    "   \n",
    "3. Conclusion\n",
    "   - Aim for 1 structural element (either a list of table) that distills the main body sections \n",
    "   - Provide a concise summary of the report\"\"\"\n",
    "\n",
    "# Configuration option 1: Claude 3.7 Sonnet for planning with perplexity search\n",
    "thread = {\"configurable\": {\"thread_id\": str(uuid.uuid4()),\n",
    "                           \"search_api\": \"perplexity\",\n",
    "                           \"planner_provider\": \"anthropic\",\n",
    "                           \"planner_model\": \"claude-3-7-sonnet-latest\",\n",
    "                           # \"planner_model_kwargs\": {\"temperature\":0.8}, # if set custom parameters\n",
    "                           \"writer_provider\": \"anthropic\",\n",
    "                           \"writer_model\": \"claude-3-5-sonnet-latest\",\n",
    "                           # \"writer_model_kwargs\": {\"temperature\":0.8}, # if set custom parameters\n",
    "                           \"max_search_depth\": 2,\n",
    "                           \"report_structure\": REPORT_STRUCTURE,\n",
    "                           }}\n",
    "\n",
    "# Configuration option 2: DeepSeek-R1-Distill-Llama-70B for planning and llama-3.3-70b-versatile for writing\n",
    "thread = {\"configurable\": {\"thread_id\": str(uuid.uuid4()),\n",
    "                           \"search_api\": \"tavily\",\n",
    "                           \"planner_provider\": \"groq\",\n",
    "                           \"planner_model\": \"deepseek-r1-distill-llama-70b\",\n",
    "                           \"writer_provider\": \"groq\",\n",
    "                           \"writer_model\": \"llama-3.3-70b-versatile\",\n",
    "                           \"report_structure\": REPORT_STRUCTURE,\n",
    "                           \"max_search_depth\": 1,}\n",
    "                           }\n",
    "\n",
    "# Configuration option 3: Use OpenAI o3 for both planning and writing (selected option)\n",
    "thread = {\"configurable\": {\"thread_id\": str(uuid.uuid4()),\n",
    "                           \"search_api\": \"tavily\",\n",
    "                           \"planner_provider\": \"openai\",\n",
    "                           \"planner_model\": \"o3\",\n",
    "                           \"writer_provider\": \"openai\",\n",
    "                           \"writer_model\": \"o3\",\n",
    "                           \"max_search_depth\": 2,\n",
    "                           \"report_structure\": REPORT_STRUCTURE,\n",
    "                           }}\n",
    "\n",
    "# Define research topic about Model Context Protocol\n",
    "topic = \"Overview of Model Context Protocol (MCP), an Anthropic‑backed open standard for integrating external context and tools with LLMs. Give an architectural overview for developers, tell me about interesting MCP servers, and compare to google Agent2Agent (A2A) protocol.\"\n",
    "\n",
    "# Run the graph workflow until first interruption (waiting for user feedback)\n",
    "async for event in graph.astream({\"topic\":topic,}, thread, stream_mode=\"updates\"):\n",
    "    if '__interrupt__' in event:\n",
    "        interrupt_value = event['__interrupt__'][0].value\n",
    "        display(Markdown(interrupt_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# User Feedback Phase\n",
    "\n",
    "* This allows for providing directed feedback on the initial report plan\n",
    "* The user can review the proposed report structure and provide specific guidance\n",
    "* The system will incorporate this feedback into the final report plan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Submit feedback on the report plan\n",
    "# The system will continue execution with the updated requirements\n",
    "\n",
    "# Provide specific feedback to focus and refine the report structure\n",
    "async for event in graph.astream(Command(resume=\"Looks great! Just do one section related to Agent2Agent (A2A) protocol, introducing it and comparing to MCP.\"), thread, stream_mode=\"updates\"):\n",
    "    if '__interrupt__' in event:\n",
    "        interrupt_value = event['__interrupt__'][0].value\n",
    "        display(Markdown(interrupt_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Final Approval Phase\n",
    "* After incorporating feedback, approve the plan to start content generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Approve the final plan and execute the report generation\n",
    "# This triggers the research and writing phases for all sections\n",
    "\n",
    "# The system will now:\n",
    "# 1. Research each section topic\n",
    "# 2. Generate content with citations\n",
    "# 3. Create introduction and conclusion\n",
    "# 4. Compile the final report\n",
    "\n",
    "async for event in graph.astream(Command(resume=True), thread, stream_mode=\"updates\"):\n",
    "    print(event)\n",
    "    print(\"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "# Introduction  \n",
       "Large language models excel at reasoning, but without structured access to the outside world they remain isolated. The Model Context Protocol (MCP) bridges this gap, defining an open, vendor‑neutral way for models to tap files, databases, APIs, and other tools through simple JSON‑RPC exchanges. This report walks developers through the protocol’s architecture, surveys real‑world MCP servers that showcase its flexibility, and contrasts MCP with Google’s emerging Agent‑to‑Agent (A2A) standard. By the end, you should know when, why, and how to weave MCP into your own agentic systems.\n",
       "\n",
       "## MCP Architectural Overview for Developers\n",
       "\n",
       "MCP uses a client‑host‑server model: a host process spawns isolated clients, and every client keeps a 1‑to‑1, stateful session with a single server that exposes prompts, resources, and tools through JSON‑RPC 2.0 messages [1][5].  \n",
       "\n",
       "A session passes through three phases — initialize, operation, shutdown. The client begins with an initialize request that lists its protocolVersion and capabilities; the server replies with a compatible version and its own capabilities. After the client’s initialized notification, both sides may exchange requests, responses, or one‑way notifications under the agreed capabilities [2].  \n",
       "\n",
       "Two official transports exist. Stdio is ideal for local child processes, while HTTP (SSE/“streamable HTTP”) supports multi‑client, remote scenarios. Both must preserve JSON‑RPC framing, and servers should validate Origin headers, bind to localhost where possible, and apply TLS or authentication to block DNS‑rebind or similar attacks [1][3].  \n",
       "\n",
       "To integrate MCP, developers can:  \n",
       "1) implement a server that registers needed primitives and advertises them in initialize.result.capabilities;  \n",
       "2) validate all inputs and set reasonable timeouts;  \n",
       "3) or consume existing servers via SDKs—select a transport, send initialize, then invoke or subscribe to tools/resources exactly as negotiated [4][5].  \n",
       "\n",
       "### Sources  \n",
       "[1] MCP Protocol Specification: https://www.claudemcp.com/specification  \n",
       "[2] Lifecycle – Model Context Protocol: https://modelcontextprotocol.info/specification/draft/basic/lifecycle/  \n",
       "[3] Transports – Model Context Protocol: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports  \n",
       "[4] Core Architecture – Model Context Protocol: https://modelcontextprotocol.io/docs/concepts/architecture  \n",
       "[5] Architecture – Model Context Protocol Specification: https://spec.modelcontextprotocol.io/specification/2025-03-26/architecture/\n",
       "\n",
       "## Ecosystem Spotlight: Notable MCP Servers\n",
       "\n",
       "Hundreds of MCP servers now exist, spanning core data access, commercial platforms, and hobby projects—proof that the protocol can wrap almost any tool or API [1][2].\n",
       "\n",
       "Reference servers maintained by Anthropic demonstrate the basics.  Filesystem, PostgreSQL, Git, and Slack servers cover file I/O, SQL queries, repository ops, and chat workflows.  Developers can launch them in seconds with commands like  \n",
       "`npx -y @modelcontextprotocol/server-filesystem` (TypeScript) or `uvx mcp-server-git` (Python) and then point any MCP‑aware client, such as Claude Desktop, at the spawned process [1].\n",
       "\n",
       "Platform vendors are adding “first‑party” connectors.  Microsoft cites the GitHub MCP Server and a Playwright browser‑automation server as popular examples that let C# or .NET apps drive code reviews or end‑to‑end tests through a uniform interface [3].  Other partner servers—e.g., Cloudflare for edge resources or Stripe for payments—expose full product APIs while still enforcing user approval through MCP’s tool‑calling flow [2].\n",
       "\n",
       "Community builders rapidly fill remaining gaps.  Docker and Kubernetes servers give agents controlled shell access; Snowflake, Neon, and Qdrant handle cloud databases; Todoist and Obsidian servers tackle personal productivity.  Because every server follows the same JSON‑RPC schema and ships as a small CLI, developers can fork an existing TypeScript or Python implementation and swap in their own SDK calls to create new connectors in hours, not weeks [2].  \n",
       "\n",
       "### Sources  \n",
       "[1] Example Servers – Model Context Protocol: https://modelcontextprotocol.io/examples  \n",
       "[2] Model Context Protocol Servers Repository: https://github.com/madhukarkumar/anthropic-mcp-servers  \n",
       "[3] Microsoft partners with Anthropic to create official C# SDK for Model Context Protocol: https://devblogs.microsoft.com/blog/microsoft-partners-with-anthropic-to-create-official-c-sdk-for-model-context-protocol\n",
       "\n",
       "## Agent‑to‑Agent (A2A) Protocol and Comparison with MCP  \n",
       "\n",
       "Google’s Agent‑to‑Agent (A2A) protocol, announced in April 2025, gives autonomous agents a common way to talk directly across vendors and clouds [2]. Its goal is to let one “client” agent delegate work to a “remote” agent without sharing internal code or memory, enabling true multi‑agent systems.  \n",
       "\n",
       "Discovery starts with a JSON Agent Card served at /.well‑known/agent.json, which lists version, skills and endpoints [3]. After discovery, the client opens a Task—an atomic unit that moves through states and exchanges Messages and multimodal Artifacts. HTTP request/response, Server‑Sent Events, or push notifications are chosen based on task length to stream progress safely [2].  \n",
       "\n",
       "Anthropic’s Model Context Protocol (MCP) tackles a different layer: it links a single language model to external tools and data through a Host‑Client‑Server triad, exposing Resources, Tools and Prompts over JSON‑RPC [1]. Communication is model‑to‑tool, not agent‑to‑agent.  \n",
       "\n",
       "Google therefore calls A2A “complementary” to MCP: use MCP to give each agent the data and actions it needs; use A2A to let those empowered agents discover one another, coordinate plans and exchange results [1]. In practice, developers might pipe an A2A task that, mid‑flow, invokes an MCP tool or serve an MCP connector as an A2A remote agent, showing the standards can interlock instead of compete.  \n",
       "\n",
       "### Sources  \n",
       "[1] MCP vs A2A: Comprehensive Comparison of AI Agent Protocols: https://www.toolworthy.ai/blog/mcp-vs-a2a-protocol-comparison  \n",
       "[2] Google A2A vs MCP: The New Protocol Standard Developers Need to Know: https://www.trickle.so/blog/google-a2a-vs-mcp  \n",
       "[3] A2A vs MCP: Comparing AI Standards for Agent Interoperability: https://www.ikangai.com/a2a-vs-mcp-ai-standards/\n",
       "\n",
       "## Conclusion\n",
       "\n",
       "Model Context Protocol (MCP) secures a model’s immediate tool belt, while Google’s Agent‑to‑Agent (A2A) protocol enables those empowered agents to find and hire one another. Their scopes differ but interlock, giving developers a layered recipe for robust, multi‑agent applications.\n",
       "\n",
       "| Aspect | MCP | A2A |\n",
       "| --- | --- | --- |\n",
       "| Layer | Model‑to‑tool RPC | Agent‑to‑agent orchestration |\n",
       "| Session start | `initialize` handshake | Task creation lifecycle |\n",
       "| Discovery | Client‑supplied server URI | `/.well‑known/agent.json` card |\n",
       "| Streaming | Stdio or HTTP/SSE | HTTP, SSE, or push |\n",
       "| Best fit | Embed filesystems, DBs, SaaS APIs into one agent | Delegate subtasks across clouds or vendors |\n",
       "\n",
       "Next steps: prototype an A2A task that internally calls an MCP PostgreSQL server; harden both layers with TLS and capability scoping; finally, contribute a new open‑source MCP connector to accelerate community adoption."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the final generated report\n",
    "# Retrieve the completed report from the graph's state and format it for display\n",
    "\n",
    "final_state = graph.get_state(thread)\n",
    "report = final_state.values.get('final_report')\n",
    "Markdown(report)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Trace: \n",
    "\n",
    "> Note: uses 80k tokens \n",
    "\n",
    "https://smith.langchain.com/public/31eca7c9-beae-42a3-bef4-5bce9488d7be/r"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "open-deep-research-env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: src/legacy/graph.py
================================================
from typing import Literal

from langchain.chat_models import init_chat_model
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.runnables import RunnableConfig

from langgraph.constants import Send
from langgraph.graph import START, END, StateGraph
from langgraph.types import interrupt, Command

from legacy.state import (
    ReportStateInput,
    ReportStateOutput,
    Sections,
    ReportState,
    SectionState,
    SectionOutputState,
    Queries,
    Feedback
)

from legacy.prompts import (
    report_planner_query_writer_instructions,
    report_planner_instructions,
    query_writer_instructions, 
    section_writer_instructions,
    final_section_writer_instructions,
    section_grader_instructions,
    section_writer_inputs
)

from legacy.configuration import Configuration
from legacy.utils import (
    format_sections, 
    get_config_value, 
    get_search_params, 
    select_and_execute_search,
    get_today_str
)

## Nodes -- 

async def generate_report_plan(state: ReportState, config: RunnableConfig):
    """Generate the initial report plan with sections.
    
    This node:
    1. Gets configuration for the report structure and search parameters
    2. Generates search queries to gather context for planning
    3. Performs web searches using those queries
    4. Uses an LLM to generate a structured plan with sections
    
    Args:
        state: Current graph state containing the report topic
        config: Configuration for models, search APIs, etc.
        
    Returns:
        Dict containing the generated sections
    """

    # Inputs
    topic = state["topic"]

    # Get list of feedback on the report plan
    feedback_list = state.get("feedback_on_report_plan", [])

    # Concatenate feedback on the report plan into a single string
    feedback = " /// ".join(feedback_list) if feedback_list else ""

    # Get configuration
    configurable = Configuration.from_runnable_config(config)
    report_structure = configurable.report_structure
    number_of_queries = configurable.number_of_queries
    search_api = get_config_value(configurable.search_api)
    search_api_config = configurable.search_api_config or {}  # Get the config dict, default to empty
    params_to_pass = get_search_params(search_api, search_api_config)  # Filter parameters

    # Convert JSON object to string if necessary
    if isinstance(report_structure, dict):
        report_structure = str(report_structure)

    # Set writer model (model used for query writing)
    writer_provider = get_config_value(configurable.writer_provider)
    writer_model_name = get_config_value(configurable.writer_model)
    writer_model_kwargs = get_config_value(configurable.writer_model_kwargs or {})
    writer_model = init_chat_model(model=writer_model_name, model_provider=writer_provider, model_kwargs=writer_model_kwargs) 
    structured_llm = writer_model.with_structured_output(Queries)

    # Format system instructions
    system_instructions_query = report_planner_query_writer_instructions.format(
        topic=topic,
        report_organization=report_structure,
        number_of_queries=number_of_queries,
        today=get_today_str()
    )

    # Generate queries  
    results = await structured_llm.ainvoke([SystemMessage(content=system_instructions_query),
                                     HumanMessage(content="Generate search queries that will help with planning the sections of the report.")])

    # Web search
    query_list = [query.search_query for query in results.queries]

    # Search the web with parameters
    source_str = await select_and_execute_search(search_api, query_list, params_to_pass)

    # Format system instructions
    system_instructions_sections = report_planner_instructions.format(topic=topic, report_organization=report_structure, context=source_str, feedback=feedback)

    # Set the planner
    planner_provider = get_config_value(configurable.planner_provider)
    planner_model = get_config_value(configurable.planner_model)
    planner_model_kwargs = get_config_value(configurable.planner_model_kwargs or {})

    # Report planner instructions
    planner_message = """Generate the sections of the report. Your response must include a 'sections' field containing a list of sections. 
                        Each section must have: name, description, research, and content fields."""

    # Run the planner
    if planner_model == "claude-3-7-sonnet-latest":
        # Allocate a thinking budget for claude-3-7-sonnet-latest as the planner model
        planner_llm = init_chat_model(model=planner_model, 
                                      model_provider=planner_provider, 
                                      max_tokens=20_000, 
                                      thinking={"type": "enabled", "budget_tokens": 16_000})

    else:
        # With other models, thinking tokens are not specifically allocated
        planner_llm = init_chat_model(model=planner_model, 
                                      model_provider=planner_provider,
                                      model_kwargs=planner_model_kwargs)
    
    # Generate the report sections
    structured_llm = planner_llm.with_structured_output(Sections)
    report_sections = await structured_llm.ainvoke([SystemMessage(content=system_instructions_sections),
                                             HumanMessage(content=planner_message)])

    # Get sections
    sections = report_sections.sections

    return {"sections": sections}

def human_feedback(state: ReportState, config: RunnableConfig) -> Command[Literal["generate_report_plan","build_section_with_web_research"]]:
    """Get human feedback on the report plan and route to next steps.
    
    This node:
    1. Formats the current report plan for human review
    2. Gets feedback via an interrupt
    3. Routes to either:
       - Section writing if plan is approved
       - Plan regeneration if feedback is provided
    
    Args:
        state: Current graph state with sections to review
        config: Configuration for the workflow
        
    Returns:
        Command to either regenerate plan or start section writing
    """

    # Get sections
    topic = state["topic"]
    sections = state['sections']
    sections_str = "\n\n".join(
        f"Section: {section.name}\n"
        f"Description: {section.description}\n"
        f"Research needed: {'Yes' if section.research else 'No'}\n"
        for section in sections
    )

    # Get feedback on the report plan from interrupt
    interrupt_message = f"""Please provide feedback on the following report plan. 
                        \n\n{sections_str}\n
                        \nDoes the report plan meet your needs?\nPass 'true' to approve the report plan.\nOr, provide feedback to regenerate the report plan:"""
    
    feedback = interrupt(interrupt_message)

    # If the user approves the report plan, kick off section writing
    if isinstance(feedback, bool) and feedback is True:
        # Treat this as approve and kick off section writing
        return Command(goto=[
            Send("build_section_with_web_research", {"topic": topic, "section": s, "search_iterations": 0}) 
            for s in sections 
            if s.research
        ])
    
    # If the user provides feedback, regenerate the report plan 
    elif isinstance(feedback, str):
        # Treat this as feedback and append it to the existing list
        return Command(goto="generate_report_plan", 
                       update={"feedback_on_report_plan": [feedback]})
    else:
        raise TypeError(f"Interrupt value of type {type(feedback)} is not supported.")
    
async def generate_queries(state: SectionState, config: RunnableConfig):
    """Generate search queries for researching a specific section.
    
    This node uses an LLM to generate targeted search queries based on the 
    section topic and description.
    
    Args:
        state: Current state containing section details
        config: Configuration including number of queries to generate
        
    Returns:
        Dict containing the generated search queries
    """

    # Get state 
    topic = state["topic"]
    section = state["section"]

    # Get configuration
    configurable = Configuration.from_runnable_config(config)
    number_of_queries = configurable.number_of_queries

    # Generate queries 
    writer_provider = get_config_value(configurable.writer_provider)
    writer_model_name = get_config_value(configurable.writer_model)
    writer_model_kwargs = get_config_value(configurable.writer_model_kwargs or {})
    writer_model = init_chat_model(model=writer_model_name, model_provider=writer_provider, model_kwargs=writer_model_kwargs) 
    structured_llm = writer_model.with_structured_output(Queries)

    # Format system instructions
    system_instructions = query_writer_instructions.format(topic=topic, 
                                                           section_topic=section.description, 
                                                           number_of_queries=number_of_queries,
                                                           today=get_today_str())

    # Generate queries  
    queries = await structured_llm.ainvoke([SystemMessage(content=system_instructions),
                                     HumanMessage(content="Generate search queries on the provided topic.")])

    return {"search_queries": queries.queries}

async def search_web(state: SectionState, config: RunnableConfig):
    """Execute web searches for the section queries.
    
    This node:
    1. Takes the generated queries
    2. Executes searches using configured search API
    3. Formats results into usable context
    
    Args:
        state: Current state with search queries
        config: Search API configuration
        
    Returns:
        Dict with search results and updated iteration count
    """

    # Get state
    search_queries = state["search_queries"]

    # Get configuration
    configurable = Configuration.from_runnable_config(config)
    search_api = get_config_value(configurable.search_api)
    search_api_config = configurable.search_api_config or {}  # Get the config dict, default to empty
    params_to_pass = get_search_params(search_api, search_api_config)  # Filter parameters

    # Web search
    query_list = [query.search_query for query in search_queries]

    # Search the web with parameters
    source_str = await select_and_execute_search(search_api, query_list, params_to_pass)

    return {"source_str": source_str, "search_iterations": state["search_iterations"] + 1}

async def write_section(state: SectionState, config: RunnableConfig) -> Command[Literal[END, "search_web"]]:
    """Write a section of the report and evaluate if more research is needed.
    
    This node:
    1. Writes section content using search results
    2. Evaluates the quality of the section
    3. Either:
       - Completes the section if quality passes
       - Triggers more research if quality fails
    
    Args:
        state: Current state with search results and section info
        config: Configuration for writing and evaluation
        
    Returns:
        Command to either complete section or do more research
    """

    # Get state 
    topic = state["topic"]
    section = state["section"]
    source_str = state["source_str"]

    # Get configuration
    configurable = Configuration.from_runnable_config(config)

    # Format system instructions
    section_writer_inputs_formatted = section_writer_inputs.format(topic=topic, 
                                                             section_name=section.name, 
                                                             section_topic=section.description, 
                                                             context=source_str, 
                                                             section_content=section.content)

    # Generate section  
    writer_provider = get_config_value(configurable.writer_provider)
    writer_model_name = get_config_value(configurable.writer_model)
    writer_model_kwargs = get_config_value(configurable.writer_model_kwargs or {})
    writer_model = init_chat_model(model=writer_model_name, model_provider=writer_provider, model_kwargs=writer_model_kwargs) 

    section_content = await writer_model.ainvoke([SystemMessage(content=section_writer_instructions),
                                           HumanMessage(content=section_writer_inputs_formatted)])
    
    # Write content to the section object  
    section.content = section_content.content

    # Grade prompt 
    section_grader_message = ("Grade the report and consider follow-up questions for missing information. "
                              "If the grade is 'pass', return empty strings for all follow-up queries. "
                              "If the grade is 'fail', provide specific search queries to gather missing information.")
    
    section_grader_instructions_formatted = section_grader_instructions.format(topic=topic, 
                                                                               section_topic=section.description,
                                                                               section=section.content, 
                                                                               number_of_follow_up_queries=configurable.number_of_queries)

    # Use planner model for reflection
    planner_provider = get_config_value(configurable.planner_provider)
    planner_model = get_config_value(configurable.planner_model)
    planner_model_kwargs = get_config_value(configurable.planner_model_kwargs or {})

    if planner_model == "claude-3-7-sonnet-latest":
        # Allocate a thinking budget for claude-3-7-sonnet-latest as the planner model
        reflection_model = init_chat_model(model=planner_model, 
                                           model_provider=planner_provider, 
                                           max_tokens=20_000, 
                                           thinking={"type": "enabled", "budget_tokens": 16_000}).with_structured_output(Feedback)
    else:
        reflection_model = init_chat_model(model=planner_model, 
                                           model_provider=planner_provider, model_kwargs=planner_model_kwargs).with_structured_output(Feedback)
    # Generate feedback
    feedback = await reflection_model.ainvoke([SystemMessage(content=section_grader_instructions_formatted),
                                        HumanMessage(content=section_grader_message)])

    # If the section is passing or the max search depth is reached, publish the section to completed sections 
    if feedback.grade == "pass" or state["search_iterations"] >= configurable.max_search_depth:
        # Publish the section to completed sections 
        update = {"completed_sections": [section]}
        if configurable.include_source_str:
            update["source_str"] = source_str
        return Command(update=update, goto=END)

    # Update the existing section with new content and update search queries
    else:
        return Command(
            update={"search_queries": feedback.follow_up_queries, "section": section},
            goto="search_web"
        )
    
async def write_final_sections(state: SectionState, config: RunnableConfig):
    """Write sections that don't require research using completed sections as context.
    
    This node handles sections like conclusions or summaries that build on
    the researched sections rather than requiring direct research.
    
    Args:
        state: Current state with completed sections as context
        config: Configuration for the writing model
        
    Returns:
        Dict containing the newly written section
    """

    # Get configuration
    configurable = Configuration.from_runnable_config(config)

    # Get state 
    topic = state["topic"]
    section = state["section"]
    completed_report_sections = state["report_sections_from_research"]
    
    # Format system instructions
    system_instructions = final_section_writer_instructions.format(topic=topic, section_name=section.name, section_topic=section.description, context=completed_report_sections)

    # Generate section  
    writer_provider = get_config_value(configurable.writer_provider)
    writer_model_name = get_config_value(configurable.writer_model)
    writer_model_kwargs = get_config_value(configurable.writer_model_kwargs or {})
    writer_model = init_chat_model(model=writer_model_name, model_provider=writer_provider, model_kwargs=writer_model_kwargs) 
    
    section_content = await writer_model.ainvoke([SystemMessage(content=system_instructions),
                                           HumanMessage(content="Generate a report section based on the provided sources.")])
    
    # Write content to section 
    section.content = section_content.content

    # Write the updated section to completed sections
    return {"completed_sections": [section]}

def gather_completed_sections(state: ReportState):
    """Format completed sections as context for writing final sections.
    
    This node takes all completed research sections and formats them into
    a single context string for writing summary sections.
    
    Args:
        state: Current state with completed sections
        
    Returns:
        Dict with formatted sections as context
    """

    # List of completed sections
    completed_sections = state["completed_sections"]

    # Format completed section to str to use as context for final sections
    completed_report_sections = format_sections(completed_sections)

    return {"report_sections_from_research": completed_report_sections}

def compile_final_report(state: ReportState, config: RunnableConfig):
    """Compile all sections into the final report.
    
    This node:
    1. Gets all completed sections
    2. Orders them according to original plan
    3. Combines them into the final report
    
    Args:
        state: Current state with all completed sections
        
    Returns:
        Dict containing the complete report
    """

    # Get configuration
    configurable = Configuration.from_runnable_config(config)

    # Get sections
    sections = state["sections"]
    completed_sections = {s.name: s.content for s in state["completed_sections"]}

    # Update sections with completed content while maintaining original order
    for section in sections:
        section.content = completed_sections[section.name]

    # Compile final report
    all_sections = "\n\n".join([s.content for s in sections])

    if configurable.include_source_str:
        return {"final_report": all_sections, "source_str": state["source_str"]}
    else:
        return {"final_report": all_sections}

def initiate_final_section_writing(state: ReportState):
    """Create parallel tasks for writing non-research sections.
    
    This edge function identifies sections that don't need research and
    creates parallel writing tasks for each one.
    
    Args:
        state: Current state with all sections and research context
        
    Returns:
        List of Send commands for parallel section writing
    """

    # Kick off section writing in parallel via Send() API for any sections that do not require research
    return [
        Send("write_final_sections", {"topic": state["topic"], "section": s, "report_sections_from_research": state["report_sections_from_research"]}) 
        for s in state["sections"] 
        if not s.research
    ]

# Report section sub-graph -- 

# Add nodes 
section_builder = StateGraph(SectionState, output=SectionOutputState)
section_builder.add_node("generate_queries", generate_queries)
section_builder.add_node("search_web", search_web)
section_builder.add_node("write_section", write_section)

# Add edges
section_builder.add_edge(START, "generate_queries")
section_builder.add_edge("generate_queries", "search_web")
section_builder.add_edge("search_web", "write_section")

# Outer graph for initial report plan compiling results from each section -- 

# Add nodes
builder = StateGraph(ReportState, input=ReportStateInput, output=ReportStateOutput, config_schema=Configuration)
builder.add_node("generate_report_plan", generate_report_plan)
builder.add_node("human_feedback", human_feedback)
builder.add_node("build_section_with_web_research", section_builder.compile())
builder.add_node("gather_completed_sections", gather_completed_sections)
builder.add_node("write_final_sections", write_final_sections)
builder.add_node("compile_final_report", compile_final_report)

# Add edges
builder.add_edge(START, "generate_report_plan")
builder.add_edge("generate_report_plan", "human_feedback")
builder.add_edge("build_section_with_web_research", "gather_completed_sections")
builder.add_conditional_edges("gather_completed_sections", initiate_final_section_writing, ["write_final_sections"])
builder.add_edge("write_final_sections", "compile_final_report")
builder.add_edge("compile_final_report", END)

graph = builder.compile()


================================================
FILE: src/legacy/legacy.md
================================================
# Open Deep Research

Open Deep Research is an experimental, fully open-source research assistant that automates deep research and produces comprehensive reports on any topic. It features two implementations - a [workflow](https://langchain-ai.github.io/langgraph/tutorials/workflows/) and a multi-agent architecture. You can customize the entire research and writing process with specific models, prompts, report structure, and search tools.

#### Workflow

![open-deep-research-overview](https://github.com/user-attachments/assets/a171660d-b735-4587-ab2f-cd771f773756)

#### Multi-agent

![multi-agent-researcher](https://github.com/user-attachments/assets/3c734c3c-57aa-4bc0-85dd-74e2ec2c0880)


### 🚀 Quickstart

Clone the repository:
```bash
git clone https://github.com/langchain-ai/open_deep_research.git
cd open_deep_research
```

Then edit the `.env` file to customize the environment variables (for model selection, search tools, and other configuration settings):
```bash
cp .env.example .env
```

Launch the assistant with the LangGraph server locally, which will open in your browser:

#### Mac

```bash
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies and start the LangGraph server
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking
```

#### Windows / Linux

```powershell
# Install dependencies 
pip install -e .
pip install -U "langgraph-cli[inmem]" 

# Start the LangGraph server
langgraph dev
```

Use this to open the Studio UI:
```
- 🚀 API: http://127.0.0.1:2024
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
- 📚 API Docs: http://127.0.0.1:2024/docs
```

#### Multi-agent

(1) Chat with the agent about your topic of interest, and it will initiate report generation:

<img width="1326" alt="input" src="https://github.com/user-attachments/assets/dc8f59dd-14b3-4a62-ac18-d2f99c8bbe83" />

(2) The report is produced as markdown.

#### Workflow

(1) Provide a `Topic`:

<img width="1326" alt="input" src="https://github.com/user-attachments/assets/de264b1b-8ea5-4090-8e72-e1ef1230262f" />

(2) This will generate a report plan and present it to the user for review.

(3) We can pass a string (`"..."`) with feedback to regenerate the plan based on the feedback.

<img width="1326" alt="feedback" src="https://github.com/user-attachments/assets/c308e888-4642-4c74-bc78-76576a2da919" />

(4) Or, we can just pass `true` to the JSON input box in Studio accept the plan.

<img width="1480" alt="accept" src="https://github.com/user-attachments/assets/ddeeb33b-fdce-494f-af8b-bd2acc1cef06" />

(5) Once accepted, the report sections will be generated.

<img width="1326" alt="report_gen" src="https://github.com/user-attachments/assets/74ff01cc-e7ed-47b8-bd0c-4ef615253c46" />

The report is produced as markdown.

<img width="1326" alt="report" src="https://github.com/user-attachments/assets/92d9f7b7-3aea-4025-be99-7fb0d4b47289" />

### Search Tools

Available search tools:

* [Tavily API](https://tavily.com/) - General web search
* [Perplexity API](https://www.perplexity.ai/hub/blog/introducing-the-sonar-pro-api) - General web search
* [Exa API](https://exa.ai/) - Powerful neural search for web content
* [ArXiv](https://arxiv.org/) - Academic papers in physics, mathematics, computer science, and more
* [PubMed](https://pubmed.ncbi.nlm.nih.gov/) - Biomedical literature from MEDLINE, life science journals, and online books
* [Linkup API](https://www.linkup.so/) - General web search
* [DuckDuckGo API](https://duckduckgo.com/) - General web search
* [Google Search API/Scrapper](https://google.com/) - Create custom search engine [here](https://programmablesearchengine.google.com/controlpanel/all) and get API key [here](https://developers.google.com/custom-search/v1/introduction)
* [Microsoft Azure AI Search](https://azure.microsoft.com/en-us/products/ai-services/ai-search) - Cloud based vector database solution 

Open Deep Research is compatible with many different LLMs: 

* You can select any model that is integrated [with the `init_chat_model()` API](https://python.langchain.com/docs/how_to/chat_models_universal_init/)
* See full list of supported integrations [here](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html)

### Using the package

```bash
pip install open-deep-research
```

See [src/legacy/graph.ipynb](src/legacy/graph.ipynb) and [src/legacy/multi_agent.ipynb](src/legacy/multi_agent.ipynb) for example usage in a Jupyter notebook:

## Open Deep Research Implementations

Open Deep Research features three distinct implementation approaches, each with its own strengths:

## 1. Graph-based Workflow Implementation (`src/legacy/graph.py`)

The graph-based implementation follows a structured plan-and-execute workflow:

- **Planning Phase**: Uses a planner model to analyze the topic and generate a structured report plan
- **Human-in-the-Loop**: Allows for human feedback and approval of the report plan before proceeding
- **Sequential Research Process**: Creates sections one by one with reflection between search iterations
- **Section-Specific Research**: Each section has dedicated search queries and content retrieval
- **Supports Multiple Search Tools**: Works with all search providers (Tavily, Perplexity, Exa, ArXiv, PubMed, Linkup, etc.)

This implementation provides a more interactive experience with greater control over the report structure, making it ideal for situations where report quality and accuracy are critical.

You can customize the research assistant workflow through several parameters:

- `report_structure`: Define a custom structure for your report (defaults to a standard research report format)
- `number_of_queries`: Number of search queries to generate per section (default: 2)
- `max_search_depth`: Maximum number of reflection and search iterations (default: 2)
- `planner_provider`: Model provider for planning phase (default: "anthropic", but can be any provider from supported integrations with `init_chat_model` as listed [here](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html))
- `planner_model`: Specific model for planning (default: "claude-3-7-sonnet-latest")
- `planner_model_kwargs`: Additional parameter for planner_model
- `writer_provider`: Model provider for writing phase (default: "anthropic", but can be any provider from supported integrations with `init_chat_model` as listed [here](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html))
- `writer_model`: Model for writing the report (default: "claude-3-5-sonnet-latest")
- `writer_model_kwargs`: Additional parameter for writer_model
- `search_api`: API to use for web searches (default: "tavily", options include "perplexity", "exa", "arxiv", "pubmed", "linkup")

## 2. Multi-Agent Implementation (`src/legacy/multi_agent.py`)

The multi-agent implementation uses a supervisor-researcher architecture:

- **Supervisor Agent**: Manages the overall research process, plans sections, and assembles the final report
- **Researcher Agents**: Multiple independent agents work in parallel, each responsible for researching and writing a specific section
- **Parallel Processing**: All sections are researched simultaneously, significantly reducing report generation time
- **Specialized Tool Design**: Each agent has access to specific tools for its role (search for researchers, section planning for supervisors)
- **Search and MCP Support**: Works with Tavily/DuckDuckGo for web search, MCP servers for local/external data access, or can operate without search tools using only MCP tools

This implementation focuses on efficiency and parallelization, making it ideal for faster report generation with less direct user involvement.

You can customize the multi-agent implementation through several parameters:

- `supervisor_model`: Model for the supervisor agent (default: "anthropic:claude-3-5-sonnet-latest")
- `researcher_model`: Model for researcher agents (default: "anthropic:claude-3-5-sonnet-latest") 
- `number_of_queries`: Number of search queries to generate per section (default: 2)
- `search_api`: API to use for web searches (default: "tavily", options include "duckduckgo", "none")
- `ask_for_clarification`: Whether the supervisor should ask clarifying questions before research (default: false) - **Important**: Set to `true` to enable the Question tool for the supervisor agent
- `mcp_server_config`: Configuration for MCP servers (optional)
- `mcp_prompt`: Additional instructions for using MCP tools (optional)
- `mcp_tools_to_include`: Specific MCP tools to include (optional)

## MCP (Model Context Protocol) Support

The multi-agent implementation (`src/legacy/multi_agent.py`) supports MCP servers to extend research capabilities beyond web search. MCP tools are available to research agents alongside or instead of traditional search tools, enabling access to local files, databases, APIs, and other data sources.

**Note**: MCP support is currently only available in the multi-agent (`src/legacy/multi_agent.py`) implementation, not in the workflow-based workflow implementation (`src/legacy/graph.py`).

### Key Features

- **Tool Integration**: MCP tools are seamlessly integrated with existing search and section-writing tools
- **Research Agent Access**: Only research agents (not supervisors) have access to MCP tools
- **Flexible Configuration**: Use MCP tools alone or combined with web search
- **Disable Default Search**: Set `search_api: "none"` to disable web search tools entirely
- **Custom Prompts**: Add specific instructions for using MCP tools

### Filesystem Server Example

#### SKK

```python
config = {
    "configurable": {
        "search_api": "none",  # Use "tavily" or "duckduckgo" to combine with web search
        "mcp_server_config": {
            "filesystem": {
                "command": "npx",
                "args": [
                    "-y",
                    "@modelcontextprotocol/server-filesystem",
                    "/path/to/your/files"
                ],
                "transport": "stdio"
            }
        },
        "mcp_prompt": "Step 1: Use the `list_allowed_directories` tool to get the list of allowed directories. Step 2: Use the `read_file` tool to read files in the allowed directory.",
        "mcp_tools_to_include": ["list_allowed_directories", "list_directory", "read_file"]  # Optional: specify which tools to include
    }
}
```

#### Studio

MCP server config: 
```
{
  "filesystem": {
    "command": "npx",
    "args": [
      "-y",
      "@modelcontextprotocol/server-filesystem",
      "/Users/rlm/Desktop/Code/open_deep_research/src/legacy/files"
    ],
    "transport": "stdio"
  }
}
```

MCP prompt: 
```
CRITICAL: You MUST follow this EXACT sequence when using filesystem tools:

1. FIRST: Call `list_allowed_directories` tool to discover allowed directories
2. SECOND: Call `list_directory` tool on a specific directory from step 1 to see available files  
3. THIRD: Call `read_file` tool to read specific files found in step 2

DO NOT call `list_directory` or `read_file` until you have first called `list_allowed_directories`. You must discover the allowed directories before attempting to browse or read files.
```

MCP tools: 
```
list_allowed_directories
list_directory 
read_file
```

Example test topic and follow-up feedback that you can provide that will reference the included file: 

Topic:
```
I want an overview of vibe coding
```

Follow-up to the question asked by the research agent: 

```
I just want a single section report on vibe coding that highlights an interesting / fun example
```

Resulting trace: 

https://smith.langchain.com/public/d871311a-f288-4885-8f70-440ab557c3cf/r

### Configuration Options

- **`mcp_server_config`**: Dictionary defining MCP server configurations (see [langchain-mcp-adapters examples](https://github.com/langchain-ai/langchain-mcp-adapters#client-1))
- **`mcp_prompt`**: Optional instructions added to research agent prompts for using MCP tools
- **`mcp_tools_to_include`**: Optional list of specific MCP tool names to include (if not set, all tools from all servers are included)
- **`search_api`**: Set to `"none"` to use only MCP tools, or keep existing search APIs to combine both

### Common Use Cases

- **Local Documentation**: Access project documentation, code files, or knowledge bases
- **Database Queries**: Connect to databases for specific data retrieval
- **API Integration**: Access external APIs and services
- **File Analysis**: Read and analyze local files during research

The MCP integration allows research agents to incorporate local knowledge and external data sources into their research process, creating more comprehensive and context-aware reports.

## Search API Configuration

Not all search APIs support additional configuration parameters. Here are the ones that do:

- **Exa**: `max_characters`, `num_results`, `include_domains`, `exclude_domains`, `subpages`
  - Note: `include_domains` and `exclude_domains` cannot be used together
  - Particularly useful when you need to narrow your research to specific trusted sources, ensure information accuracy, or when your research requires using specified domains (e.g., academic journals, government sites)
  - Provides AI-generated summaries tailored to your specific query, making it easier to extract relevant information from search results
- **ArXiv**: `load_max_docs`, `get_full_documents`, `load_all_available_meta`
- **PubMed**: `top_k_results`, `email`, `api_key`, `doc_content_chars_max`
- **Linkup**: `depth`

Example with Exa configuration:
```python
thread = {"configurable": {"thread_id": str(uuid.uuid4()),
                           "search_api": "exa",
                           "search_api_config": {
                               "num_results": 5,
                               "include_domains": ["nature.com", "sciencedirect.com"]
                           },
                           # Other configuration...
                           }}
```

## Model Considerations

(1) You can use models supported with [the `init_chat_model()` API](https://python.langchain.com/docs/how_to/chat_models_universal_init/). See full list of supported integrations [here](https://python.langchain.com/api_reference/langchain/chat_models/langchain.chat_models.base.init_chat_model.html).

(2) ***The workflow planner and writer models need to support structured outputs***: Check whether structured outputs are supported by the model you are using [here](https://python.langchain.com/docs/integrations/chat/).

(3) ***The agent models need to support tool calling:*** Ensure tool calling is well supoorted; tests have been done with Claude 3.7, o3, o3-mini, and gpt4.1. See [here](https://smith.langchain.com/public/adc5d60c-97ee-4aa0-8b2c-c776fb0d7bd6/d).

(4) With Groq, there are token per minute (TPM) limits if you are on the `on_demand` service tier:
- The `on_demand` service tier has a limit of `6000 TPM`
- You will want a [paid plan](https://github.com/cline/cline/issues/47#issuecomment-2640992272) for section writing with Groq models

(5) `deepseek-R1` [is not strong at function calling](https://api-docs.deepseek.com/guides/reasoning_model), which the assistant uses to generate structured outputs for report sections and report section grading. See example traces [here](https://smith.langchain.com/public/07d53997-4a6d-4ea8-9a1f-064a85cd6072/r).  
- Consider providers that are strong at function calling such as OpenAI, Anthropic, and certain OSS models like Groq's `llama-3.3-70b-versatile`.
- If you see the following error, it is likely due to the model not being able to produce structured outputs (see [trace](https://smith.langchain.com/public/8a6da065-3b8b-4a92-8df7-5468da336cbe/r)):
```
groq.APIError: Failed to call a function. Please adjust your prompt. See 'failed_generation' for more details.
```

(6) Follow [here[(https://github.com/langchain-ai/open_deep_research/issues/75#issuecomment-2811472408) to use with OpenRouter.

(7) For working with local models via Ollama, see [here](https://github.com/langchain-ai/open_deep_research/issues/65#issuecomment-2743586318).

## Evaluation Systems

Open Deep Research includes two comprehensive evaluation systems to assess report quality and performance:

### 1. Pytest-based Evaluation System

A developer-friendly testing framework that provides immediate feedback during development and testing cycles.

#### **Features:**
- **Rich Console Output**: Formatted tables, progress indicators, and color-coded results
- **Binary Pass/Fail Testing**: Clear success/failure criteria for CI/CD integration
- **LangSmith Integration**: Automatic experiment tracking and logging
- **Flexible Configuration**: Extensive CLI options for different testing scenarios
- **Real-time Feedback**: Live output during test execution

#### **Evaluation Criteria:**
The system evaluates reports against 9 comprehensive quality dimensions:
- Topic relevance (overall and section-level)
- Structure and logical flow
- Introduction and conclusion quality
- Proper use of structural elements (headers, citations)
- Markdown formatting compliance
- Citation quality and source attribution
- Overall research depth and accuracy

#### **Usage:**
```bash
# Run all agents with default settings
python tests/run_test.py --all

# Test specific agent with custom models
python tests/run_test.py --agent multi_agent \
  --supervisor-model "anthropic:claude-3-7-sonnet-latest" \
  --search-api tavily

# Test with OpenAI o3 models
python tests/run_test.py --all \
  --supervisor-model "openai:o3" \
  --researcher-model "openai:o3" \
  --planner-provider "openai" \
  --planner-model "o3" \
  --writer-provider "openai" \
  --writer-model "o3" \
  --eval-model "openai:o3" \
  --search-api "tavily"
```

#### **Key Files:**
- `tests/run_test.py`: Main test runner with rich CLI interface
- `tests/test_report_quality.py`: Core test implementation
- `tests/conftest.py`: Pytest configuration and CLI options

### 2. LangSmith Evaluate API System

A comprehensive batch evaluation system designed for detailed analysis and comparative studies.

#### **Features:**
- **Multi-dimensional Scoring**: Four specialized evaluators with 1-5 scale ratings
- **Weighted Criteria**: Detailed scoring with customizable weights for different quality aspects
- **Dataset-driven Evaluation**: Batch processing across multiple test cases
- **Performance Optimization**: Caching with extended TTL for evaluator prompts
- **Professional Reporting**: Structured analysis with improvement recommendations

#### **Evaluation Dimensions:**

1. **Overall Quality** (7 weighted criteria):
   - Research depth and source quality (20%)
   - Analytical rigor and critical thinking (15%)
   - Structure and organization (20%)
   - Practical value and actionability (10%)
   - Balance and objectivity (15%)
   - Writing quality and clarity (10%)
   - Professional presentation (10%)

2. **Relevance**: Section-by-section topic relevance analysis with strict criteria

3. **Structure**: Assessment of logical flow, formatting, and citation practices

4. **Groundedness**: Evaluation of alignment with retrieved context and sources

#### **Usage:**
```bash
# Run comprehensive evaluation on LangSmith datasets
python tests/evals/run_evaluate.py
```

#### **Key Files:**
- `tests/evals/run_evaluate.py`: Main evaluation script
- `tests/evals/evaluators.py`: Four specialized evaluator functions
- `tests/evals/prompts.py`: Detailed evaluation prompts for each dimension
- `tests/evals/target.py`: Report generation workflows

### When to Use Each System

**Use Pytest System for:**
- Development and debugging cycles
- CI/CD pipeline integration
- Quick model comparison experiments
- Interactive testing with immediate feedback
- Gate-keeping before production deployments

**Use LangSmith System for:**
- Comprehensive model evaluation across datasets
- Research and analysis of system performance
- Detailed performance profiling and benchmarking
- Comparative studies between different configurations
- Production monitoring and quality assurance

Both evaluation systems complement each other and provide comprehensive coverage for different use cases and development stages.

## UX

### Local deployment

Follow the [quickstart](#-quickstart) to start LangGraph server locally.

### Hosted deployment
 
You can easily deploy to [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#deployment-options). 


================================================
FILE: src/legacy/multi_agent.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-Agent Researcher\n",
    "\n",
    "This notebook demonstrates the multi-agent research approach, which uses a supervisor-researcher collaborative pattern to create comprehensive reports. The system consists of:\n",
    "\n",
    "1. A **Supervisor Agent** that plans the overall report structure and coordinates work\n",
    "2. Multiple **Research Agents** that investigate specific topics in parallel\n",
    "3. A workflow that produces a structured report with introduction, body sections, and conclusion\n",
    "\n",
    "## From repo "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/rlm/Desktop/Code/open_deep_research/src\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/rlm/Desktop/Code/open_deep_research/open-deep-research-env/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
      "  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
     ]
    }
   ],
   "source": [
    "%cd ..\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install -U -q open-deep-research"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compile the multi-agent graph\n",
    "\n",
    "Next, we'll compile the LangGraph workflow for the multi-agent research approach. This step creates the orchestration layer that manages communication between the supervisor and research agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0.14\n"
     ]
    }
   ],
   "source": [
    "import uuid \n",
    "import os, getpass\n",
    "import open_deep_research   \n",
    "print(open_deep_research.__version__) \n",
    "from IPython.display import Image, display, Markdown\n",
    "from langgraph.checkpoint.memory import MemorySaver\n",
    "from open_deep_research.multi_agent import supervisor_builder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a MemorySaver for checkpointing the agent's state\n",
    "# This enables tracking and debugging of the multi-agent interaction\n",
    "checkpointer = MemorySaver()\n",
    "agent = supervisor_builder.compile(name=\"research_team\", checkpointer=checkpointer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize the graph structure\n",
    "# This shows how supervisor and research agents are connected in the workflow\n",
    "display(Image(agent.get_graph(xray=1).draw_mermaid_png(max_retries=5)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure and run the multi-agent system\n",
    "# This sets up the model configuration and executes the research workflow\n",
    "\n",
    "# Configure models and search API for both supervisor and researcher roles\n",
    "config = {\n",
    "    \"thread_id\": str(uuid.uuid4()),\n",
    "    \"search_api\": \"tavily\",\n",
    "    \"supervisor_model\": \"openai:o3\",\n",
    "    \"researcher_model\": \"openai:o3\",\n",
    "    }\n",
    "\n",
    "# Set up thread configuration with the specified parameters\n",
    "thread_config = {\"configurable\": config}\n",
    "\n",
    "# Define the research topic as a user message\n",
    "msg = [{\"role\": \"user\", \"content\": \"What is model context protocol?\"}]\n",
    "\n",
    "# Run the multi-agent workflow with the specified configuration\n",
    "response = await agent.ainvoke({\"messages\": msg}, config=thread_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
      "\n",
      "Here’s what I’ve gathered so far:\n",
      "\n",
      "• Model Context Protocol (MCP) is an open, client‑server standard created to let large‑language‑model assistants (Claude, Azure OpenAI, Copilot Studio, etc.) securely “plug into” external data sources and tools.  \n",
      "• It works like a “USB‑C for AI” — an LLM (the “host”) connects through an MCP client to one or more lightweight “MCP servers.” Each server exposes specific capabilities (APIs, files, databases, SaaS apps, etc.) in a uniform JSON schema, so the model can fetch context or invoke actions.  \n",
      "• The spec is public and already ships with SDKs (Python, TypeScript, Kotlin). Typical transports are STDIO, Server‑Sent Events, and WebSocket.  \n",
      "• Early adopters include Anthropic’s Claude Desktop, Microsoft Azure OpenAI, and Copilot Studio. There’s a growing catalogue of pre‑built servers for Slack, GitHub, Postgres, Google Drive, etc.\n",
      "\n",
      "Before I outline the report, I’d like to be sure I’m targeting the right depth and angle for you.\n",
      "\n",
      "1. What audience should the explanation assume? (e.g., non‑technical execs, software architects, hands‑on developers)  \n",
      "2. Which aspects interest you most?  \n",
      "   a. High‑level concept & benefits  \n",
      "   b. Detailed architecture, data flow, and transport layers  \n",
      "   c. SDK usage & code snippets  \n",
      "   d. Security / compliance considerations  \n",
      "   e. Real‑world adoption stories and roadmap  \n",
      "3. Do you need comparisons with alternative approaches (e.g., “function calling,” LangChain tool interfaces, RAG pipelines without MCP)?  \n",
      "4. Any length or format constraints (slide deck notes, white‑paper style, quick FAQ)?  \n",
      "5. Timeline sensitivity: should we cover only the open‑sourced spec (Nov 2024) or include 2025 Microsoft integrations as well?\n",
      "\n",
      "Let me know, and I’ll tailor the research plan accordingly.\n"
     ]
    }
   ],
   "source": [
    "messages = agent.get_state(thread_config).values['messages']\n",
    "messages[-1].pretty_print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "================================\u001b[1m Human Message \u001b[0m=================================\n",
      "\n",
      "What is model context protocol?\n",
      "==================================\u001b[1m Ai Message \u001b[0m==================================\n",
      "Tool Calls:\n",
      "  tavily_search (call_UAovh3IIwOUUGLXUu56A6VhD)\n",
      " Call ID: call_UAovh3IIwOUUGLXUu56A6VhD\n",
      "  Args:\n",
      "    queries: ['\"model context protocol\" MCP']\n",
      "=================================\u001b[1m Tool Message \u001b[0m=================================\n",
      "Name: tavily_search\n",
      "\n",
      "Search results: \n",
      "\n",
      "\n",
      "\n",
      "--- SOURCE 1: Unleashing the Power of Model Context Protocol (MCP): A Game-Changer in ... ---\n",
      "URL: https://techcommunity.microsoft.com/blog/educatordeveloperblog/unleashing-the-power-of-model-context-protocol-mcp-a-game-changer-in-ai-integrat/4397564\n",
      "\n",
      "SUMMARY:\n",
      "What is Model Context Protocol (MCP)? MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge. Key\n",
      "\n",
      "FULL CONTENT:\n",
      "Published Time: 3/27/2025, 7:57:10 AM\n",
      "Unleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration | Microsoft Community Hub\n",
      "Skip to content\n",
      "Tech CommunityCommunity Hubs\n",
      "Products\n",
      "Topics\n",
      "BlogsEvents\n",
      "Microsoft Learn\n",
      "Lounge\n",
      "More\n",
      "RegisterSign In\n",
      "\n",
      "\n",
      "Microsoft Community Hub\n",
      "\n",
      "\n",
      "CommunitiesTopics\n",
      "\n",
      "\n",
      "Education Sector\n",
      "\n",
      "\n",
      "Educator Developer Blog\n",
      "\n",
      "\n",
      "Report\n",
      "Connect with experts and redefine what’s possible at work – join us at the Microsoft 365 Community Conference May 6-8. Learn more >\n",
      "Educator Developer Blog\n",
      "Blog Post\n",
      "Educator Developer Blog\n",
      "4 MIN READ\n",
      "Unleashing the Power of Model Context Protocol (MCP): A Game-Changer in AI Integration\n",
      "\n",
      "Sharda_Kaur\n",
      "Brass Contributor\n",
      "Mar 27, 2025\n",
      "Artificial Intelligence is evolving rapidly, and one of the most pressing challenges is enabling AI models to interact effectively with external tools, data sources, and APIs. The Model Context Protocol (MCP) solves this problem by acting as a bridge between AI models and external services, creating a standardized communication framework that enhances tool integration, accessibility, and AI reasoning capabilities.\n",
      "What is Model Context Protocol (MCP)?\n",
      "MCP is a protocol designed to enable AI models, such as Azure OpenAI models, to interact seamlessly with external tools and services. Think of MCP as a universal USB-C connector for AI, allowing language models to fetch information, interact with APIs, and execute tasks beyond their built-in knowledge.\n",
      "\n",
      "Key Features of MCP\n",
      "\n",
      "Standardized Communication – MCP provides a structured way for AI models to interact with various tools.\n",
      "Tool Access & Expansion – AI assistants can now utilize external tools for real-time insights.\n",
      "Secure & Scalable – Enables safe and scalable integration with enterprise applications.\n",
      "Multi-Modal Integration – Supports STDIO, SSE (Server-Sent Events), and WebSocket communication methods.\n",
      "\n",
      "MCP Architecture & How It Works\n",
      "MCP follows a client-server architecture that allows AI models to interact with external tools efficiently. Here’s how it works:\n",
      "Components of MCP\n",
      "\n",
      "MCP Host – The AI model (e.g., Azure OpenAI GPT) requesting data or actions.\n",
      "MCP Client – An intermediary service that forwards the AI model's requests to MCP servers.\n",
      "MCP Server – Lightweight applications that expose specific capabilities (APIs, databases, files, etc.).\n",
      "Data Sources – Various backend systems, including local storage, cloud databases, and external APIs.\n",
      "\n",
      "Data Flow in MCP\n",
      "\n",
      "The AI model sends a request (e.g., \"fetch user profile data\").\n",
      "The MCP client forwards the request to the appropriate MCP server.\n",
      "The MCP server retrieves the required data from a database or API.\n",
      "The response is sent back to the AI model via the MCP client.\n",
      "\n",
      "Integrating MCP with Azure OpenAI Services\n",
      "Microsoft has integrated MCP with Azure OpenAI Services, allowing GPT models to interact with external services and fetch live data. This means AI models are no longer limited to static knowledge but can access real-time information.\n",
      "Benefits of Azure OpenAI Services + MCP Integration\n",
      "✔ Real-time Data Fetching – AI assistants can retrieve fresh information from APIs, databases, and internal systems.\n",
      "✔ Contextual AI Responses – Enhances AI responses by providing accurate, up-to-date information.\n",
      "✔ Enterprise-Ready – Secure and scalable for business applications, including finance, healthcare, and retail.\n",
      "Hands-On Tools for MCP Implementation\n",
      "To implement MCP effectively, Microsoft provides two powerful tools: Semantic Workbench and AI Gateway.\n",
      "Microsoft Semantic Workbench\n",
      "A development environment for prototyping AI-powered assistants and integrating MCP-based functionalities.\n",
      "Features:\n",
      "\n",
      "Build and test multi-agent AI assistants.\n",
      "Configure settings and interactions between AI models and external tools.\n",
      "Supports GitHub Codespaces for cloud-based development.\n",
      "\n",
      "Explore Semantic Workbench\n",
      "Workbench interface examples\n",
      "\n",
      "Microsoft AI Gateway\n",
      "A plug-and-play interface that allows developers to experiment with MCP using Azure API Management.\n",
      "Features:\n",
      "\n",
      "Credential Manager – Securely handle API credentials.\n",
      "Live Experimentation – Test AI model interactions with external tools.\n",
      "Pre-built Labs – Hands-on learning for developers.\n",
      "\n",
      "Explore AI Gateway\n",
      "Setting Up MCP with Azure OpenAI Services\n",
      "Step 1: Create a Virtual Environment\n",
      "First, create a virtual environment using Python:\n",
      "python\n",
      "python -m venv .venv\n",
      "Activate the environment:\n",
      "# Windows\n",
      "python\n",
      "venv\\Scripts\\activate\n",
      "# MacOS/Linux\n",
      "python\n",
      "source .venv/bin/activate\n",
      "Step 2: Install Required Libraries\n",
      "Create a requirements.txt file and add the following dependencies:\n",
      "```python\n",
      "langchain-mcp-adapters\n",
      "langgraph\n",
      "langchain-openai\n",
      "```\n",
      "Then, install the required libraries:\n",
      "python\n",
      "pip install -r requirements.txt\n",
      "Step 3: Set Up OpenAI API Key\n",
      "Ensure you have your OpenAI API key set up:\n",
      "# Windows\n",
      "python\n",
      "setx OPENAI_API_KEY \"<your_api_key>\n",
      "# MacOS/Linux\n",
      "python\n",
      "export OPENAI_API_KEY=<your_api_key>\n",
      "Building an MCP Server\n",
      "This server performs basic mathematical operations like addition and multiplication.\n",
      "Create the Server File\n",
      "First, create a new Python file:\n",
      "python\n",
      "touch math_server.py\n",
      "Then, implement the server:\n",
      "python\n",
      "from mcp.server.fastmcp import FastMCP\n",
      "```python\n",
      "Initialize the server\n",
      "mcp = FastMCP(\"Math\")\n",
      "MCP.tool()\n",
      "def add(a: int, b: int) -> int:\n",
      "return a + b\n",
      "\n",
      "MCP.tool()\n",
      "def multiply(a: int, b: int) -> int:\n",
      "return a * b\n",
      "\n",
      "if name == \"main\":\n",
      "mcp.run(transport=\"stdio\")\n",
      "\n",
      "```\n",
      "Your MCP server is now ready to run.\n",
      "Building an MCP Client\n",
      "This client connects to the MCP server and interacts with it.\n",
      "Create the Client File\n",
      "First, create a new file:\n",
      "python\n",
      "touch client.py\n",
      "Then, implement the client:\n",
      "```python\n",
      "import asyncio\n",
      "from mcp import ClientSession, StdioServerParameters\n",
      "from langchain_openai import ChatOpenAI\n",
      "from mcp.client.stdio import stdio_client\n",
      "Define server parameters\n",
      "server_params = StdioServerParameters(\n",
      "command=\"python\",\n",
      "\n",
      "args=[\"math_server.py\"],\n",
      "\n",
      ")\n",
      "Define the model\n",
      "model = ChatOpenAI(model=\"gpt-4o\")\n",
      "async def run_agent():\n",
      "async with stdio_client(server_params) as (read, write):\n",
      "\n",
      "    async with ClientSession(read, write) as session:\n",
      "\n",
      "        await session.initialize()\n",
      "\n",
      "        tools = await load_mcp_tools(session)\n",
      "\n",
      "        agent = create_react_agent(model, tools)\n",
      "\n",
      "        agent_response = await agent.ainvoke({\"messages\": \"what's (4 + 6) x 14?\"})\n",
      "\n",
      "        return agent_response[\"messages\"][3].content\n",
      "\n",
      "if name == \"main\":\n",
      "result = asyncio.run(run_agent())\n",
      "\n",
      "print(result)\n",
      "\n",
      "```\n",
      "Your client is now set up and ready to interact with the MCP server.\n",
      "Running the MCP Server and Client\n",
      "Step 1: Start the MCP Server\n",
      "Open a terminal and run:\n",
      "python\n",
      "python math_server.py\n",
      "This starts the MCP server, making it available for client connections.\n",
      "Step 2: Run the MCP Client\n",
      "In another terminal, run:\n",
      "python\n",
      "python client.py\n",
      "Expected Output\n",
      "140\n",
      "This means the AI agent correctly computed (4 + 6) x 14 using both the MCP server and GPT-4o.\n",
      "Conclusion\n",
      "Integrating MCP with Azure OpenAI Services enables AI applications to securely interact with external tools, enhancing functionality beyond text-based responses. With standardized communication and improved AI capabilities, developers can build smarter and more interactive AI-powered solutions. By following this guide, you can set up an MCP server and client, unlocking the full potential of AI with structured external interactions.\n",
      "Next Steps:\n",
      "\n",
      "Explore more MCP tools and integrations.\n",
      "Extend your MCP setup to work with additional APIs.\n",
      "Deploy your solution in a cloud environment for broader accessibility.\n",
      "\n",
      "For further details, visit the GitHub repository for MCP integration examples and best practices.\n",
      "\n",
      "MCP GitHub Repository\n",
      "MCP Documentation\n",
      "Semantic Workbench\n",
      "AI Gateway\n",
      "MCP Video Walkthrough\n",
      "MCP Blog\n",
      "MCP Github End to End Demo\n",
      "\n",
      "Updated Mar 27, 2025\n",
      "Version 1.0\n",
      "azure\n",
      "Azure AI Agents\n",
      "Azure AI Model\n",
      "mcp\n",
      "openai\n",
      "Semantic Kernal\n",
      "Semantic Search\n",
      "Like\n",
      "2\n",
      "Comment\n",
      "\n",
      "Sharda_Kaur\n",
      "Brass Contributor\n",
      "Joined December 16, 2023\n",
      "Send Message\n",
      "View Profile\n",
      "\n",
      "Educator Developer Blog\n",
      "Follow this blog board to get notified when there's new activity\n",
      "Share\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "What's new\n",
      "\n",
      "Surface Pro 9\n",
      "Surface Laptop 5\n",
      "Surface Studio 2+\n",
      "Surface Laptop Go 2\n",
      "Surface Laptop Studio\n",
      "Surface Duo 2\n",
      "Microsoft 365\n",
      "Windows 11 apps\n",
      "\n",
      "Microsoft Store\n",
      "\n",
      "Account profile\n",
      "Download Center\n",
      "Microsoft Store support\n",
      "Returns\n",
      "Order tracking\n",
      "Virtual workshops and training\n",
      "Microsoft Store Promise\n",
      "Flexible Payments\n",
      "\n",
      "Education\n",
      "\n",
      "Microsoft in education\n",
      "Devices for education\n",
      "Microsoft Teams for Education\n",
      "Microsoft 365 Education\n",
      "Education consultation appointment\n",
      "Educator training and development\n",
      "Deals for students and parents\n",
      "Azure for students\n",
      "\n",
      "Business\n",
      "\n",
      "Microsoft Cloud\n",
      "Microsoft Security\n",
      "Dynamics 365\n",
      "Microsoft 365\n",
      "Microsoft Power Platform\n",
      "Microsoft Teams\n",
      "Microsoft Industry\n",
      "Small Business\n",
      "\n",
      "Developer & IT\n",
      "\n",
      "Azure\n",
      "Developer Center\n",
      "Documentation\n",
      "Microsoft Learn\n",
      "Microsoft Tech Community\n",
      "Azure Marketplace\n",
      "AppSource\n",
      "Visual Studio\n",
      "\n",
      "Company\n",
      "\n",
      "Careers\n",
      "About Microsoft\n",
      "Company news\n",
      "Privacy at Microsoft\n",
      "Investors\n",
      "Diversity and inclusion\n",
      "Accessibility\n",
      "Sustainability\n",
      "\n",
      "Your Privacy Choices\n",
      "\n",
      "Sitemap\n",
      "Contact Microsoft\n",
      "Privacy\n",
      "Manage cookies\n",
      "Terms of use\n",
      "Trademarks\n",
      "Safety & eco\n",
      "About our ads\n",
      "© Microsoft 2024\n",
      "\n",
      "\"}},\"componentScriptGroups({\\\"componentId\\\":\\\"custom.widget.Social_Sharing\\\"})\":{\"__typename\":\"ComponentScriptGroups\",\"scriptGroups\":{\"__typename\":\"ComponentScriptGroupsDefinition\",\"afterInteractive\":{\"__typename\":\"PageScriptGroupDefinition\",\"group\":\"AFTER_INTERACTIVE\",\"scriptIds\":[]},\"lazyOnLoad\":{\"__typename\":\"PageScriptGroupDefinition\",\"group\":\"LAZY_ON_LOAD\",\"scriptIds\":[]}},\"componentScripts\":[]},\"component({\\\"componentId\\\":\\\"custom.widget.MicrosoftFooter\\\"})\":{\"__typename\":\"Component\",\"render({\\\"context\\\":{\\\"component\\\":{\\\"entities\\\":[],\\\"props\\\":{}},\\\"page\\\":{\\\"entities\\\":[\\\"board:EducatorDeveloperBlog\\\",\\\"message:4397564\\\"],\\\"name\\\":\\\"BlogMessagePage\\\",\\\"props\\\":{},\\\"url\\\":\\\"https://techcommunity.microsoft.com/blog/educatordeveloperblog/unleashing-the-power-of-model-context-protocol-mcp-a-game-changer-in-ai-integrat/4397564\\\"}}})\":{\"__typename\":\"ComponentRenderResult\",\"html\":\"\n",
      "What's new\n",
      "\n",
      "Surface Pro 9\n",
      "Surface Laptop 5\n",
      "Surface Studio 2+\n",
      "Surface Laptop Go 2\n",
      "Surface Laptop Studio\n",
      "Surface Duo 2\n",
      "Microsoft 365\n",
      "Windows 11 apps\n",
      "\n",
      "Microsoft Store\n",
      "\n",
      "Account profile\n",
      "Download Center\n",
      "Microsoft Store support\n",
      "Returns\n",
      "Order tracking\n",
      "Virtual workshops and training\n",
      "Microsoft Store Promise\n",
      "Flexible Payments\n",
      "\n",
      "Education\n",
      "\n",
      "Microsoft in education\n",
      "Devices for education\n",
      "Microsoft Teams for Education\n",
      "Microsoft 365 Education\n",
      "Education consultation appointment\n",
      "Educator training and development\n",
      "Deals for students and parents\n",
      "Azure for students\n",
      "\n",
      "Business\n",
      "\n",
      "Microsoft Cloud\n",
      "Microsoft Security\n",
      "Dynamics 365\n",
      "Microsoft 365\n",
      "Microsoft Power Platform\n",
      "Microsoft Teams\n",
      "Microsoft Industry\n",
      "Small Business\n",
      "\n",
      "Developer & IT\n",
      "\n",
      "Azure\n",
      "Developer Center\n",
      "Documentation\n",
      "Microsoft Learn\n",
      "Microsoft Tech Community\n",
      "Azure Marketplace\n",
      "AppSource\n",
      "Visual Studio\n",
      "\n",
      "Company\n",
      "\n",
      "Careers\n",
      "About Microsoft\n",
      "Company news\n",
      "Privacy at Microsoft\n",
      "Investors\n",
      "Diversity and inclusion\n",
      "Accessibility\n",
      "Sustainability\n",
      "\n",
      "Your Privacy Choices\n",
      "\n",
      "Sitemap\n",
      "Contact Microsoft\n",
      "Privacy\n",
      "Manage cookies\n",
      "Terms of use\n",
      "Trademarks\n",
      "Safety & eco\n",
      "About our ads\n",
      "© Microsoft 2024\n",
      "\n",
      "\"}},\"componentScriptGroups({\\\"componentId\\\":\\\"custom.widget.MicrosoftFooter\\\"})\":{\"__typename\":\"ComponentScriptGroups\",\"scriptGroups\":{\"__typename\":\"ComponentScriptGroupsDefinition\",\"afterInteractive\":{\"__typename\":\"PageScriptGroupDefinition\",\"group\":\"AFTER_INTERACTIVE\",\"scriptIds\":[]},\"lazyOnLoad\":{\"__typename\":\"PageScriptGroupDefinition\",\"group\":\"LAZY_ON_LOAD\",\"scriptIds\":[]}},\"componentScripts\":[]},\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/community/NavbarDropdownToggle\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/community/NavbarDropdownToggle-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/common/QueryHandler\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/common/QueryHandler-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageCoverImage\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageCoverImage-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/nodes/NodeTitle\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/nodes/NodeTitle-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageTimeToRead\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageTimeToRead-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageSubject\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageSubject-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/users/UserLink\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/users/UserLink-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/users/UserRank\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/users/UserRank-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageTime\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageTime-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageBody\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageBody-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageCustomFields\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageCustomFields-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageRevision\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageRevision-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageReplyButton\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageReplyButton-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/messages/MessageAuthorBio\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/messages/MessageAuthorBio-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/users/UserAvatar\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/users/UserAvatar-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/ranks/UserRankLabel\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/ranks/UserRankLabel-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/users/UserRegistrationDate\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/users/UserRegistrationDate-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/nodes/NodeAvatar\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/nodes/NodeAvatar-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/nodes/NodeDescription\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/nodes/NodeDescription-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"components/tags/TagView/TagViewChip\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-components/tags/TagView/TagViewChip-1743151752454\"}],\"cachedText({\\\"lastModified\\\":\\\"1743151752454\\\",\\\"locale\\\":\\\"en-US\\\",\\\"namespaces\\\":[\\\"shared/client/components/nodes/NodeIcon\\\"]})\":[{\"__ref\":\"CachedAsset:text:en_US-shared/client/components/nodes/NodeIcon-1743151752454\"}]},\"CachedAsset:pages-1743057517558\":{\"__typename\":\"CachedAsset\",\"id\":\"pages-1743057517558\",\"value\":[{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"BlogViewAllPostsPage\",\"type\":\"BLOG\",\"urlPath\":\"/category/:categoryId/blog/:boardId/all-posts/(/:after|/:before)?\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CasePortalPage\",\"type\":\"CASE_PORTAL\",\"urlPath\":\"/caseportal\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CreateGroupHubPage\",\"type\":\"GROUP_HUB\",\"urlPath\":\"/groups/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CaseViewPage\",\"type\":\"CASE_DETAILS\",\"urlPath\":\"/case/:caseId/:caseNumber\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"InboxPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/inbox\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"HelpFAQPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/help\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaMessagePage\",\"type\":\"IDEA_POST\",\"urlPath\":\"/idea/:boardId/:messageSubject/:messageId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaViewAllIdeasPage\",\"type\":\"IDEA\",\"urlPath\":\"/category/:categoryId/ideas/:boardId/all-ideas/(/:after|/:before)?\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"LoginPage\",\"type\":\"USER\",\"urlPath\":\"/signin\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"BlogPostPage\",\"type\":\"BLOG\",\"urlPath\":\"/category/:categoryId/blogs/:boardId/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"UserBlogPermissions.Page\",\"type\":\"COMMUNITY\",\"urlPath\":\"/c/user-blog-permissions/page\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ThemeEditorPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/designer/themes\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TkbViewAllArticlesPage\",\"type\":\"TKB\",\"urlPath\":\"/category/:categoryId/kb/:boardId/all-articles/(/:after|/:before)?\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1730819800000,\"localOverride\":null,\"page\":{\"id\":\"AllEvents\",\"type\":\"CUSTOM\",\"urlPath\":\"/Events\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"OccasionEditPage\",\"type\":\"EVENT\",\"urlPath\":\"/event/:boardId/:messageSubject/:messageId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"OAuthAuthorizationAllowPage\",\"type\":\"USER\",\"urlPath\":\"/auth/authorize/allow\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"PageEditorPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/designer/pages\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"PostPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/category/:categoryId/:boardId/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ForumBoardPage\",\"type\":\"FORUM\",\"urlPath\":\"/category/:categoryId/discussions/:boardId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TkbBoardPage\",\"type\":\"TKB\",\"urlPath\":\"/category/:categoryId/kb/:boardId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"EventPostPage\",\"type\":\"EVENT\",\"urlPath\":\"/category/:categoryId/events/:boardId/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"UserBadgesPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/users/:login/:userId/badges\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"GroupHubMembershipAction\",\"type\":\"GROUP_HUB\",\"urlPath\":\"/membership/join/:nodeId/:membershipType\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"MaintenancePage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/maintenance\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaReplyPage\",\"type\":\"IDEA_REPLY\",\"urlPath\":\"/idea/:boardId/:messageSubject/:messageId/comments/:replyId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"UserSettingsPage\",\"type\":\"USER\",\"urlPath\":\"/mysettings/:userSettingsTab\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"GroupHubsPage\",\"type\":\"GROUP_HUB\",\"urlPath\":\"/groups\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ForumPostPage\",\"type\":\"FORUM\",\"urlPath\":\"/category/:categoryId/discussions/:boardId/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"OccasionRsvpActionPage\",\"type\":\"OCCASION\",\"urlPath\":\"/event/:boardId/:messageSubject/:messageId/rsvp/:responseType\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"VerifyUserEmailPage\",\"type\":\"USER\",\"urlPath\":\"/verifyemail/:userId/:verifyEmailToken\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"AllOccasionsPage\",\"type\":\"OCCASION\",\"urlPath\":\"/category/:categoryId/events/:boardId/all-events/(/:after|/:before)?\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"EventBoardPage\",\"type\":\"EVENT\",\"urlPath\":\"/category/:categoryId/events/:boardId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TkbReplyPage\",\"type\":\"TKB_REPLY\",\"urlPath\":\"/kb/:boardId/:messageSubject/:messageId/comments/:replyId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaBoardPage\",\"type\":\"IDEA\",\"urlPath\":\"/category/:categoryId/ideas/:boardId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CommunityGuideLinesPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/communityguidelines\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CaseCreatePage\",\"type\":\"SALESFORCE_CASE_CREATION\",\"urlPath\":\"/caseportal/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TkbEditPage\",\"type\":\"TKB\",\"urlPath\":\"/kb/:boardId/:messageSubject/:messageId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ForgotPasswordPage\",\"type\":\"USER\",\"urlPath\":\"/forgotpassword\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaEditPage\",\"type\":\"IDEA\",\"urlPath\":\"/idea/:boardId/:messageSubject/:messageId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TagPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/tag/:tagName\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"BlogBoardPage\",\"type\":\"BLOG\",\"urlPath\":\"/category/:categoryId/blog/:boardId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"OccasionMessagePage\",\"type\":\"OCCASION_TOPIC\",\"urlPath\":\"/event/:boardId/:messageSubject/:messageId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ManageContentPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/managecontent\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ClosedMembershipNodeNonMembersPage\",\"type\":\"GROUP_HUB\",\"urlPath\":\"/closedgroup/:groupHubId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"CommunityPage\",\"type\":\"COMMUNITY\",\"urlPath\":\"/\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ForumMessagePage\",\"type\":\"FORUM_TOPIC\",\"urlPath\":\"/discussions/:boardId/:messageSubject/:messageId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"IdeaPostPage\",\"type\":\"IDEA\",\"urlPath\":\"/category/:categoryId/ideas/:boardId/create\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1730819800000,\"localOverride\":null,\"page\":{\"id\":\"CommunityHub.Page\",\"type\":\"CUSTOM\",\"urlPath\":\"/Directory\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"BlogMessagePage\",\"type\":\"BLOG_ARTICLE\",\"urlPath\":\"/blog/:boardId/:messageSubject/:messageId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"RegistrationPage\",\"type\":\"USER\",\"urlPath\":\"/register\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"EditGroupHubPage\",\"type\":\"GROUP_HUB\",\"urlPath\":\"/group/:groupHubId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ForumEditPage\",\"type\":\"FORUM\",\"urlPath\":\"/discussions/:boardId/:messageSubject/:messageId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ResetPasswordPage\",\"type\":\"USER\",\"urlPath\":\"/resetpassword/:userId/:resetPasswordToken\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1730819800000,\"localOverride\":null,\"page\":{\"id\":\"AllBlogs.Page\",\"type\":\"CUSTOM\",\"urlPath\":\"/blogs\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"TkbMessagePage\",\"type\":\"TKB_ARTICLE\",\"urlPath\":\"/kb/:boardId/:messageSubject/:messageId\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"BlogEditPage\",\"type\":\"BLOG\",\"urlPath\":\"/blog/:boardId/:messageSubject/:messageId/edit\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"ManageUsersPage\",\"type\":\"USER\",\"urlPath\":\"/users/manage/:tab?/:manageUsersTab?\",\"__typename\":\"PageDescriptor\"},\"__typename\":\"PageResource\"},{\"lastUpdatedTime\":1743057517558,\"localOverride\":null,\"page\":{\"id\":\"F
Download .txt
gitextract_qsg5gbbh/

├── .github/
│   ├── dependabot.yml
│   └── workflows/
│       ├── claude-code-review.yml
│       └── claude.yml
├── .gitignore
├── CLAUDE.md
├── LICENSE
├── README.md
├── examples/
│   ├── arxiv.md
│   ├── inference-market-gpt45.md
│   ├── inference-market.md
│   └── pubmed.md
├── langgraph.json
├── pyproject.toml
├── src/
│   ├── legacy/
│   │   ├── CLAUDE.md
│   │   ├── __init__.py
│   │   ├── configuration.py
│   │   ├── files/
│   │   │   └── vibe_code.md
│   │   ├── graph.ipynb
│   │   ├── graph.py
│   │   ├── legacy.md
│   │   ├── multi_agent.ipynb
│   │   ├── multi_agent.py
│   │   ├── prompts.py
│   │   ├── state.py
│   │   ├── tests/
│   │   │   ├── conftest.py
│   │   │   ├── run_test.py
│   │   │   └── test_report_quality.py
│   │   └── utils.py
│   ├── open_deep_research/
│   │   ├── configuration.py
│   │   ├── deep_researcher.py
│   │   ├── prompts.py
│   │   ├── state.py
│   │   └── utils.py
│   └── security/
│       └── auth.py
└── tests/
    ├── evaluators.py
    ├── expt_results/
    │   ├── deep_research_bench_claude4-sonnet.jsonl
    │   ├── deep_research_bench_gpt-4.1.jsonl
    │   └── deep_research_bench_gpt-5.jsonl
    ├── extract_langsmith_data.py
    ├── pairwise_evaluation.py
    ├── prompts.py
    ├── run_evaluate.py
    └── supervisor_parallel_evaluation.py
Download .txt
SYMBOL INDEX (160 symbols across 18 files)

FILE: src/legacy/configuration.py
  class SearchAPI (line 20) | class SearchAPI(Enum):
  class Configuration (line 32) | class Configuration:
    method from_runnable_config (line 55) | def from_runnable_config(
  class MultiAgentConfiguration (line 70) | class MultiAgentConfiguration:
    method from_runnable_config (line 91) | def from_runnable_config(

FILE: src/legacy/graph.py
  function generate_report_plan (line 43) | async def generate_report_plan(state: ReportState, config: RunnableConfig):
  function human_feedback (line 142) | def human_feedback(state: ReportState, config: RunnableConfig) -> Comman...
  function generate_queries (line 194) | async def generate_queries(state: SectionState, config: RunnableConfig):
  function search_web (line 235) | async def search_web(state: SectionState, config: RunnableConfig):
  function write_section (line 268) | async def write_section(state: SectionState, config: RunnableConfig) -> ...
  function write_final_sections (line 356) | async def write_final_sections(state: SectionState, config: RunnableConf...
  function gather_completed_sections (line 396) | def gather_completed_sections(state: ReportState):
  function compile_final_report (line 417) | def compile_final_report(state: ReportState, config: RunnableConfig):
  function initiate_final_section_writing (line 451) | def initiate_final_section_writing(state: ReportState):

FILE: src/legacy/multi_agent.py
  function get_search_tool (line 26) | def get_search_tool(config: RunnableConfig):
  class Section (line 51) | class Section(BaseModel):
  class Sections (line 63) | class Sections(BaseModel):
  class Introduction (line 69) | class Introduction(BaseModel):
  class Conclusion (line 78) | class Conclusion(BaseModel):
  class Question (line 87) | class Question(BaseModel):
  class FinishResearch (line 94) | class FinishResearch(BaseModel):
  class FinishReport (line 98) | class FinishReport(BaseModel):
  class ReportStateOutput (line 102) | class ReportStateOutput(MessagesState):
  class ReportState (line 108) | class ReportState(MessagesState):
  class SectionState (line 116) | class SectionState(MessagesState):
  class SectionOutputState (line 123) | class SectionOutputState(TypedDict):
  function _load_mcp_tools (line 130) | async def _load_mcp_tools(
  function get_supervisor_tools (line 161) | async def get_supervisor_tools(config: RunnableConfig) -> list[BaseTool]:
  function get_research_tools (line 176) | async def get_research_tools(config: RunnableConfig) -> list[BaseTool]:
  function supervisor (line 188) | async def supervisor(state: ReportState, config: RunnableConfig):
  function supervisor_tools (line 240) | async def supervisor_tools(state: ReportState, config: RunnableConfig)  ...
  function supervisor_should_continue (line 340) | async def supervisor_should_continue(state: ReportState) -> str:
  function research_agent (line 353) | async def research_agent(state: SectionState, config: RunnableConfig):
  function research_agent_tools (line 396) | async def research_agent_tools(state: SectionState, config: RunnableConf...
  function research_agent_should_continue (line 447) | async def research_agent_should_continue(state: SectionState) -> str:

FILE: src/legacy/state.py
  class Section (line 5) | class Section(BaseModel):
  class Sections (line 19) | class Sections(BaseModel):
  class SearchQuery (line 24) | class SearchQuery(BaseModel):
  class Queries (line 27) | class Queries(BaseModel):
  class Feedback (line 32) | class Feedback(BaseModel):
  class ReportStateInput (line 40) | class ReportStateInput(TypedDict):
  class ReportStateOutput (line 43) | class ReportStateOutput(TypedDict):
  class ReportState (line 49) | class ReportState(TypedDict):
  class SectionState (line 60) | class SectionState(TypedDict):
  class SectionOutputState (line 69) | class SectionOutputState(TypedDict):

FILE: src/legacy/tests/conftest.py
  function pytest_addoption (line 7) | def pytest_addoption(parser):

FILE: src/legacy/tests/run_test.py
  function main (line 21) | def main():
  function run_test (line 103) | def run_test(agent, agent_config, args):
  function add_model_configs (line 144) | def add_model_configs(cmd, args):

FILE: src/legacy/tests/test_report_quality.py
  class CriteriaGrade (line 26) | class CriteriaGrade(BaseModel):
  function get_evaluation_llm (line 32) | def get_evaluation_llm(eval_model=None):
  function research_agent (line 84) | def research_agent(request):
  function search_api (line 89) | def search_api(request):
  function eval_model (line 94) | def eval_model(request):
  function models (line 99) | def models(request, research_agent):
  function test_response_criteria_evaluation (line 140) | def test_response_criteria_evaluation(research_agent, search_api, models...

FILE: src/legacy/utils.py
  function get_config_value (line 46) | def get_config_value(value):
  function get_search_params (line 57) | def get_search_params(search_api: str, search_api_config: Optional[Dict[...
  function deduplicate_and_format_sources (line 89) | def deduplicate_and_format_sources(
  function format_sections (line 153) | def format_sections(sections: list[Section]) -> str:
  function tavily_search_async (line 173) | async def tavily_search_async(search_queries, max_results: int = 5, topi...
  function azureaisearch_search_async (line 219) | async def azureaisearch_search_async(search_queries: list[str], max_resu...
  function perplexity_search (line 279) | def perplexity_search(search_queries):
  function exa_search (line 374) | async def exa_search(search_queries, max_characters: Optional[int] = Non...
  function arxiv_search_async (line 577) | async def arxiv_search_async(search_queries, load_max_docs=5, get_full_d...
  function pubmed_search_async (line 734) | async def pubmed_search_async(search_queries, top_k_results=5, email=Non...
  function linkup_search (line 882) | async def linkup_search(search_queries, depth: Optional[str] = "standard"):
  function google_search_async (line 928) | async def google_search_async(search_queries: Union[str, List[str]], max...
  function scrape_pages (line 1188) | async def scrape_pages(titles: List[str], urls: List[str]) -> str:
  function duckduckgo_search (line 1248) | async def duckduckgo_search(search_queries: List[str]):
  function tavily_search (line 1364) | async def tavily_search(
  function azureaisearch_search (line 1457) | async def azureaisearch_search(queries: List[str], max_results: int = 5,...
  function select_and_execute_search (line 1501) | async def select_and_execute_search(search_api: str, query_list: list[st...
  class Summary (line 1542) | class Summary(BaseModel):
  function summarize_webpage (line 1547) | async def summarize_webpage(model: BaseChatModel, webpage_content: str) ...
  function split_and_rerank_search_results (line 1573) | def split_and_rerank_search_results(embeddings: Embeddings, query: str, ...
  function stitch_documents_by_url (line 1596) | def stitch_documents_by_url(documents: list[Document]) -> list[Document]:
  function get_today_str (line 1621) | def get_today_str() -> str:
  function load_mcp_server_config (line 1626) | async def load_mcp_server_config(path: str) -> dict:

FILE: src/open_deep_research/configuration.py
  class SearchAPI (line 11) | class SearchAPI(Enum):
  class MCPConfig (line 19) | class MCPConfig(BaseModel):
  class Configuration (line 38) | class Configuration(BaseModel):
    method from_runnable_config (line 237) | def from_runnable_config(
    class Config (line 249) | class Config:

FILE: src/open_deep_research/deep_researcher.py
  function clarify_with_user (line 60) | async def clarify_with_user(state: AgentState, config: RunnableConfig) -...
  function write_research_brief (line 118) | async def write_research_brief(state: AgentState, config: RunnableConfig...
  function supervisor (line 178) | async def supervisor(state: SupervisorState, config: RunnableConfig) -> ...
  function supervisor_tools (line 225) | async def supervisor_tools(state: SupervisorState, config: RunnableConfi...
  function researcher (line 365) | async def researcher(state: ResearcherState, config: RunnableConfig) -> ...
  function execute_tool_safely (line 427) | async def execute_tool_safely(tool, args, config):
  function researcher_tools (line 435) | async def researcher_tools(state: ResearcherState, config: RunnableConfi...
  function compress_research (line 511) | async def compress_research(state: ResearcherState, config: RunnableConf...
  function final_report_generation (line 607) | async def final_report_generation(state: AgentState, config: RunnableCon...

FILE: src/open_deep_research/state.py
  class ConductResearch (line 15) | class ConductResearch(BaseModel):
  class ResearchComplete (line 21) | class ResearchComplete(BaseModel):
  class Summary (line 24) | class Summary(BaseModel):
  class ClarifyWithUser (line 30) | class ClarifyWithUser(BaseModel):
  class ResearchQuestion (line 43) | class ResearchQuestion(BaseModel):
  function override_reducer (line 55) | def override_reducer(current_value, new_value):
  class AgentInputState (line 62) | class AgentInputState(MessagesState):
  class AgentState (line 65) | class AgentState(MessagesState):
  class SupervisorState (line 74) | class SupervisorState(TypedDict):
  class ResearcherState (line 83) | class ResearcherState(TypedDict):
  class ResearcherOutputState (line 92) | class ResearcherOutputState(BaseModel):

FILE: src/open_deep_research/utils.py
  function tavily_search (line 44) | async def tavily_search(
  function tavily_search_async (line 138) | async def tavily_search_async(
  function summarize_webpage (line 175) | async def summarize_webpage(model: BaseChatModel, webpage_content: str) ...
  function think_tool (line 220) | def think_tool(reflection: str) -> str:
  function get_mcp_access_token (line 250) | async def get_mcp_access_token(
  function get_tokens (line 293) | async def get_tokens(config: RunnableConfig):
  function set_tokens (line 331) | async def set_tokens(config: RunnableConfig, tokens: dict[str, Any]):
  function fetch_tokens (line 352) | async def fetch_tokens(config: RunnableConfig) -> dict[str, Any]:
  function wrap_mcp_authenticate_tool (line 385) | def wrap_mcp_authenticate_tool(tool: StructuredTool) -> StructuredTool:
  function load_mcp_tools (line 449) | async def load_mcp_tools(
  function get_search_tool (line 531) | async def get_search_tool(search_api: SearchAPI):
  function get_all_tools (line 569) | async def get_all_tools(config: RunnableConfig):
  function get_notes_from_tool_calls (line 599) | def get_notes_from_tool_calls(messages: list[MessageLikeRepresentation]):
  function anthropic_websearch_called (line 607) | def anthropic_websearch_called(response):
  function openai_websearch_called (line 639) | def openai_websearch_called(response):
  function is_token_limit_exceeded (line 665) | def is_token_limit_exceeded(exception: Exception, model_name: str = None...
  function _check_openai_token_limit (line 703) | def _check_openai_token_limit(exception: Exception, error_str: str) -> b...
  function _check_anthropic_token_limit (line 736) | def _check_anthropic_token_limit(exception: Exception, error_str: str) -...
  function _check_gemini_token_limit (line 759) | def _check_gemini_token_limit(exception: Exception, error_str: str) -> b...
  function get_model_token_limit (line 831) | def get_model_token_limit(model_string):
  function remove_up_to_last_ai_message (line 848) | def remove_up_to_last_ai_message(messages: list[MessageLikeRepresentatio...
  function get_today_str (line 872) | def get_today_str() -> str:
  function get_config_value (line 881) | def get_config_value(value):
  function get_api_key_for_model (line 892) | def get_api_key_for_model(model_name: str, config: RunnableConfig):
  function get_tavily_api_key (line 916) | def get_tavily_api_key(config: RunnableConfig):

FILE: src/security/auth.py
  function get_current_user (line 22) | async def get_current_user(authorization: str | None) -> Auth.types.Mini...
  function on_thread_create (line 74) | async def on_thread_create(
  function on_thread_read (line 98) | async def on_thread_read(
  function on_assistants_create (line 115) | async def on_assistants_create(
  function on_assistants_read (line 132) | async def on_assistants_read(
  function authorize_store (line 150) | async def authorize_store(ctx: Auth.types.AuthContext, value: dict):

FILE: tests/evaluators.py
  function _format_input_query (line 12) | def _format_input_query(inputs: dict) -> str:
  class OverallQualityScore (line 25) | class OverallQualityScore(BaseModel):
  function eval_overall_quality (line 34) | def eval_overall_quality(inputs: dict, outputs: dict):
  class RelevanceScore (line 58) | class RelevanceScore(BaseModel):
  function eval_relevance (line 63) | def eval_relevance(inputs: dict, outputs: dict):
  class StructureScore (line 81) | class StructureScore(BaseModel):
  function eval_structure (line 86) | def eval_structure(inputs: dict, outputs: dict):
  class CorrectnessScore (line 103) | class CorrectnessScore(BaseModel):
  function eval_correctness (line 108) | def eval_correctness(inputs: dict, outputs: dict, reference_outputs: dict):
  class GroundednessClaim (line 125) | class GroundednessClaim(BaseModel):
  class GroundednessScore (line 130) | class GroundednessScore(BaseModel):
  function eval_groundedness (line 134) | def eval_groundedness(inputs: dict, outputs: dict):
  class CompletenessScore (line 154) | class CompletenessScore(BaseModel):
  function eval_completeness (line 159) | def eval_completeness(inputs: dict, outputs: dict):

FILE: tests/extract_langsmith_data.py
  function extract_langsmith_data (line 13) | def extract_langsmith_data(project_name, model_name, dataset_name, api_k...
  function main (line 61) | def main():

FILE: tests/pairwise_evaluation.py
  class HeadToHeadRanking (line 30) | class HeadToHeadRanking(BaseModel):
  function head_to_head_evaluator (line 35) | def head_to_head_evaluator(inputs: dict, outputs: list[dict]) -> list:
  class Rankings (line 86) | class Rankings(BaseModel):
  function free_for_all_evaluator (line 92) | def free_for_all_evaluator(inputs: dict, outputs: list[dict]) -> list:

FILE: tests/run_evaluate.py
  function target (line 32) | async def target(
  function main (line 63) | async def main():

FILE: tests/supervisor_parallel_evaluation.py
  function right_parallelism_evaluator (line 10) | def right_parallelism_evaluator(
  function target (line 19) | async def target(inputs: dict):
  function main (line 50) | async def main():
Condensed preview — 43 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,649K chars).
[
  {
    "path": ".github/dependabot.yml",
    "chars": 633,
    "preview": "# To get started with Dependabot version updates, you'll need to specify which\n# package ecosystems to update and where "
  },
  {
    "path": ".github/workflows/claude-code-review.yml",
    "chars": 3021,
    "preview": "name: Claude Code Review\n\non:\n  pull_request:\n    types: [opened, synchronize]\n    # Optional: Only run on specific file"
  },
  {
    "path": ".github/workflows/claude.yml",
    "chars": 2301,
    "preview": "name: Claude Code\n\non:\n  issue_comment:\n    types: [created]\n  pull_request_review_comment:\n    types: [created]\n  issue"
  },
  {
    "path": ".gitignore",
    "chars": 616,
    "preview": "\n*.egg-info\n*.pyc\n\n# Python\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n."
  },
  {
    "path": "CLAUDE.md",
    "chars": 2942,
    "preview": "# Open Deep Research Repository Overview\n\n## Project Description\nOpen Deep Research is a configurable, fully open-source"
  },
  {
    "path": "LICENSE",
    "chars": 1065,
    "preview": "MIT License\n\nCopyright (c) 2025 LangChain\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
  },
  {
    "path": "README.md",
    "chars": 10224,
    "preview": "# 🔬 Open Deep Research\n\n<img width=\"1388\" height=\"298\" alt=\"full_diagram\" src=\"https://github.com/user-attachments/asset"
  },
  {
    "path": "examples/arxiv.md",
    "chars": 11569,
    "preview": "# Obesity Among Young Adults in the United States: A Growing Public Health Challenge\n\nThe obesity epidemic among young a"
  },
  {
    "path": "examples/inference-market-gpt45.md",
    "chars": 11892,
    "preview": "# Introduction\n\nThe AI inference market is rapidly expanding, driven by growing demand for real-time data processing and"
  },
  {
    "path": "examples/inference-market.md",
    "chars": 10009,
    "preview": "# The AI Inference Market: Analyzing Emerging Leaders\n\nThe AI inference market is experiencing unprecedented growth, pro"
  },
  {
    "path": "examples/pubmed.md",
    "chars": 13636,
    "preview": "# Diabetic Nephropathy Treatment: Current Approaches and Future Directions\n\nDiabetic nephropathy has emerged as the lead"
  },
  {
    "path": "langgraph.json",
    "chars": 295,
    "preview": "{\n    \"dockerfile_lines\": [],\n    \"graphs\": {\n      \"Deep Researcher\": \"./src/open_deep_research/deep_researcher.py:deep"
  },
  {
    "path": "pyproject.toml",
    "chars": 2020,
    "preview": "[project]\nname = \"open_deep_research\"\nversion = \"0.0.16\"\ndescription = \"Planning, research, and report generation.\"\nauth"
  },
  {
    "path": "src/legacy/CLAUDE.md",
    "chars": 4545,
    "preview": "# Open Deep Research\n\n## About Open Deep Research\n\nOpen Deep Research is an experimental, fully open-source research ass"
  },
  {
    "path": "src/legacy/__init__.py",
    "chars": 72,
    "preview": "\"\"\"Planning, research, and report generation.\"\"\"\n\n__version__ = \"0.0.15\""
  },
  {
    "path": "src/legacy/configuration.py",
    "chars": 4100,
    "preview": "import os\nfrom enum import Enum\nfrom dataclasses import dataclass, fields\nfrom typing import Any, Optional, Dict, Litera"
  },
  {
    "path": "src/legacy/files/vibe_code.md",
    "chars": 13841,
    "preview": "# Vibe coding MenuGen\n\nAndrej Karpathy\n\nVery often, I sit down at a restaurant, look through their menu, and feel... kin"
  },
  {
    "path": "src/legacy/graph.ipynb",
    "chars": 19549,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Research Workflow\\n\",\n    \"\\n\",\n "
  },
  {
    "path": "src/legacy/graph.py",
    "chars": 21121,
    "preview": "from typing import Literal\n\nfrom langchain.chat_models import init_chat_model\nfrom langchain_core.messages import HumanM"
  },
  {
    "path": "src/legacy/legacy.md",
    "chars": 20682,
    "preview": "# Open Deep Research\n\nOpen Deep Research is an experimental, fully open-source research assistant that automates deep re"
  },
  {
    "path": "src/legacy/multi_agent.ipynb",
    "chars": 108302,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Multi-Agent Researcher\\n\",\n    \"\\"
  },
  {
    "path": "src/legacy/multi_agent.py",
    "chars": 19435,
    "preview": "from typing import List, Annotated, TypedDict, Literal, cast\nfrom pydantic import BaseModel, Field\nimport operator\nimpor"
  },
  {
    "path": "src/legacy/prompts.py",
    "chars": 20670,
    "preview": "report_planner_query_writer_instructions=\"\"\"You are performing research for a report. \n\n<Report topic>\n{topic}\n</Report "
  },
  {
    "path": "src/legacy/state.py",
    "chars": 3011,
    "preview": "from typing import Annotated, List, TypedDict, Literal\nfrom pydantic import BaseModel, Field\nimport operator\n\nclass Sect"
  },
  {
    "path": "src/legacy/tests/conftest.py",
    "chars": 1035,
    "preview": "\"\"\"\nPytest configuration for open_deep_research tests.\n\"\"\"\n\nimport pytest\n\ndef pytest_addoption(parser):\n    \"\"\"Add comm"
  },
  {
    "path": "src/legacy/tests/run_test.py",
    "chars": 6980,
    "preview": "#!/usr/bin/env python\nimport os\nimport subprocess\nimport sys\nimport argparse\nfrom rich.console import Console\nfrom rich."
  },
  {
    "path": "src/legacy/tests/test_report_quality.py",
    "chars": 12804,
    "preview": "#!/usr/bin/env python\n\nimport os\nimport uuid\nimport pytest\nimport asyncio\nfrom pydantic import BaseModel, Field\nfrom lan"
  },
  {
    "path": "src/legacy/utils.py",
    "chars": 69610,
    "preview": "import os\nimport asyncio\nimport json\nimport datetime\nimport requests\nimport random \nimport concurrent\nimport hashlib\nimp"
  },
  {
    "path": "src/open_deep_research/configuration.py",
    "chars": 8234,
    "preview": "\"\"\"Configuration management for the Open Deep Research system.\"\"\"\n\nimport os\nfrom enum import Enum\nfrom typing import An"
  },
  {
    "path": "src/open_deep_research/deep_researcher.py",
    "chars": 30117,
    "preview": "\"\"\"Main LangGraph implementation for the Deep Research agent.\"\"\"\n\nimport asyncio\nfrom typing import Literal\n\nfrom langch"
  },
  {
    "path": "src/open_deep_research/prompts.py",
    "chars": 21208,
    "preview": "\"\"\"System prompts and prompt templates for the Deep Research agent.\"\"\"\n\nclarify_with_user_instructions=\"\"\"\nThese are the"
  },
  {
    "path": "src/open_deep_research/state.py",
    "chars": 3273,
    "preview": "\"\"\"Graph state definitions and data structures for the Deep Research agent.\"\"\"\n\nimport operator\nfrom typing import Annot"
  },
  {
    "path": "src/open_deep_research/utils.py",
    "chars": 33199,
    "preview": "\"\"\"Utility functions and helpers for the Deep Research agent.\"\"\"\n\nimport asyncio\nimport logging\nimport os\nimport warning"
  },
  {
    "path": "src/security/auth.py",
    "chars": 4949,
    "preview": "import os\nimport asyncio\nfrom langgraph_sdk import Auth\nfrom langgraph_sdk.auth.types import StudioUser\nfrom supabase im"
  },
  {
    "path": "tests/evaluators.py",
    "chars": 9689,
    "preview": "from typing import cast\nfrom pydantic import BaseModel, Field\nfrom langchain_openai import ChatOpenAI\nfrom langchain_ant"
  },
  {
    "path": "tests/expt_results/deep_research_bench_claude4-sonnet.jsonl",
    "chars": 1252858,
    "preview": "{\"id\": 59, \"prompt\": \"In ecology, how do birds achieve precise location and direction navigation during migration? What "
  },
  {
    "path": "tests/expt_results/deep_research_bench_gpt-4.1.jsonl",
    "chars": 1078519,
    "preview": "{\"id\": 57, \"prompt\": \"Summarize the global investments, key initiatives, and outputs related to Artificial Intelligence "
  },
  {
    "path": "tests/expt_results/deep_research_bench_gpt-5.jsonl",
    "chars": 1695912,
    "preview": "{\"id\": 57, \"prompt\": \"Summarize the global investments, key initiatives, and outputs related to Artificial Intelligence "
  },
  {
    "path": "tests/extract_langsmith_data.py",
    "chars": 2975,
    "preview": "#!/usr/bin/env python3\n\"\"\"Extract data from LangSmith and save to JSONL file with configurable dataset.\"\"\"\n\nimport os\nim"
  },
  {
    "path": "tests/pairwise_evaluation.py",
    "chars": 6491,
    "preview": "from langchain_anthropic import ChatAnthropic\nfrom langsmith.evaluation import evaluate_comparative\nfrom pydantic import"
  },
  {
    "path": "tests/prompts.py",
    "chars": 9338,
    "preview": "OVERALL_QUALITY_PROMPT = \"\"\"You are an expert evaluator tasked with assessing the quality of research reports. Please ev"
  },
  {
    "path": "tests/run_evaluate.py",
    "chars": 4048,
    "preview": "from langsmith import Client\nfrom tests.evaluators import eval_overall_quality, eval_relevance, eval_structure, eval_cor"
  },
  {
    "path": "tests/supervisor_parallel_evaluation.py",
    "chars": 2273,
    "preview": "from open_deep_research.deep_researcher import deep_researcher_builder\nfrom langgraph.checkpoint.memory import MemorySav"
  }
]

About this extraction

This page contains the full source code of the langchain-ai/open_deep_research GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 43 files (4.3 MB), approximately 1.1M tokens, and a symbol index with 160 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!