Full Code of foreveryh/langgraph-deep-research for AI

main 3757c39b6fa2 cached

52 files

454.6 KB

142.7k tokens

130 symbols

1 requests

Download .txt

Showing preview only (494K chars total). Download the full file or copy to clipboard to get everything.

Repository: foreveryh/langgraph-deep-research
Branch: main
Commit: 3757c39b6fa2
Files: 52
Total size: 454.6 KB

Directory structure:
gitextract_ckbl8hy3/

├── .gitignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── backend/
│   ├── .gitignore
│   ├── LICENSE
│   ├── Makefile
│   ├── langgraph.json
│   ├── pyproject.toml
│   ├── src/
│   │   └── agent/
│   │       ├── __init__.py
│   │       ├── app.py
│   │       ├── configuration.py
│   │       ├── content_enhancement_decision.py
│   │       ├── enhanced_graph_nodes.py
│   │       ├── graph.py
│   │       ├── prompts.py
│   │       ├── report_level_enhancement.py
│   │       ├── state.py
│   │       ├── tools_and_schemas.py
│   │       └── utils.py
│   └── test-agent.ipynb
├── docker-compose.yml
├── docs/
│   ├── document-generation-flow-ZH.md
│   └── document-generation-flow.md
└── frontend/
    ├── .gitignore
    ├── components.json
    ├── eslint.config.js
    ├── index.html
    ├── package.json
    ├── src/
    │   ├── App.tsx
    │   ├── components/
    │   │   ├── ActivityTimeline.tsx
    │   │   ├── ChatMessagesView.tsx
    │   │   ├── InputForm.tsx
    │   │   ├── ResearchThinkPanel.tsx
    │   │   ├── WelcomeScreen.tsx
    │   │   └── ui/
    │   │       ├── badge.tsx
    │   │       ├── button.tsx
    │   │       ├── card.tsx
    │   │       ├── input.tsx
    │   │       ├── scroll-area.tsx
    │   │       ├── select.tsx
    │   │       ├── tabs.tsx
    │   │       └── textarea.tsx
    │   ├── global.css
    │   ├── lib/
    │   │   └── utils.ts
    │   ├── main.tsx
    │   ├── utils/
    │   │   └── dataTransformer.ts
    │   └── vite-env.d.ts
    ├── tsconfig.json
    ├── tsconfig.node.json
    └── vite.config.ts

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Node / Frontend
node_modules/
frontend/dist/
frontend/.vite/
frontend/coverage/
.DS_Store
*.local

# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
lerna-debug.log*

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# IDE files
.idea/
.vscode/
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?

# Optional backend venv (if created in root)
#.venv/ 

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
uv.lock

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

backend/.langgraph_api
modify/

================================================
FILE: Dockerfile
================================================
# Stage 1: Build React Frontend
FROM node:20-alpine AS frontend-builder

# Set working directory for frontend
WORKDIR /app/frontend

# Copy frontend package files and install dependencies
COPY frontend/package.json ./
COPY frontend/package-lock.json ./
# If you use yarn or pnpm, adjust accordingly (e.g., copy yarn.lock or pnpm-lock.yaml and use yarn install or pnpm install)
RUN npm install

# Copy the rest of the frontend source code
COPY frontend/ ./

# Build the frontend
RUN npm run build

# Stage 2: Python Backend
FROM docker.io/langchain/langgraph-api:3.11

# -- Install UV --
# First install curl, then install UV using the standalone installer
RUN apt-get update && apt-get install -y curl && \
    curl -LsSf https://astral.sh/uv/install.sh | sh && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
ENV PATH="/root/.local/bin:$PATH"
# -- End of UV installation --

# -- Copy built frontend from builder stage --
# The app.py expects the frontend build to be at ../frontend/dist relative to its own location.
# If app.py is at /deps/backend/src/agent/app.py, then ../frontend/dist resolves to /deps/frontend/dist.
COPY --from=frontend-builder /app/frontend/dist /deps/frontend/dist
# -- End of copying built frontend --

# -- Adding local package . --
ADD backend/ /deps/backend
# -- End of local package . --

# -- Installing all local dependencies using UV --
# First, we need to ensure pip is available for UV to use
RUN uv pip install --system pip setuptools wheel
# Install dependencies with UV, respecting constraints
RUN cd /deps/backend && \
    PYTHONDONTWRITEBYTECODE=1 UV_SYSTEM_PYTHON=1 uv pip install --system -c /api/constraints.txt -e .
# -- End of local dependencies install --
ENV LANGGRAPH_HTTP='{"app": "/deps/backend/src/agent/app.py:app"}'
ENV LANGSERVE_GRAPHS='{"agent": "/deps/backend/src/agent/graph.py:graph"}'

# -- Ensure user deps didn't inadvertently overwrite langgraph-api
# Create all required directories that the langgraph-api package expects
RUN mkdir -p /api/langgraph_api /api/langgraph_runtime /api/langgraph_license /api/langgraph_storage && \
    touch /api/langgraph_api/__init__.py /api/langgraph_runtime/__init__.py /api/langgraph_license/__init__.py /api/langgraph_storage/__init__.py
# Use pip for this specific package as it has poetry-based build requirements
RUN PYTHONDONTWRITEBYTECODE=1 pip install --no-cache-dir --no-deps -e /api
# -- End of ensuring user deps didn't inadvertently overwrite langgraph-api --
# -- Removing pip from the final image (but keeping UV) --
RUN uv pip uninstall --system pip setuptools wheel && \
    rm -rf /usr/local/lib/python*/site-packages/pip* /usr/local/lib/python*/site-packages/setuptools* /usr/local/lib/python*/site-packages/wheel* && \
    find /usr/local/bin -name "pip*" -delete
# -- End of pip removal --

WORKDIR /deps/backend


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: Makefile
================================================
.PHONY: help dev-frontend dev-backend dev

help:
	@echo "Available commands:"
	@echo "  make dev-frontend    - Starts the frontend development server (Vite)"
	@echo "  make dev-backend     - Starts the backend development server (Uvicorn with reload)"
	@echo "  make dev             - Starts both frontend and backend development servers"

dev-frontend:
	@echo "Starting frontend development server..."
	@cd frontend && pnpm run dev

dev-backend:
	@echo "Starting backend development server..."
	@cd backend && langgraph dev

# Run frontend and backend concurrently
dev:
	@echo "Starting both frontend and backend development servers..."
	@make dev-frontend & make dev-backend 

================================================
FILE: README.md
================================================
# 🚀 Enhanced Version

> Based on the original project, I have optimized the Agent workflow and frontend display effects.

## **Agent Workflow Comparison**
<table>
<tr>
<td align="center"><b>Optimized Agent</b></td>
<td align="center"><b>Original Agent</b></td>
</tr>
<tr>
<td><img src="./agent_new.png" width="400"/></td>
<td><img src="./agent.png" width="400"/></td>
</tr>
</table>

## **Frontend Display Enhancement** 
<table>
<tr>
<td align="center"><b>Enhanced Frontend</b></td>
<td align="center"><b>Original Frontend</b></td>
</tr>
<tr>
<td><img src="./frontend.png" width="400"/></td>
<td><img src="./app.png" width="400"/></td>
</tr>
</table>

## **Technical Documentation**
For detailed technical implementation and architecture analysis, please refer to:
- 📖 [`docs/document-generation-flow.md`](docs/document-generation-flow.md) - English Technical Documentation
- 📖 [`docs/document-generation-flow-ZH.md`](docs/document-generation-flow-ZH.md) - Chinese Technical Documentation

## **📞 Contact & Support**

**Author: Peng.G**

If you have any questions about this project or Agent development, or are interested in business collaboration opportunities, feel free to reach out:

<div align="center">

| Platform | Contact Information |
|----------|-------------------|
| **𝕏 (Twitter)** | [@Stephen4171127](https://x.com/Stephen4171127) |
| **📝 Blog** | [https://me.deeptoai.com](https://me.deeptoai.com) |
| **💬 WeChat** | `browncony999` |
| **📧 Email** | [foreveryh@gmail.com](mailto:foreveryh@gmail.com) |

</div>

---

## **Getting Started**
The setup process remains the same. Please follow the original project's official guidance below.

---

# Gemini Fullstack LangGraph Quickstart

This project demonstrates a fullstack application using a React frontend and a LangGraph-powered backend agent. The agent is designed to perform comprehensive research on a user's query by dynamically generating search terms, querying the web using Google Search, reflecting on the results to identify knowledge gaps, and iteratively refining its search until it can provide a well-supported answer with citations. This application serves as an example of building research-augmented conversational AI using LangGraph and Google's Gemini models.

![Gemini Fullstack LangGraph](./app.png)

## Features

- 💬 Fullstack application with a React frontend and LangGraph backend.
- 🧠 Powered by a LangGraph agent for advanced research and conversational AI.
- 🔍 Dynamic search query generation using Google Gemini models.
- 🌐 Integrated web research via Google Search API.
- 🤔 Reflective reasoning to identify knowledge gaps and refine searches.
- 📄 Generates answers with citations from gathered sources.
- 🔄 Hot-reloading for both frontend and backend development during development.

## Project Structure

The project is divided into two main directories:

-   `frontend/`: Contains the React application built with Vite.
-   `backend/`: Contains the LangGraph/FastAPI application, including the research agent logic.

## Getting Started: Development and Local Testing

Follow these steps to get the application running locally for development and testing.

**1. Prerequisites:**

-   Node.js and npm (or yarn/pnpm)
-   Python 3.8+
-   **`GEMINI_API_KEY`**: The backend agent requires a Google Gemini API key.
    1.  Navigate to the `backend/` directory.
    2.  Create a file named `.env` by copying the `backend/.env.example` file.
    3.  Open the `.env` file and add your Gemini API key: `GEMINI_API_KEY="YOUR_ACTUAL_API_KEY"`

**2. Install Dependencies:**

**Backend:**

```bash
cd backend
pip install .
```

**Frontend:**

```bash
cd frontend
npm install
```

**3. Run Development Servers:**

**Backend & Frontend:**

```bash
make dev
```
This will run the backend and frontend development servers.    Open your browser and navigate to the frontend development server URL (e.g., `http://localhost:5173/app`).

_Alternatively, you can run the backend and frontend development servers separately. For the backend, open a terminal in the `backend/` directory and run `langgraph dev`. The backend API will be available at `http://127.0.0.1:2024`. It will also open a browser window to the LangGraph UI. For the frontend, open a terminal in the `frontend/` directory and run `npm run dev`. The frontend will be available at `http://localhost:5173`._

## How the Backend Agent Works (High-Level)

The core of the backend is a LangGraph agent defined in `backend/src/agent/graph.py`. It follows these steps:

![Agent Flow](./agent.png)

1.  **Generate Initial Queries:** Based on your input, it generates a set of initial search queries using a Gemini model.
2.  **Web Research:** For each query, it uses the Gemini model with the Google Search API to find relevant web pages.
3.  **Reflection & Knowledge Gap Analysis:** The agent analyzes the search results to determine if the information is sufficient or if there are knowledge gaps. It uses a Gemini model for this reflection process.
4.  **Iterative Refinement:** If gaps are found or the information is insufficient, it generates follow-up queries and repeats the web research and reflection steps (up to a configured maximum number of loops).
5.  **Finalize Answer:** Once the research is deemed sufficient, the agent synthesizes the gathered information into a coherent answer, including citations from the web sources, using a Gemini model.

## Deployment

In production, the backend server serves the optimized static frontend build. LangGraph requires a Redis instance and a Postgres database. Redis is used as a pub-sub broker to enable streaming real time output from background runs. Postgres is used to store assistants, threads, runs, persist thread state and long term memory, and to manage the state of the background task queue with 'exactly once' semantics. For more details on how to deploy the backend server, take a look at the [LangGraph Documentation](https://langchain-ai.github.io/langgraph/concepts/deployment_options/). Below is an example of how to build a Docker image that includes the optimized frontend build and the backend server and run it via `docker-compose`.

_Note: For the docker-compose.yml example you need a LangSmith API key, you can get one from [LangSmith](https://smith.langchain.com/settings)._

_Note: If you are not running the docker-compose.yml example or exposing the backend server to the public internet, you update the `apiUrl` in the `frontend/src/App.tsx` file your host. Currently the `apiUrl` is set to `http://localhost:8123` for docker-compose or `http://localhost:2024` for development._

**1. Build the Docker Image:**

   Run the following command from the **project root directory**:
   ```bash
   docker build -t gemini-fullstack-langgraph -f Dockerfile .
   ```
**2. Run the Production Server:**

   ```bash
   GEMINI_API_KEY=<your_gemini_api_key> LANGSMITH_API_KEY=<your_langsmith_api_key> docker-compose up
   ```

Open your browser and navigate to `http://localhost:8123/app/` to see the application. The API will be available at `http://localhost:8123`.

## Technologies Used

- [React](https://reactjs.org/) (with [Vite](https://vitejs.dev/)) - For the frontend user interface.
- [Tailwind CSS](https://tailwindcss.com/) - For styling.
- [Shadcn UI](https://ui.shadcn.com/) - For components.
- [LangGraph](https://github.com/langchain-ai/langgraph) - For building the backend research agent.
- [Google Gemini](https://ai.google.dev/models/gemini) - LLM for query generation, reflection, and answer synthesis.

## License

This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details. 

================================================
FILE: backend/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
uv.lock

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
.pdm.toml
.pdm-python
.pdm-build/

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
ai_test/

================================================
FILE: backend/LICENSE
================================================
MIT License

Copyright (c) 2025 Philipp Schmid

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: backend/Makefile
================================================
.PHONY: all format lint test tests test_watch integration_tests docker_tests help extended_tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/

test:
	uv run --with-editable . pytest $(TEST_FILE)

test_watch:
	uv run --with-editable . ptw --snapshot-update --now . -- -vv tests/unit_tests

test_profile:
	uv run --with-editable . pytest -vv tests/unit_tests/ --profile-svg

extended_tests:
	uv run --with-editable . pytest --only-extended $(TEST_FILE)


######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=src/
MYPY_CACHE=.mypy_cache
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d main | grep -E '\.py$$|\.ipynb$$')
lint_package: PYTHON_FILES=src
lint_tests: PYTHON_FILES=tests
lint_tests: MYPY_CACHE=.mypy_cache_test

lint lint_diff lint_package lint_tests:
	uv run ruff check .
	[ "$(PYTHON_FILES)" = "" ] || uv run ruff format $(PYTHON_FILES) --diff
	[ "$(PYTHON_FILES)" = "" ] || uv run ruff check --select I $(PYTHON_FILES)
	[ "$(PYTHON_FILES)" = "" ] || uv run mypy --strict $(PYTHON_FILES)
	[ "$(PYTHON_FILES)" = "" ] || mkdir -p $(MYPY_CACHE) && uv run mypy --strict $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)

format format_diff:
	uv run ruff format $(PYTHON_FILES)
	uv run ruff check --select I --fix $(PYTHON_FILES)

spell_check:
	codespell --toml pyproject.toml

spell_fix:
	codespell --toml pyproject.toml -w

######################
# HELP
######################

help:
	@echo '----'
	@echo 'format                       - run code formatters'
	@echo 'lint                         - run linters'
	@echo 'test                         - run unit tests'
	@echo 'tests                        - run unit tests'
	@echo 'test TEST_FILE=<test_file>   - run all tests in file'
	@echo 'test_watch                   - run unit tests in watch mode'



================================================
FILE: backend/langgraph.json
================================================
{
  "dependencies": ["."],
  "graphs": {
    "agent": "./src/agent/graph.py:graph"
  },
  "http": {
    "app": "./src/agent/app.py:app"
  },
  "env": ".env"
}


================================================
FILE: backend/pyproject.toml
================================================
[project]
name = "agent"
version = "0.0.1"
description = "Backend for the LangGraph agent"
authors = [
    { name = "Philipp Schmid", email = "schmidphilipp1995@gmail.com" },
]
readme = "README.md"
license = { text = "MIT" }
requires-python = ">=3.11,<4.0"
dependencies = [
    "langgraph>=0.2.6",
    "langchain>=0.3.19",
    "langchain-google-genai",
    "python-dotenv>=1.0.1",
    "langgraph-sdk>=0.1.57",
    "langgraph-cli",
    "langgraph-api",
    "fastapi",
    "google-genai",
    "tiktoken>=0.8.0",
    "firecrawl-py>=2.7.0",
]


[project.optional-dependencies]
dev = ["mypy>=1.11.1", "ruff>=0.6.1"]

[build-system]
requires = ["setuptools>=73.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[tool.ruff]
lint.select = [
    "E",    # pycodestyle
    "F",    # pyflakes
    "I",    # isort
    "D",    # pydocstyle
    "D401", # First line should be in imperative mood
    "T201",
    "UP",
]
lint.ignore = [
    "UP006",
    "UP007",
    # We actually do want to import from typing_extensions
    "UP035",
    # Relax the convention by _not_ requiring documentation for every function parameter.
    "D417",
    "E501",
]
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["D", "UP"]
[tool.ruff.lint.pydocstyle]
convention = "google"

[dependency-groups]
dev = [
    "langgraph-cli[inmem]>=0.1.71",
    "pytest>=8.3.5",
]


================================================
FILE: backend/src/agent/__init__.py
================================================
from agent.graph import graph

__all__ = ["graph"]


================================================
FILE: backend/src/agent/app.py
================================================
# mypy: disable - error - code = "no-untyped-def,misc"
import pathlib
from fastapi import FastAPI, Request, Response
from fastapi.staticfiles import StaticFiles
import fastapi.exceptions

# Define the FastAPI app
app = FastAPI()


def create_frontend_router(build_dir="../frontend/dist"):
    """Creates a router to serve the React frontend.

    Args:
        build_dir: Path to the React build directory relative to this file.

    Returns:
        A Starlette application serving the frontend.
    """
    build_path = pathlib.Path(__file__).parent.parent.parent / build_dir
    static_files_path = build_path / "assets"  # Vite uses 'assets' subdir

    if not build_path.is_dir() or not (build_path / "index.html").is_file():
        print(
            f"WARN: Frontend build directory not found or incomplete at {build_path}. Serving frontend will likely fail."
        )
        # Return a dummy router if build isn't ready
        from starlette.routing import Route

        async def dummy_frontend(request):
            return Response(
                "Frontend not built. Run 'npm run build' in the frontend directory.",
                media_type="text/plain",
                status_code=503,
            )

        return Route("/{path:path}", endpoint=dummy_frontend)

    build_dir = pathlib.Path(build_dir)

    react = FastAPI(openapi_url="")
    react.mount(
        "/assets", StaticFiles(directory=static_files_path), name="static_assets"
    )

    @react.get("/{path:path}")
    async def handle_catch_all(request: Request, path: str):
        fp = build_path / path
        if not fp.exists() or not fp.is_file():
            fp = build_path / "index.html"
        return fastapi.responses.FileResponse(fp)

    return react


# Mount the frontend under /app to not conflict with the LangGraph API routes
app.mount(
    "/app",
    create_frontend_router(),
    name="frontend",
)


================================================
FILE: backend/src/agent/configuration.py
================================================
import os
from pydantic import BaseModel, Field
from typing import Any, Optional

from langchain_core.runnables import RunnableConfig


class Configuration(BaseModel):
    """The configuration for the agent."""

    query_generator_model: str = Field(
        default="gemini-2.5-flash-preview-04-17",
        metadata={
            "description": "The name of the language model to use for the agent's query generation."
        },
    )

    reflection_model: str = Field(
        default="gemini-2.5-flash-preview-04-17",
        metadata={
            "description": "The name of the language model to use for the agent's reflection."
        },
    )

    answer_model: str = Field(
        default="gemini-2.5-flash-preview-04-17",
        metadata={
            "description": "The name of the language model to use for the agent's answer."
        },
    )

    number_of_initial_queries: int = Field(
        default=6,
        metadata={"description": "The number of initial search queries to generate."},
    )

    max_research_loops: int = Field(
        default=8,
        metadata={"description": "The maximum number of research loops to perform."},
    )

    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> "Configuration":
        """Create a Configuration instance from a RunnableConfig."""
        configurable = (
            config["configurable"] if config and "configurable" in config else {}
        )

        # Get raw values from environment or config
        raw_values: dict[str, Any] = {
            name: os.environ.get(name.upper(), configurable.get(name))
            for name in cls.model_fields.keys()
        }

        # Filter out None values
        values = {k: v for k, v in raw_values.items() if v is not None}

        return cls(**values)


================================================
FILE: backend/src/agent/content_enhancement_decision.py
================================================
"""
智能内容增强决策模块 - 决定何时使用Firecrawl进行深度内容抓取
"""

import os
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.runnables import RunnableConfig
from firecrawl import FirecrawlApp

@dataclass
class EnhancementDecision:
    """内容增强决策结果"""
    needs_enhancement: bool
    priority_urls: List[Dict[str, Any]]
    reasoning: str
    confidence_score: float  # 0-1
    enhancement_type: str  # "none", "selective", "comprehensive"


class ContentEnhancementDecisionMaker:
    """智能内容增强决策器 - 类似reflection机制"""
    
    def __init__(self):
        self.firecrawl_app = None
        if os.getenv("FIRECRAWL_API_KEY"):
            self.firecrawl_app = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))
    
    def analyze_enhancement_need(
        self, 
        research_topic: str,
        current_findings: List[str],
        grounding_sources: List[Dict[str, Any]],
        config: RunnableConfig
    ) -> EnhancementDecision:
        """
        智能分析是否需要内容增强 - 使用LLM做判断
        
        类似reflection机制，让LLM分析当前研究质量并决定是否需要深度抓取
        """
        
        # 构建分析提示词
        analysis_prompt = self._build_analysis_prompt(
            research_topic, current_findings, grounding_sources
        )
        
        # 使用LLM进行智能判断
        from agent.configuration import Configuration
        configurable = Configuration.from_runnable_config(config)
        
        llm = ChatGoogleGenerativeAI(
            model=configurable.reflection_model,  # 使用和reflection相同的模型
            temperature=0.3,  # 低温度确保一致性
            max_retries=2,
            api_key=os.getenv("GEMINI_API_KEY"),
        )
        
        response = llm.invoke(analysis_prompt)
        decision_text = response.content if hasattr(response, 'content') else str(response)
        
        # 解析LLM的决策
        return self._parse_llm_decision(decision_text, grounding_sources)
    
    def _build_analysis_prompt(
        self, 
        research_topic: str, 
        current_findings: List[str], 
        grounding_sources: List[Dict[str, Any]]
    ) -> str:
        """构建分析提示词"""
        
        findings_summary = "\n---\n".join(current_findings[-3:])  # 最近3个结果
        
        sources_list = "\n".join([
            f"- {source.get('title', 'N/A')}: {source.get('url', 'N/A')}"
            for source in grounding_sources[:5]  # 前5个源
        ])
        
        return f"""你是一个研究质量评估专家。请分析当前的研究结果质量，并决定是否需要深度内容增强。

研究主题: {research_topic}

当前研究发现:
{findings_summary}

可用的信息源:
{sources_list}

请根据以下标准进行评估:

1. **内容深度不足的信号**:
   - 缺乏具体数据、统计信息、案例研究
   - 描述过于泛泛，缺乏技术细节
   - 没有提及重要的公司、项目或实施案例
   - 信息源质量不高（非权威网站）

2. **需要深度抓取的情况**:
   - 研究主题需要详细的技术说明
   - 当前结果缺乏关键数据支撑
   - 存在权威信息源但内容被截断
   - 需要获取完整的报告或研究内容

3. **评估当前信息源的价值**:
   - 官方网站/文档: 高价值
   - 学术论文/研究报告: 高价值  
   - 维基百科/百科类: 中等价值
   - 新闻报道: 根据详细程度判断
   - 博客/论坛: 低价值

请按以下格式回答:

**决策**: [ENHANCE/NO_ENHANCE]
**置信度**: [0.1-1.0]
**增强类型**: [selective/comprehensive/none]
**推荐URL数量**: [0-3]
**推理过程**: 
[详细说明你的判断理由，包括当前内容的不足之处和预期的改进效果]

**优先URLs** (如果需要增强):
[从信息源中选择最值得深度抓取的URL，按优先级排序]
"""

    def _parse_llm_decision(
        self, 
        decision_text: str, 
        grounding_sources: List[Dict[str, Any]]
    ) -> EnhancementDecision:
        """解析LLM的决策结果"""
        
        decision_text = decision_text.lower()
        
        # 解析基本决策
        needs_enhancement = "enhance" in decision_text and "no_enhance" not in decision_text
        
        # 解析置信度
        confidence_score = 0.5  # 默认值
        import re
        confidence_match = re.search(r'置信度.*?([0-9]\.[0-9])', decision_text)
        if confidence_match:
            try:
                confidence_score = float(confidence_match.group(1))
            except:
                pass
        
        # 解析增强类型
        enhancement_type = "none"
        if "selective" in decision_text:
            enhancement_type = "selective"
        elif "comprehensive" in decision_text:
            enhancement_type = "comprehensive"
        elif needs_enhancement:
            enhancement_type = "selective"  # 默认选择性增强
        
        # 选择优先URL（简化版本，可以后续改进为LLM选择）
        priority_urls = []
        if needs_enhancement and grounding_sources:
            # 简单的优先级算法
            scored_sources = []
            for source in grounding_sources:
                score = self._calculate_url_priority(source)
                scored_sources.append((source, score))
            
            # 按评分排序，选择前2-3个
            scored_sources.sort(key=lambda x: x[1], reverse=True)
            max_urls = 3 if enhancement_type == "comprehensive" else 2
            
            priority_urls = [
                {
                    "title": source.get("title", ""),
                    "url": source.get("url", ""),
                    "priority_score": score,
                    "reasoning": f"评分: {score:.2f}"
                }
                for source, score in scored_sources[:max_urls]
                if score > 0.3  # 只选择评分较高的
            ]
        
        return EnhancementDecision(
            needs_enhancement=needs_enhancement,
            priority_urls=priority_urls,
            reasoning=decision_text,
            confidence_score=confidence_score,
            enhancement_type=enhancement_type
        )
    
    def _calculate_url_priority(self, source: Dict[str, Any]) -> float:
        """计算URL的优先级评分"""
        score = 0.0
        
        url = source.get("url", "").lower()
        title = source.get("title", "").lower()
        
        # 官方网站和文档
        if any(domain in url for domain in [".gov", ".edu", ".org"]):
            score += 0.4
        
        # 知名平台
        if any(platform in url for platform in ["wikipedia", "arxiv", "ieee", "acm"]):
            score += 0.3
        
        # 技术内容指标
        if any(keyword in title for keyword in ["report", "study", "research", "analysis", "technical"]):
            score += 0.2
        
        # 公司官网
        if any(company in url for company in ["google", "microsoft", "amazon", "tesla", "nvidia"]):
            score += 0.2
        
        # 基础评分
        score += 0.1
        
        return min(score, 1.0)
    
    async def enhance_content_with_firecrawl(
        self, 
        priority_urls: List[Dict[str, Any]]
    ) -> List[Dict[str, Any]]:
        """使用Firecrawl增强内容"""
        
        if not self.firecrawl_app:
            return []
        
        enhanced_results = []
        
        for url_info in priority_urls:
            url = url_info.get("url")
            if not url:
                continue
            
            try:
                print(f"🔥 Firecrawl增强: {url_info.get('title', 'Unknown')}")
                
                result = self.firecrawl_app.scrape_url(url)
                
                if result and result.success:
                    markdown_content = result.markdown or ''
                    
                    enhanced_results.append({
                        "url": url,
                        "title": url_info.get("title", ""),
                        "original_priority": url_info.get("priority_score", 0),
                        "enhanced_content": markdown_content,
                        "content_length": len(markdown_content),
                        "enhancement_quality": self._assess_enhancement_quality(markdown_content),
                        "source_type": "firecrawl_enhanced"
                    })
                    
                    print(f"  ✅ 增强成功: {len(markdown_content)} 字符")
                else:
                    print(f"  ❌ 增强失败: {result.error if hasattr(result, 'error') else '未知错误'}")
                    
            except Exception as e:
                print(f"  ❌ 增强异常: {str(e)}")
                continue
        
        return enhanced_results
    
    def _assess_enhancement_quality(self, content: str) -> str:
        """评估增强内容的质量"""
        if not content:
            return "poor"
        
        length = len(content)
        has_data = any(char.isdigit() for char in content)
        has_structure = any(marker in content for marker in ['#', '##', '###'])
        
        if length > 5000 and has_data and has_structure:
            return "excellent"
        elif length > 1000 and (has_data or has_structure):
            return "good"
        elif length > 300:
            return "fair"
        else:
            return "poor"


# 延迟初始化函数，避免循环导入
def get_content_enhancement_decision_maker():
    """获取内容增强决策器实例（延迟初始化）"""
    if not hasattr(get_content_enhancement_decision_maker, '_instance'):
        get_content_enhancement_decision_maker._instance = ContentEnhancementDecisionMaker()
    return get_content_enhancement_decision_maker._instance

# 为了向后兼容，保留原有的全局变量名
content_enhancement_decision_maker = None  # 将在首次使用时初始化 

================================================
FILE: backend/src/agent/enhanced_graph_nodes.py
================================================
"""
增强的Graph节点 - 集成智能Firecrawl内容增强功能
"""

import os
import json
from typing import List, Dict, Any
from datetime import datetime
from langchain_core.runnables import RunnableConfig
from langchain_core.messages import AIMessage

from agent.state import OverallState, ReflectionState
from agent.content_enhancement_decision import (
    get_content_enhancement_decision_maker,
    EnhancementDecision
)
from agent.utils import get_research_topic


def content_enhancement_analysis(state: OverallState, config: RunnableConfig) -> dict:
    """
    智能内容增强分析节点 - 决定是否需要使用Firecrawl进行深度抓取
    
    这个节点会：
    1. 分析当前研究结果的质量
    2. 评估是否需要深度内容增强
    3. 选择优先的URL进行Firecrawl抓取
    4. 执行内容增强（如果需要）
    5. 将增强的内容合并到研究结果中
    """
    
    try:
        # 获取当前研究上下文
        plan = state.get("plan", [])
        current_pointer = state.get("current_task_pointer", 0)
        
        # 确定研究主题
        if plan and current_pointer < len(plan):
            research_topic = plan[current_pointer]["description"]
        else:
            research_topic = state.get("user_query") or get_research_topic(state["messages"])
        
        # 获取当前研究发现
        current_findings = state.get("web_research_result", [])
        
        # 获取grounding sources（从最近的搜索结果中提取）
        grounding_sources = []
        sources_gathered = state.get("sources_gathered", [])
        for source in sources_gathered[-10:]:  # 最近的10个源
            if isinstance(source, dict):
                grounding_sources.append({
                    "title": source.get("title", ""),
                    "url": source.get("url", ""),
                    "snippet": source.get("snippet", "")
                })
        
        print(f"🤔 分析内容增强需求...")
        print(f"  研究主题: {research_topic}")
        print(f"  当前发现数量: {len(current_findings)}")
        print(f"  可用信息源: {len(grounding_sources)}")
        
        # 使用智能决策器进行分析
        decision = get_content_enhancement_decision_maker().analyze_enhancement_need(
            research_topic=research_topic,
            current_findings=current_findings,
            grounding_sources=grounding_sources,
            config=config
        )
        
        print(f"📊 增强决策结果:")
        print(f"  需要增强: {decision.needs_enhancement}")
        print(f"  置信度: {decision.confidence_score:.2f}")
        print(f"  增强类型: {decision.enhancement_type}")
        print(f"  优先URL数量: {len(decision.priority_urls)}")
        
        # 保存决策到状态
        state_update = {
            "enhancement_decision": {
                "needs_enhancement": decision.needs_enhancement,
                "confidence_score": decision.confidence_score,
                "enhancement_type": decision.enhancement_type,
                "reasoning": decision.reasoning,
                "priority_urls": decision.priority_urls
            }
        }
        
        # 如果不需要增强，直接返回
        if not decision.needs_enhancement:
            print("✅ 当前内容质量充足，无需增强")
            state_update["enhancement_status"] = "skipped"
            return state_update
        
        # 如果没有Firecrawl API Key，跳过增强
        if not get_content_enhancement_decision_maker().firecrawl_app:
            print("⚠️ 缺少FIRECRAWL_API_KEY，跳过内容增强")
            state_update["enhancement_status"] = "skipped_no_api"
            return state_update
        
        # 执行内容增强
        print(f"🔥 执行Firecrawl内容增强...")
        enhanced_results = []
        
        # 同步调用（暂时简化，后续可改为异步）
        for url_info in decision.priority_urls:
            url = url_info.get("url")
            if not url:
                continue
            
            try:
                print(f"  正在抓取: {url_info.get('title', 'Unknown')}")
                
                result = get_content_enhancement_decision_maker().firecrawl_app.scrape_url(url)
                
                if result and result.success:
                    markdown_content = result.markdown or ''
                    
                    enhanced_results.append({
                        "url": url,
                        "title": url_info.get("title", ""),
                        "original_priority": url_info.get("priority_score", 0),
                        "enhanced_content": markdown_content,
                        "content_length": len(markdown_content),
                        "source_type": "firecrawl_enhanced",
                        "timestamp": datetime.now().isoformat()
                    })
                    
                    print(f"    ✅ 成功: {len(markdown_content)} 字符")
                else:
                    print(f"    ❌ 失败: {result.error if hasattr(result, 'error') else '未知错误'}")
                    
            except Exception as e:
                print(f"    ❌ 异常: {str(e)}")
                continue
        
        if enhanced_results:
            # 将增强内容添加到研究结果中
            enhanced_contents = []
            for result in enhanced_results:
                # 格式化增强内容
                formatted_content = f"""

## 深度内容增强 - {result['title']}

来源: {result['url']}
内容长度: {result['content_length']} 字符

{result['enhanced_content'][:3000]}{'...' if len(result['enhanced_content']) > 3000 else ''}

---
"""
                enhanced_contents.append(formatted_content)
            
            state_update.update({
                "enhanced_content_results": enhanced_results,
                "web_research_result": enhanced_contents,  # 添加到研究结果中
                "enhancement_status": "completed",
                "enhanced_sources_count": len(enhanced_results)
            })
            
            print(f"✅ 内容增强完成: {len(enhanced_results)} 个源")
        else:
            print("❌ 内容增强失败，没有成功抓取任何内容")
            state_update["enhancement_status"] = "failed"
        
        return state_update
        
    except Exception as e:
        error_message = f"内容增强分析节点异常: {str(e)}"
        print(f"❌ {error_message}")
        return {
            "enhancement_status": "error",
            "enhancement_error": error_message
        }


def should_enhance_content(state: OverallState) -> str:
    """
    条件边函数 - 决定是否进入内容增强流程
    
    基于以下条件判断:
    1. 是否配置了Firecrawl API Key
    2. 当前研究循环次数
    3. 用户配置的增强偏好
    """
    
    # 检查Firecrawl可用性
    if not os.getenv("FIRECRAWL_API_KEY"):
        print("⚠️ 跳过内容增强: 未配置FIRECRAWL_API_KEY")
        return "continue_without_enhancement"
    
    # 检查研究循环次数（避免在早期循环中增强）
    research_loop_count = state.get("research_loop_count", 0)
    if research_loop_count < 1:  # 至少进行一轮研究后再考虑增强
        print(f"⚠️ 跳过内容增强: 研究循环次数不足 ({research_loop_count})")
        return "continue_without_enhancement"
    
    # 检查是否已经进行过增强（避免重复增强）
    if state.get("enhancement_status") in ["completed", "skipped"]:
        print("⚠️ 跳过内容增强: 已经完成增强")
        return "continue_without_enhancement"
    
    # 检查当前发现数量（至少要有一些基础内容）
    current_findings = state.get("web_research_result", [])
    if len(current_findings) < 1:
        print("⚠️ 跳过内容增强: 缺少基础研究内容")
        return "continue_without_enhancement"
    
    print("✅ 满足增强条件，进入内容增强分析")
    return "analyze_enhancement_need"


def enhanced_reflection(state: OverallState, config: RunnableConfig) -> ReflectionState:
    """
    增强版反思节点 - 在原有reflection基础上考虑内容增强的结果
    """
    
    # 先调用原有的reflection逻辑
    from agent.graph import reflection
    reflection_result = reflection(state, config)
    
    # 如果进行了内容增强，调整reflection的判断
    enhancement_status = state.get("enhancement_status")
    enhanced_sources_count = state.get("enhanced_sources_count", 0)
    
    if enhancement_status == "completed" and enhanced_sources_count > 0:
        print(f"📈 内容增强完成，调整反思判断")
        print(f"  增强了 {enhanced_sources_count} 个信息源")
        
        # 如果成功增强了内容，更倾向于认为信息充足
        # 但仍然保留LLM的判断权重
        if not reflection_result["is_sufficient"]:
            # 给增强内容一定的"加分"
            enhancement_boost = min(enhanced_sources_count * 0.3, 0.8)
            print(f"  由于内容增强，提升充足性评估 (+{enhancement_boost:.1f})")
            
            # 如果增强效果很好，可能将"不充足"改为"充足"
            if enhancement_boost >= 0.6:
                print("  ✅ 基于内容增强结果，判定信息已充足")
                reflection_result["is_sufficient"] = True
                reflection_result["knowledge_gap"] = "内容已通过深度抓取得到充分补充"
    
    elif enhancement_status == "skipped":
        print("📝 内容增强被跳过，使用原始反思结果")
    
    elif enhancement_status == "failed":
        print("⚠️ 内容增强失败，可能需要更多研究循环")
    
    return reflection_result


# 辅助函数：格式化增强决策信息用于日志
def format_enhancement_decision_log(decision: EnhancementDecision) -> str:
    """格式化增强决策信息用于日志输出"""
    
    log_lines = [
        f"📊 内容增强决策报告:",
        f"  决策: {'需要增强' if decision.needs_enhancement else '无需增强'}",
        f"  置信度: {decision.confidence_score:.2f}",
        f"  增强类型: {decision.enhancement_type}",
        f"  优先URL数量: {len(decision.priority_urls)}"
    ]
    
    if decision.priority_urls:
        log_lines.append("  优先URLs:")
        for i, url_info in enumerate(decision.priority_urls, 1):
            log_lines.append(f"    {i}. {url_info.get('title', 'N/A')} (评分: {url_info.get('priority_score', 0):.2f})")
    
    log_lines.append(f"  推理: {decision.reasoning[:200]}...")
    
    return "\n".join(log_lines) 

================================================
FILE: backend/src/agent/graph.py
================================================
import os
import json
from typing import List
from datetime import datetime

from agent.tools_and_schemas import SearchQueryList, Reflection, ResearchPlan, LedgerEntry
from dotenv import load_dotenv
from langchain_core.messages import AIMessage
from langgraph.types import Send
from langgraph.graph import StateGraph
from langgraph.graph import START, END
from langchain_core.runnables import RunnableConfig
from google.genai import Client
import tiktoken  # 需确保环境已安装 tiktoken

from agent.state import (
    OverallState,
    QueryGenerationState,
    ReflectionState,
    WebSearchState,
)
from agent.configuration import Configuration
from agent.prompts import (
    get_current_date,
    query_writer_instructions,
    web_searcher_instructions,
    reflection_instructions,
    answer_instructions,
    planning_instructions,
    integrated_report_instructions,
)
from langchain_google_genai import ChatGoogleGenerativeAI
from agent.utils import (
    get_citations,
    get_research_topic,
    insert_citation_markers,
    resolve_urls,
)
# Import intelligent content enhancement modules
from agent.enhanced_graph_nodes import (
    content_enhancement_analysis,
    should_enhance_content
)

load_dotenv()

if os.getenv("GEMINI_API_KEY") is None:
    raise ValueError("GEMINI_API_KEY is not set")

# Used for Google Search API
genai_client = Client(api_key=os.getenv("GEMINI_API_KEY"))


# Nodes
def generate_query(state: OverallState, config: RunnableConfig) -> QueryGenerationState:
    """LangGraph node that generates search queries based on the current research task from the plan."""
    configurable = Configuration.from_runnable_config(config)

    # check for custom initial search query count
    if state.get("initial_search_query_count") is None:
        state["initial_search_query_count"] = configurable.number_of_initial_queries

    # init Gemini 2.0 Flash
    llm = ChatGoogleGenerativeAI(
        model=configurable.query_generator_model,
        temperature=1.0,
        max_retries=2,
        api_key=os.getenv("GEMINI_API_KEY"),
    )
    structured_llm = llm.with_structured_output(SearchQueryList)

    # New logic: prioritize generating queries based on current plan task
    plan = state.get("plan")
    pointer = state.get("current_task_pointer")
    if plan and pointer is not None and pointer < len(plan):
        research_topic = plan[pointer]["description"]
    else:
        # Fallback to user_query or messages
        research_topic = state.get("user_query") or get_research_topic(state["messages"])

    current_date = get_current_date()
    formatted_prompt = query_writer_instructions.format(
        current_date=current_date,
        research_topic=research_topic,
        number_queries=state["initial_search_query_count"],
    )
    result = structured_llm.invoke(formatted_prompt)
    
    return {
        "query_list": result.query,
        "plan": state.get("plan", []),
        "current_task_pointer": state.get("current_task_pointer", 0)
    }


def continue_to_web_research(state: QueryGenerationState):
    """LangGraph node that sends the search queries to the web research node.

    This is used to spawn n number of web research nodes, one for each search query.
    """
    # Get current task info
    plan = state.get("plan", [])
    current_pointer = state.get("current_task_pointer", 0)
    current_task_id = "unknown"
    
    if plan and current_pointer < len(plan):
        current_task_id = plan[current_pointer]["id"]
    
    return [
        Send("web_research", {
            "search_query": search_query, 
            "id": int(idx),
            "current_task_id": current_task_id
        })
        for idx, search_query in enumerate(state["query_list"])
    ]


def web_research(state: WebSearchState, config: RunnableConfig) -> OverallState:
    """LangGraph node that performs web research using the native Google Search API tool.

    Executes a web search using the native Google Search API tool in combination with Gemini 2.0 Flash.

    Args:
        state: Current graph state containing the search query and research loop count
        config: Configuration for the runnable, including search API settings

    Returns:
        Dictionary with state update, including sources_gathered, research_loop_count, and web_research_results
    """
    try:
        # Configure
        configurable = Configuration.from_runnable_config(config)
        formatted_prompt = web_searcher_instructions.format(
            current_date=get_current_date(),
            research_topic=state["search_query"],
        )

        # Uses the google genai client as the langchain client doesn't return grounding metadata
        response = genai_client.models.generate_content(
            model=configurable.query_generator_model,
            contents=formatted_prompt,
            config={
                "tools": [{"google_search": {}}],
                "temperature": 0,
            },
        )
        
        # Error handling for empty response
        if not response.candidates or not response.candidates[0].grounding_metadata:
            current_task_id = state.get("current_task_id", "unknown")
            error_content = f"No results found for query: {state['search_query']}"
            
            detailed_finding = {
                "task_id": current_task_id,
                "query_id": state["id"],
                "content": error_content,
                "source": None,
                "timestamp": datetime.now().isoformat()
            }
            
            task_specific_result = {
                "task_id": current_task_id,
                "content": error_content,
                "sources": [],
                "timestamp": datetime.now().isoformat()
            }
            
            return {
                "sources_gathered": [],
                "executed_search_queries": [state["search_query"]],
                "web_research_result": [error_content],
                "current_task_detailed_findings": [detailed_finding],
                "task_specific_results": [task_specific_result]
            }

        # resolve the urls to short urls for saving tokens and time
        resolved_urls = resolve_urls(
            response.candidates[0].grounding_metadata.grounding_chunks, state["id"]
        )
        
        # Gets the citations and adds them to the generated text
        citations = get_citations(response, resolved_urls)
        modified_text = insert_citation_markers(response.text, citations)
        sources_gathered = [item for citation in citations for item in citation["segments"]]

        # Create detailed findings entry with task ID
        current_task_id = state.get("current_task_id", "unknown")
        detailed_finding = {
            "task_id": current_task_id,
            "query_id": state["id"],
            "content": modified_text,
            "source": sources_gathered[0] if sources_gathered else None,
            "timestamp": datetime.now().isoformat()
        }

        # Add task-specific metadata to the research result
        task_specific_result = {
            "task_id": current_task_id,
            "content": modified_text,
            "sources": sources_gathered,
            "timestamp": datetime.now().isoformat()
        }

        return {
            "sources_gathered": sources_gathered,
            "executed_search_queries": [state["search_query"]],
            "web_research_result": [modified_text],
            "current_task_detailed_findings": [detailed_finding],
            "task_specific_results": [task_specific_result]
        }
    except Exception as e:
        # Error handling for API or processing errors
        current_task_id = state.get("current_task_id", "unknown")
        error_message = f"Error during web research: {str(e)}"
        
        detailed_finding = {
            "task_id": current_task_id,
            "query_id": state["id"],
            "content": error_message,
            "source": None,
            "timestamp": datetime.now().isoformat()
        }
        
        task_specific_result = {
            "task_id": current_task_id,
            "content": error_message,
            "sources": [],
            "timestamp": datetime.now().isoformat()
        }
        
        return {
            "sources_gathered": [],
            "executed_search_queries": [state["search_query"]],
            "web_research_result": [error_message],
            "current_task_detailed_findings": [detailed_finding],
            "task_specific_results": [task_specific_result]
        }


def reflection(state: OverallState, config: RunnableConfig) -> OverallState:
    """LangGraph node that identifies knowledge gaps and generates potential follow-up queries.

    This is where we check if our search results are sufficient to answer the research question.
    If not, we generate follow-up queries to address the knowledge gap.
    """
    try:
        configurable = Configuration.from_runnable_config(config)
        
        # Increment research loop counter
        state["research_loop_count"] = state.get("research_loop_count", 0) + 1
        
        reasoning_model = configurable.reasoning_model
        current_date = get_current_date()
        research_topic = get_research_topic(state["messages"])
        
        # Safely retrieve web research results and truncate overly long content
        web_research_results = state.get("web_research_result", [])
        
        # Content truncation: limit total characters to avoid API limits
        MAX_CHARS = 50000  # Approximately 12500 tokens
        truncated_results = []
        total_chars = 0
        
        for result in web_research_results:
            result_str = str(result)
            if total_chars + len(result_str) <= MAX_CHARS:
                truncated_results.append(result_str)
                total_chars += len(result_str)
            else:
                # Partially truncate the last result
                remaining_chars = MAX_CHARS - total_chars
                if remaining_chars > 500:  # Keep at least 500 characters
                    truncated_results.append(result_str[:remaining_chars] + "...[truncated]")
                break
        
        print(f"🔍 Reflection analysis: {len(web_research_results)} results, {len(truncated_results)} after truncation, {total_chars} characters")
        
        formatted_prompt = reflection_instructions.format(
            current_date=current_date,
            research_topic=research_topic,
            summaries="\n\n---\n\n".join(truncated_results),
        )
        
        # Check prompt length
        prompt_length = len(formatted_prompt)
        print(f"📏 Reflection prompt length: {prompt_length} characters")
        
        if prompt_length > 100000:  # If still too long, further truncate
            print("⚠️ Prompt too long, further truncating summaries section")
            truncated_summaries = "\n\n---\n\n".join(truncated_results[:3])  # Keep only first 3 results
            formatted_prompt = reflection_instructions.format(
                current_date=current_date,
                research_topic=research_topic,
                summaries=truncated_summaries,
            )
        
        # Initialize LLM
        llm = ChatGoogleGenerativeAI(
            model=reasoning_model,
            temperature=1.0,
            max_retries=3,  # Increase retry count
            api_key=os.getenv("GEMINI_API_KEY"),
        )
        
        # Try structured output
        try:
            print("🤖 Calling Gemini API for reflection analysis...")
            result = llm.with_structured_output(Reflection).invoke(formatted_prompt)
            print("✅ Reflection analysis completed successfully")
            
        except Exception as api_error:
            print(f"❌ Structured output failed: {str(api_error)}")
            print("🔄 Trying fallback approach...")
            
            # Fallback: use simple text generation instead of structured output
            simple_prompt = f"""Based on the research topic: {research_topic}
            
Research results summary: {len(truncated_results)} sources analyzed.

Please evaluate if this research is sufficient and respond in this exact JSON format:
{{
  "is_sufficient": true,
  "knowledge_gap": "Research appears comprehensive based on available sources",
  "follow_up_queries": []
}}

Important: Respond only with valid JSON."""
            
            try:
                fallback_response = llm.invoke(simple_prompt)
                import json
                # 尝试解析JSON响应
                response_text = fallback_response.content if hasattr(fallback_response, 'content') else str(fallback_response)
                # 提取JSON部分
                import re
                json_match = re.search(r'\{.*\}', response_text, re.DOTALL)
                if json_match:
                    result_dict = json.loads(json_match.group())
                    # 创建Reflection对象
                    result = Reflection(
                        is_sufficient=result_dict.get("is_sufficient", True),
                        knowledge_gap=result_dict.get("knowledge_gap", "Analysis completed with available data"),
                        follow_up_queries=result_dict.get("follow_up_queries", [])
                    )
                    print("✅ Fallback方案成功")
                else:
                    raise ValueError("无法解析JSON响应")
                    
            except Exception as fallback_error:
                print(f"❌ Fallback方案也失败: {str(fallback_error)}")
                print("🛡️ 使用默认reflection结果")
                
                # 最终fallback: 基于结果数量的简单判断
                has_sufficient_results = len(web_research_results) >= 3
                result = Reflection(
                    is_sufficient=has_sufficient_results,
                    knowledge_gap="Analysis completed with available research data" if has_sufficient_results else "Limited research data available",
                    follow_up_queries=[] if has_sufficient_results else [f"additional information about {research_topic}"]
                )
                print(f"🛡️ 默认判断: sufficient={has_sufficient_results}, 基于{len(web_research_results)}个搜索结果")

    except Exception as e:
        error_message = f"Reflection node encountered critical error: {str(e)}"
        print(f"💥 {error_message}")
        
        # Emergency fallback: always consider current results sufficient to avoid flow interruption
        result = Reflection(
            is_sufficient=True,
            knowledge_gap="Analysis completed despite technical difficulties",
            follow_up_queries=[]
        )
        print("🚨 Using emergency fallback, marking as sufficient to continue flow")

    # Return updated state with reflection results
    return {
        "research_loop_count": state["research_loop_count"],
        "reflection_is_sufficient": result.is_sufficient,  # 新增字段保存reflection结果
        "reflection_knowledge_gap": result.knowledge_gap,  # 新增字段保存知识差距
        "reflection_follow_up_queries": result.follow_up_queries,  # 新增字段保存follow-up查询
        "number_of_ran_queries": len(state.get("executed_search_queries", [])),
        "plan": state.get("plan", []),
        "current_task_pointer": state.get("current_task_pointer", 0)
    }


def evaluate_research_enhanced(state: OverallState, config: RunnableConfig) -> dict:
    """
    增强版研究评估节点 - 更新状态中的评估结果
    
    这个函数只负责状态更新，不负责路由决策
    """
    configurable = Configuration.from_runnable_config(config)
    
    # 获取reflection结果
    research_loop_count = state.get("research_loop_count", 0)
    max_research_loops = configurable.max_research_loops
    reflection_is_sufficient = state.get("reflection_is_sufficient", False)
    reflection_follow_up_queries = state.get("reflection_follow_up_queries", [])
    
    # 检查是否已经完成增强以及增强的效果
    enhancement_status = state.get("enhancement_status")
    enhanced_sources_count = state.get("enhanced_sources_count", 0)
    
    # 智能决策：考虑reflection结果和增强效果
    is_sufficient = reflection_is_sufficient
    
    # 如果reflection认为不充足，但我们成功进行了内容增强，可能需要重新评估
    if not is_sufficient and enhancement_status == "completed" and enhanced_sources_count > 0:
        print(f"📈 内容增强完成 ({enhanced_sources_count} 个源)，提升充足性评估")
        # 给增强内容一定的"加分"
        enhancement_boost = min(enhanced_sources_count * 0.3, 0.8)
        if enhancement_boost >= 0.6:
            print(f"  ✅ 基于内容增强结果，判定信息已充足")
            is_sufficient = True
    
    # 准备follow-up查询（如果需要继续研究）
    follow_up_queries = reflection_follow_up_queries or []
    if not follow_up_queries and not is_sufficient:
        # 如果没有follow-up查询但信息不充足，生成简单的查询
        plan = state.get("plan", [])
        current_pointer = state.get("current_task_pointer", 0)
        if plan and current_pointer < len(plan):
            task_description = plan[current_pointer]["description"]
            follow_up_queries = [f"more details about {task_description}"]
    
    # 记录评估结果到状态
    final_decision = is_sufficient or research_loop_count >= max_research_loops
    
    print(f"🏁 研究评估完成 - 充足性: {is_sufficient}, 循环次数: {research_loop_count}/{max_research_loops}")
    if enhancement_status == "completed":
        print(f"  🔥 本轮包含Firecrawl内容增强: {enhanced_sources_count} 个源")
    
    return {
        "evaluation_is_sufficient": is_sufficient,
        "evaluation_should_continue": not final_decision,
        "evaluation_follow_up_queries": follow_up_queries,
        "evaluation_research_complete": final_decision,
        "evaluation_enhancement_boost": enhanced_sources_count if enhancement_status == "completed" else 0
    }


def decide_next_research_step(state: OverallState):
    """
    条件边函数 - 决定研究是否完成还是继续
    可以返回字符串路由或Send对象列表
    """
    # 从状态中获取评估结果
    should_continue = state.get("evaluation_should_continue", False)
    research_complete = state.get("evaluation_research_complete", False)
    
    if research_complete or not should_continue:
        print("🏁 研究流程完成，记录任务结果")
        return "record_task_completion"
    else:
        print("🔄 继续研究，执行follow-up查询")
        # 生成follow-up查询的Send对象
        follow_up_queries = state.get("evaluation_follow_up_queries", [])
        
        if not follow_up_queries:
            print("⚠️ 没有follow-up查询，直接完成")
            return "record_task_completion"
        
        # Get current task info for follow-up research
        plan = state.get("plan", [])
        current_pointer = state.get("current_task_pointer", 0)
        current_task_id = "unknown"
        
        if plan and current_pointer < len(plan):
            current_task_id = plan[current_pointer]["id"]
        
        print(f"🔄 生成 {len(follow_up_queries)} 个follow-up查询")
        
        # 返回follow-up查询的Send列表
        from langgraph.types import Send
        return [
            Send(
                "web_research",
                {
                    "search_query": follow_up_query,
                    "id": state.get("number_of_ran_queries", 0) + int(idx),
                    "current_task_id": current_task_id
                },
            )
            for idx, follow_up_query in enumerate(follow_up_queries)
        ]


def finalize_answer(state: OverallState, config: RunnableConfig) -> dict:
    """
    Generate the final research report using holistic integration of all research findings.
    
    OPTIMIZATION STRATEGY:
    This function implements a comprehensive refactor from the previous task-segmented approach
    to a unified holistic integration strategy. Instead of concatenating individual task sections,
    it synthesizes all research data through a single LLM call for coherent narrative flow.
    
    KEY IMPROVEMENTS:
    1. Cross-task data aggregation: Combines findings from all research streams
    2. Thematic organization: Structures content by analytical themes, not task boundaries  
    3. Executive-grade synthesis: Generates consulting-quality integrated reports
    4. Narrative coherence: Maintains unified strategic perspective throughout
    
    INPUT SOURCES:
    - Task-specific research results from ledger
    - Detailed research content from task_specific_results
    - Source attribution from sources_gathered
    - Original user query and research plan context
    
    OUTPUT:
    Unified professional research report with integrated analysis across all investigation areas.
    """
    try:
        configurable = Configuration.from_runnable_config(config)
        llm = ChatGoogleGenerativeAI(
            model=configurable.reflection_model,
            temperature=0.3,
            max_retries=2,
            api_key=os.getenv("GEMINI_API_KEY"),
        )
        
        plan = state.get("plan", [])
        user_query = state.get("user_query", "Research Analysis")
        
        if not plan:
            return {
                "messages": [AIMessage(content="No research plan available to generate report")],
                "final_report_markdown": "No research plan available to generate report"
            }
        
        # Build comprehensive research dataset from all sources
        ledger = state.get("ledger", [])
        task_specific_results = state.get("task_specific_results", [])
        sources_gathered = state.get("sources_gathered", [])
        
        # Create research plan summary for context
        research_plan_summary = "\n".join([
            f"• {task['description']}" for task in plan
        ])
        
        # Aggregate all research findings with proper attribution
        comprehensive_research_data = []
        
        # Add comprehensive findings from ledger with all available detail
        for entry in ledger:
            detailed_snippets = entry.get('detailed_snippets', [])
            citations = entry.get('citations_for_snippets', [])
            
            # Build comprehensive task context with all available information
            task_context = f"""
RESEARCH FOCUS: {entry['description']}
KEY FINDINGS: {entry['findings_summary']}

DETAILED RESEARCH CONTENT:
{chr(10).join(detailed_snippets)}

SUPPORTING CITATIONS:
{chr(10).join([f"- {cite.get('snippet', '')[:200]}... [Source: {cite.get('source', 'Unknown')}]" for cite in citations[:5]])}
"""
            comprehensive_research_data.append(task_context)
        
        # Add task-specific detailed results with enhanced context
        for result in task_specific_results:
            sources_info = ""
            if result.get('sources'):
                sources_list = [f"- {source.get('title', 'Unknown')} ({source.get('url', 'N/A')})" 
                              for source in result.get('sources', [])[:3]]
                sources_info = f"\nSOURCES:\n{chr(10).join(sources_list)}"
            
            task_detail = f"""
RESEARCH STREAM: {result.get('task_id', 'Unknown')}
CONTENT: {result.get('content', '')}
TIMESTAMP: {result.get('timestamp', '')}{sources_info}
"""
            comprehensive_research_data.append(task_detail)
        
        # Build source mapping for citation conversion
        source_mapping = build_source_mapping(sources_gathered)
        
        # Combine all research data
        research_dataset = "\n" + "="*80 + "\n".join(comprehensive_research_data)
        
        # Convert citations to readable format
        research_dataset = convert_citations_to_readable(research_dataset, source_mapping)
        
        # Apply token limits to prevent API overload
        research_dataset_batches = split_by_tokens([research_dataset], max_tokens=120000)
        final_research_data = "\n\n".join(research_dataset_batches[0]) if research_dataset_batches else ""
        
        # REPORT-LEVEL ENHANCEMENT: Analyze if additional targeted content is needed
        try:
            from agent.report_level_enhancement import integrate_report_enhancement_into_finalize
            
            # Convert sources_gathered to the format expected by report enhancement
            available_sources = []
            for source in sources_gathered:
                if isinstance(source, dict):
                    available_sources.append({
                        'title': source.get('title', ''),
                        'url': source.get('url', ''),
                        'snippet': source.get('snippet', '')
                    })
            
            print(f"🎯 启动报告级别增强分析...")
            enhanced_research_data, enhancement_results = integrate_report_enhancement_into_finalize(
                user_query=user_query,
                research_plan=plan,
                aggregated_research_data=final_research_data,
                available_sources=available_sources,
                config=config
            )
            
            # Use enhanced data if available
            final_research_data = enhanced_research_data
            
            # Log enhancement results
            successful_enhancements = [r for r in enhancement_results if r.success]
            if successful_enhancements:
                print(f"✅ 报告级别增强成功: {len(successful_enhancements)} 个增强点")
                for result in successful_enhancements:
                    print(f"   - 质量: {result.enhancement_quality}, 源数量: {len(result.sources_used)}")
            else:
                print("ℹ️  报告级别增强: 未执行或无有效增强")
                
        except Exception as e:
            print(f"⚠️ 报告级别增强异常，继续使用原始数据: {str(e)}")
            # Continue with original data if enhancement fails
        
        # Generate integrated report using the enhanced holistic approach
        formatted_prompt = integrated_report_instructions.format(
            user_query=user_query,
            research_plan_summary=research_plan_summary,
            comprehensive_research_data=final_research_data
        )
        
        print(f"🔄 Generating integrated report for: {user_query}")
        print(f"📊 Research data length: {len(final_research_data)} characters")
        print(f"📋 Tasks integrated: {len(plan)} research streams")
        
        # Generate the final integrated report
        integrated_report = llm.invoke(formatted_prompt).content
        
        # Apply final quality improvements
        integrated_report = clean_generated_content(integrated_report)
        integrated_report = remove_prompt_remnants(integrated_report)
        integrated_report = final_quality_check(integrated_report)
        
        print(f"✅ Integrated report generated: {len(integrated_report)} characters")
        
        return {
            "messages": [AIMessage(content=integrated_report)],
            "final_report_markdown": integrated_report
        }
        
    except Exception as e:
        error_message = f"Error generating integrated report: {str(e)}"
        print(f"❌ {error_message}")
        return {
            "messages": [AIMessage(content=error_message)],
            "final_report_markdown": error_message
        }

def build_source_mapping(sources_gathered):
    """构建源文件映射，用于引用转换"""
    mapping = {}
    for i, source in enumerate(sources_gathered):
        # Extract domain from URL for readable citation
        original_url = source.get("value", "")
        domain = extract_domain(original_url)
        label = source.get("label", domain)
        
        # Create mapping for different citation formats
        short_url = source.get("short_url", "")
        if short_url:
            # Extract ID from short URL
            import re
            id_match = re.search(r'/id/([^/]+)', short_url)
            if id_match:
                citation_id = id_match.group(1)
                mapping[citation_id] = {
                    "label": label,
                    "domain": domain,
                    "value": original_url if original_url and not original_url.startswith('https://vertexaisearch') else ""
                }
        
        # Also try direct URL mapping if available
        if original_url and not original_url.startswith('https://vertexaisearch'):
            # Create a simple mapping using domain as key
            domain_key = domain.lower().replace(' ', '')
            mapping[domain_key] = {
                "label": label,
                "domain": domain,  
                "value": original_url
            }
    
    return mapping

def extract_domain(url):
    """从URL中提取域名"""
    import re
    if not url:
        return "Unknown"
    
    # Extract domain from URL
    domain_match = re.search(r'https?://(?:www\.)?([^/]+)', url)
    if domain_match:
        domain = domain_match.group(1)
        # Simplify common domains
        if "google.com" in domain:
            return "Google"
        elif "wikipedia" in domain:
            return "Wikipedia" 
        elif "youtube" in domain:
            return "YouTube"
        else:
            return domain.split('.')[0].title()
    return "Web Source"

def convert_citations_to_readable(content, source_mapping):
    """Convert raw citation markers to readable, verifiable citation formats with complete source information"""
    import re
    
    def replace_citation(match):
        citation_id = match.group(1)
        if citation_id in source_mapping:
            source_info = source_mapping[citation_id]
            # Create comprehensive citation with verifiable information
            domain = source_info.get('domain', 'Unknown Source')
            url = source_info.get('value', '')
            label = source_info.get('label', domain)
            
            # Format: [Source: Domain (URL)] for verifiability
            if url and url.startswith('http') and 'vertexaisearch.cloud.google.com' not in url:
                return f"[Source: {label} ({url})]"
            else:
                return f"[Source: {label}]"
        return f"[Source: {citation_id}]"  # Fallback with original ID
    
    # Convert Vertex AI citations with full source information
    content = re.sub(r'\[vertexaisearch\.cloud\.google\.com/id/([^\]]+)\]', 
                     replace_citation, content)
    
    # Convert other citation formats while preserving source identification
    content = re.sub(r'\[([a-z0-9\-]+)\]', replace_citation, content)
    
    # Clean up any remaining malformed citations
    content = clean_malformed_citations(content)
    
    return content

def clean_malformed_citations(content):
    """Clean up malformed citation formats in content"""
    import re
    
    # Fix mixed citation formats like [Source: domain](https://vertexaisearch...)
    content = re.sub(r'\[Source: ([^\]]+)\]\(https://vertexaisearch\.cloud\.google\.com[^)]*\)', 
                     r'[Source: \1]', content)
    
    # Remove any remaining vertexaisearch URLs that shouldn't be there
    content = re.sub(r'\(https://vertexaisearch\.cloud\.google\.com[^)]*\)', '', content)
    
    # Fix double closing brackets
    content = re.sub(r'\]\]', ']', content)
    
    return content

def clean_generated_content(content):
    """清理生成内容中的元文本和无关信息"""
    if not content:
        return content
    
    # Remove common meta-text at beginning
    meta_prefixes = [
        "here is", "this is", "based on", "according to", "好的", "根据",
        "以下是", "here's", "below is", "following is"
    ]
    
    lines = content.split('\n')
    cleaned_lines = []
    
    for line in lines:
        line = line.strip()
        if line:
            # Skip lines that start with meta-text
            line_lower = line.lower()
            is_meta = any(line_lower.startswith(prefix) for prefix in meta_prefixes)
            if not is_meta:
                cleaned_lines.append(line)
    
    return '\n'.join(cleaned_lines)

def remove_prompt_remnants(content):
    """移除内容中的Prompt残留"""
    import re
    
    # Remove instruction-like text
    content = re.sub(r'INSTRUCTIONS?:.*?(?=\n\n|\n[A-Z]|\Z)', '', content, flags=re.DOTALL | re.IGNORECASE)
    content = re.sub(r'REQUIREMENTS?:.*?(?=\n\n|\n[A-Z]|\Z)', '', content, flags=re.DOTALL | re.IGNORECASE)
    content = re.sub(r'IMPORTANT:.*?(?=\n\n|\n[A-Z]|\Z)', '', content, flags=re.DOTALL | re.IGNORECASE)
    
    # Remove standalone bullets or dashes
    content = re.sub(r'^\s*[-•]\s*$', '', content, flags=re.MULTILINE)
    
    # Remove multiple consecutive line breaks
    content = re.sub(r'\n{3,}', '\n\n', content)
    
    return content.strip()

def final_quality_check(content):
    """Final quality check and cleanup while preserving citation URLs and source information"""
    import re
    
    # Remove standalone URLs that are NOT part of citations
    # Use a different approach to preserve citation URLs
    lines = content.split('\n')
    cleaned_lines = []
    
    for line in lines:
        # Check if the line contains a citation with URL
        if '[Source:' in line and 'http' in line:
            # Keep lines with citations intact
            cleaned_lines.append(line)
        else:
            # Remove standalone URLs from lines without citations
            cleaned_line = re.sub(r'\bhttps?://[^\s\[\]]+', '', line)
            cleaned_lines.append(cleaned_line)
    
    content = '\n'.join(cleaned_lines)
    
    # Fix spacing issues
    content = re.sub(r'\n{3,}', '\n\n', content)
    content = re.sub(r'[ \t]+', ' ', content)
    
    # Remove standalone punctuation lines
    content = re.sub(r'^\s*[-.•]+\s*$', '', content, flags=re.MULTILINE)
    
    # Ensure proper spacing around headers
    content = re.sub(r'\n(#+[^\n]+)\n', r'\n\n\1\n\n', content)
    
    # Clean up extra spaces around citations
    content = re.sub(r'\s+(\[Source:[^\]]+\])', r' \1', content)
    
    # Final citation cleanup
    content = clean_malformed_citations(content)
    
    return content.strip()


def planner_node(state: OverallState, config: RunnableConfig) -> dict:
    """LangGraph node that generates a multi-step research plan based on the user's question."""
    configurable = Configuration.from_runnable_config(config)
    llm = ChatGoogleGenerativeAI(
        model=configurable.query_generator_model,
        temperature=0.7,
        max_retries=2,
        api_key=os.getenv("GEMINI_API_KEY"),
    )
    structured_llm = llm.with_structured_output(ResearchPlan)

    # Get user query, prioritize from user_query, fallback to messages
    user_query = state.get("user_query") or get_research_topic(state["messages"])
    
    # Use centrally managed planning prompt
    formatted_prompt = planning_instructions.format(user_query=user_query)
    
    try:
        result = structured_llm.invoke(formatted_prompt)
        # Convert ResearchPlan to expected format
        plan = [{"id": task.id, "description": task.description, "info_needed": True, "source_hint": task.description, "status": "pending"} for task in result.tasks]
        
        return {
            "user_query": user_query,
            "plan": plan,
            "current_task_pointer": 0
        }
    except Exception as e:
        print(f"Planning failed: {e}")
        # Provide default single-task plan as fallback
        return {
            "user_query": user_query,
            "plan": [{"id": "task-1", "description": f"Research and answer: {user_query}", "info_needed": True, "source_hint": user_query, "status": "pending"}],
            "current_task_pointer": 0
        }


def record_task_completion_node(state: OverallState, config: RunnableConfig) -> dict:
    """Record the findings for the current task and prepare for the next task."""
    try:
        # Get current task info
        plan = state.get("plan", [])
        current_pointer = state.get("current_task_pointer", 0)
        
        if not plan or current_pointer >= len(plan):
            return {
                "messages": [AIMessage(content="Error: Invalid task pointer or empty plan")],
                "next_node_decision": "end"
            }
            
        current_task = plan[current_pointer]
        current_task_id = current_task.get("id")
        
        # Get detailed findings for current task
        detailed_findings = state.get("current_task_detailed_findings", [])
        task_specific_findings = [
            finding["content"] for finding in detailed_findings 
            if finding.get("task_id") == current_task_id
        ]
        
        # If no task-specific findings found, try to get recent web results as fallback
        if not task_specific_findings:
            print(f"Warning: No task-specific findings found for task {current_task_id}, using recent web results as fallback")
            web_results = state.get("web_research_result", [])
            # Take the most recent results (assume they belong to current task)
            task_specific_findings = web_results[-3:] if len(web_results) > 3 else web_results
        
        # Generate task summary
        task_summary = _summarize_task_findings(
            current_task["description"],
            task_specific_findings,
            config
        )
        
        # Create citations from detailed findings
        citations_for_snippets = []
        for finding in detailed_findings:
            if finding.get("task_id") == current_task_id and finding.get("source"):
                citations_for_snippets.append({
                    "snippet": finding["content"],
                    "source": str(finding["source"])
                })
        
        # Create ledger entry with detailed findings
        ledger_entry = {
            "task_id": current_task_id,
            "description": current_task["description"],
            "findings_summary": task_summary,
            "detailed_snippets": task_specific_findings,
            "citations_for_snippets": citations_for_snippets
        }
        
        # Update plan status
        plan[current_pointer]["status"] = "completed"
        
        # Clear current task findings to prepare for next task
        return {
            "ledger": [ledger_entry],
            "global_summary_memory": [task_summary],
            "plan": plan,
            "current_task_pointer": current_pointer + 1,
            "current_task_detailed_findings": [],  # Clear for next task
            "next_node_decision": "continue" if current_pointer + 1 < len(plan) else "end"
        }
    except Exception as e:
        error_message = f"Error in record_task_completion_node: {str(e)}"
        print(error_message)
        return {
            "messages": [AIMessage(content=error_message)],
            "next_node_decision": "end"
        }


def _summarize_task_findings(task_description: str, web_results: List[str], config: RunnableConfig) -> str:
    """Helper function to summarize web research results for a specific task."""
    if not web_results:
        return f"No specific findings available for task: {task_description}"
    
    # Use recent results (last 3 entries) to avoid overwhelming context
    recent_results = web_results[-3:] if len(web_results) > 3 else web_results
    context_to_summarize = "\n---\n".join(recent_results)
    
    configurable = Configuration.from_runnable_config(config)
    llm = ChatGoogleGenerativeAI(
        model=configurable.reflection_model,
        temperature=0.3,
        max_retries=2,
        api_key=os.getenv("GEMINI_API_KEY"),
    )
    
    prompt = f"""Given the research task: "{task_description}"

And the following research findings:
{context_to_summarize}

Please provide a concise summary (1-2 sentences) of the key findings that directly address this specific task.

Task Summary:"""
    
    try:
        response = llm.invoke(prompt)
        return response.content if hasattr(response, 'content') else str(response)
    except Exception as e:
        print(f"Task summarization failed: {e}")
        return f"Completed research for: {task_description}"


def decide_next_step_in_plan(state: OverallState) -> str:
    """Conditional edge function that determines whether to continue with next task or finalize."""
    current_pointer = state.get("current_task_pointer", 0)
    plan = state.get("plan", [])
    
    if current_pointer < len(plan):
        print(f"--- Moving to next task (pointer: {current_pointer}) ---")
        return "generate_query"
    else:
        print("--- All tasks completed. Finalizing answer ---")
        return "finalize_answer"


# Create our Agent Graph
builder = StateGraph(OverallState, config_schema=Configuration)

# Define the nodes we will cycle between
builder.add_node("planner", planner_node)
builder.add_node("generate_query", generate_query)
builder.add_node("web_research", web_research)
builder.add_node("reflection", reflection)
builder.add_node("content_enhancement", content_enhancement_analysis)  # Enhanced content analysis node
builder.add_node("evaluate_research_enhanced", evaluate_research_enhanced)  # Enhanced research evaluation node
builder.add_node("record_task_completion", record_task_completion_node)  # Task completion recording node
builder.add_node("finalize_answer", finalize_answer)

# Set the entrypoint as `planner`
builder.add_edge(START, "planner")
builder.add_edge("planner", "generate_query")

# Add conditional edge to continue with search queries in a parallel branch
builder.add_conditional_edges(
    "generate_query", continue_to_web_research, ["web_research"]
)

# Reflect on the web research
builder.add_edge("web_research", "reflection")

# Modified routing logic after reflection - added intelligent content enhancement decision
builder.add_conditional_edges(
    "reflection", 
    should_enhance_content, 
    {
        "analyze_enhancement_need": "content_enhancement",
        "continue_without_enhancement": "evaluate_research_enhanced"
    }
)

# Enter evaluation phase after content enhancement completion
builder.add_edge("content_enhancement", "evaluate_research_enhanced")

# Decide next step after evaluation completion - continue research or complete task
builder.add_conditional_edges(
    "evaluate_research_enhanced", 
    decide_next_research_step, 
    ["web_research", "record_task_completion"]  # Can route to these two targets
)

# 当decide_next_research_step返回"continue_research"时，使用follow-up查询
# 这将通过continue_research_with_followup函数生成新的web_research任务

# After recording task completion, decide next step in plan (multi-task loop)
builder.add_conditional_edges(
    "record_task_completion", 
    decide_next_step_in_plan, 
    ["generate_query", "finalize_answer"]
)

# Finalize the answer
builder.add_edge("finalize_answer", END)

graph = builder.compile(name="pro-search-agent")

def split_by_tokens(texts, max_tokens=150000, encoding_name="cl100k_base"):
    """智能分批处理文本，保留重要上下文和信息完整性"""
    try:
        encoding = tiktoken.get_encoding(encoding_name)
    except ImportError:
        # Fallback to simple character-based estimation
        return simple_split_by_chars(texts, max_tokens * 4)  # Rough estimation: 4 chars per token
    
    batches = []
    current_batch = []
    current_tokens = 0
    
    for text in texts:
        if not text:
            continue
            
        text_tokens = len(encoding.encode(str(text)))
        
        # If single text is too large, intelligently extract key sections
        if text_tokens > max_tokens * 0.8:
            text = extract_key_sections(text, max_tokens * 0.7, encoding)
            text_tokens = len(encoding.encode(str(text)))
        
        # Check if adding this text would exceed the limit
        if current_tokens + text_tokens > max_tokens and current_batch:
            # Finalize current batch
            batches.append(current_batch)
            current_batch = [text]
            current_tokens = text_tokens
        else:
            current_batch.append(text)
            current_tokens += text_tokens
    
    # Add the last batch if it has content
    if current_batch:
        batches.append(current_batch)
    
    return batches

def extract_key_sections(content, max_tokens, encoding):
    """从长内容中智能提取关键部分，优先保留重要信息"""
    if not content:
        return content
    
    # Split content into sections
    sections = content.split('\n\n')
    key_sections = []
    tokens_used = 0
    priority_sections = []
    regular_sections = []
    
    # Categorize sections by importance
    for section in sections:
        if is_factual_section(section):
            priority_sections.append(section)
        else:
            regular_sections.append(section)
    
    # Add priority sections first
    for section in priority_sections:
        section_tokens = len(encoding.encode(section))
        if tokens_used + section_tokens <= max_tokens:
            key_sections.append(section)
            tokens_used += section_tokens
        elif is_critical_section(section):
            # For critical sections, truncate but include
            truncated = truncate_section(section, max_tokens - tokens_used, encoding)
            if truncated:
                key_sections.append(truncated)
            break
    
    # Add regular sections if space allows
    for section in regular_sections:
        section_tokens = len(encoding.encode(section))
        if tokens_used + section_tokens <= max_tokens:
            key_sections.append(section)
            tokens_used += section_tokens
        else:
            break
    
    return '\n\n'.join(key_sections)

def is_factual_section(section):
    """判断段落是否包含重要事实信息"""
    factual_indicators = [
        r'\d{4}',  # Years
        r'\$[\d,]+',  # Money amounts
        r'\d+%',  # Percentages
        r'\d+\.?\d*\s*(million|billion|thousand)',  # Large numbers
        r'(acquired|purchased|bought|sold)',  # Business actions
        r'(announced|launched|released)',  # Event verbs
        r'[A-Z][a-z]+\s+(Inc|Corp|Ltd|Company)',  # Company names
    ]
    
    import re
    for pattern in factual_indicators:
        if re.search(pattern, section, re.IGNORECASE):
            return True
    return False

def is_critical_section(section):
    """判断是否为关键段落（即使超长也要保留）"""
    critical_keywords = [
        'acquisition', 'merger', 'financial', 'revenue', 'profit',
        'strategy', 'impact', 'result', 'conclusion', 'summary'
    ]
    
    section_lower = section.lower()
    return any(keyword in section_lower for keyword in critical_keywords)

def truncate_section(section, max_tokens, encoding):
    """智能截取段落，保留最重要的部分"""
    if not section:
        return ""
    
    sentences = section.split('. ')
    truncated_sentences = []
    tokens_used = 0
    
    for sentence in sentences:
        sentence_tokens = len(encoding.encode(sentence))
        if tokens_used + sentence_tokens <= max_tokens:
            truncated_sentences.append(sentence)
            tokens_used += sentence_tokens
        else:
            break
    
    result = '. '.join(truncated_sentences)
    if result and not result.endswith('.'):
        result += '.'
    
    return result

def simple_split_by_chars(texts, max_chars):
    """字符级别的简单分批（备用方案）"""
    batches = []
    current_batch = []
    current_chars = 0
    
    for text in texts:
        text_chars = len(str(text))
        if current_chars + text_chars > max_chars and current_batch:
            batches.append(current_batch)
            current_batch = [text]
            current_chars = text_chars
        else:
            current_batch.append(text)
            current_chars += text_chars
    
    if current_batch:
        batches.append(current_batch)
    
    return batches


================================================
FILE: backend/src/agent/prompts.py
================================================
from datetime import datetime


# Get current date in a readable format
def get_current_date():
    return datetime.now().strftime("%B %d, %Y")


query_writer_instructions = """You are a **QueryGenerationAgent** responsible for creating comprehensive, targeted search queries.

=== TASK ===
Generate {number_queries} diverse, specific search queries that will gather detailed, comprehensive information about the research topic.

=== RESEARCH STRATEGY ===
1. **Specificity**: Create queries targeting specific aspects, data points, case studies, and technical details
2. **Multi-angle approach**: Cover different perspectives, time periods, and geographical regions
3. **Technical depth**: Include queries for technical specifications, implementation details, and performance metrics
4. **Data-focused**: Target queries likely to return statistical data, reports, and detailed analysis
5. **Source diversity**: Ensure queries will hit different types of sources (academic, industry, news, government)

=== QUERY QUALITY CRITERIA ===
Each query should:
- Target specific, actionable information rather than general overviews
- Include relevant technical terms and industry keywords
- Specify timeframes, locations, or scale when relevant
- Aim for sources likely to contain detailed data and analysis
- Be distinct enough to avoid duplicate information

=== EXAMPLES OF GOOD vs POOR QUERIES ===
Research Topic: "Smart city transportation trends 2024"

POOR (too general):
- "smart city transportation"
- "smart city trends 2024"

GOOD (specific and detailed):
- "smart city autonomous vehicle deployment statistics 2024"
- "IoT traffic management systems case studies major cities 2024"
- "smart city public transport electrification data Europe Asia 2024"
- "AI-powered traffic optimization ROI metrics smart cities 2024"

=== CURRENT RESEARCH CONTEXT ===
Current Date: {current_date}
Research Topic: {research_topic}

=== OUTPUT REQUIREMENTS ===
Generate exactly {number_queries} search queries that will maximize the collection of detailed, specific information.
Focus on queries that will return comprehensive data, technical details, case studies, and implementation specifics.

IMPORTANT: Return only the search queries in the specified JSON format."""


web_searcher_instructions = """You are a **WebResearcher** agent responsible for gathering and extracting detailed information from web searches.

=== TASK ===
Conduct targeted Google Searches to gather comprehensive, credible information about the research topic.

=== INFORMATION EXTRACTION STRATEGY ===
1. **Preserve original details**: Include specific data points, statistics, dates, and technical specifications
2. **Extract key facts**: Pull out concrete information, case studies, and implementation details
3. **Maintain source context**: Keep important quotes and specific findings from sources
4. **Include diverse perspectives**: Gather information from multiple source types and viewpoints
5. **Technical depth**: Extract implementation details, performance metrics, and technical specifications

=== CONTENT REQUIREMENTS ===
Your output should prioritize:
1. **Specific data points**: Numbers, percentages, dates, costs, performance metrics
2. **Concrete examples**: Real projects, case studies, implementation examples
3. **Technical details**: How technologies work, system architectures, integration approaches
4. **Current information**: Recent developments, 2024 trends, latest implementations
5. **Authoritative sources**: Government reports, research papers, industry analyses

=== OUTPUT FORMAT ===
Structure your findings as:
1. **Key Statistics and Data**: Present specific numbers, metrics, and quantitative findings
2. **Technology Implementations**: Describe specific systems, architectures, and technical approaches
3. **Case Studies and Examples**: Detail real-world implementations with concrete details
4. **Current Trends and Developments**: Latest innovations and market movements
5. **Challenges and Solutions**: Specific problems and technical solutions being implemented

=== QUALITY STANDARDS ===
- Include specific citations for each major point
- Preserve technical terminology and specifications
- Extract detailed implementation approaches
- Include performance benchmarks and comparative data
- Maintain chronological context (emphasize 2024 developments)

=== CURRENT RESEARCH CONTEXT ===
Current Date: {current_date}
Research Topic: {research_topic}

IMPORTANT: Focus on extracting and preserving detailed, specific information from search results rather than creating high-level summaries. The goal is to gather comprehensive raw information that can be used for detailed analysis."""

reflection_instructions = """You are a **ResearchAnalyst** agent responsible for evaluating research comprehensiveness and depth.

=== TASK ===
Analyze the provided research summaries to determine if they contain sufficient detail and breadth to answer the research question comprehensively.

=== EVALUATION FRAMEWORK ===

**SUFFICIENT RESEARCH** should include:
1. **Quantitative data**: Specific statistics, percentages, dollar amounts, dates
2. **Multiple perspectives**: Different geographical regions, market segments, or approaches  
3. **Technical specifics**: Implementation details, technical specifications, performance metrics
4. **Current examples**: Recent case studies, pilot projects, deployed solutions
5. **Comprehensive coverage**: Multiple aspects of the research topic addressed

**EVALUATION CRITERIA**:
- **Comprehensive (sufficient=true)**: Rich with specific data, multiple examples, technical details, current information
- **Surface-level (sufficient=false)**: Lacks specific data, few concrete examples, missing technical depth

=== QUALITY THRESHOLDS ===
Mark as **sufficient=true** if the research includes:
- At least 5-8 specific data points or statistics
- Multiple concrete examples or case studies  
- Technical implementation details
- Geographic or market diversity in examples
- Recent (2024) information and trends

Mark as **sufficient=false** only if research is clearly:
- Too high-level or conceptual
- Missing key technical aspects
- Lacking concrete examples or data
- Insufficient depth for comprehensive analysis

=== FOLLOW-UP QUERY STRATEGY ===
If research is insufficient, generate 3-5 targeted queries to fill specific gaps:
- Target missing data types (quantitative, technical, geographic)
- Focus on specific implementation details or metrics
- Address underrepresented aspects of the topic

=== OUTPUT FORMAT ===
Return a JSON object with these exact keys:
{{
  "is_sufficient": true/false,
  "knowledge_gap": "Specific description of what information is missing or insufficient",
  "follow_up_queries": ["specific query 1", "specific query 2", ...]
}}

=== CURRENT RESEARCH CONTEXT ===
Current Date: {current_date}
Research Topic: {research_topic}

Research Summaries to Analyze:
{summaries}

IMPORTANT: Focus on whether the research provides sufficient detail and specificity for a comprehensive analysis, not whether it's "perfect"."""

answer_instructions = """You are a **Senior Research Analyst** at a leading global research consultancy firm. You are responsible for producing executive-level research reports for Fortune 500 clients.

=== PROFESSIONAL CONTEXT ===
Your audience consists of:
- C-suite executives and board members
- Strategic planners and business development teams  
- Investment committees and venture capital firms
- Government policy makers and regulatory bodies

=== REPORT QUALITY STANDARDS ===
As a premium research consultancy, your reports must demonstrate:
- **Strategic insight**: Beyond data presentation to actionable intelligence
- **Market expertise**: Deep understanding of industry dynamics and competitive landscape
- **Executive focus**: Clear implications for business strategy and decision-making
- **Professional credibility**: Authoritative tone with rigorous methodology

=== REPORT STRUCTURE REQUIREMENTS ===
Your comprehensive report must include:

1. **Executive Summary** (2-3 paragraphs)
   - Key findings and strategic implications
   - Critical market trends and drivers
   - Primary recommendations for stakeholders

2. **Methodology & Scope**
   - Research approach and data sources
   - Analysis framework and validation methods
   - Limitations and scope of study

3. **Core Analysis Sections** (organized by research objectives)
   - Market landscape and competitive dynamics
   - Technology trends and innovation drivers
   - Implementation case studies and best practices
   - Challenges, barriers, and risk factors

4. **Strategic Implications & Recommendations**
   - Business impact analysis
   - Investment and policy recommendations  
   - Future outlook and emerging opportunities

5. **Conclusion & Next Steps**
   - Summary of critical findings
   - Strategic priorities for stakeholders
   - Areas for continued monitoring

=== WRITING STYLE GUIDELINES ===
- **Authoritative but accessible**: Professional language without unnecessary jargon
- **Data-driven narratives**: Every claim supported by evidence and context
- **Strategic perspective**: Focus on "what this means" rather than just "what is"
- **Executive brevity**: Concise yet comprehensive coverage
- **Human insight**: Provide interpretation and judgment, not just data aggregation

=== CITATION & SOURCE STANDARDS ===
- Integrate sources naturally within the narrative flow
- Use professional attribution: "According to McKinsey research..." rather than [Source: mckinsey]
- Prioritize authoritative sources: industry reports, academic research, government data
- Provide context for data points: trends, comparisons, significance

=== OUTPUT FORMAT ===
Structure as a professional consulting report:
- Clear section headers with strategic focus
- Executive summary highlighting key insights
- Logical flow from analysis to implications
- Professional formatting with bullet points and subheadings
- Integrated citations that enhance credibility

=== CURRENT ASSIGNMENT ===
Research Topic: {research_topic}
Report Date: {current_date}

Research Findings:
{summaries}

IMPORTANT: Transform these research findings into a polished, executive-level report that demonstrates the analytical rigor and strategic insight expected from a top-tier consulting firm. Focus on delivering actionable intelligence rather than raw information compilation."""

planning_instructions = """You are **PlannerAgent**. Your job is to analyze the user research query and break it down into multiple specific, executable research tasks.

=== TASK ANALYSIS PRINCIPLES ===
1. **Decompose complex queries**: Break broad topics into specific, manageable subtasks
2. **Identify key dimensions**: Extract different aspects, categories, or domains
3. **Create parallel tasks**: Generate 2-5 focused tasks that can be researched independently
4. **Ensure comprehensive coverage**: All important aspects should be covered

=== TASK BREAKDOWN STRATEGY ===
For research queries, consider these dimensions:
- **Domain separation**: Split different fields/industries (e.g., transportation vs energy)
- **Geographic scope**: Different regions or global vs local
- **Temporal focus**: Current trends vs future projections vs historical analysis
- **Technical depth**: Overview vs implementation details vs case studies
- **Stakeholder perspective**: Government, industry, technology, user impact

=== OUTPUT FORMAT ===
Return a single JSON array inside ```PLAN``` fences.  
Each element must contain the following fields **in this order**:

{{
  "id":             "<kebab-case unique slug>",
  "description":    "<one specific, focused research task>",
  "info_needed":    true | false,
  "source_hint":    "<specific search keywords for this task>",
  "status":         "pending"
}}

=== PLANNING EXAMPLES ===

**Example 1**: User Query: "Research AI impact on healthcare"
```PLAN
[
  {{
    "id": "ai-diagnostics",
    "description": "Research AI applications in medical diagnostics and imaging",
    "info_needed": true,
    "source_hint": "AI medical diagnostics imaging radiology machine learning healthcare 2024",
    "status": "pending"
  }},
  {{
    "id": "ai-treatment",
    "description": "Research AI-driven treatment recommendations and drug discovery",
    "info_needed": true,
    "source_hint": "AI treatment recommendations drug discovery personalized medicine",
    "status": "pending"
  }},
  {{
    "id": "ai-healthcare-challenges",
    "description": "Analyze challenges and ethical considerations of AI in healthcare",
    "info_needed": true,
    "source_hint": "AI healthcare ethics privacy challenges regulatory issues",
    "status": "pending"
  }}
]
```

**Example 2**: User Query: "Smart city transportation and energy trends 2024"
```PLAN
[
  {{
    "id": "smart-transportation-2024",
    "description": "Research 2024 smart city transportation technologies and trends",
    "info_needed": true,
    "source_hint": "smart city transportation 2024 IoT traffic management autonomous vehicles",
    "status": "pending"
  }},
  {{
    "id": "smart-energy-2024", 
    "description": "Research 2024 smart city energy systems and sustainability trends",
    "info_needed": true,
    "source_hint": "smart city energy 2024 renewable smart grid energy management",
    "status": "pending"
  }},
  {{
    "id": "transport-energy-integration",
    "description": "Analyze integration between smart transportation and energy systems",
    "info_needed": true,
    "source_hint": "smart city transport energy integration electric vehicles charging infrastructure",
    "status": "pending"
  }}
]
```

=== REQUIREMENTS ===
1. **Always create 2-5 tasks** (never just 1 unless the query is extremely specific)
2. **Each task should be focused and specific**
3. **Tasks should be complementary but independent**
4. **Use descriptive, actionable task descriptions**
5. **Provide targeted source hints for each task**
6. **Total top-level steps ≤ 5**

=== CURRENT RESEARCH QUERY ===
User Query: {user_query}

=== INSTRUCTIONS ===
Analyze the user query and break it down into specific research tasks. Focus on creating multiple focused tasks rather than one broad task. Output **only** the JSON array inside ```PLAN``` fences."""

integrated_report_instructions = """You are a **Senior Research Director** at a premier global consulting firm, responsible for synthesizing complex multi-faceted research into cohesive strategic intelligence reports.

=== LANGUAGE ADAPTATION ===
**CRITICAL**: Respond in the SAME LANGUAGE as the original user query.
- If user query is in Chinese (中文), write the entire report in Chinese
- If user query is in English, write the entire report in English
- Maintain professional terminology and industry-specific language in the appropriate language
- Use native language conventions for citations, formatting, and professional writing

=== SYNTHESIS MISSION ===
Transform the provided research findings from multiple investigation streams into a unified, comprehensive professional analysis that reads as a single coherent narrative, not a collection of separate studies or high-level summaries.

=== INPUT CONTEXT ===
- Original Research Query: {user_query}
- Research Plan: {research_plan_summary}
- Complete Research Dataset: Multiple investigation streams with varying focus areas
- Target Audience: Industry professionals, researchers, business analysts requiring detailed insights

=== REPORT ARCHITECTURE PRINCIPLES ===

**1. DETAILED PROFESSIONAL ANALYSIS (Not Executive Summary)**
- Provide comprehensive, detailed analysis with specific data, metrics, and examples
- Include technical details, implementation specifics, and concrete case studies
- Present thorough research findings with supporting evidence and quantitative data
- Maintain depth and specificity throughout - this is NOT a summary document

**2. THEMATIC INTEGRATION WITH DEPTH**
- Organize by analytical themes while maintaining detailed coverage of each area
- Identify cross-cutting insights supported by specific evidence and data points
- Build narrative bridges between different aspects with concrete connections
- Present unified analysis while preserving the richness of detailed findings

**3. PROFESSIONAL RESEARCH STANDARDS**
- Comprehensive analysis with detailed methodology and findings
- Extensive use of specific data, statistics, and concrete examples
- Thorough coverage of technical aspects and implementation details
- Rich sourcing with complete, accurate, and verifiable citations

=== REQUIRED REPORT STRUCTURE ===

**COMPREHENSIVE OVERVIEW**
- Detailed introduction to the research scope and methodology
- Complete context setting with market sizing, key players, and current landscape
- Specific quantitative and qualitative indicators
- Thorough background establishing the foundation for detailed analysis

**DETAILED FINDINGS & ANALYSIS**
- In-depth analysis of each major research area with supporting data
- Specific technical details, implementation approaches, and case studies
- Comprehensive coverage of trends, technologies, and market dynamics
- Detailed examination of challenges, opportunities, and solution approaches
- Rich integration of cross-domain insights with specific supporting evidence

**TECHNICAL IMPLEMENTATIONS & CASE STUDIES**
- Detailed implementation examples with specific technical specifications
- Comprehensive case study analysis with concrete outcomes and metrics
- Thorough coverage of best practices and lessons learned
- Specific technology deployments, performance data, and success metrics

**MARKET DYNAMICS & COMPETITIVE LANDSCAPE**
- Detailed competitive analysis with specific market share data and positioning
- Comprehensive regulatory environment analysis with specific policy impacts
- Thorough investment landscape with specific funding amounts and trends
- Detailed stakeholder analysis with specific roles and influence patterns

**FUTURE PROJECTIONS & STRATEGIC IMPLICATIONS**
- Detailed forecasting with specific timelines and quantitative projections
- Comprehensive risk analysis with specific mitigation strategies
- Thorough opportunity assessment with concrete implementation pathways
- Detailed strategic recommendations with specific action items and resource requirements

=== SYNTHESIS GUIDELINES ===

**INTEGRATION WITH DEPTH:**
- Weave together detailed findings from different research streams naturally
- Maintain the richness and specificity of original research while showing connections
- Preserve technical details, specific data points, and concrete examples
- Build comprehensive understanding through detailed cross-domain analysis

**ENHANCED CITATION STANDARDS:**
- Preserve and integrate complete citation information throughout the narrative
- Use format: "According to [Specific Source/Study Name] (URL if available)..."
- Include specific study details, publication dates, and author/organization information
- Maintain all quantitative claims with specific source attribution
- Provide verifiable references that readers can follow up on

**PROFESSIONAL DEPTH:**
- Focus on comprehensive analysis rather than high-level strategic summaries
- Include technical specifications, implementation details, and operational insights
- Provide specific metrics, performance data, and concrete examples throughout
- Maintain the detailed, professional tone expected in industry research reports

**COMPREHENSIVE COVERAGE:**
- Ensure thorough coverage of all research areas with appropriate depth
- Include specific technical details, market data, and implementation examples
- Provide comprehensive context and background for all major topics
- Maintain professional research report standards with extensive detail and analysis

=== RESEARCH DATA TO SYNTHESIZE ===

{comprehensive_research_data}

=== CRITICAL INSTRUCTIONS ===

1. **LANGUAGE**: Write the ENTIRE report in the same language as the user query
2. **DEPTH**: This is a detailed professional research report, NOT an executive summary
3. **SPECIFICITY**: Include concrete data, metrics, examples, and technical details throughout
4. **INTEGRATION**: Unify findings while preserving the richness and depth of source material
5. **CITATIONS**: Maintain complete, accurate citations that readers can verify and follow
6. **COMPREHENSIVENESS**: Provide thorough coverage that satisfies professional research standards

OUTPUT: A comprehensive, detailed professional research report that integrates findings across all research areas while maintaining the depth, specificity, and professional rigor expected in industry research documentation."""


================================================
FILE: backend/src/agent/report_level_enhancement.py
================================================
"""
Report-Level Content Enhancement Module

During the final report generation phase, the LLM may discover it needs more in-depth specific 
information to support its analysis. This module provides the capability to perform targeted 
content enhancement during the report generation process.
"""

import os
from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.runnables import RunnableConfig
from firecrawl import FirecrawlApp


@dataclass
class ReportEnhancementRequest:
    """Report enhancement request"""
    enhancement_type: str  # "specific_data", "case_study", "technical_details", "market_data"
    target_information: str  # Specific description of needed information
    suggested_sources: List[str]  # Suggested source URLs
    priority: int  # Priority level 1-5
    reasoning: str  # LLM's reasoning process


@dataclass
class ReportEnhancementResult:
    """Report enhancement result"""
    success: bool
    enhanced_content: str
    sources_used: List[Dict[str, Any]]
    enhancement_quality: str  # "excellent", "good", "fair", "poor"


class ReportLevelEnhancer:
    """Report-level content enhancer"""
    
    def __init__(self):
        self.firecrawl_app = None
        if os.getenv("FIRECRAWL_API_KEY"):
            self.firecrawl_app = FirecrawlApp(api_key=os.getenv("FIRECRAWL_API_KEY"))
    
    def analyze_report_enhancement_needs(
        self, 
        user_query: str,
        research_plan: List[Dict],
        aggregated_research_data: str,
        config: RunnableConfig
    ) -> List[ReportEnhancementRequest]:
        """
        Analyze additional information needed during report writing process
        
        This is a pre-analysis step that allows the LLM to identify information gaps 
        before formal writing begins
        """
        
        enhancement_analysis_prompt = f"""You are a professional research report writing expert. Before writing the final report, please analyze whether the current research data is sufficiently complete and identify additional in-depth information that may be needed.

User Query: {user_query}

Research Plan:
{chr(10).join([f"• {task.get('description', '')}" for task in research_plan])}

Current Research Data Overview:
{aggregated_research_data[:2000]}...

Please analyze the information adequacy in the following dimensions and identify areas that need deep enhancement:

1. **Specific Data & Statistics** - Is there sufficient quantitative data to support the analysis?
2. **Implementation Cases & Technical Details** - Are there specific implementation examples?
3. **Market Data & Competitive Analysis** - Is there latest market sizing and competitive landscape data?
4. **Policies, Regulations & Standards** - Is the relevant regulatory framework covered?

For each area that needs enhancement, please output in the following format:

**ENHANCEMENT_REQUEST_START**
Type: [specific_data|case_study|technical_details|market_data|regulatory_info]
Target: [What specific information is needed]
Priority: [1-5 number]
Reasoning: [Why this information is needed and how it will improve report quality]
Suggested_Sources: [Suggested website types or specific URLs if known that might have this information]
**ENHANCEMENT_REQUEST_END**

If current information is already sufficient, output: **NO_ENHANCEMENT_NEEDED**

Please identify only the most critical 1-3 enhancement needs to avoid over-complication.
"""
        
        from agent.configuration import Configuration
        configurable = Configuration.from_runnable_config(config)
        
        llm = ChatGoogleGenerativeAI(
            model=configurable.reflection_model,
            temperature=0.3,
            max_retries=2,
            api_key=os.getenv("GEMINI_API_KEY"),
        )
        
        response = llm.invoke(enhancement_analysis_prompt)
        analysis_text = response.content if hasattr(response, 'content') else str(response)
        
        return self._parse_enhancement_requests(analysis_text)
    
    def _parse_enhancement_requests(self, analysis_text: str) -> List[ReportEnhancementRequest]:
        """Parse LLM's enhancement requests"""
        requests = []
        
        if "NO_ENHANCEMENT_NEEDED" in analysis_text:
            return requests
        
        import re
        
        # Extract all enhancement request blocks
        pattern = r'\*\*ENHANCEMENT_REQUEST_START\*\*(.*?)\*\*ENHANCEMENT_REQUEST_END\*\*'
        matches = re.findall(pattern, analysis_text, re.DOTALL)
        
        for match in matches:
            try:
                request_data = self._parse_single_request(match)
                if request_data:
                    requests.append(request_data)
            except Exception as e:
                print(f"⚠️ Failed to parse enhancement request: {e}")
                continue
        
        return requests[:3]  # Maximum 3 requests
    
    def _parse_single_request(self, request_text: str) -> Optional[ReportEnhancementRequest]:
        """Parse a single enhancement request"""
        lines = request_text.strip().split('\n')
        
        enhancement_type = ""
        target_information = ""
        priority = 3
        reasoning = ""
        suggested_sources = []
        
        for line in lines:
            line = line.strip()
            if line.startswith('Type:'):
                enhancement_type = line.replace('Type:', '').strip()
            elif line.startswith('Target:'):
                target_information = line.replace('Target:', '').strip()
            elif line.startswith('Priority:'):
                try:
                    priority = int(line.replace('Priority:', '').strip())
                except:
                    priority = 3
            elif line.startswith('Reasoning:'):
                reasoning = line.replace('Reasoning:', '').strip()
            elif line.startswith('Suggested_Sources:'):
                sources_text = line.replace('Suggested_Sources:', '').strip()
                # Simple split, could be more complex in practice
                suggested_sources = [s.strip() for s in sources_text.split(',') if s.strip()]
        
        if enhancement_type and target_information:
            return ReportEnhancementRequest(
                enhancement_type=enhancement_type,
                target_information=target_information,
                suggested_sources=suggested_sources,
                priority=priority,
                reasoning=reasoning
            )
        
        return None
    
    def execute_targeted_enhancement(
        self, 
        enhancement_requests: List[ReportEnhancementRequest],
        available_sources: List[Dict[str, Any]]
    ) -> List[ReportEnhancementResult]:
        """Execute targeted content enhancement"""
        
        if not self.firecrawl_app:
            print("⚠️ Firecrawl not configured, skipping report-level enhancement")
            return []
        
        results = []
        
        for request in enhancement_requests:
            print(f"🎯 Executing report-level enhancement: {request.enhancement_type}")
            print(f"   Target information: {request.target_information}")
            
            # Find matching URLs
            target_urls = self._find_matching_urls(request, available_sources)
            
            if not target_urls:
                print(f"   ❌ No matching information sources found")
                continue
            
            # Attempt enhancement
            enhanced_content = ""
            sources_used = []
            
            for url_info in target_urls[:2]:  # Try at most 2 URLs
                try:
                    url = url_info.get('url', '')
                    if not url:
                        continue
                    
                    print(f"   🔥 Scraping: {url_info.get('title', 'Unknown')}")
                    
                    result = self.firecrawl_app.scrape_url(url, params={
                        'formats': ['markdown'],
                        'onlyMainContent': True,
                        'timeout': 30000
                    })
                    
                    if result and result.success:
                        content = result.markdown or ''
                        if len(content) > 500:  # Valid content
                            enhanced_content += f"\n\n### Source: {url_info.get('title', 'Unknown')}\n{content[:2000]}..."
                            sources_used.append({
                                'url': url,
                                'title': url_info.get('title', ''),
                                'content_length': len(content)
                            })
                            print(f"     ✅ Success: {len(content)} characters")
                        else:
                            print(f"     ⚠️ Content too short: {len(content)} characters")
                    else:
                        print(f"     ❌ Scraping failed")
                        
                except Exception as e:
                    print(f"     ❌ Scraping exception: {str(e)}")
                    continue
            
            if enhanced_content and sources_used:
                quality = self._assess_enhancement_quality(enhanced_content, request)
                results.append(ReportEnhancementResult(
                    success=True,
                    enhanced_content=enhanced_content,
                    sources_used=sources_used,
                    enhancement_quality=quality
                ))
                print(f"   ✅ Enhancement completed, quality: {quality}")
            else:
                results.append(ReportEnhancementResult(
                    success=False,
                    enhanced_content="",
                    sources_used=[],
                    enhancement_quality="failed"
                ))
                print(f"   ❌ Enhancement failed")
        
        return results
    
    def _find_matching_urls(
        self, 
        request: ReportEnhancementRequest, 
        available_sources: List[Dict[str, Any]]
    ) -> List[Dict[str, Any]]:
        """Find URLs matching the enhancement request"""
        
        target_keywords = request.target_information.lower().split()
        enhancement_type = request.enhancement_type
        
        scored_sources = []
        
        for source in available_sources:
            title = source.get('title', '').lower()
            url = source.get('url', '').lower()
            
            score = 0
            
            # Keyword matching
            for keyword in target_keywords:
                if keyword in title:
                    score += 2
                if keyword in url:
                    score += 1
            
            # Type matching
            type_scoring = {
                'specific_data': ['data', 'statistics', 'report', 'research', 'study'],
                'case_study': ['case', 'example', 'implementation', 'deployment', 'success'],
                'technical_details': ['technical', 'specification', 'documentation', 'guide', 'manual'],
                'market_data': ['market', 'industry', 'competition', 'analysis', 'forecast'],
                'regulatory_info': ['regulation', 'policy', 'standard', 'compliance', 'legal']
            }
            
            type_keywords = type_scoring.get(enhancement_type, [])
            for keyword in type_keywords:
                if keyword in title or keyword in url:
                    score += 1
            
            # Authority bonus
            if any(domain in url for domain in ['.gov', '.edu', '.org']):
                score += 3
            
            if score > 0:
                scored_sources.append((source, score))
        
        # Sort by score
        scored_sources.sort(key=lambda x: x[1], reverse=True)
        
        return [source for source, score in scored_sources if score >= 2]
    
    def _assess_enhancement_quality(
        self, 
        content: str, 
        request: ReportEnhancementRequest
    ) -> str:
        """Assess enhancement content quality"""
        
        if not content:
            return "poor"
        
        length = len(content)
        target_keywords = request.target_information.lower().split()
        
        # Keyword matching rate
        keyword_matches = sum(1 for keyword in target_keywords if keyword in content.lower())
        keyword_ratio = keyword_matches / len(target_keywords) if target_keywords else 0
        
        # Length assessment
        if length > 2000 and keyword_ratio > 0.6:
            return "excellent"
        elif length > 1000 and keyword_ratio > 0.4:
            return "good" 
        elif length > 500 and keyword_ratio > 0.2:
            return "fair"
        else:
            return "poor"


def integrate_report_enhancement_into_finalize(
    user_query: str,
    research_plan: List[Dict],
    aggregated_research_data: str,
    available_sources: List[Dict[str, Any]],
    config: RunnableConfig
) -> Tuple[str, List[ReportEnhancementResult]]:
    """
    Integrate report-level enhancement into finalize_answer process
    
    Returns: (enhanced_research_data, enhancement_results)
    """
    
    enhancer = ReportLevelEnhancer()
    
    # 1. Analyze enhancement needs
    enhancement_requests = enhancer.analyze_report_enhancement_needs(
        user_query, research_plan, aggregated_research_data, config
    )
    
    if not enhancement_requests:
        print("✅ Report-level analysis: Current information is sufficient, no additional enhancement needed")
        return aggregated_research_data, []
    
    print(f"🎯 Identified {len(enhancement_requests)} report-level enhancement needs")
    for i, req in enumerate(enhancement_requests, 1):
        print(f"   {i}. {req.enhancement_type}: {req.target_information}")
    
    # 2. Execute enhancement
    enhancement_results = enhancer.execute_targeted_enhancement(
        enhancement_requests, available_sources
    )
    
    # 3. Merge enhanced content
    enhanced_data = aggregated_research_data
    
    successful_enhancements = [r for r in enhancement_results if r.success]
    if successful_enhancements:
        enhanced_sections = []
        for result in successful_enhancements:
            enhanced_sections.append(f"\n\n## Report-Level Deep Enhancement\n{result.enhanced_content}")
        
        enhanced_data += "\n" + "\n".join(enhanced_sections)
        print(f"✅ Report-level enhancement completed: {len(successful_enhancements)} successful")
    else:
        print("⚠️ Report-level enhancement did not yield effective content")
    
    return enhanced_data, enhancement_results 

================================================
FILE: backend/src/agent/state.py
================================================
from __future__ import annotations

from dataclasses import dataclass, field
from typing import TypedDict, List, Optional, Dict, Any

from langgraph.graph import add_messages
from typing_extensions import Annotated


import operator
from dataclasses import dataclass, field
from typing_extensions import Annotated


class LedgerEntry(TypedDict):
    task_id: str
    description: str
    findings_summary: str  # The concise (1-2 sentence) LLM-generated summary for this task
    detailed_snippets: Optional[List[str]]  # List of relevant web_research_result strings
    citations_for_snippets: Optional[List[Dict[str, str]]]  # Maps snippets to sources


class OverallState(TypedDict):
    messages: Annotated[list, add_messages]
    user_query: str  # Store original user question
    plan: list  # Store task plan generated by planner_node
    current_task_pointer: int  # Point to current task in plan
    executed_search_queries: Annotated[list, operator.add]  # Renamed from search_query
    web_research_result: Annotated[list, operator.add]
    sources_gathered: Annotated[list, operator.add]
    initial_search_query_count: int
    max_research_loops: int
    research_loop_count: int
    reasoning_model: str
    
    # --- Day 2 additions for multi-task iteration ---
    ledger: Annotated[List[LedgerEntry], operator.add]  # Records of completed task findings
    global_summary_memory: Annotated[List[str], operator.add]  # Cross-task memory accumulation
    
    # --- Day 3 additions for richer synthesis ---
    current_task_detailed_findings: Annotated[List[Dict[str, Any]], operator.add]  # Temporary storage for current task's detailed findings
    task_specific_results: Annotated[List[Dict[str, Any]], operator.add]  # Task-specific research results with task_id
    final_report_markdown: Optional[str]  # The final synthesized report
    
    # --- Reflection结果字段 ---
    reflection_is_sufficient: Optional[bool]  # reflection判断的信息充足性
    reflection_knowledge_gap: Optional[str]  # reflection识别的知识差距
    reflection_follow_up_queries: Optional[List[str]]  # reflection建议的follow-up查询
    number_of_ran_queries: Optional[int]  # 已执行的查询数量
    
    # --- 增强版评估结果字段 ---
    evaluation_is_sufficient: Optional[bool]  # 最终评估的信息充足性
    evaluation_should_continue: Optional[bool]  # 是否应该继续研究
    evaluation_follow_up_queries: Optional[List[str]]  # 评估建议的follow-up查询
    evaluation_research_complete: Optional[bool]  # 研究是否完成
    evaluation_enhancement_boost: Optional[int]  # 内容增强带来的提升度
    
    # --- 智能内容增强字段 ---
    enhancement_decision: Optional[Dict[str, Any]]  # 增强决策结果
    enhancement_status: Optional[str]  # "skipped", "completed", "failed", "error", "analyzing"
    enhanced_content_results: Optional[List[Dict[str, Any]]]  # Firecrawl增强内容结果
    enhanced_sources_count: Optional[int]  # 成功增强的源数量
    enhancement_error: Optional[str]  # 增强过程中的错误信息


class ReflectionState(TypedDict):
    is_sufficient: bool
    knowledge_gap: str
    follow_up_queries: Annotated[list, operator.add]
    research_loop_count: int
    number_of_ran_queries: int
    plan: list
    current_task_pointer: int


class Query(TypedDict):
    query: str
    rationale: str


class QueryGenerationState(TypedDict):
    query_list: list[Query]
    plan: list
    current_task_pointer: int


class WebSearchState(TypedDict):
    search_query: str
    id: str
    current_task_id: str


@dataclass(kw_only=True)
class SearchStateOutput:
    running_summary: str = field(default=None)  # Final report


================================================
FILE: backend/src/agent/tools_and_schemas.py
================================================
from typing import List
from pydantic import BaseModel, Field


class SearchQueryList(BaseModel):
    query: List[str] = Field(
        description="A list of search queries to be used for web research."
    )
    rationale: str = Field(
        description="A brief explanation of why these queries are relevant to the research topic."
    )


class Reflection(BaseModel):
    is_sufficient: bool = Field(
        description="Whether the provided summaries are sufficient to answer the user's question."
    )
    knowledge_gap: str = Field(
        description="A description of what information is missing or needs clarification."
    )
    follow_up_queries: List[str] = Field(
        description="A list of follow-up queries to address the knowledge gap."
    )


class ResearchTask(BaseModel):
    id: str = Field(description="Unique identifier for the task.")
    description: str = Field(description="A concise description of what this research task aims to achieve.")


class ResearchPlan(BaseModel):
    tasks: List[ResearchTask] = Field(description="A list of research tasks to be executed.")


class LedgerEntry(BaseModel):
    """Record of completed task findings for the ledger."""
    task_id: str = Field(description="Unique identifier of the completed task")
    description: str = Field(description="Original task description")
    findings_summary: str = Field(description="Concise summary of key findings for this task")


================================================
FILE: backend/src/agent/utils.py
================================================
from typing import Any, Dict, List
from langchain_core.messages import AnyMessage, AIMessage, HumanMessage


def get_research_topic(messages: List[AnyMessage]) -> str:
    """
    Get the research topic from the messages.
    """
    # check if request has a history and combine the messages into a single string
    if len(messages) == 1:
        research_topic = messages[-1].content
    else:
        research_topic = ""
        for message in messages:
            if isinstance(message, HumanMessage):
                research_topic += f"User: {message.content}\n"
            elif isinstance(message, AIMessage):
                research_topic += f"Assistant: {message.content}\n"
    return research_topic


def resolve_urls(urls_to_resolve: List[Any], id: int) -> Dict[str, str]:
    """
    Create a map that preserves the original URLs instead of replacing them with fake internal IDs.
    This ensures citations point to real, accessible web sources.
    """
    # Extract real URLs from the search results
    urls = [site.web.uri for site in urls_to_resolve]

    # Create a dictionary that maps each unique URL to itself (preserve original URLs)
    # We only need to deduplicate, not create fake internal URLs
    resolved_map = {}
    for idx, url in enumerate(urls):
        if url not in resolved_map:
            # Keep the original URL instead of creating a fake vertexaisearch URL
            resolved_map[url] = url

    return resolved_map


def insert_citation_markers(text, citations_list):
    """
    Inserts citation markers into a text string based on start and end indices.

    Args:
        text (str): The original text string.
        citations_list (list): A list of dictionaries, where each dictionary
                               contains 'start_index', 'end_index', and
                               'segment_string' (the marker to insert).
                               Indices are assumed to be for the original text.

    Returns:
        str: The text with citation markers inserted.
    """
    # Sort citations by end_index in descending order.
    # If end_index is the same, secondary sort by start_index descending.
    # This ensures that insertions at the end of the string don't affect
    # the indices of earlier parts of the string that still need to be processed.
    sorted_citations = sorted(
        citations_list, key=lambda c: (c["end_index"], c["start_index"]), reverse=True
    )

    modified_text = text
    for citation_info in sorted_citations:
        # These indices refer to positions in the *original* text,
        # but since we iterate from the end, they remain valid for insertion
        # relative to the parts of the string already processed.
        end_idx = citation_info["end_index"]
        marker_to_insert = ""
        for segment in citation_info["segments"]:
            marker_to_insert += f" [{segment['label']}]({segment['short_url']})"
        # Insert the citation marker at the original end_idx position
        modified_text = (
            modified_text[:end_idx] + marker_to_insert + modified_text[end_idx:]
        )

    return modified_text


def get_citations(response, resolved_urls_map):
    """
    Extracts and formats citation information from a Gemini model's response.

    This function processes the grounding metadata provided in the response to
    construct a list of citation objects. Each citation object includes the
    start and end indices of the text segment it refers to, and a string
    containing formatted markdown links to the supporting web chunks.

    Args:
        response: The response object from the Gemini model, expected to have
                  a structure including `candidates[0].grounding_metadata`.
                  It also relies on a `resolved_map` being available in its
                  scope to map chunk URIs to resolved URLs.

    Returns:
        list: A list of dictionaries, where each dictionary represents a citation
              and has the following keys:
              - "start_index" (int): The starting character index of the cited
                                     segment in the original text. Defaults to 0
                                     if not specified.
              - "end_index" (int): The character index immediately after the
                                   end of the cited segment (exclusive).
              - "segments" (list[str]): A list of individual markdown-formatted
                                        links for each grounding chunk.
              - "segment_string" (str): A concatenated string of all markdown-
                                        formatted links for the citation.
              Returns an empty list if no valid candidates or grounding supports
              are found, or if essential data is missing.
    """
    citations = []

    # Ensure response and necessary nested structures are present
    if not response or not response.candidates:
        return citations

    candidate = response.candidates[0]
    if (
        not hasattr(candidate, "grounding_metadata")
        or not candidate.grounding_metadata
        or not hasattr(candidate.grounding_metadata, "grounding_supports")
    ):
        return citations

    for support in candidate.grounding_metadata.grounding_supports:
        citation = {}

        # Ensure segment information is present
        if not hasattr(support, "segment") or support.segment is None:
            continue  # Skip this support if segment info is missing

        start_index = (
            support.segment.start_index
            if support.segment.start_index is not None
            else 0
        )

        # Ensure end_index is present to form a valid segment
        if support.segment.end_index is None:
            continue  # Skip if end_index is missing, as it's crucial

        # Add 1 to end_index to make it an exclusive end for slicing/range purposes
        # (assuming the API provides an inclusive end_index)
        citation["start_index"] = start_index
        citation["end_index"] = support.segment.end_index

        citation["segments"] = []
        if (
            hasattr(support, "grounding_chunk_indices")
            and support.grounding_chunk_indices
        ):
            for ind in support.grounding_chunk_indices:
                try:
                    chunk = candidate.grounding_metadata.grounding_chunks[ind]
                    resolved_url = resolved_urls_map.get(chunk.web.uri, None)
                    citation["segments"].append(
                        {
                            "label": chunk.web.title.split(".")[:-1][0],
                            "short_url": resolved_url,
                            "value": chunk.web.uri,
                        }
                    )
                except (IndexError, AttributeError, NameError):
                    # Handle cases where chunk, web, uri, or resolved_map might be problematic
                    # For simplicity, we'll just skip adding this particular segment link
                    # In a production system, you might want to log this.
                    pass
        citations.append(citation)
    return citations


================================================
FILE: backend/test-agent.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from agent import graph\n",
    "\n",
    "state = graph.invoke({\"messages\": [{\"role\": \"user\", \"content\": \"Who won the euro 2024\"}], \"max_research_loops\": 3, \"initial_search_query_count\": 3})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'messages': [HumanMessage(content='Who won the euro 2024', additional_kwargs={}, response_metadata={}, id='4b0ccc12-2e74-4a55-a85e-c512e7867c26'),\n",
       "  AIMessage(content=\"Spain won the UEFA Euro 2024 tournament [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGFcidniPKtBR-_QjSR1P1Oathq_0T9FTwfpCAWZxbXsroItHQU8zRcyOPDgMcvsWoD2fEnwYFKwanV18ep2_cyS5BlHF6-OFNsijWb-peAgsgLAVRiubekRnzMugsYtiWrhZyO3Q==) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==) [foxsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o=) [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGZc-qDhRx_v3mPelXEfAVmWCpNTa_rzUKundc0pRc7PlTgppymao-_wO7O1oPaAhJYLcZkazIg8T5jA6t9OGgOxUd_Vl88BjouHsot0OK8TlM5hmPf4ECMWGeJthqVwndE3h4wdQ==) [uefa](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFY5CRvcfjdkBz3h8Md_PscguyZ7LtYrxeHHP3eagcmIOnjaMyZbOHFqUAsa2cgkwvb26FZTvGiRgLKNLfiAsH1oP-5kGwnL6Ejhm4ZXhWGg0R3yE_8zkIKde4RgjIXlBvQW4kZ-LI5yhag-ESoh771z6hob8AigAVXT7WeWABMlQNfcbyG_UZIkqAs18U5e6to44ruNbSyDIyd5gobsVpEmdU256oVxa9d7co=) [coachesvoice](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHxgpkZWF64tZ8-iypkI2fiFi2cpsj4AFjZXkcYUzf5hSOWYb5etIbCoZd_L6zDJi6mWWisxAO6T5V4T8H7XiRow6dmVqXpSEIKhPSdG0HAQbQK74lwxeV_uXx9fSPllIKPOs2tFNRqTuHdJBNcwpcJp6MJbVLEskyhYnWlyOd9ouQv) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEV-g6Hxxcan5Xre1yYGM3BtP3fo9uF2zHQ9sVeK_4poD-aBN5CRvhz471beYCC26wdrjhtbiCvDT9dAnPI-ruyqJZhwB3vbKS5HCFb9tPn7Dkj99LpjLXqYyuzbFGsHCbr5SCHoMEhNg--dMU7xB5TiH8HeqKH8B4lk_h00dqhEVQFb05w5TuLtbX1UdXN6NDzHlFN_xyXzOU=) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL).\\n\\nIn the final match held in Berlin, Germany, Spain defeated England 2-1 [olympics](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==) [foxsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFY5CRvcfjdkBz3h8Md_PscguyZ7LtYrxeHHP3eagcmIOnjaMyZbOHFqUAsa2cgkwvb26FZTvGiRgLKNLfiAsH1oP-5kGwnL6Ejhm4ZXhWGg0R3yE_8zkIKde4RgjIXlBvQW4kZ-LI5yhag-ESoh771z6hob8AigAVXT7WeWABMlQNfcbyG_UZIkqAs18U5e6to44ruNbSyDIyd5gobsVpEmdU256oVxa9d7co=) [coachesvoice](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHxgpkZWF64tZ8-iypkI2fiFi2cpsj4AFjZXkcYUzf5hSOWYb5etIbCoZd_L6zDJi6mWWisxAO6T5V4T8H7XiRow6dmVqXpSEIKhPSdG0HAQbQK74lwxeV_uXx9fSPllIKPOs2tFNRqTuHdJBNcwpcJp6MJbVLEskyhYnWlyOd9ouQv) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEV-g6Hxxcan5Xre1yYGM3BtP3fo9uF2zHQ9sVeK_4poD-aBN5CRvhz471beYCC26wdrjhtbiCvDT9dAnPI-ruyqJZhwB3vbKS5HCFb9tPn7Dkj99LpjLXqYyuzbFGsHCbr5SCHoMEhNg--dMU7xB5TiH8HeqKH8B4lk_h00dqhEVQFb05w5TuLtbX1UdXN6NDzHlFN_xyXzOU=) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL). Nico Williams scored the opening goal for Spain, and Mikel Oyarzabal scored the winning goal [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGFcidniPKtBR-_QjSR1P1Oathq_0T9FTwfpCAWZxbXsroItHQU8zRcyOPDgMcvsWoD2fEnwYFKwanV18ep2_cyS5BlHF6-OFNsijWb-peAgsgLAVRiubekRnzMugsYtiWrhZyO3Q==) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==) [foxsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz). Cole Palmer scored England's only goal [olympics](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==) [foxsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz).\\n\\nThis victory marked Spain's record fourth European Championship title [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGFcidniPKtBR-_QjSR1P1Oathq_0T9FTwfpCAWZxbXsroItHQU8zRcyOPDgMcvsWoD2fEnwYFKwanV18ep2_cyS5BlHF6-OFNsijWb-peAgsgLAVRiubekRnzMugsYtiWrhZyO3Q==) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==) [foxsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o=) [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGZc-qDhRx_v3mPelXEfAVmWCpNTa_rzUKundc0pRc7PlTgppymao-_wO7O1oPaAhJYLcZkazIg8T5jA6t9OGgOxUd_Vl88BjouHsot0OK8TlM5hmPf4ECMWGeJthqVwndE3h4wdQ==) [uefa](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFY5CRvcfjdkBz3h8Md_PscguyZ7LtYrxeHHP3eagcmIOnjaMyZbOHFqUAsa2cgkwvb26FZTvGiRgLKNLfiAsH1oP-5kGwnL6Ejhm4ZXhWGg0R3yE_8zkIKde4RgjIXlBvQW4kZ-LI5yhag-ESoh771z6hob8AigAVXT7WeWABMlQNfcbyG_UZIkqAs18U5e6to44ruNbSyDIyd5gobsVpEmdU256oVxa9d7co=) [coachesvoice](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHxgpkZWF64tZ8-iypkI2fiFi2cpsj4AFjZXkcYUzf5hSOWYb5etIbCoZd_L6zDJi6mWWisxAO6T5V4T8H7XiRow6dmVqXpSEIKhPSdG0HAQbQK74lwxeV_uXx9fSPllIKPOs2tFNRqTuHdJBNcwpcJp6MJbVLEskyhYnWlyOd9ouQv) [aljazeera](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEV-g6Hxxcan5Xre1yYGM3BtP3fo9uF2zHQ9sVeK_4poD-aBN5CRvhz471beYCC26wdrjhtbiCvDT9dAnPI-ruyqJZhwB3vbKS5HCFb9tPn7Dkj99LpjLXqYyuzbFGsHCbr5SCHoMEhNg--dMU7xB5TiH8HeqKH8B4lk_h00dqhEVQFb05w5TuLtbX1UdXN6NDzHlFN_xyXzOU=) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL). Spain achieved this by winning all seven of their matches throughout the tournament [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFgwKo5lPes5M_GObnkYEzn3QYn1kpTQpx42ANaNqvNMgRsB1Xp2TIXI82SYTSYuLd9ysgKfmlJJy3lcLxrmNBg1R_Z37PCO9vbqIBIbw6DKqMif7pHdtDTS7FUq69c29hkYb_b5w==) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o=) [youtube](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGZc-qDhRx_v3mPelXEfAVmWCpNTa_rzUKundc0pRc7PlTgppymao-_wO7O1oPaAhJYLcZkazIg8T5jA6t9OGgOxUd_Vl88BjouHsot0OK8TlM5hmPf4ECMWGeJthqVwndE3h4wdQ==) [uefa](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP) [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL) [newsbytesapp](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFIl5Xc3f44I1nYw_YrJqkByrRl20SiAopZqjfJIK6U62o27CrxLvxaJ4v1M7L5eOfTMMlBCHHYCUooPoG0aObaeRG3YxrcoFT7Xtd4KIrvCS6AWWRpOZasCW-sGtFA56DEDf-qbJ8lsXEJ4GQ386iGTdRkyK9EtJWw1mRpDu7dfPQ6Qy1hNIqTgTdo-3yq1WNmWEl8Xtnag0s=).\\n\\nKey individual awards for the tournament went to Spain's players: Rodri was named the Best Player, and Lamine Yamal was named the Best Young Player [wikipedia](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig==0) [bet9ja](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFgj0MP_IEmC842xTfmMPnbybBGYTUb_wEpwJ58keX5x_qPfUmC7Zz0o6IQeQ8TEqoRpv-Uq6oOqfbazu_aP0fMhP7UrSln6rB4SRvCRC327tM1LNaXpiXN-h6xlg0TN_-AWQORV4PSH7G5u2qD_NaNEWkz_oaEHxj22-qOam52fwRvqISOdoFDNTptlM6t0BbhcA==) [uefa](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGKGygrv0aVjWa7JUdwqtuttcPxVIiVFb2_Mxv32q-4AyOVwd8oMKLXq6sl2kw4A37lHLmUUQYqVfDMkX3DLXr4or1Xpx1lnOpIUanPjOtrr2Hk6tPPc0308hdE0xJ5CClC220Tz30xD6538_DOvrVWqfA7pV7x651519Zz37wgqYhN00Ah3LX4QZnW981_-SM8tjVSLDXutPphZBXXmMehNgUynvNd2IiGB9UtkLyGeWINIqR2F7lejStuXJ8U2Q==) [beinsports](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXExRli0zGmQZlemPPItRH3qShabB-QVHrgUAECeXIs3GUKgd2oIHd45-ULY--TosnkRkiM-XHqZlPxeQlOV6Ktgxb-L5r9Hhf8M-nQS_T0N7NK0BeynreRZtFivuKzwwOByq6uALzoVtombjsREMmsPG7s07CMlMrQjyJCVX8McNdnGC7-mdlHEjdfXN4sgi-YGxdxCdAxaHUaMQxPL0GUUmqDzMMpzVC_lRnrYfuk17UhXI9QhsEi3TMeuUgHu3kl16g1mHA==) [thehindu](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEAlCtejIOwzHPUOAXi7oLu469wYzGUJN86oxtrB6YCAHKAocfkxog6XZeXOUjAl9MTY2_jU5igYEOpyy5RZV2jhxGHtahvQGi8Bq0XkJmaFvludGqwpuBn-vFf-MR3As1CXu9GZNh0TW5f3eLPgvDjB6N3IoYaGhGT8BUiqSyZS6k41T-vL9h6fEFMoOFUYhG2S0AfuVZDuyF2nJHJP1WVWZS42csWXEJUDxqhYjyzmx33HaCxKk0Rbe3_Ovc_Kgdagw==).\", additional_kwargs={}, response_metadata={}, id='4c4aa673-391d-48b2-954a-9fcb7053c634')],\n",
       " 'search_query': ['Euro 2024 winner',\n",
       "  \"What were Spain's key team performance statistics throughout Euro 2024?\",\n",
       "  'What specific stats or performances led to Rodri being named Euro 2024 Best Player?',\n",
       "  'What specific stats or performances led to Lamine Yamal being named Euro 2024 Best Young Player?'],\n",
       " 'web_research_result': [\"Spain won the UEFA Euro 2024, securing their record fourth title [youtube](https://vertexaisearch.cloud.google.com/id/0-0) [aljazeera](https://vertexaisearch.cloud.google.com/id/0-1) [foxsports](https://vertexaisearch.cloud.google.com/id/0-2) [wikipedia](https://vertexaisearch.cloud.google.com/id/0-3) [youtube](https://vertexaisearch.cloud.google.com/id/0-4) [uefa](https://vertexaisearch.cloud.google.com/id/0-5). The final match was held in Berlin, Germany, where Spain defeated England 2-1 [olympics](https://vertexaisearch.cloud.google.com/id/0-6) [aljazeera](https://vertexaisearch.cloud.google.com/id/0-1) [foxsports](https://vertexaisearch.cloud.google.com/id/0-2). Spain's Nico Williams scored the opening goal, and Mikel Oyarzabal scored the winning goal [youtube](https://vertexaisearch.cloud.google.com/id/0-0) [aljazeera](https://vertexaisearch.cloud.google.com/id/0-1) [foxsports](https://vertexaisearch.cloud.google.com/id/0-2). England's Cole Palmer scored their lone goal [olympics](https://vertexaisearch.cloud.google.com/id/0-6) [aljazeera](https://vertexaisearch.cloud.google.com/id/0-1) [foxsports](https://vertexaisearch.cloud.google.com/id/0-2).\\n\\nSpain won all seven of their matches in the tournament [youtube](https://vertexaisearch.cloud.google.com/id/0-7) [wikipedia](https://vertexaisearch.cloud.google.com/id/0-3) [youtube](https://vertexaisearch.cloud.google.com/id/0-4) [uefa](https://vertexaisearch.cloud.google.com/id/0-5). In the quarter-finals, Spain defeated Germany 2-1 after extra time [olympics](https://vertexaisearch.cloud.google.com/id/0-6) [wikipedia](https://vertexaisearch.cloud.google.com/id/0-3). In the semi-finals, Spain beat France 2-1 [olympics](https://vertexaisearch.cloud.google.com/id/0-6) [aljazeera](https://vertexaisearch.cloud.google.com/id/0-8) [wikipedia](https://vertexaisearch.cloud.google.com/id/0-3). Lamine Yamal became the youngest player to score in a UEFA European Championship [ndtv](https://vertexaisearch.cloud.google.com/id/0-9) [uefa](https://vertexaisearch.cloud.google.com/id/0-5).\\n\\nThe top scorers of the tournament were Harry Kane, Georges Mikautadze, Jamal Musiala, Cody Gakpo, Ivan Schranz and Dani Olmo, each with 3 goals [wikipedia](https://vertexaisearch.cloud.google.com/id/0-10). Rodri was named best player and Lamine Yamal best young player of the tournament [wikipedia](https://vertexaisearch.cloud.google.com/id/0-10). Luis de la Fuente was the coach who led Spain to victory [transfermarkt](https://vertexaisearch.cloud.google.com/id/0-11).\\n\",\n",
       "  \"Spain won Euro 2024, defeating England 2-1 in the final to secure their record fourth European Championship [aljazeera](https://vertexaisearch.cloud.google.com/id/1-0) [coachesvoice](https://vertexaisearch.cloud.google.com/id/1-1) [aljazeera](https://vertexaisearch.cloud.google.com/id/1-2) [wikipedia](https://vertexaisearch.cloud.google.com/id/1-3). They won all seven of their matches in the competition [wikipedia](https://vertexaisearch.cloud.google.com/id/1-3).\\n\\nHere's a summary of Spain's key team performance statistics throughout Euro 2024:\\n\\n**General Stats:**\\n\\n*   **Goals Scored:** Spain scored 15 goals throughout the tournament, setting a new record for most goals in a single European Championship [wikipedia](https://vertexaisearch.cloud.google.com/id/1-3). They scored 13 goals before the final [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5).\\n*   **Goals Conceded:** Spain conceded only three goals in the tournament [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5).\\n*   **Wins:** Spain had a 100% win record in Euro 2024 [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5). They won all six of their matches leading up to the final [aljazeera](https://vertexaisearch.cloud.google.com/id/1-2) [sportsmole](https://vertexaisearch.cloud.google.com/id/1-6).\\n*   **Clean Sheets:** Spain had three clean sheets in Euro 2024 [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [thehindu](https://vertexaisearch.cloud.google.com/id/1-7).\\n*   **Possession:** Spain averaged 57.3% possession during the tournament [thehindu](https://vertexaisearch.cloud.google.com/id/1-4). They often maintained possession for over 65% of their matches [spanishprofootball](https://vertexaisearch.cloud.google.com/id/1-8).\\n*   **Passing Accuracy:** Spain had a passing accuracy of 90% [thehindu](https://vertexaisearch.cloud.google.com/id/1-4).\\n*   **Ball Recoveries:** Spain led the tournament in ball recoveries with 255 [thehindu](https://vertexaisearch.cloud.google.com/id/1-4).\\n*   **Shots:** Spain had 80 shots (excluding blocks), with 38 on target [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5). They had the most attempts in Euro 2024, with 108, 37 of which were on target [thehindu](https://vertexaisearch.cloud.google.com/id/1-4).\\n*   **Chances Created:** Spain created 85 chances [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5).\\n*   **Tackles:** Spain made 92 tackles [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5).\\n\\n**Team Composition and Tactics:**\\n\\n*   The squad featured a blend of experienced players and young talents [totalfootballanalysis](https://vertexaisearch.cloud.google.com/id/1-9).\\n*   Luis de la Fuente employed multifaceted tactics, adapting to different opponents [totalfootballanalysis](https://vertexaisearch.cloud.google.com/id/1-10).\\n*   Spain dominated possession and controlled the tempo of matches [spanishprofootball](https://vertexaisearch.cloud.google.com/id/1-8).\\n*   They utilized a high pressing strategy and quick recovery [spanishprofootball](https://vertexaisearch.cloud.google.com/id/1-8).\\n*   Fluid midfield dynamics were powered by players like Pedri, Rodri, and Gavi [spanishprofootball](https://vertexaisearch.cloud.google.com/id/1-8).\\n\\n**Individual Player Stats:**\\n\\n*   **Dani Olmo:** Joint leading goal scorer with three goals [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5). He also provided two assists [newsbytesapp](https://vertexaisearch.cloud.google.com/id/1-5).\\n*   **Lamine Yamal:** Joint assist leader with three assists [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [thehindu](https://vertexaisearch.cloud.google.com/id/1-7). He also became the youngest-ever Euros scorer [sportsmole](https://vertexaisearch.cloud.google.com/id/1-6) [wikipedia](https://vertexaisearch.cloud.google.com/id/1-3).\\n*   **Rodri:** Completed the most passes for Spain [thehindu](https://vertexaisearch.cloud.google.com/id/1-4) [thehindu](https://vertexaisearch.cloud.google.com/id/1-7).\\n*   **Aymeric Laporte:** Recovered the ball the most number of times for Spain defensively [thehindu](https://vertexaisearch.cloud.google.com/id/1-4).\\n*   **Unai Simon:** Conceded three goals and made 12 saves in five matches [thehindu](https://vertexaisearch.cloud.google.com/id/1-4).\\n*   **Nico Williams:** Named Man of the Match in the final [wikipedia](https://vertexaisearch.cloud.google.com/id/1-3).\\n\\nSpain's coach, Luis de la Fuente, emphasized versatility, pace on the wings, control in the middle, and a solid defense as key to their balance [coachesvoice](https://vertexaisearch.cloud.google.com/id/1-1).\\n\",\n",
       "  'Rodri was named Euro 2024 Best Player due to his consistent and brilliant performances throughout the tournament [bet9ja](https://vertexaisearch.cloud.google.com/id/2-0). He was the centerpiece of Spain\\'s midfield, playing a crucial role in nearly every game [europeanchampionship2024](https://vertexaisearch.cloud.google.com/id/2-1). Here\\'s a breakdown of the specific stats and performances that led to the award:\\n\\n*   **Key Role in Spain\\'s Victories:** Rodri played a crucial role in Spain\\'s victories over Germany and France [bet9ja](https://vertexaisearch.cloud.google.com/id/2-0).\\n*   **Midfield Dominance:** Rodri\\'s consistent presence in midfield was pivotal for Spain [europeanchampionship2024](https://vertexaisearch.cloud.google.com/id/2-1).\\n*   **Only Goal:** He scored a goal in Spain\\'s 4-1 win over Georgia in the Last 16 [indiatimes](https://vertexaisearch.cloud.google.com/id/2-2) [bet9ja](https://vertexaisearch.cloud.google.com/id/2-0).\\n*   **Passing Accuracy:** Rodri had a remarkable passing accuracy of 92.84% [uefa](https://vertexaisearch.cloud.google.com/id/2-3) [mancity](https://vertexaisearch.cloud.google.com/id/2-4) [uefa](https://vertexaisearch.cloud.google.com/id/2-5). Only Aymeric Laporte completed more passes for Spain with 411 passes [mancity](https://vertexaisearch.cloud.google.com/id/2-4).\\n*   **Ball Recoveries:** Rodri was also pivotal when out of possession, with just one other midfielder registering more ball recoveries than the Spaniard\\'s 33 [mancity](https://vertexaisearch.cloud.google.com/id/2-4).\\n*   **Leadership:** He led his team with distinction [europeanchampionship2024](https://vertexaisearch.cloud.google.com/id/2-1). Rodri\\'s leadership on the field helped integrate young talents [bet9ja](https://vertexaisearch.cloud.google.com/id/2-0).\\n*   **Strategic Rest:** He started in six of Spain\\'s seven matches, only sitting out the final group stage game against Slovakia, which Spain won 1-0. This strategic rest allowed Rodri to stay fresh for the knockout stages [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Calmness Under Pressure:** Rodri\\'s calmness under pressure was a recurring theme throughout the tournament [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Dictating Tempo:** His ability to dictate the tempo of the game, coupled with his defensive prowess, made Rodri indispensable [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Orchestration:** Rodri\\'s orchestration was crucial in maintaining possession and preventing Germany from gaining momentum in the quarter-final [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Midfield Control:** His performance against France in the semi-finals was another masterclass in midfield control [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Composure and Strategic Thinking:** Rodri\\'s composure and strategic thinking brought a sense of reliability to Spain\\'s gameplay [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n*   **Impact in the Final:** Despite his early exit due to a hamstring injury in the final against England, Rodri\\'s presence in the first half helped Spain establish control and set the tone for the rest of the match [upthrust](https://vertexaisearch.cloud.google.com/id/2-6).\\n\\nLuis de la Fuente, the coach of the Spanish team, described Rodri as a \"perfect computer\" due to his precise passing and exceptional understanding of the game [indiatimes](https://vertexaisearch.cloud.google.com/id/2-2) [bet9ja](https://vertexaisearch.cloud.google.com/id/2-0). UEFA\\'s team of technical observers at EURO 2024 also recognized Rodri\\'s influence in central midfield [uefa](https://vertexaisearch.cloud.google.com/id/2-7).\\n',\n",
       "  \"Lamine Yamal was named Euro 2024 Young Player of the Tournament due to several outstanding achievements [uefa](https://vertexaisearch.cloud.google.com/id/3-0) [beinsports](https://vertexaisearch.cloud.google.com/id/3-1) [thehindu](https://vertexaisearch.cloud.google.com/id/3-2). He played in all seven of Spain's Euro 2024 matches, starting in six of them [uefa](https://vertexaisearch.cloud.google.com/id/3-0). He became the youngest player ever to play in the tournament when he started against Croatia at 16 years, 338 days old [uefa](https://vertexaisearch.cloud.google.com/id/3-0) [uefa](https://vertexaisearch.cloud.google.com/id/3-3). In the semi-final against France, he scored a remarkable goal, making him the youngest goalscorer in Euros history at 16 years, 362 days [wikipedia](https://vertexaisearch.cloud.google.com/id/3-4) [uefa](https://vertexaisearch.cloud.google.com/id/3-0) [uefa](https://vertexaisearch.cloud.google.com/id/3-3) [beinsports](https://vertexaisearch.cloud.google.com/id/3-1) [thehindu](https://vertexaisearch.cloud.google.com/id/3-2). Furthermore, he provided four assists during the tournament [wikipedia](https://vertexaisearch.cloud.google.com/id/3-4) [thehindu](https://vertexaisearch.cloud.google.com/id/3-5) [beinsports](https://vertexaisearch.cloud.google.com/id/3-1). In the final, he set up the opening goal against England [uefa](https://vertexaisearch.cloud.google.com/id/3-0).\\n\\nKey statistics from the tournament include [uefa](https://vertexaisearch.cloud.google.com/id/3-6) [uefa](https://vertexaisearch.cloud.google.com/id/3-7):\\n*   7 Matches played\\n*   507 Minutes played\\n*   1 Goal\\n*   4 Assists\\n\\nThese performances led to Yamal receiving the Euro 2024 Young Player of the Tournament award [uefa](https://vertexaisearch.cloud.google.com/id/3-0) [beinsports](https://vertexaisearch.cloud.google.com/id/3-1) [thehindu](https://vertexaisearch.cloud.google.com/id/3-2).\\n\"],\n",
       " 'sources_gathered': [{'label': 'youtube',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-0',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGFcidniPKtBR-_QjSR1P1Oathq_0T9FTwfpCAWZxbXsroItHQU8zRcyOPDgMcvsWoD2fEnwYFKwanV18ep2_cyS5BlHF6-OFNsijWb-peAgsgLAVRiubekRnzMugsYtiWrhZyO3Q=='},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-1',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig=='},\n",
       "  {'label': 'foxsports',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o='},\n",
       "  {'label': 'youtube',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGZc-qDhRx_v3mPelXEfAVmWCpNTa_rzUKundc0pRc7PlTgppymao-_wO7O1oPaAhJYLcZkazIg8T5jA6t9OGgOxUd_Vl88BjouHsot0OK8TlM5hmPf4ECMWGeJthqVwndE3h4wdQ=='},\n",
       "  {'label': 'uefa',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP'},\n",
       "  {'label': 'olympics',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-6',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe'},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-1',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig=='},\n",
       "  {'label': 'foxsports',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz'},\n",
       "  {'label': 'youtube',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-0',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGFcidniPKtBR-_QjSR1P1Oathq_0T9FTwfpCAWZxbXsroItHQU8zRcyOPDgMcvsWoD2fEnwYFKwanV18ep2_cyS5BlHF6-OFNsijWb-peAgsgLAVRiubekRnzMugsYtiWrhZyO3Q=='},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-1',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig=='},\n",
       "  {'label': 'foxsports',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz'},\n",
       "  {'label': 'olympics',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-6',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe'},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-1',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEk7ApC7Y41UOrTWJ40wP2rsT0VDxqhqF-WJEI-FNKW7SNpR7LoA22sRQecS8hZNeZ_-62Vh7X75RmcmZUtnAOuQunrLAsETkkSx5l75dt9ESgTRkIURwtu4Pew7hn8yFz_LY_FJXUpmRfoWP7MWrDfPHcKrOpfmKqONj6mJcASNvAfCZ0p6qK3K4PvKWye6NyBMyYxWCuJig=='},\n",
       "  {'label': 'foxsports',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHh_4hBL0Giyuw_cyfT8m7tUSnMqBqH4Lis1CtJICPJNGGLhT6PADTIoUtrj3Rl5qcKNE9T6rzOmedAER_gxJOBDrCF8pnr9lUvhYvmDJxYCJzELkE5rTap4dx6FzOIKZKm1QBp5aHXzd_LCkSTV9ag7Q1A6_t8Vjdbskch6ZG3BoIfjYDQSPgRKDNFAAwt5J07cVFV5pDQzggmM7pxwsUz4drz'},\n",
       "  {'label': 'youtube',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-7',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFgwKo5lPes5M_GObnkYEzn3QYn1kpTQpx42ANaNqvNMgRsB1Xp2TIXI82SYTSYuLd9ysgKfmlJJy3lcLxrmNBg1R_Z37PCO9vbqIBIbw6DKqMif7pHdtDTS7FUq69c29hkYb_b5w=='},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o='},\n",
       "  {'label': 'youtube',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGZc-qDhRx_v3mPelXEfAVmWCpNTa_rzUKundc0pRc7PlTgppymao-_wO7O1oPaAhJYLcZkazIg8T5jA6t9OGgOxUd_Vl88BjouHsot0OK8TlM5hmPf4ECMWGeJthqVwndE3h4wdQ=='},\n",
       "  {'label': 'uefa',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP'},\n",
       "  {'label': 'olympics',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-6',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o='},\n",
       "  {'label': 'olympics',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-6',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFARil0pwjYQuFrDObawlDzu-eVtUPC4_nINjcXT-mlTL3MDgVPI83UB8gWS1rzGZkaMEmAUIeAzo2ihpMXUsWibzVzeAdQ7nUyqAOq0En87kpfuISduBuWI3__7yJw-vmdApD56-_G2ZhhZC4d_ll2iyNBaZHxxdNqXbb76mUiq99xV0hdoPEkp9RLk7T-uYYfTYXa8oYCXy2ysa9SZDa9hffEHrVe'},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-8',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFdu_dxqteuc9vM3oGH5WgEnFuOA6vlmbqof-iVRg2OviD2jzkp1jlCRsWkLfb64cK8TJ_g5jKKfZgmaMCk4LA-E2zjYGBfmsWiHdwfSg5Zv3VDMngM3HxT-VLjWYdBdpvpcBTj9VNRkqSCAjGVL9ar0VAOF0uRF6Z96LFz7G9KCSL50llqG7XLpbXmQTFIV4FUsffI8aQG9KKmIaZ1eGqeWQl2xaaRu6-Pwzqxizg8'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGd9ZQky3X7RQLbTs6mY1i4Pg7ppcI5H_vtxpvQPiEyD8Qw0f7hjvn3QeoOeAVcCG_pEt5Aeu8ofWCgjwQy4_u6qU-NOOJsYPWOW94XcvtkmKiv46vbNkJF-Mb4OpvBztrDa28BfIdCGHdfF9o='},\n",
       "  {'label': 'ndtv',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-9',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFRRH83ij2MgKWwrGFVWMaFDAT0_GKCFdwVIjaYn7DOoBlxXCGR-Y2RTw9AdKH8dYuhXxSxUTaZNXOBac2nknNZpdmwJiGIj51H6lRWREPUPOiKQkfVPJ0f4ubRSJBLm7_QcAkz4BwzJr3OM06jh-41TbNFZ9t6D7WrbzxmSs7x1O5DCnrPM2OeI6Nc0OhVT0AbeC6f_dTaBR9APlQFDrzIsvDIAn-W5eWuEohDs8w6np0eW65RuhQWrofdY8vFz-bsHgK0J3ew'},\n",
       "  {'label': 'uefa',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG1Lj9FnmuckfU0k1NC_ThQBZVxFCppp4tPl4FCcM3JZGF9aPvn9ZNFUo0fLfqw4Adt63Cdv8thcFSbsBRcf3rj1sz4LALJvrGfh6OayGo0KJ-UEKmKoOz8cxj5nIILCzKjFh2_0ZgTwrf1pkhhYbnWqj2E8hrVN4S5_sxvlCpLXPxjTsE4R0gYKXH_utqqm1NBkpl3p-C9v6kz-zm6V-JJoePAppIXFICF0DMYjOIBA9Mj0z4yO9Y9Tdgx2oaP'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-10',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGbyTk6AGj4XMhW66noNoKqe8eCt9-HZUMs6FXsKVyXcMuoG1WLLhBHa9dITcU3zQFJqCzcxPmnu6rj3ZHmJp-n2xdffBtWYFl2pqxmLrEiZONNYLwleA-T8cnaL7gXWfFlJ2jnvB0='},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-10',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXGbyTk6AGj4XMhW66noNoKqe8eCt9-HZUMs6FXsKVyXcMuoG1WLLhBHa9dITcU3zQFJqCzcxPmnu6rj3ZHmJp-n2xdffBtWYFl2pqxmLrEiZONNYLwleA-T8cnaL7gXWfFlJ2jnvB0='},\n",
       "  {'label': 'transfermarkt',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/0-11',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFMeGs_GRmx0zI6E_xQZfylxykYcTT9MnZlM3ICoa41Pogn4H-1tLirtdPBOrumyI8s_C9i9cBukjUKHxlPfPP49aqTep7xFPgfe2uQFyG37Acsn9RtVv5VenCS5kfPLDQB7sGR-Tyj6wGyiptaTP1uhRnGgYg0u92BW5OH-MY='},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-0',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFY5CRvcfjdkBz3h8Md_PscguyZ7LtYrxeHHP3eagcmIOnjaMyZbOHFqUAsa2cgkwvb26FZTvGiRgLKNLfiAsH1oP-5kGwnL6Ejhm4ZXhWGg0R3yE_8zkIKde4RgjIXlBvQW4kZ-LI5yhag-ESoh771z6hob8AigAVXT7WeWABMlQNfcbyG_UZIkqAs18U5e6to44ruNbSyDIyd5gobsVpEmdU256oVxa9d7co='},\n",
       "  {'label': 'coachesvoice',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-1',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHxgpkZWF64tZ8-iypkI2fiFi2cpsj4AFjZXkcYUzf5hSOWYb5etIbCoZd_L6zDJi6mWWisxAO6T5V4T8H7XiRow6dmVqXpSEIKhPSdG0HAQbQK74lwxeV_uXx9fSPllIKPOs2tFNRqTuHdJBNcwpcJp6MJbVLEskyhYnWlyOd9ouQv'},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEV-g6Hxxcan5Xre1yYGM3BtP3fo9uF2zHQ9sVeK_4poD-aBN5CRvhz471beYCC26wdrjhtbiCvDT9dAnPI-ruyqJZhwB3vbKS5HCFb9tPn7Dkj99LpjLXqYyuzbFGsHCbr5SCHoMEhNg--dMU7xB5TiH8HeqKH8B4lk_h00dqhEVQFb05w5TuLtbX1UdXN6NDzHlFN_xyXzOU='},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL'},\n",
       "  {'label': 'wikipedia',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-3',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFNtaBQTFVnSbEW5Bbo8LUIs0h5cv4Pc4aS6Q8qG7jIMCsJPKy5_o6R8x7Z_xQ7AuDEAFlj2JY_AVV1YpwLqtXZxiAyvpfboH_VuMpo6MVbQAu2ZASSSD2slWaIqsUGkTEaPa2z2809z7UhEWUL'},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHtzvfIxJ0Lv3W7kqwlmY7CzFQxcbvXZqh4rRp3xBgV1vY01z4BRWA-GFu4INE8yFv9DE-eCib4cYnC-iv_PVgR8yPkBv8uRhI93Yf29MdbDoi_LGu46heOoxRLdMV58jlLI5nr-1sxKdfPutXE_rjuKehCswPGD-9RlbPI8NjyUQ69XAAOjDDhAN-MBxcIt_r3raV86AQfoo1UtYpUoUjhTGVcYBisvHRxv8-XjDjkr65nPm9vdaO7j28yCcokCCeGWv074_AGWeewDQWwczQM'},\n",
       "  {'label': 'newsbytesapp',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFIl5Xc3f44I1nYw_YrJqkByrRl20SiAopZqjfJIK6U62o27CrxLvxaJ4v1M7L5eOfTMMlBCHHYCUooPoG0aObaeRG3YxrcoFT7Xtd4KIrvCS6AWWRpOZasCW-sGtFA56DEDf-qbJ8lsXEJ4GQ386iGTdRkyK9EtJWw1mRpDu7dfPQ6Qy1hNIqTgTdo-3yq1WNmWEl8Xtnag0s='},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHtzvfIxJ0Lv3W7kqwlmY7CzFQxcbvXZqh4rRp3xBgV1vY01z4BRWA-GFu4INE8yFv9DE-eCib4cYnC-iv_PVgR8yPkBv8uRhI93Yf29MdbDoi_LGu46heOoxRLdMV58jlLI5nr-1sxKdfPutXE_rjuKehCswPGD-9RlbPI8NjyUQ69XAAOjDDhAN-MBxcIt_r3raV86AQfoo1UtYpUoUjhTGVcYBisvHRxv8-XjDjkr65nPm9vdaO7j28yCcokCCeGWv074_AGWeewDQWwczQM'},\n",
       "  {'label': 'newsbytesapp',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFIl5Xc3f44I1nYw_YrJqkByrRl20SiAopZqjfJIK6U62o27CrxLvxaJ4v1M7L5eOfTMMlBCHHYCUooPoG0aObaeRG3YxrcoFT7Xtd4KIrvCS6AWWRpOZasCW-sGtFA56DEDf-qbJ8lsXEJ4GQ386iGTdRkyK9EtJWw1mRpDu7dfPQ6Qy1hNIqTgTdo-3yq1WNmWEl8Xtnag0s='},\n",
       "  {'label': 'newsbytesapp',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-5',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFIl5Xc3f44I1nYw_YrJqkByrRl20SiAopZqjfJIK6U62o27CrxLvxaJ4v1M7L5eOfTMMlBCHHYCUooPoG0aObaeRG3YxrcoFT7Xtd4KIrvCS6AWWRpOZasCW-sGtFA56DEDf-qbJ8lsXEJ4GQ386iGTdRkyK9EtJWw1mRpDu7dfPQ6Qy1hNIqTgTdo-3yq1WNmWEl8Xtnag0s='},\n",
       "  {'label': 'aljazeera',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-2',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEV-g6Hxxcan5Xre1yYGM3BtP3fo9uF2zHQ9sVeK_4poD-aBN5CRvhz471beYCC26wdrjhtbiCvDT9dAnPI-ruyqJZhwB3vbKS5HCFb9tPn7Dkj99LpjLXqYyuzbFGsHCbr5SCHoMEhNg--dMU7xB5TiH8HeqKH8B4lk_h00dqhEVQFb05w5TuLtbX1UdXN6NDzHlFN_xyXzOU='},\n",
       "  {'label': 'sportsmole',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-6',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXEVHkRwlOhx_8CZHVDe9XPE_nCs4XYVbx6aIl19aXGNLZxDpcsK5-hcYvMX_et8vasZtMNzmJNTtVd3Vne666vIkkRFUNJxVSBH9bMoGEFcPMcPoxFMUY5LV1YGZjm3n6xbDrkskawWb9MBS-zIIXiXZk7n6TluCji9k3ur3i5-ZhJcgPtAYU-KyfWRTdN0JY4bJt4tAl87Ba9ZInk9YuRlLlAFJ6flaKI-a4cZSXYDQeERhB742z_heWOhDchdvlPfoJaAuYSKKaABrbZQeZw='},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHtzvfIxJ0Lv3W7kqwlmY7CzFQxcbvXZqh4rRp3xBgV1vY01z4BRWA-GFu4INE8yFv9DE-eCib4cYnC-iv_PVgR8yPkBv8uRhI93Yf29MdbDoi_LGu46heOoxRLdMV58jlLI5nr-1sxKdfPutXE_rjuKehCswPGD-9RlbPI8NjyUQ69XAAOjDDhAN-MBxcIt_r3raV86AQfoo1UtYpUoUjhTGVcYBisvHRxv8-XjDjkr65nPm9vdaO7j28yCcokCCeGWv074_AGWeewDQWwczQM'},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-7',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXG7_kutwvl9NHZQl-k0Vpvj_1I7o8MCX8jNlw6rYXEOGSC9QcRvzaH9ycR3JQUjJLvUhUSeaR7hmJ-qPTgMSfw9US7uXQzTF3CJ-tXnIVI1UC8VRyJoW6fH2r-MRFd5EI-PS494grt4Xey1x7WsaZ_Q7tRcQgVX_EM0JxQK12s8yYAY3TIUpa1L5fZOmsi6ZKq-jrXYOmIV5OTu2AaleBeQE_Z-B10oU2qin2Q3T8w6LP2ispUlVEh54d5fWLcHlEtskrRHC8psjrarTgqn'},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHtzvfIxJ0Lv3W7kqwlmY7CzFQxcbvXZqh4rRp3xBgV1vY01z4BRWA-GFu4INE8yFv9DE-eCib4cYnC-iv_PVgR8yPkBv8uRhI93Yf29MdbDoi_LGu46heOoxRLdMV58jlLI5nr-1sxKdfPutXE_rjuKehCswPGD-9RlbPI8NjyUQ69XAAOjDDhAN-MBxcIt_r3raV86AQfoo1UtYpUoUjhTGVcYBisvHRxv8-XjDjkr65nPm9vdaO7j28yCcokCCeGWv074_AGWeewDQWwczQM'},\n",
       "  {'label': 'spanishprofootball',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-8',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXFG8gCwweIne3MmZpbUnDq24EeYu1w6OpSNeS2U5DtRYUbqRVtIjCnFAOjlXy8XjD8MvbmoNIsRD9rdadJ7tWoyG3T5fj2QvlMdWjCXwpMs7W3D_49AT_d1vWRuu8i_-nAK0WHpo6Wo5abiRpwUyjtFX1rYGXujmwsodi5hUV9Q4Qd1ltJe2cuLhq2cPRU='},\n",
       "  {'label': 'thehindu',\n",
       "   'short_url': 'https://vertexaisearch.cloud.google.com/id/1-4',\n",
       "   'value': 'https://vertexaisearch.cloud.google.com/grounding-api-redirect/AbF9wXHtzvfIxJ0Lv3W7kqwlmY7CzFQxcbvXZqh4rRp3xBgV1vY01z4BRWA-GFu4INE8yFv9DE-eCib4cYnC-iv_PVgR8yPkBv8uRhI93Yf29MdbDoi_LGu46heOoxRLdMV58jlLI5nr-1sxKdfPutXE_rjuKehCswPGD-9RlbPI8NjyUQ69XAAOjDDhAN-MBxcIt_r3raV86AQfoo1UtYpUoUjhTGVcYBisvHRxv8-XjDjkr

Download .txt

gitextract_ckbl8hy3/

├── .gitignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README.md
├── backend/
│   ├── .gitignore
│   ├── LICENSE
│   ├── Makefile
│   ├── langgraph.json
│   ├── pyproject.toml
│   ├── src/
│   │   └── agent/
│   │       ├── __init__.py
│   │       ├── app.py
│   │       ├── configuration.py
│   │       ├── content_enhancement_decision.py
│   │       ├── enhanced_graph_nodes.py
│   │       ├── graph.py
│   │       ├── prompts.py
│   │       ├── report_level_enhancement.py
│   │       ├── state.py
│   │       ├── tools_and_schemas.py
│   │       └── utils.py
│   └── test-agent.ipynb
├── docker-compose.yml
├── docs/
│   ├── document-generation-flow-ZH.md
│   └── document-generation-flow.md
└── frontend/
    ├── .gitignore
    ├── components.json
    ├── eslint.config.js
    ├── index.html
    ├── package.json
    ├── src/
    │   ├── App.tsx
    │   ├── components/
    │   │   ├── ActivityTimeline.tsx
    │   │   ├── ChatMessagesView.tsx
    │   │   ├── InputForm.tsx
    │   │   ├── ResearchThinkPanel.tsx
    │   │   ├── WelcomeScreen.tsx
    │   │   └── ui/
    │   │       ├── badge.tsx
    │   │       ├── button.tsx
    │   │       ├── card.tsx
    │   │       ├── input.tsx
    │   │       ├── scroll-area.tsx
    │   │       ├── select.tsx
    │   │       ├── tabs.tsx
    │   │       └── textarea.tsx
    │   ├── global.css
    │   ├── lib/
    │   │   └── utils.ts
    │   ├── main.tsx
    │   ├── utils/
    │   │   └── dataTransformer.ts
    │   └── vite-env.d.ts
    ├── tsconfig.json
    ├── tsconfig.node.json
    └── vite.config.ts

Download .txt

SYMBOL INDEX (130 symbols across 26 files)

FILE: backend/src/agent/app.py
  function create_frontend_router (line 11) | def create_frontend_router(build_dir="../frontend/dist"):

FILE: backend/src/agent/configuration.py
  class Configuration (line 8) | class Configuration(BaseModel):
    method from_runnable_config (line 43) | def from_runnable_config(

FILE: backend/src/agent/content_enhancement_decision.py
  class EnhancementDecision (line 13) | class EnhancementDecision:
  class ContentEnhancementDecisionMaker (line 22) | class ContentEnhancementDecisionMaker:
    method __init__ (line 25) | def __init__(self):
    method analyze_enhancement_need (line 30) | def analyze_enhancement_need(
    method _build_analysis_prompt (line 65) | def _build_analysis_prompt(
    method _parse_llm_decision (line 124) | def _parse_llm_decision(
    method _calculate_url_priority (line 187) | def _calculate_url_priority(self, source: Dict[str, Any]) -> float:
    method enhance_content_with_firecrawl (line 215) | async def enhance_content_with_firecrawl(
    method _assess_enhancement_quality (line 259) | def _assess_enhancement_quality(self, content: str) -> str:
  function get_content_enhancement_decision_maker (line 279) | def get_content_enhancement_decision_maker():

FILE: backend/src/agent/enhanced_graph_nodes.py
  function content_enhancement_analysis (line 20) | def content_enhancement_analysis(state: OverallState, config: RunnableCo...
  function should_enhance_content (line 176) | def should_enhance_content(state: OverallState) -> str:
  function enhanced_reflection (line 212) | def enhanced_reflection(state: OverallState, config: RunnableConfig) -> ...
  function format_enhancement_decision_log (line 252) | def format_enhancement_decision_log(decision: EnhancementDecision) -> str:

FILE: backend/src/agent/graph.py
  function generate_query (line 55) | def generate_query(state: OverallState, config: RunnableConfig) -> Query...
  function continue_to_web_research (line 96) | def continue_to_web_research(state: QueryGenerationState):
  function web_research (line 119) | def web_research(state: WebSearchState, config: RunnableConfig) -> Overa...
  function reflection (line 241) | def reflection(state: OverallState, config: RunnableConfig) -> OverallSt...
  function evaluate_research_enhanced (line 387) | def evaluate_research_enhanced(state: OverallState, config: RunnableConf...
  function decide_next_research_step (line 443) | def decide_next_research_step(state: OverallState):
  function finalize_answer (line 489) | def finalize_answer(state: OverallState, config: RunnableConfig) -> dict:
  function build_source_mapping (line 663) | def build_source_mapping(sources_gathered):
  function extract_domain (line 698) | def extract_domain(url):
  function convert_citations_to_readable (line 719) | def convert_citations_to_readable(content, source_mapping):
  function clean_malformed_citations (line 751) | def clean_malformed_citations(content):
  function clean_generated_content (line 767) | def clean_generated_content(content):
  function remove_prompt_remnants (line 792) | def remove_prompt_remnants(content):
  function final_quality_check (line 809) | def final_quality_check(content):
  function planner_node (line 849) | def planner_node(state: OverallState, config: RunnableConfig) -> dict:
  function record_task_completion_node (line 886) | def record_task_completion_node(state: OverallState, config: RunnableCon...
  function _summarize_task_findings (line 962) | def _summarize_task_findings(task_description: str, web_results: List[st...
  function decide_next_step_in_plan (line 996) | def decide_next_step_in_plan(state: OverallState) -> str:
  function split_by_tokens (line 1069) | def split_by_tokens(texts, max_tokens=150000, encoding_name="cl100k_base"):
  function extract_key_sections (line 1108) | def extract_key_sections(content, max_tokens, encoding):
  function is_factual_section (line 1151) | def is_factual_section(section):
  function is_critical_section (line 1169) | def is_critical_section(section):
  function truncate_section (line 1179) | def truncate_section(section, max_tokens, encoding):
  function simple_split_by_chars (line 1202) | def simple_split_by_chars(texts, max_chars):

FILE: backend/src/agent/prompts.py
  function get_current_date (line 5) | def get_current_date():

FILE: backend/src/agent/report_level_enhancement.py
  class ReportEnhancementRequest (line 18) | class ReportEnhancementRequest:
  class ReportEnhancementResult (line 28) | class ReportEnhancementResult:
  class ReportLevelEnhancer (line 36) | class ReportLevelEnhancer:
    method __init__ (line 39) | def __init__(self):
    method analyze_report_enhancement_needs (line 44) | def analyze_report_enhancement_needs(
    method _parse_enhancement_requests (line 105) | def _parse_enhancement_requests(self, analysis_text: str) -> List[Repo...
    method _parse_single_request (line 129) | def _parse_single_request(self, request_text: str) -> Optional[ReportE...
    method execute_targeted_enhancement (line 168) | def execute_targeted_enhancement(
    method _find_matching_urls (line 249) | def _find_matching_urls(
    method _assess_enhancement_quality (line 300) | def _assess_enhancement_quality(
  function integrate_report_enhancement_into_finalize (line 328) | def integrate_report_enhancement_into_finalize(

FILE: backend/src/agent/state.py
  class LedgerEntry (line 15) | class LedgerEntry(TypedDict):
  class OverallState (line 23) | class OverallState(TypedDict):
  class ReflectionState (line 66) | class ReflectionState(TypedDict):
  class Query (line 76) | class Query(TypedDict):
  class QueryGenerationState (line 81) | class QueryGenerationState(TypedDict):
  class WebSearchState (line 87) | class WebSearchState(TypedDict):
  class SearchStateOutput (line 94) | class SearchStateOutput:

FILE: backend/src/agent/tools_and_schemas.py
  class SearchQueryList (line 5) | class SearchQueryList(BaseModel):
  class Reflection (line 14) | class Reflection(BaseModel):
  class ResearchTask (line 26) | class ResearchTask(BaseModel):
  class ResearchPlan (line 31) | class ResearchPlan(BaseModel):
  class LedgerEntry (line 35) | class LedgerEntry(BaseModel):

FILE: backend/src/agent/utils.py
  function get_research_topic (line 5) | def get_research_topic(messages: List[AnyMessage]) -> str:
  function resolve_urls (line 22) | def resolve_urls(urls_to_resolve: List[Any], id: int) -> Dict[str, str]:
  function insert_citation_markers (line 41) | def insert_citation_markers(text, citations_list):
  function get_citations (line 80) | def get_citations(response, resolved_urls_map):

FILE: frontend/src/App.tsx
  type StreamEvent (line 10) | interface StreamEvent {
  type SourceData (line 14) | interface SourceData {
  function App (line 21) | function App() {

FILE: frontend/src/components/ActivityTimeline.tsx
  type ProcessedEvent (line 21) | interface ProcessedEvent {
  type ActivityTimelineProps (line 26) | interface ActivityTimelineProps {
  function ActivityTimeline (line 31) | function ActivityTimeline({

FILE: frontend/src/components/ChatMessagesView.tsx
  type MdComponentProps (line 19) | type MdComponentProps = {
  type HumanMessageBubbleProps (line 140) | interface HumanMessageBubbleProps {
  type AiMessageBubbleProps (line 164) | interface AiMessageBubbleProps {
  type ChatMessagesViewProps (line 251) | interface ChatMessagesViewProps {
  function ChatMessagesView (line 261) | function ChatMessagesView({

FILE: frontend/src/components/InputForm.tsx
  type InputFormProps (line 14) | interface InputFormProps {

FILE: frontend/src/components/ResearchThinkPanel.tsx
  type ResearchThinkPanelProps (line 5) | interface ResearchThinkPanelProps {
  type TaskCardProps (line 230) | interface TaskCardProps {
  type StepCardProps (line 309) | interface StepCardProps {

FILE: frontend/src/components/WelcomeScreen.tsx
  type WelcomeScreenProps (line 3) | interface WelcomeScreenProps {

FILE: frontend/src/components/ui/badge.tsx
  function Badge (line 28) | function Badge({

FILE: frontend/src/components/ui/button.tsx
  function Button (line 38) | function Button({

FILE: frontend/src/components/ui/card.tsx
  function Card (line 5) | function Card({ className, ...props }: React.ComponentProps<"div">) {
  function CardHeader (line 18) | function CardHeader({ className, ...props }: React.ComponentProps<"div">) {
  function CardTitle (line 31) | function CardTitle({ className, ...props }: React.ComponentProps<"div">) {
  function CardDescription (line 41) | function CardDescription({ className, ...props }: React.ComponentProps<"...
  function CardAction (line 51) | function CardAction({ className, ...props }: React.ComponentProps<"div">) {
  function CardContent (line 64) | function CardContent({ className, ...props }: React.ComponentProps<"div"...
  function CardFooter (line 74) | function CardFooter({ className, ...props }: React.ComponentProps<"div">) {

FILE: frontend/src/components/ui/input.tsx
  function Input (line 5) | function Input({ className, type, ...props }: React.ComponentProps<"inpu...

FILE: frontend/src/components/ui/scroll-area.tsx
  function ScrollArea (line 6) | function ScrollArea({
  function ScrollBar (line 29) | function ScrollBar({

FILE: frontend/src/components/ui/select.tsx
  function Select (line 7) | function Select({
  function SelectGroup (line 13) | function SelectGroup({
  function SelectValue (line 19) | function SelectValue({
  function SelectTrigger (line 25) | function SelectTrigger({
  function SelectContent (line 51) | function SelectContent({
  function SelectLabel (line 86) | function SelectLabel({
  function SelectItem (line 99) | function SelectItem({
  function SelectSeparator (line 123) | function SelectSeparator({
  function SelectScrollUpButton (line 136) | function SelectScrollUpButton({
  function SelectScrollDownButton (line 154) | function SelectScrollDownButton({

FILE: frontend/src/components/ui/tabs.tsx
  function Tabs (line 6) | function Tabs({
  function TabsList (line 19) | function TabsList({
  function TabsTrigger (line 35) | function TabsTrigger({
  function TabsContent (line 51) | function TabsContent({

FILE: frontend/src/components/ui/textarea.tsx
  function Textarea (line 5) | function Textarea({ className, ...props }: React.ComponentProps<"textare...

FILE: frontend/src/lib/utils.ts
  function cn (line 4) | function cn(...inputs: ClassValue[]) {

FILE: frontend/src/utils/dataTransformer.ts
  type EventData (line 6) | interface EventData {
  type SourceData (line 10) | interface SourceData {
  type TaskData (line 17) | interface TaskData {
  type StateData (line 23) | interface StateData {
  type TaskDetail (line 30) | interface TaskDetail {
  type TaskStep (line 37) | interface TaskStep {
  type StepDetail (line 46) | interface StepDetail {
  type PlanningInfo (line 61) | interface PlanningInfo {
  type ProcessedResearchData (line 71) | interface ProcessedResearchData {
  function transformEventsToHierarchy (line 81) | function transformEventsToHierarchy(
  function extractPlanningInfo (line 143) | function extractPlanningInfo(events: EventData[], state: StateData): Pla...
  function buildTaskDetails (line 170) | function buildTaskDetails(events: EventData[], state: StateData): TaskDe...
  function buildTaskSteps (line 206) | function buildTaskSteps(
  function getCurrentTaskId (line 474) | function getCurrentTaskId(events: EventData[], state: StateData): string...
  function determineOverallStatus (line 488) | function determineOverallStatus(events: EventData[]): 'planning' | 'rese...
  function getEnhancementStatusMessage (line 507) | function getEnhancementStatusMessage(status: string): string {
  function debugTransformResult (line 523) | function debugTransformResult(data: ProcessedResearchData): void {

Download .json

Condensed preview — 52 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (504K chars).

[
  {
    "path": ".gitignore",
    "chars": 3587,
    "preview": "# Node / Frontend\nnode_modules/\nfrontend/dist/\nfrontend/.vite/\nfrontend/coverage/\n.DS_Store\n*.local\n\n# Logs\nlogs\n*.log\nn"
  },
  {
    "path": "Dockerfile",
    "chars": 2835,
    "preview": "# Stage 1: Build React Frontend\nFROM node:20-alpine AS frontend-builder\n\n# Set working directory for frontend\nWORKDIR /a"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "Makefile",
    "chars": 677,
    "preview": ".PHONY: help dev-frontend dev-backend dev\n\nhelp:\n\t@echo \"Available commands:\"\n\t@echo \"  make dev-frontend    - Starts th"
  },
  {
    "path": "README.md",
    "chars": 7645,
    "preview": "# 🚀 Enhanced Version\n\n> Based on the original project, I have optimized the Agent workflow and frontend display effects."
  },
  {
    "path": "backend/.gitignore",
    "chars": 3155,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\nuv.lock\n\n# C extensions\n*.so\n\n# Distribution /"
  },
  {
    "path": "backend/LICENSE",
    "chars": 1071,
    "preview": "MIT License\n\nCopyright (c) 2025 Philipp Schmid\n\nPermission is hereby granted, free of charge, to any person obtaining a "
  },
  {
    "path": "backend/Makefile",
    "chars": 2000,
    "preview": ".PHONY: all format lint test tests test_watch integration_tests docker_tests help extended_tests\n\n# Default target execu"
  },
  {
    "path": "backend/langgraph.json",
    "chars": 159,
    "preview": "{\n  \"dependencies\": [\".\"],\n  \"graphs\": {\n    \"agent\": \"./src/agent/graph.py:graph\"\n  },\n  \"http\": {\n    \"app\": \"./src/ag"
  },
  {
    "path": "backend/pyproject.toml",
    "chars": 1337,
    "preview": "[project]\nname = \"agent\"\nversion = \"0.0.1\"\ndescription = \"Backend for the LangGraph agent\"\nauthors = [\n    { name = \"Phi"
  },
  {
    "path": "backend/src/agent/__init__.py",
    "chars": 51,
    "preview": "from agent.graph import graph\n\n__all__ = [\"graph\"]\n"
  },
  {
    "path": "backend/src/agent/app.py",
    "chars": 1907,
    "preview": "# mypy: disable - error - code = \"no-untyped-def,misc\"\nimport pathlib\nfrom fastapi import FastAPI, Request, Response\nfro"
  },
  {
    "path": "backend/src/agent/configuration.py",
    "chars": 1841,
    "preview": "import os\nfrom pydantic import BaseModel, Field\nfrom typing import Any, Optional\n\nfrom langchain_core.runnables import R"
  },
  {
    "path": "backend/src/agent/content_enhancement_decision.py",
    "chars": 8795,
    "preview": "\"\"\"\n智能内容增强决策模块 - 决定何时使用Firecrawl进行深度内容抓取\n\"\"\"\n\nimport os\nfrom typing import Dict, List, Any, Optional\nfrom dataclasses im"
  },
  {
    "path": "backend/src/agent/enhanced_graph_nodes.py",
    "chars": 9120,
    "preview": "\"\"\"\n增强的Graph节点 - 集成智能Firecrawl内容增强功能\n\"\"\"\n\nimport os\nimport json\nfrom typing import List, Dict, Any\nfrom datetime import "
  },
  {
    "path": "backend/src/agent/graph.py",
    "chars": 47304,
    "preview": "import os\nimport json\nfrom typing import List\nfrom datetime import datetime\n\nfrom agent.tools_and_schemas import SearchQ"
  },
  {
    "path": "backend/src/agent/prompts.py",
    "chars": 20661,
    "preview": "from datetime import datetime\n\n\n# Get current date in a readable format\ndef get_current_date():\n    return datetime.now("
  },
  {
    "path": "backend/src/agent/report_level_enhancement.py",
    "chars": 14794,
    "preview": "\"\"\"\nReport-Level Content Enhancement Module\n\nDuring the final report generation phase, the LLM may discover it needs mor"
  },
  {
    "path": "backend/src/agent/state.py",
    "chars": 3494,
    "preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass, field\nfrom typing import TypedDict, List, Optiona"
  },
  {
    "path": "backend/src/agent/tools_and_schemas.py",
    "chars": 1443,
    "preview": "from typing import List\nfrom pydantic import BaseModel, Field\n\n\nclass SearchQueryList(BaseModel):\n    query: List[str] ="
  },
  {
    "path": "backend/src/agent/utils.py",
    "chars": 7208,
    "preview": "from typing import Any, Dict, List\nfrom langchain_core.messages import AnyMessage, AIMessage, HumanMessage\n\n\ndef get_res"
  },
  {
    "path": "backend/test-agent.ipynb",
    "chars": 123580,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n "
  },
  {
    "path": "docker-compose.yml",
    "chars": 1055,
    "preview": "volumes:\n  langgraph-data:\n    driver: local\nservices:\n  langgraph-redis:\n    image: docker.io/redis:6\n    healthcheck:\n"
  },
  {
    "path": "docs/document-generation-flow-ZH.md",
    "chars": 18468,
    "preview": "# 文档生成流程：从查询到综合研究报告\n\n## 增强的Agent工作流\n\n![增强的Agent工作流](../agent_new.png)\n\n*增强的agent工作流包含智能内容增强和双层评估系统，确保研究质量的全面性。*\n\n## 目录\n\n"
  },
  {
    "path": "docs/document-generation-flow.md",
    "chars": 67848,
    "preview": "# Document Generation Flow: From Query to Comprehensive Research Report\n\n## Enhanced Agent Workflow\n\n![Enhanced Agent Wo"
  },
  {
    "path": "frontend/.gitignore",
    "chars": 253,
    "preview": "# Logs\nlogs\n*.log\nnpm-debug.log*\nyarn-debug.log*\nyarn-error.log*\npnpm-debug.log*\nlerna-debug.log*\n\nnode_modules\ndist\ndis"
  },
  {
    "path": "frontend/components.json",
    "chars": 423,
    "preview": "{\n  \"$schema\": \"https://ui.shadcn.com/schema.json\",\n  \"style\": \"new-york\",\n  \"rsc\": false,\n  \"tsx\": true,\n  \"tailwind\": "
  },
  {
    "path": "frontend/eslint.config.js",
    "chars": 734,
    "preview": "import js from '@eslint/js'\nimport globals from 'globals'\nimport reactHooks from 'eslint-plugin-react-hooks'\nimport reac"
  },
  {
    "path": "frontend/index.html",
    "chars": 366,
    "preview": "<!doctype html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"UTF-8\" />\n    <link rel=\"icon\" type=\"image/svg+xml\" href=\"/"
  },
  {
    "path": "frontend/package.json",
    "chars": 1284,
    "preview": "{\n  \"name\": \"frontend\",\n  \"private\": true,\n  \"version\": \"0.0.0\",\n  \"type\": \"module\",\n  \"scripts\": {\n    \"dev\": \"vite\",\n "
  },
  {
    "path": "frontend/src/App.tsx",
    "chars": 14308,
    "preview": "import { useStream } from \"@langchain/langgraph-sdk/react\";\nimport type { Message } from \"@langchain/langgraph-sdk\";\nimp"
  },
  {
    "path": "frontend/src/components/ActivityTimeline.tsx",
    "chars": 5716,
    "preview": "import {\n  Card,\n  CardContent,\n  CardDescription,\n  CardHeader,\n} from \"@/components/ui/card\";\nimport { ScrollArea } fr"
  },
  {
    "path": "frontend/src/components/ChatMessagesView.tsx",
    "chars": 14814,
    "preview": "import type React from \"react\";\nimport type { Message } from \"@langchain/langgraph-sdk\";\nimport { ScrollArea } from \"@/c"
  },
  {
    "path": "frontend/src/components/InputForm.tsx",
    "chars": 6893,
    "preview": "import { useState } from \"react\";\nimport { Button } from \"@/components/ui/button\";\nimport { SquarePen, Brain, Send, Stop"
  },
  {
    "path": "frontend/src/components/ResearchThinkPanel.tsx",
    "chars": 17025,
    "preview": "import React from 'react';\nimport { ProcessedResearchData, TaskDetail, TaskStep } from '@/utils/dataTransformer';\nimport"
  },
  {
    "path": "frontend/src/components/WelcomeScreen.tsx",
    "chars": 996,
    "preview": "import { InputForm } from \"./InputForm\";\n\ninterface WelcomeScreenProps {\n  handleSubmit: (\n    submittedInputValue: stri"
  },
  {
    "path": "frontend/src/components/ui/badge.tsx",
    "chars": 1631,
    "preview": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class"
  },
  {
    "path": "frontend/src/components/ui/button.tsx",
    "chars": 2123,
    "preview": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class"
  },
  {
    "path": "frontend/src/components/ui/card.tsx",
    "chars": 1989,
    "preview": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Card({ className, ...props }: React.Component"
  },
  {
    "path": "frontend/src/components/ui/input.tsx",
    "chars": 967,
    "preview": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Input({ className, type, ...props }: React.Co"
  },
  {
    "path": "frontend/src/components/ui/scroll-area.tsx",
    "chars": 1631,
    "preview": "import * as React from \"react\"\nimport * as ScrollAreaPrimitive from \"@radix-ui/react-scroll-area\"\n\nimport { cn } from \"@"
  },
  {
    "path": "frontend/src/components/ui/select.tsx",
    "chars": 6239,
    "preview": "import * as React from \"react\"\nimport * as SelectPrimitive from \"@radix-ui/react-select\"\nimport { CheckIcon, ChevronDown"
  },
  {
    "path": "frontend/src/components/ui/tabs.tsx",
    "chars": 1955,
    "preview": "import * as React from \"react\"\nimport * as TabsPrimitive from \"@radix-ui/react-tabs\"\n\nimport { cn } from \"@/lib/utils\"\n\n"
  },
  {
    "path": "frontend/src/components/ui/textarea.tsx",
    "chars": 759,
    "preview": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Textarea({ className, ...props }: React.Compo"
  },
  {
    "path": "frontend/src/global.css",
    "chars": 5037,
    "preview": "@import \"tailwindcss\";\n@import \"tw-animate-css\";\n\n@custom-variant dark (&:is(.dark *));\n\n@theme inline {\n  --radius-sm: "
  },
  {
    "path": "frontend/src/lib/utils.ts",
    "chars": 169,
    "preview": "import { type ClassValue, clsx } from \"clsx\";\nimport { twMerge } from \"tailwind-merge\";\n\nexport function cn(...inputs: C"
  },
  {
    "path": "frontend/src/main.tsx",
    "chars": 328,
    "preview": "import { StrictMode } from \"react\";\nimport { createRoot } from \"react-dom/client\";\nimport { BrowserRouter } from \"react-"
  },
  {
    "path": "frontend/src/utils/dataTransformer.ts",
    "chars": 16274,
    "preview": "/**\n * 数据转换器：将平铺的事件流转换为层次化的任务结构\n */\n\n// 添加类型定义\nexport interface EventData {\n  [key: string]: unknown;\n}\n\nexport interfac"
  },
  {
    "path": "frontend/src/vite-env.d.ts",
    "chars": 38,
    "preview": "/// <reference types=\"vite/client\" />\n"
  },
  {
    "path": "frontend/tsconfig.json",
    "chars": 753,
    "preview": "{\n  \"compilerOptions\": {\n    \"tsBuildInfoFile\": \"./node_modules/.tmp/tsconfig.app.tsbuildinfo\",\n    \"target\": \"ES2020\",\n"
  },
  {
    "path": "frontend/tsconfig.node.json",
    "chars": 593,
    "preview": "{\n  \"compilerOptions\": {\n    \"tsBuildInfoFile\": \"./node_modules/.tmp/tsconfig.node.tsbuildinfo\",\n    \"target\": \"ES2022\","
  },
  {
    "path": "frontend/vite.config.ts",
    "chars": 804,
    "preview": "import path from \"node:path\";\nimport { fileURLToPath } from \"node:url\";\nimport { defineConfig } from \"vite\";\nimport reac"
  }
]

About this extraction

This page contains the full source code of the foreveryh/langgraph-deep-research GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 52 files (454.6 KB), approximately 142.7k tokens, and a symbol index with 130 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo