Repository: The-Pocket/PocketFlow-Tutorial-Codebase-Knowledge
Branch: main
Commit: c8a8ca17180c
Files: 203
Total size: 2.8 MB
Directory structure:
gitextract_ufv5rlmg/
├── .clinerules
├── .cursorrules
├── .dockerignore
├── .gitignore
├── .windsurfrules
├── Dockerfile
├── LICENSE
├── README.md
├── docs/
│ ├── AutoGen Core/
│ │ ├── 01_agent.md
│ │ ├── 02_messaging_system__topic___subscription_.md
│ │ ├── 03_agentruntime.md
│ │ ├── 04_tool.md
│ │ ├── 05_chatcompletionclient.md
│ │ ├── 06_chatcompletioncontext.md
│ │ ├── 07_memory.md
│ │ ├── 08_component.md
│ │ └── index.md
│ ├── Browser Use/
│ │ ├── 01_agent.md
│ │ ├── 02_system_prompt.md
│ │ ├── 03_browsercontext.md
│ │ ├── 04_dom_representation.md
│ │ ├── 05_action_controller___registry.md
│ │ ├── 06_message_manager.md
│ │ ├── 07_data_structures__views_.md
│ │ ├── 08_telemetry_service.md
│ │ └── index.md
│ ├── Celery/
│ │ ├── 01_celery_app.md
│ │ ├── 02_configuration.md
│ │ ├── 03_task.md
│ │ ├── 04_broker_connection__amqp_.md
│ │ ├── 05_worker.md
│ │ ├── 06_result_backend.md
│ │ ├── 07_beat__scheduler_.md
│ │ ├── 08_canvas__signatures___primitives_.md
│ │ ├── 09_events.md
│ │ ├── 10_bootsteps.md
│ │ └── index.md
│ ├── Click/
│ │ ├── 01_command___group.md
│ │ ├── 02_decorators.md
│ │ ├── 03_parameter__option___argument_.md
│ │ ├── 04_paramtype.md
│ │ ├── 05_context.md
│ │ ├── 06_term_ui__terminal_user_interface_.md
│ │ ├── 07_click_exceptions.md
│ │ └── index.md
│ ├── Codex/
│ │ ├── 01_terminal_ui__ink_components_.md
│ │ ├── 02_input_handling__textbuffer_editor_.md
│ │ ├── 03_agent_loop.md
│ │ ├── 04_approval_policy___security.md
│ │ ├── 05_response___tool_call_handling.md
│ │ ├── 06_command_execution___sandboxing.md
│ │ ├── 07_configuration_management.md
│ │ ├── 08_single_pass_mode.md
│ │ └── index.md
│ ├── Crawl4AI/
│ │ ├── 01_asynccrawlerstrategy.md
│ │ ├── 02_asyncwebcrawler.md
│ │ ├── 03_crawlerrunconfig.md
│ │ ├── 04_contentscrapingstrategy.md
│ │ ├── 05_relevantcontentfilter.md
│ │ ├── 06_extractionstrategy.md
│ │ ├── 07_crawlresult.md
│ │ ├── 08_deepcrawlstrategy.md
│ │ ├── 09_cachecontext___cachemode.md
│ │ ├── 10_basedispatcher.md
│ │ └── index.md
│ ├── CrewAI/
│ │ ├── 01_crew.md
│ │ ├── 02_agent.md
│ │ ├── 03_task.md
│ │ ├── 04_tool.md
│ │ ├── 05_process.md
│ │ ├── 06_llm.md
│ │ ├── 07_memory.md
│ │ ├── 08_knowledge.md
│ │ └── index.md
│ ├── DSPy/
│ │ ├── 01_module___program.md
│ │ ├── 02_signature.md
│ │ ├── 03_example.md
│ │ ├── 04_predict.md
│ │ ├── 05_lm__language_model_client_.md
│ │ ├── 06_rm__retrieval_model_client_.md
│ │ ├── 07_evaluate.md
│ │ ├── 08_teleprompter___optimizer.md
│ │ ├── 09_adapter.md
│ │ ├── 10_settings.md
│ │ └── index.md
│ ├── FastAPI/
│ │ ├── 01_fastapi_application___routing.md
│ │ ├── 02_path_operations___parameter_declaration.md
│ │ ├── 03_data_validation___serialization__pydantic_.md
│ │ ├── 04_openapi___automatic_docs.md
│ │ ├── 05_dependency_injection.md
│ │ ├── 06_error_handling.md
│ │ ├── 07_security_utilities.md
│ │ ├── 08_background_tasks.md
│ │ └── index.md
│ ├── Flask/
│ │ ├── 01_application_object___flask__.md
│ │ ├── 02_routing_system.md
│ │ ├── 03_request_and_response_objects.md
│ │ ├── 04_templating__jinja2_integration_.md
│ │ ├── 05_context_globals___current_app____request____session____g__.md
│ │ ├── 06_configuration___config__.md
│ │ ├── 07_application_and_request_contexts.md
│ │ ├── 08_blueprints.md
│ │ └── index.md
│ ├── Google A2A/
│ │ ├── 01_agent_card.md
│ │ ├── 02_task.md
│ │ ├── 03_a2a_protocol___core_types.md
│ │ ├── 04_a2a_server_implementation.md
│ │ ├── 05_a2a_client_implementation.md
│ │ ├── 06_task_handling_logic__server_side_.md
│ │ ├── 07_streaming_communication__sse_.md
│ │ ├── 08_multi_agent_orchestration__host_agent_.md
│ │ ├── 09_demo_ui_application___service.md
│ │ └── index.md
│ ├── LangGraph/
│ │ ├── 01_graph___stategraph.md
│ │ ├── 02_nodes___pregelnode__.md
│ │ ├── 03_channels.md
│ │ ├── 04_control_flow_primitives___branch____send____interrupt__.md
│ │ ├── 05_pregel_execution_engine.md
│ │ ├── 06_checkpointer___basecheckpointsaver__.md
│ │ └── index.md
│ ├── LevelDB/
│ │ ├── 01_table___sstable___tablecache.md
│ │ ├── 02_memtable.md
│ │ ├── 03_write_ahead_log__wal____logwriter_logreader.md
│ │ ├── 04_dbimpl.md
│ │ ├── 05_writebatch.md
│ │ ├── 06_version___versionset.md
│ │ ├── 07_iterator.md
│ │ ├── 08_compaction.md
│ │ ├── 09_internalkey___dbformat.md
│ │ └── index.md
│ ├── MCP Python SDK/
│ │ ├── 01_cli___mcp__command_.md
│ │ ├── 02_fastmcp_server___fastmcp__.md
│ │ ├── 03_fastmcp_resources___resource____resourcemanager__.md
│ │ ├── 04_fastmcp_tools___tool____toolmanager__.md
│ │ ├── 05_fastmcp_prompts___prompt____promptmanager__.md
│ │ ├── 06_fastmcp_context___context__.md
│ │ ├── 07_mcp_protocol_types.md
│ │ ├── 08_client_server_sessions___clientsession____serversession__.md
│ │ ├── 09_communication_transports__stdio__sse__websocket__memory_.md
│ │ └── index.md
│ ├── NumPy Core/
│ │ ├── 01_ndarray__n_dimensional_array_.md
│ │ ├── 02_dtype__data_type_object_.md
│ │ ├── 03_ufunc__universal_function_.md
│ │ ├── 04_numeric_types___numerictypes__.md
│ │ ├── 05_array_printing___arrayprint__.md
│ │ ├── 06_multiarray_module.md
│ │ ├── 07_umath_module.md
│ │ ├── 08___array_function___protocol___overrides___overrides__.md
│ │ └── index.md
│ ├── OpenManus/
│ │ ├── 01_llm.md
│ │ ├── 02_message___memory.md
│ │ ├── 03_baseagent.md
│ │ ├── 04_tool___toolcollection.md
│ │ ├── 05_baseflow.md
│ │ ├── 06_schema.md
│ │ ├── 07_configuration__config_.md
│ │ ├── 08_dockersandbox.md
│ │ ├── 09_mcp__model_context_protocol_.md
│ │ └── index.md
│ ├── PocketFlow/
│ │ ├── 01_shared_state___shared__dictionary__.md
│ │ ├── 02_node___basenode____node____asyncnode___.md
│ │ ├── 03_actions___transitions_.md
│ │ ├── 04_flow___flow____asyncflow___.md
│ │ ├── 05_asynchronous_processing___asyncnode____asyncflow___.md
│ │ ├── 06_batch_processing___batchnode____batchflow____asyncparallelbatchnode___.md
│ │ ├── 07_a2a__agent_to_agent__communication_framework_.md
│ │ └── index.md
│ ├── Pydantic Core/
│ │ ├── 01_basemodel.md
│ │ ├── 02_fields__fieldinfo___field_function_.md
│ │ ├── 03_configuration__configdict___configwrapper_.md
│ │ ├── 04_custom_logic__decorators___annotated_helpers_.md
│ │ ├── 05_core_schema___validation_serialization.md
│ │ ├── 06_typeadapter.md
│ │ └── index.md
│ ├── Requests/
│ │ ├── 01_functional_api.md
│ │ ├── 02_request___response_models.md
│ │ ├── 03_session.md
│ │ ├── 04_cookie_jar.md
│ │ ├── 05_authentication_handlers.md
│ │ ├── 06_exception_hierarchy.md
│ │ ├── 07_transport_adapters.md
│ │ ├── 08_hook_system.md
│ │ └── index.md
│ ├── SmolaAgents/
│ │ ├── 01_multistepagent.md
│ │ ├── 02_model_interface.md
│ │ ├── 03_tool.md
│ │ ├── 04_agentmemory.md
│ │ ├── 05_prompttemplates.md
│ │ ├── 06_pythonexecutor.md
│ │ ├── 07_agenttype.md
│ │ ├── 08_agentlogger___monitor.md
│ │ └── index.md
│ ├── _config.yml
│ ├── design.md
│ └── index.md
├── flow.py
├── main.py
├── nodes.py
├── requirements.txt
└── utils/
├── __init__.py
├── call_llm.py
├── crawl_github_files.py
└── crawl_local_files.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .clinerules
================================================
---
layout: default
title: "Agentic Coding"
---
# Agentic Coding: Humans Design, Agents code!
> If you are an AI agents involved in building LLM Systems, read this guide **VERY, VERY** carefully! This is the most important chapter in the entire document. Throughout development, you should always (1) start with a small and simple solution, (2) design at a high level (`docs/design.md`) before implementation, and (3) frequently ask humans for feedback and clarification.
{: .warning }
## Agentic Coding Steps
Agentic Coding should be a collaboration between Human System Design and Agent Implementation:
| Steps | Human | AI | Comment |
|:-----------------------|:----------:|:---------:|:------------------------------------------------------------------------|
| 1. Requirements | ★★★ High | ★☆☆ Low | Humans understand the requirements and context. |
| 2. Flow | ★★☆ Medium | ★★☆ Medium | Humans specify the high-level design, and the AI fills in the details. |
| 3. Utilities | ★★☆ Medium | ★★☆ Medium | Humans provide available external APIs and integrations, and the AI helps with implementation. |
| 4. Node | ★☆☆ Low | ★★★ High | The AI helps design the node types and data handling based on the flow. |
| 5. Implementation | ★☆☆ Low | ★★★ High | The AI implements the flow based on the design. |
| 6. Optimization | ★★☆ Medium | ★★☆ Medium | Humans evaluate the results, and the AI helps optimize. |
| 7. Reliability | ★☆☆ Low | ★★★ High | The AI writes test cases and addresses corner cases. |
1. **Requirements**: Clarify the requirements for your project, and evaluate whether an AI system is a good fit.
- Understand AI systems' strengths and limitations:
- **Good for**: Routine tasks requiring common sense (filling forms, replying to emails)
- **Good for**: Creative tasks with well-defined inputs (building slides, writing SQL)
- **Not good for**: Ambiguous problems requiring complex decision-making (business strategy, startup planning)
- **Keep It User-Centric:** Explain the "problem" from the user's perspective rather than just listing features.
- **Balance complexity vs. impact**: Aim to deliver the highest value features with minimal complexity early.
2. **Flow Design**: Outline at a high level, describe how your AI system orchestrates nodes.
- Identify applicable design patterns (e.g., [Map Reduce](./design_pattern/mapreduce.md), [Agent](./design_pattern/agent.md), [RAG](./design_pattern/rag.md)).
- For each node in the flow, start with a high-level one-line description of what it does.
- If using **Map Reduce**, specify how to map (what to split) and how to reduce (how to combine).
- If using **Agent**, specify what are the inputs (context) and what are the possible actions.
- If using **RAG**, specify what to embed, noting that there's usually both offline (indexing) and online (retrieval) workflows.
- Outline the flow and draw it in a mermaid diagram. For example:
```mermaid
flowchart LR
start[Start] --> batch[Batch]
batch --> check[Check]
check -->|OK| process
check -->|Error| fix[Fix]
fix --> check
subgraph process[Process]
step1[Step 1] --> step2[Step 2]
end
process --> endNode[End]
```
- > **If Humans can't specify the flow, AI Agents can't automate it!** Before building an LLM system, thoroughly understand the problem and potential solution by manually solving example inputs to develop intuition.
{: .best-practice }
3. **Utilities**: Based on the Flow Design, identify and implement necessary utility functions.
- Think of your AI system as the brain. It needs a body—these *external utility functions*—to interact with the real world:
- Reading inputs (e.g., retrieving Slack messages, reading emails)
- Writing outputs (e.g., generating reports, sending emails)
- Using external tools (e.g., calling LLMs, searching the web)
- **NOTE**: *LLM-based tasks* (e.g., summarizing text, analyzing sentiment) are **NOT** utility functions; rather, they are *core functions* internal in the AI system.
- For each utility function, implement it and write a simple test.
- Document their input/output, as well as why they are necessary. For example:
- `name`: `get_embedding` (`utils/get_embedding.py`)
- `input`: `str`
- `output`: a vector of 3072 floats
- `necessity`: Used by the second node to embed text
- Example utility implementation:
```python
# utils/call_llm.py
from openai import OpenAI
def call_llm(prompt):
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
if __name__ == "__main__":
prompt = "What is the meaning of life?"
print(call_llm(prompt))
```
- > **Sometimes, design Utilies before Flow:** For example, for an LLM project to automate a legacy system, the bottleneck will likely be the available interface to that system. Start by designing the hardest utilities for interfacing, and then build the flow around them.
{: .best-practice }
4. **Node Design**: Plan how each node will read and write data, and use utility functions.
- One core design principle for PocketFlow is to use a [shared store](./core_abstraction/communication.md), so start with a shared store design:
- For simple systems, use an in-memory dictionary.
- For more complex systems or when persistence is required, use a database.
- **Don't Repeat Yourself**: Use in-memory references or foreign keys.
- Example shared store design:
```python
shared = {
"user": {
"id": "user123",
"context": { # Another nested dict
"weather": {"temp": 72, "condition": "sunny"},
"location": "San Francisco"
}
},
"results": {} # Empty dict to store outputs
}
```
- For each [Node](./core_abstraction/node.md), describe its type, how it reads and writes data, and which utility function it uses. Keep it specific but high-level without codes. For example:
- `type`: Regular (or Batch, or Async)
- `prep`: Read "text" from the shared store
- `exec`: Call the embedding utility function
- `post`: Write "embedding" to the shared store
5. **Implementation**: Implement the initial nodes and flows based on the design.
- 🎉 If you've reached this step, humans have finished the design. Now *Agentic Coding* begins!
- **"Keep it simple, stupid!"** Avoid complex features and full-scale type checking.
- **FAIL FAST**! Avoid `try` logic so you can quickly identify any weak points in the system.
- Add logging throughout the code to facilitate debugging.
7. **Optimization**:
- **Use Intuition**: For a quick initial evaluation, human intuition is often a good start.
- **Redesign Flow (Back to Step 3)**: Consider breaking down tasks further, introducing agentic decisions, or better managing input contexts.
- If your flow design is already solid, move on to micro-optimizations:
- **Prompt Engineering**: Use clear, specific instructions with examples to reduce ambiguity.
- **In-Context Learning**: Provide robust examples for tasks that are difficult to specify with instructions alone.
- > **You'll likely iterate a lot!** Expect to repeat Steps 3–6 hundreds of times.
>
>
{: .best-practice }
8. **Reliability**
- **Node Retries**: Add checks in the node `exec` to ensure outputs meet requirements, and consider increasing `max_retries` and `wait` times.
- **Logging and Visualization**: Maintain logs of all attempts and visualize node results for easier debugging.
- **Self-Evaluation**: Add a separate node (powered by an LLM) to review outputs when results are uncertain.
## Example LLM Project File Structure
```
my_project/
├── main.py
├── nodes.py
├── flow.py
├── utils/
│ ├── __init__.py
│ ├── call_llm.py
│ └── search_web.py
├── requirements.txt
└── docs/
└── design.md
```
- **`docs/design.md`**: Contains project documentation for each step above. This should be *high-level* and *no-code*.
- **`utils/`**: Contains all utility functions.
- It's recommended to dedicate one Python file to each API call, for example `call_llm.py` or `search_web.py`.
- Each file should also include a `main()` function to try that API call
- **`nodes.py`**: Contains all the node definitions.
```python
# nodes.py
from pocketflow import Node
from utils.call_llm import call_llm
class GetQuestionNode(Node):
def exec(self, _):
# Get question directly from user input
user_question = input("Enter your question: ")
return user_question
def post(self, shared, prep_res, exec_res):
# Store the user's question
shared["question"] = exec_res
return "default" # Go to the next node
class AnswerNode(Node):
def prep(self, shared):
# Read question from shared
return shared["question"]
def exec(self, question):
# Call LLM to get the answer
return call_llm(question)
def post(self, shared, prep_res, exec_res):
# Store the answer in shared
shared["answer"] = exec_res
```
- **`flow.py`**: Implements functions that create flows by importing node definitions and connecting them.
```python
# flow.py
from pocketflow import Flow
from nodes import GetQuestionNode, AnswerNode
def create_qa_flow():
"""Create and return a question-answering flow."""
# Create nodes
get_question_node = GetQuestionNode()
answer_node = AnswerNode()
# Connect nodes in sequence
get_question_node >> answer_node
# Create flow starting with input node
return Flow(start=get_question_node)
```
- **`main.py`**: Serves as the project's entry point.
```python
# main.py
from flow import create_qa_flow
# Example main function
# Please replace this with your own main function
def main():
shared = {
"question": None, # Will be populated by GetQuestionNode from user input
"answer": None # Will be populated by AnswerNode
}
# Create the flow and run it
qa_flow = create_qa_flow()
qa_flow.run(shared)
print(f"Question: {shared['question']}")
print(f"Answer: {shared['answer']}")
if __name__ == "__main__":
main()
```
================================================
File: docs/index.md
================================================
---
layout: default
title: "Home"
nav_order: 1
---
# Pocket Flow
A [100-line](https://github.com/the-pocket/PocketFlow/blob/main/pocketflow/__init__.py) minimalist LLM framework for *Agents, Task Decomposition, RAG, etc*.
- **Lightweight**: Just the core graph abstraction in 100 lines. ZERO dependencies, and vendor lock-in.
- **Expressive**: Everything you love from larger frameworks—([Multi-](./design_pattern/multi_agent.html))[Agents](./design_pattern/agent.html), [Workflow](./design_pattern/workflow.html), [RAG](./design_pattern/rag.html), and more.
- **Agentic-Coding**: Intuitive enough for AI agents to help humans build complex LLM applications.
## Core Abstraction
We model the LLM workflow as a **Graph + Shared Store**:
- [Node](./core_abstraction/node.md) handles simple (LLM) tasks.
- [Flow](./core_abstraction/flow.md) connects nodes through **Actions** (labeled edges).
- [Shared Store](./core_abstraction/communication.md) enables communication between nodes within flows.
- [Batch](./core_abstraction/batch.md) nodes/flows allow for data-intensive tasks.
- [Async](./core_abstraction/async.md) nodes/flows allow waiting for asynchronous tasks.
- [(Advanced) Parallel](./core_abstraction/parallel.md) nodes/flows handle I/O-bound tasks.
## Design Pattern
From there, it’s easy to implement popular design patterns:
- [Agent](./design_pattern/agent.md) autonomously makes decisions.
- [Workflow](./design_pattern/workflow.md) chains multiple tasks into pipelines.
- [RAG](./design_pattern/rag.md) integrates data retrieval with generation.
- [Map Reduce](./design_pattern/mapreduce.md) splits data tasks into Map and Reduce steps.
- [Structured Output](./design_pattern/structure.md) formats outputs consistently.
- [(Advanced) Multi-Agents](./design_pattern/multi_agent.md) coordinate multiple agents.
## Utility Function
We **do not** provide built-in utilities. Instead, we offer *examples*—please *implement your own*:
- [LLM Wrapper](./utility_function/llm.md)
- [Viz and Debug](./utility_function/viz.md)
- [Web Search](./utility_function/websearch.md)
- [Chunking](./utility_function/chunking.md)
- [Embedding](./utility_function/embedding.md)
- [Vector Databases](./utility_function/vector.md)
- [Text-to-Speech](./utility_function/text_to_speech.md)
**Why not built-in?**: I believe it's a *bad practice* for vendor-specific APIs in a general framework:
- *API Volatility*: Frequent changes lead to heavy maintenance for hardcoded APIs.
- *Flexibility*: You may want to switch vendors, use fine-tuned models, or run them locally.
- *Optimizations*: Prompt caching, batching, and streaming are easier without vendor lock-in.
## Ready to build your Apps?
Check out [Agentic Coding Guidance](./guide.md), the fastest way to develop LLM projects with Pocket Flow!
================================================
File: docs/core_abstraction/async.md
================================================
---
layout: default
title: "(Advanced) Async"
parent: "Core Abstraction"
nav_order: 5
---
# (Advanced) Async
**Async** Nodes implement `prep_async()`, `exec_async()`, `exec_fallback_async()`, and/or `post_async()`. This is useful for:
1. **prep_async()**: For *fetching/reading data (files, APIs, DB)* in an I/O-friendly way.
2. **exec_async()**: Typically used for async LLM calls.
3. **post_async()**: For *awaiting user feedback*, *coordinating across multi-agents* or any additional async steps after `exec_async()`.
**Note**: `AsyncNode` must be wrapped in `AsyncFlow`. `AsyncFlow` can also include regular (sync) nodes.
### Example
```python
class SummarizeThenVerify(AsyncNode):
async def prep_async(self, shared):
# Example: read a file asynchronously
doc_text = await read_file_async(shared["doc_path"])
return doc_text
async def exec_async(self, prep_res):
# Example: async LLM call
summary = await call_llm_async(f"Summarize: {prep_res}")
return summary
async def post_async(self, shared, prep_res, exec_res):
# Example: wait for user feedback
decision = await gather_user_feedback(exec_res)
if decision == "approve":
shared["summary"] = exec_res
return "approve"
return "deny"
summarize_node = SummarizeThenVerify()
final_node = Finalize()
# Define transitions
summarize_node - "approve" >> final_node
summarize_node - "deny" >> summarize_node # retry
flow = AsyncFlow(start=summarize_node)
async def main():
shared = {"doc_path": "document.txt"}
await flow.run_async(shared)
print("Final Summary:", shared.get("summary"))
asyncio.run(main())
```
================================================
File: docs/core_abstraction/batch.md
================================================
---
layout: default
title: "Batch"
parent: "Core Abstraction"
nav_order: 4
---
# Batch
**Batch** makes it easier to handle large inputs in one Node or **rerun** a Flow multiple times. Example use cases:
- **Chunk-based** processing (e.g., splitting large texts).
- **Iterative** processing over lists of input items (e.g., user queries, files, URLs).
## 1. BatchNode
A **BatchNode** extends `Node` but changes `prep()` and `exec()`:
- **`prep(shared)`**: returns an **iterable** (e.g., list, generator).
- **`exec(item)`**: called **once** per item in that iterable.
- **`post(shared, prep_res, exec_res_list)`**: after all items are processed, receives a **list** of results (`exec_res_list`) and returns an **Action**.
### Example: Summarize a Large File
```python
class MapSummaries(BatchNode):
def prep(self, shared):
# Suppose we have a big file; chunk it
content = shared["data"]
chunk_size = 10000
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
return chunks
def exec(self, chunk):
prompt = f"Summarize this chunk in 10 words: {chunk}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res_list):
combined = "\n".join(exec_res_list)
shared["summary"] = combined
return "default"
map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)
```
---
## 2. BatchFlow
A **BatchFlow** runs a **Flow** multiple times, each time with different `params`. Think of it as a loop that replays the Flow for each parameter set.
### Example: Summarize Many Files
```python
class SummarizeAllFiles(BatchFlow):
def prep(self, shared):
# Return a list of param dicts (one per file)
filenames = list(shared["data"].keys()) # e.g., ["file1.txt", "file2.txt", ...]
return [{"filename": fn} for fn in filenames]
# Suppose we have a per-file Flow (e.g., load_file >> summarize >> reduce):
summarize_file = SummarizeFile(start=load_file)
# Wrap that flow into a BatchFlow:
summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)
```
### Under the Hood
1. `prep(shared)` returns a list of param dicts—e.g., `[{filename: "file1.txt"}, {filename: "file2.txt"}, ...]`.
2. The **BatchFlow** loops through each dict. For each one:
- It merges the dict with the BatchFlow’s own `params`.
- It calls `flow.run(shared)` using the merged result.
3. This means the sub-Flow is run **repeatedly**, once for every param dict.
---
## 3. Nested or Multi-Level Batches
You can nest a **BatchFlow** in another **BatchFlow**. For instance:
- **Outer** batch: returns a list of diretory param dicts (e.g., `{"directory": "/pathA"}`, `{"directory": "/pathB"}`, ...).
- **Inner** batch: returning a list of per-file param dicts.
At each level, **BatchFlow** merges its own param dict with the parent’s. By the time you reach the **innermost** node, the final `params` is the merged result of **all** parents in the chain. This way, a nested structure can keep track of the entire context (e.g., directory + file name) at once.
```python
class FileBatchFlow(BatchFlow):
def prep(self, shared):
directory = self.params["directory"]
# e.g., files = ["file1.txt", "file2.txt", ...]
files = [f for f in os.listdir(directory) if f.endswith(".txt")]
return [{"filename": f} for f in files]
class DirectoryBatchFlow(BatchFlow):
def prep(self, shared):
directories = [ "/path/to/dirA", "/path/to/dirB"]
return [{"directory": d} for d in directories]
# MapSummaries have params like {"directory": "/path/to/dirA", "filename": "file1.txt"}
inner_flow = FileBatchFlow(start=MapSummaries())
outer_flow = DirectoryBatchFlow(start=inner_flow)
```
================================================
File: docs/core_abstraction/communication.md
================================================
---
layout: default
title: "Communication"
parent: "Core Abstraction"
nav_order: 3
---
# Communication
Nodes and Flows **communicate** in 2 ways:
1. **Shared Store (for almost all the cases)**
- A global data structure (often an in-mem dict) that all nodes can read ( `prep()`) and write (`post()`).
- Great for data results, large content, or anything multiple nodes need.
- You shall design the data structure and populate it ahead.
- > **Separation of Concerns:** Use `Shared Store` for almost all cases to separate *Data Schema* from *Compute Logic*! This approach is both flexible and easy to manage, resulting in more maintainable code. `Params` is more a syntax sugar for [Batch](./batch.md).
{: .best-practice }
2. **Params (only for [Batch](./batch.md))**
- Each node has a local, ephemeral `params` dict passed in by the **parent Flow**, used as an identifier for tasks. Parameter keys and values shall be **immutable**.
- Good for identifiers like filenames or numeric IDs, in Batch mode.
If you know memory management, think of the **Shared Store** like a **heap** (shared by all function calls), and **Params** like a **stack** (assigned by the caller).
---
## 1. Shared Store
### Overview
A shared store is typically an in-mem dictionary, like:
```python
shared = {"data": {}, "summary": {}, "config": {...}, ...}
```
It can also contain local file handlers, DB connections, or a combination for persistence. We recommend deciding the data structure or DB schema first based on your app requirements.
### Example
```python
class LoadData(Node):
def post(self, shared, prep_res, exec_res):
# We write data to shared store
shared["data"] = "Some text content"
return None
class Summarize(Node):
def prep(self, shared):
# We read data from shared store
return shared["data"]
def exec(self, prep_res):
# Call LLM to summarize
prompt = f"Summarize: {prep_res}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res):
# We write summary to shared store
shared["summary"] = exec_res
return "default"
load_data = LoadData()
summarize = Summarize()
load_data >> summarize
flow = Flow(start=load_data)
shared = {}
flow.run(shared)
```
Here:
- `LoadData` writes to `shared["data"]`.
- `Summarize` reads from `shared["data"]`, summarizes, and writes to `shared["summary"]`.
---
## 2. Params
**Params** let you store *per-Node* or *per-Flow* config that doesn't need to live in the shared store. They are:
- **Immutable** during a Node's run cycle (i.e., they don't change mid-`prep->exec->post`).
- **Set** via `set_params()`.
- **Cleared** and updated each time a parent Flow calls it.
> Only set the uppermost Flow params because others will be overwritten by the parent Flow.
>
> If you need to set child node params, see [Batch](./batch.md).
{: .warning }
Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store.
### Example
```python
# 1) Create a Node that uses params
class SummarizeFile(Node):
def prep(self, shared):
# Access the node's param
filename = self.params["filename"]
return shared["data"].get(filename, "")
def exec(self, prep_res):
prompt = f"Summarize: {prep_res}"
return call_llm(prompt)
def post(self, shared, prep_res, exec_res):
filename = self.params["filename"]
shared["summary"][filename] = exec_res
return "default"
# 2) Set params
node = SummarizeFile()
# 3) Set Node params directly (for testing)
node.set_params({"filename": "doc1.txt"})
node.run(shared)
# 4) Create Flow
flow = Flow(start=node)
# 5) Set Flow params (overwrites node params)
flow.set_params({"filename": "doc2.txt"})
flow.run(shared) # The node summarizes doc2, not doc1
```
================================================
File: docs/core_abstraction/flow.md
================================================
---
layout: default
title: "Flow"
parent: "Core Abstraction"
nav_order: 2
---
# Flow
A **Flow** orchestrates a graph of Nodes. You can chain Nodes in a sequence or create branching depending on the **Actions** returned from each Node's `post()`.
## 1. Action-based Transitions
Each Node's `post()` returns an **Action** string. By default, if `post()` doesn't return anything, we treat that as `"default"`.
You define transitions with the syntax:
1. **Basic default transition**: `node_a >> node_b`
This means if `node_a.post()` returns `"default"`, go to `node_b`.
(Equivalent to `node_a - "default" >> node_b`)
2. **Named action transition**: `node_a - "action_name" >> node_b`
This means if `node_a.post()` returns `"action_name"`, go to `node_b`.
It's possible to create loops, branching, or multi-step flows.
## 2. Creating a Flow
A **Flow** begins with a **start** node. You call `Flow(start=some_node)` to specify the entry point. When you call `flow.run(shared)`, it executes the start node, looks at its returned Action from `post()`, follows the transition, and continues until there's no next node.
### Example: Simple Sequence
Here's a minimal flow of two nodes in a chain:
```python
node_a >> node_b
flow = Flow(start=node_a)
flow.run(shared)
```
- When you run the flow, it executes `node_a`.
- Suppose `node_a.post()` returns `"default"`.
- The flow then sees `"default"` Action is linked to `node_b` and runs `node_b`.
- `node_b.post()` returns `"default"` but we didn't define `node_b >> something_else`. So the flow ends there.
### Example: Branching & Looping
Here's a simple expense approval flow that demonstrates branching and looping. The `ReviewExpense` node can return three possible Actions:
- `"approved"`: expense is approved, move to payment processing
- `"needs_revision"`: expense needs changes, send back for revision
- `"rejected"`: expense is denied, finish the process
We can wire them like this:
```python
# Define the flow connections
review - "approved" >> payment # If approved, process payment
review - "needs_revision" >> revise # If needs changes, go to revision
review - "rejected" >> finish # If rejected, finish the process
revise >> review # After revision, go back for another review
payment >> finish # After payment, finish the process
flow = Flow(start=review)
```
Let's see how it flows:
1. If `review.post()` returns `"approved"`, the expense moves to the `payment` node
2. If `review.post()` returns `"needs_revision"`, it goes to the `revise` node, which then loops back to `review`
3. If `review.post()` returns `"rejected"`, it moves to the `finish` node and stops
```mermaid
flowchart TD
review[Review Expense] -->|approved| payment[Process Payment]
review -->|needs_revision| revise[Revise Report]
review -->|rejected| finish[Finish Process]
revise --> review
payment --> finish
```
### Running Individual Nodes vs. Running a Flow
- `node.run(shared)`: Just runs that node alone (calls `prep->exec->post()`), returns an Action.
- `flow.run(shared)`: Executes from the start node, follows Actions to the next node, and so on until the flow can't continue.
> `node.run(shared)` **does not** proceed to the successor.
> This is mainly for debugging or testing a single node.
>
> Always use `flow.run(...)` in production to ensure the full pipeline runs correctly.
{: .warning }
## 3. Nested Flows
A **Flow** can act like a Node, which enables powerful composition patterns. This means you can:
1. Use a Flow as a Node within another Flow's transitions.
2. Combine multiple smaller Flows into a larger Flow for reuse.
3. Node `params` will be a merging of **all** parents' `params`.
### Flow's Node Methods
A **Flow** is also a **Node**, so it will run `prep()` and `post()`. However:
- It **won't** run `exec()`, as its main logic is to orchestrate its nodes.
- `post()` always receives `None` for `exec_res` and should instead get the flow execution results from the shared store.
### Basic Flow Nesting
Here's how to connect a flow to another node:
```python
# Create a sub-flow
node_a >> node_b
subflow = Flow(start=node_a)
# Connect it to another node
subflow >> node_c
# Create the parent flow
parent_flow = Flow(start=subflow)
```
When `parent_flow.run()` executes:
1. It starts `subflow`
2. `subflow` runs through its nodes (`node_a->node_b`)
3. After `subflow` completes, execution continues to `node_c`
### Example: Order Processing Pipeline
Here's a practical example that breaks down order processing into nested flows:
```python
# Payment processing sub-flow
validate_payment >> process_payment >> payment_confirmation
payment_flow = Flow(start=validate_payment)
# Inventory sub-flow
check_stock >> reserve_items >> update_inventory
inventory_flow = Flow(start=check_stock)
# Shipping sub-flow
create_label >> assign_carrier >> schedule_pickup
shipping_flow = Flow(start=create_label)
# Connect the flows into a main order pipeline
payment_flow >> inventory_flow >> shipping_flow
# Create the master flow
order_pipeline = Flow(start=payment_flow)
# Run the entire pipeline
order_pipeline.run(shared_data)
```
This creates a clean separation of concerns while maintaining a clear execution path:
```mermaid
flowchart LR
subgraph order_pipeline[Order Pipeline]
subgraph paymentFlow["Payment Flow"]
A[Validate Payment] --> B[Process Payment] --> C[Payment Confirmation]
end
subgraph inventoryFlow["Inventory Flow"]
D[Check Stock] --> E[Reserve Items] --> F[Update Inventory]
end
subgraph shippingFlow["Shipping Flow"]
G[Create Label] --> H[Assign Carrier] --> I[Schedule Pickup]
end
paymentFlow --> inventoryFlow
inventoryFlow --> shippingFlow
end
```
================================================
File: docs/core_abstraction/node.md
================================================
---
layout: default
title: "Node"
parent: "Core Abstraction"
nav_order: 1
---
# Node
A **Node** is the smallest building block. Each Node has 3 steps `prep->exec->post`:
1. `prep(shared)`
- **Read and preprocess data** from `shared` store.
- Examples: *query DB, read files, or serialize data into a string*.
- Return `prep_res`, which is used by `exec()` and `post()`.
2. `exec(prep_res)`
- **Execute compute logic**, with optional retries and error handling (below).
- Examples: *(mostly) LLM calls, remote APIs, tool use*.
- ⚠️ This shall be only for compute and **NOT** access `shared`.
- ⚠️ If retries enabled, ensure idempotent implementation.
- Return `exec_res`, which is passed to `post()`.
3. `post(shared, prep_res, exec_res)`
- **Postprocess and write data** back to `shared`.
- Examples: *update DB, change states, log results*.
- **Decide the next action** by returning a *string* (`action = "default"` if *None*).
> **Why 3 steps?** To enforce the principle of *separation of concerns*. The data storage and data processing are operated separately.
>
> All steps are *optional*. E.g., you can only implement `prep` and `post` if you just need to process data.
{: .note }
### Fault Tolerance & Retries
You can **retry** `exec()` if it raises an exception via two parameters when define the Node:
- `max_retries` (int): Max times to run `exec()`. The default is `1` (**no** retry).
- `wait` (int): The time to wait (in **seconds**) before next retry. By default, `wait=0` (no waiting).
`wait` is helpful when you encounter rate-limits or quota errors from your LLM provider and need to back off.
```python
my_node = SummarizeFile(max_retries=3, wait=10)
```
When an exception occurs in `exec()`, the Node automatically retries until:
- It either succeeds, or
- The Node has retried `max_retries - 1` times already and fails on the last attempt.
You can get the current retry times (0-based) from `self.cur_retry`.
```python
class RetryNode(Node):
def exec(self, prep_res):
print(f"Retry {self.cur_retry} times")
raise Exception("Failed")
```
### Graceful Fallback
To **gracefully handle** the exception (after all retries) rather than raising it, override:
```python
def exec_fallback(self, prep_res, exc):
raise exc
```
By default, it just re-raises exception. But you can return a fallback result instead, which becomes the `exec_res` passed to `post()`.
### Example: Summarize file
```python
class SummarizeFile(Node):
def prep(self, shared):
return shared["data"]
def exec(self, prep_res):
if not prep_res:
return "Empty file content"
prompt = f"Summarize this text in 10 words: {prep_res}"
summary = call_llm(prompt) # might fail
return summary
def exec_fallback(self, prep_res, exc):
# Provide a simple fallback instead of crashing
return "There was an error processing your request."
def post(self, shared, prep_res, exec_res):
shared["summary"] = exec_res
# Return "default" by not returning
summarize_node = SummarizeFile(max_retries=3)
# node.run() calls prep->exec->post
# If exec() fails, it retries up to 3 times before calling exec_fallback()
action_result = summarize_node.run(shared)
print("Action returned:", action_result) # "default"
print("Summary stored:", shared["summary"])
```
================================================
File: docs/core_abstraction/parallel.md
================================================
---
layout: default
title: "(Advanced) Parallel"
parent: "Core Abstraction"
nav_order: 6
---
# (Advanced) Parallel
**Parallel** Nodes and Flows let you run multiple **Async** Nodes and Flows **concurrently**—for example, summarizing multiple texts at once. This can improve performance by overlapping I/O and compute.
> Because of Python’s GIL, parallel nodes and flows can’t truly parallelize CPU-bound tasks (e.g., heavy numerical computations). However, they excel at overlapping I/O-bound work—like LLM calls, database queries, API requests, or file I/O.
{: .warning }
> - **Ensure Tasks Are Independent**: If each item depends on the output of a previous item, **do not** parallelize.
>
> - **Beware of Rate Limits**: Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals).
>
> - **Consider Single-Node Batch APIs**: Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests and mitigates rate limits.
{: .best-practice }
## AsyncParallelBatchNode
Like **AsyncBatchNode**, but run `exec_async()` in **parallel**:
```python
class ParallelSummaries(AsyncParallelBatchNode):
async def prep_async(self, shared):
# e.g., multiple texts
return shared["texts"]
async def exec_async(self, text):
prompt = f"Summarize: {text}"
return await call_llm_async(prompt)
async def post_async(self, shared, prep_res, exec_res_list):
shared["summary"] = "\n\n".join(exec_res_list)
return "default"
node = ParallelSummaries()
flow = AsyncFlow(start=node)
```
## AsyncParallelBatchFlow
Parallel version of **BatchFlow**. Each iteration of the sub-flow runs **concurrently** using different parameters:
```python
class SummarizeMultipleFiles(AsyncParallelBatchFlow):
async def prep_async(self, shared):
return [{"filename": f} for f in shared["files"]]
sub_flow = AsyncFlow(start=LoadAndSummarizeFile())
parallel_flow = SummarizeMultipleFiles(start=sub_flow)
await parallel_flow.run_async(shared)
```
================================================
File: docs/design_pattern/agent.md
================================================
---
layout: default
title: "Agent"
parent: "Design Pattern"
nav_order: 1
---
# Agent
Agent is a powerful design pattern in which nodes can take dynamic actions based on the context.
## Implement Agent with Graph
1. **Context and Action:** Implement nodes that supply context and perform actions.
2. **Branching:** Use branching to connect each action node to an agent node. Use action to allow the agent to direct the [flow](../core_abstraction/flow.md) between nodes—and potentially loop back for multi-step.
3. **Agent Node:** Provide a prompt to decide action—for example:
```python
f"""
### CONTEXT
Task: {task_description}
Previous Actions: {previous_actions}
Current State: {current_state}
### ACTION SPACE
[1] search
Description: Use web search to get results
Parameters:
- query (str): What to search for
[2] answer
Description: Conclude based on the results
Parameters:
- result (str): Final answer to provide
### NEXT ACTION
Decide the next action based on the current context and available action space.
Return your response in the following format:
```yaml
thinking: |
action:
parameters:
:
```"""
```
The core of building **high-performance** and **reliable** agents boils down to:
1. **Context Management:** Provide *relevant, minimal context.* For example, rather than including an entire chat history, retrieve the most relevant via [RAG](./rag.md). Even with larger context windows, LLMs still fall victim to ["lost in the middle"](https://arxiv.org/abs/2307.03172), overlooking mid-prompt content.
2. **Action Space:** Provide *a well-structured and unambiguous* set of actions—avoiding overlap like separate `read_databases` or `read_csvs`. Instead, import CSVs into the database.
## Example Good Action Design
- **Incremental:** Feed content in manageable chunks (500 lines or 1 page) instead of all at once.
- **Overview-zoom-in:** First provide high-level structure (table of contents, summary), then allow drilling into details (raw texts).
- **Parameterized/Programmable:** Instead of fixed actions, enable parameterized (columns to select) or programmable (SQL queries) actions, for example, to read CSV files.
- **Backtracking:** Let the agent undo the last step instead of restarting entirely, preserving progress when encountering errors or dead ends.
## Example: Search Agent
This agent:
1. Decides whether to search or answer
2. If searches, loops back to decide if more search needed
3. Answers when enough context gathered
```python
class DecideAction(Node):
def prep(self, shared):
context = shared.get("context", "No previous search")
query = shared["query"]
return query, context
def exec(self, inputs):
query, context = inputs
prompt = f"""
Given input: {query}
Previous search results: {context}
Should I: 1) Search web for more info 2) Answer with current knowledge
Output in yaml:
```yaml
action: search/answer
reason: why this action
search_term: search phrase if action is search
```"""
resp = call_llm(prompt)
yaml_str = resp.split("```yaml")[1].split("```")[0].strip()
result = yaml.safe_load(yaml_str)
assert isinstance(result, dict)
assert "action" in result
assert "reason" in result
assert result["action"] in ["search", "answer"]
if result["action"] == "search":
assert "search_term" in result
return result
def post(self, shared, prep_res, exec_res):
if exec_res["action"] == "search":
shared["search_term"] = exec_res["search_term"]
return exec_res["action"]
class SearchWeb(Node):
def prep(self, shared):
return shared["search_term"]
def exec(self, search_term):
return search_web(search_term)
def post(self, shared, prep_res, exec_res):
prev_searches = shared.get("context", [])
shared["context"] = prev_searches + [
{"term": shared["search_term"], "result": exec_res}
]
return "decide"
class DirectAnswer(Node):
def prep(self, shared):
return shared["query"], shared.get("context", "")
def exec(self, inputs):
query, context = inputs
return call_llm(f"Context: {context}\nAnswer: {query}")
def post(self, shared, prep_res, exec_res):
print(f"Answer: {exec_res}")
shared["answer"] = exec_res
# Connect nodes
decide = DecideAction()
search = SearchWeb()
answer = DirectAnswer()
decide - "search" >> search
decide - "answer" >> answer
search - "decide" >> decide # Loop back
flow = Flow(start=decide)
flow.run({"query": "Who won the Nobel Prize in Physics 2024?"})
```
================================================
File: docs/design_pattern/mapreduce.md
================================================
---
layout: default
title: "Map Reduce"
parent: "Design Pattern"
nav_order: 4
---
# Map Reduce
MapReduce is a design pattern suitable when you have either:
- Large input data (e.g., multiple files to process), or
- Large output data (e.g., multiple forms to fill)
and there is a logical way to break the task into smaller, ideally independent parts.
You first break down the task using [BatchNode](../core_abstraction/batch.md) in the map phase, followed by aggregation in the reduce phase.
### Example: Document Summarization
```python
class SummarizeAllFiles(BatchNode):
def prep(self, shared):
files_dict = shared["files"] # e.g. 10 files
return list(files_dict.items()) # [("file1.txt", "aaa..."), ("file2.txt", "bbb..."), ...]
def exec(self, one_file):
filename, file_content = one_file
summary_text = call_llm(f"Summarize the following file:\n{file_content}")
return (filename, summary_text)
def post(self, shared, prep_res, exec_res_list):
shared["file_summaries"] = dict(exec_res_list)
class CombineSummaries(Node):
def prep(self, shared):
return shared["file_summaries"]
def exec(self, file_summaries):
# format as: "File1: summary\nFile2: summary...\n"
text_list = []
for fname, summ in file_summaries.items():
text_list.append(f"{fname} summary:\n{summ}\n")
big_text = "\n---\n".join(text_list)
return call_llm(f"Combine these file summaries into one final summary:\n{big_text}")
def post(self, shared, prep_res, final_summary):
shared["all_files_summary"] = final_summary
batch_node = SummarizeAllFiles()
combine_node = CombineSummaries()
batch_node >> combine_node
flow = Flow(start=batch_node)
shared = {
"files": {
"file1.txt": "Alice was beginning to get very tired of sitting by her sister...",
"file2.txt": "Some other interesting text ...",
# ...
}
}
flow.run(shared)
print("Individual Summaries:", shared["file_summaries"])
print("\nFinal Summary:\n", shared["all_files_summary"])
```
================================================
File: docs/design_pattern/rag.md
================================================
---
layout: default
title: "RAG"
parent: "Design Pattern"
nav_order: 3
---
# RAG (Retrieval Augmented Generation)
For certain LLM tasks like answering questions, providing relevant context is essential. One common architecture is a **two-stage** RAG pipeline:
1. **Offline stage**: Preprocess and index documents ("building the index").
2. **Online stage**: Given a question, generate answers by retrieving the most relevant context.
---
## Stage 1: Offline Indexing
We create three Nodes:
1. `ChunkDocs` – [chunks](../utility_function/chunking.md) raw text.
2. `EmbedDocs` – [embeds](../utility_function/embedding.md) each chunk.
3. `StoreIndex` – stores embeddings into a [vector database](../utility_function/vector.md).
```python
class ChunkDocs(BatchNode):
def prep(self, shared):
# A list of file paths in shared["files"]. We process each file.
return shared["files"]
def exec(self, filepath):
# read file content. In real usage, do error handling.
with open(filepath, "r", encoding="utf-8") as f:
text = f.read()
# chunk by 100 chars each
chunks = []
size = 100
for i in range(0, len(text), size):
chunks.append(text[i : i + size])
return chunks
def post(self, shared, prep_res, exec_res_list):
# exec_res_list is a list of chunk-lists, one per file.
# flatten them all into a single list of chunks.
all_chunks = []
for chunk_list in exec_res_list:
all_chunks.extend(chunk_list)
shared["all_chunks"] = all_chunks
class EmbedDocs(BatchNode):
def prep(self, shared):
return shared["all_chunks"]
def exec(self, chunk):
return get_embedding(chunk)
def post(self, shared, prep_res, exec_res_list):
# Store the list of embeddings.
shared["all_embeds"] = exec_res_list
print(f"Total embeddings: {len(exec_res_list)}")
class StoreIndex(Node):
def prep(self, shared):
# We'll read all embeds from shared.
return shared["all_embeds"]
def exec(self, all_embeds):
# Create a vector index (faiss or other DB in real usage).
index = create_index(all_embeds)
return index
def post(self, shared, prep_res, index):
shared["index"] = index
# Wire them in sequence
chunk_node = ChunkDocs()
embed_node = EmbedDocs()
store_node = StoreIndex()
chunk_node >> embed_node >> store_node
OfflineFlow = Flow(start=chunk_node)
```
Usage example:
```python
shared = {
"files": ["doc1.txt", "doc2.txt"], # any text files
}
OfflineFlow.run(shared)
```
---
## Stage 2: Online Query & Answer
We have 3 nodes:
1. `EmbedQuery` – embeds the user’s question.
2. `RetrieveDocs` – retrieves top chunk from the index.
3. `GenerateAnswer` – calls the LLM with the question + chunk to produce the final answer.
```python
class EmbedQuery(Node):
def prep(self, shared):
return shared["question"]
def exec(self, question):
return get_embedding(question)
def post(self, shared, prep_res, q_emb):
shared["q_emb"] = q_emb
class RetrieveDocs(Node):
def prep(self, shared):
# We'll need the query embedding, plus the offline index/chunks
return shared["q_emb"], shared["index"], shared["all_chunks"]
def exec(self, inputs):
q_emb, index, chunks = inputs
I, D = search_index(index, q_emb, top_k=1)
best_id = I[0][0]
relevant_chunk = chunks[best_id]
return relevant_chunk
def post(self, shared, prep_res, relevant_chunk):
shared["retrieved_chunk"] = relevant_chunk
print("Retrieved chunk:", relevant_chunk[:60], "...")
class GenerateAnswer(Node):
def prep(self, shared):
return shared["question"], shared["retrieved_chunk"]
def exec(self, inputs):
question, chunk = inputs
prompt = f"Question: {question}\nContext: {chunk}\nAnswer:"
return call_llm(prompt)
def post(self, shared, prep_res, answer):
shared["answer"] = answer
print("Answer:", answer)
embed_qnode = EmbedQuery()
retrieve_node = RetrieveDocs()
generate_node = GenerateAnswer()
embed_qnode >> retrieve_node >> generate_node
OnlineFlow = Flow(start=embed_qnode)
```
Usage example:
```python
# Suppose we already ran OfflineFlow and have:
# shared["all_chunks"], shared["index"], etc.
shared["question"] = "Why do people like cats?"
OnlineFlow.run(shared)
# final answer in shared["answer"]
```
================================================
File: docs/design_pattern/structure.md
================================================
---
layout: default
title: "Structured Output"
parent: "Design Pattern"
nav_order: 5
---
# Structured Output
In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.
There are several approaches to achieve a structured output:
- **Prompting** the LLM to strictly return a defined structure.
- Using LLMs that natively support **schema enforcement**.
- **Post-processing** the LLM's response to extract structured content.
In practice, **Prompting** is simple and reliable for modern LLMs.
### Example Use Cases
- Extracting Key Information
```yaml
product:
name: Widget Pro
price: 199.99
description: |
A high-quality widget designed for professionals.
Recommended for advanced users.
```
- Summarizing Documents into Bullet Points
```yaml
summary:
- This product is easy to use.
- It is cost-effective.
- Suitable for all skill levels.
```
- Generating Configuration Files
```yaml
server:
host: 127.0.0.1
port: 8080
ssl: true
```
## Prompt Engineering
When prompting the LLM to produce **structured** output:
1. **Wrap** the structure in code fences (e.g., `yaml`).
2. **Validate** that all required fields exist (and let `Node` handles retry).
### Example Text Summarization
```python
class SummarizeNode(Node):
def exec(self, prep_res):
# Suppose `prep_res` is the text to summarize.
prompt = f"""
Please summarize the following text as YAML, with exactly 3 bullet points
{prep_res}
Now, output:
```yaml
summary:
- bullet 1
- bullet 2
- bullet 3
```"""
response = call_llm(prompt)
yaml_str = response.split("```yaml")[1].split("```")[0].strip()
import yaml
structured_result = yaml.safe_load(yaml_str)
assert "summary" in structured_result
assert isinstance(structured_result["summary"], list)
return structured_result
```
> Besides using `assert` statements, another popular way to validate schemas is [Pydantic](https://github.com/pydantic/pydantic)
{: .note }
### Why YAML instead of JSON?
Current LLMs struggle with escaping. YAML is easier with strings since they don't always need quotes.
**In JSON**
```json
{
"dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
}
```
- Every double quote inside the string must be escaped with `\"`.
- Each newline in the dialogue must be represented as `\n`.
**In YAML**
```yaml
dialogue: |
Alice said: "Hello Bob.
How are you?
I am good."
```
- No need to escape interior quotes—just place the entire text under a block literal (`|`).
- Newlines are naturally preserved without needing `\n`.
================================================
File: docs/design_pattern/workflow.md
================================================
---
layout: default
title: "Workflow"
parent: "Design Pattern"
nav_order: 2
---
# Workflow
Many real-world tasks are too complex for one LLM call. The solution is to **Task Decomposition**: decompose them into a [chain](../core_abstraction/flow.md) of multiple Nodes.
> - You don't want to make each task **too coarse**, because it may be *too complex for one LLM call*.
> - You don't want to make each task **too granular**, because then *the LLM call doesn't have enough context* and results are *not consistent across nodes*.
>
> You usually need multiple *iterations* to find the *sweet spot*. If the task has too many *edge cases*, consider using [Agents](./agent.md).
{: .best-practice }
### Example: Article Writing
```python
class GenerateOutline(Node):
def prep(self, shared): return shared["topic"]
def exec(self, topic): return call_llm(f"Create a detailed outline for an article about {topic}")
def post(self, shared, prep_res, exec_res): shared["outline"] = exec_res
class WriteSection(Node):
def prep(self, shared): return shared["outline"]
def exec(self, outline): return call_llm(f"Write content based on this outline: {outline}")
def post(self, shared, prep_res, exec_res): shared["draft"] = exec_res
class ReviewAndRefine(Node):
def prep(self, shared): return shared["draft"]
def exec(self, draft): return call_llm(f"Review and improve this draft: {draft}")
def post(self, shared, prep_res, exec_res): shared["final_article"] = exec_res
# Connect nodes
outline = GenerateOutline()
write = WriteSection()
review = ReviewAndRefine()
outline >> write >> review
# Create and run flow
writing_flow = Flow(start=outline)
shared = {"topic": "AI Safety"}
writing_flow.run(shared)
```
For *dynamic cases*, consider using [Agents](./agent.md).
================================================
File: docs/utility_function/llm.md
================================================
---
layout: default
title: "LLM Wrapper"
parent: "Utility Function"
nav_order: 1
---
# LLM Wrappers
Check out libraries like [litellm](https://github.com/BerriAI/litellm).
Here, we provide some minimal example implementations:
1. OpenAI
```python
def call_llm(prompt):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
# Example usage
call_llm("How are you?")
```
> Store the API key in an environment variable like OPENAI_API_KEY for security.
{: .best-practice }
2. Claude (Anthropic)
```python
def call_llm(prompt):
from anthropic import Anthropic
client = Anthropic(api_key="YOUR_API_KEY_HERE")
response = client.messages.create(
model="claude-2",
messages=[{"role": "user", "content": prompt}],
max_tokens=100
)
return response.content
```
3. Google (Generative AI Studio / PaLM API)
```python
def call_llm(prompt):
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY_HERE")
response = genai.generate_text(
model="models/text-bison-001",
prompt=prompt
)
return response.result
```
4. Azure (Azure OpenAI)
```python
def call_llm(prompt):
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://.openai.azure.com/",
api_key="YOUR_API_KEY_HERE",
api_version="2023-05-15"
)
r = client.chat.completions.create(
model="",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
```
5. Ollama (Local LLM)
```python
def call_llm(prompt):
from ollama import chat
response = chat(
model="llama2",
messages=[{"role": "user", "content": prompt}]
)
return response.message.content
```
## Improvements
Feel free to enhance your `call_llm` function as needed. Here are examples:
- Handle chat history:
```python
def call_llm(messages):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return r.choices[0].message.content
```
- Add in-memory caching
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def call_llm(prompt):
# Your implementation here
pass
```
> ⚠️ Caching conflicts with Node retries, as retries yield the same result.
>
> To address this, you could use cached results only if not retried.
{: .warning }
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_call(prompt):
pass
def call_llm(prompt, use_cache):
if use_cache:
return cached_call(prompt)
# Call the underlying function directly
return cached_call.__wrapped__(prompt)
class SummarizeNode(Node):
def exec(self, text):
return call_llm(f"Summarize: {text}", self.cur_retry==0)
```
- Enable logging:
```python
def call_llm(prompt):
import logging
logging.info(f"Prompt: {prompt}")
response = ... # Your implementation here
logging.info(f"Response: {response}")
return response
```
================================================
FILE: .cursorrules
================================================
---
layout: default
title: "Agentic Coding"
---
# Agentic Coding: Humans Design, Agents code!
> If you are an AI agents involved in building LLM Systems, read this guide **VERY, VERY** carefully! This is the most important chapter in the entire document. Throughout development, you should always (1) start with a small and simple solution, (2) design at a high level (`docs/design.md`) before implementation, and (3) frequently ask humans for feedback and clarification.
{: .warning }
## Agentic Coding Steps
Agentic Coding should be a collaboration between Human System Design and Agent Implementation:
| Steps | Human | AI | Comment |
|:-----------------------|:----------:|:---------:|:------------------------------------------------------------------------|
| 1. Requirements | ★★★ High | ★☆☆ Low | Humans understand the requirements and context. |
| 2. Flow | ★★☆ Medium | ★★☆ Medium | Humans specify the high-level design, and the AI fills in the details. |
| 3. Utilities | ★★☆ Medium | ★★☆ Medium | Humans provide available external APIs and integrations, and the AI helps with implementation. |
| 4. Node | ★☆☆ Low | ★★★ High | The AI helps design the node types and data handling based on the flow. |
| 5. Implementation | ★☆☆ Low | ★★★ High | The AI implements the flow based on the design. |
| 6. Optimization | ★★☆ Medium | ★★☆ Medium | Humans evaluate the results, and the AI helps optimize. |
| 7. Reliability | ★☆☆ Low | ★★★ High | The AI writes test cases and addresses corner cases. |
1. **Requirements**: Clarify the requirements for your project, and evaluate whether an AI system is a good fit.
- Understand AI systems' strengths and limitations:
- **Good for**: Routine tasks requiring common sense (filling forms, replying to emails)
- **Good for**: Creative tasks with well-defined inputs (building slides, writing SQL)
- **Not good for**: Ambiguous problems requiring complex decision-making (business strategy, startup planning)
- **Keep It User-Centric:** Explain the "problem" from the user's perspective rather than just listing features.
- **Balance complexity vs. impact**: Aim to deliver the highest value features with minimal complexity early.
2. **Flow Design**: Outline at a high level, describe how your AI system orchestrates nodes.
- Identify applicable design patterns (e.g., [Map Reduce](./design_pattern/mapreduce.md), [Agent](./design_pattern/agent.md), [RAG](./design_pattern/rag.md)).
- For each node in the flow, start with a high-level one-line description of what it does.
- If using **Map Reduce**, specify how to map (what to split) and how to reduce (how to combine).
- If using **Agent**, specify what are the inputs (context) and what are the possible actions.
- If using **RAG**, specify what to embed, noting that there's usually both offline (indexing) and online (retrieval) workflows.
- Outline the flow and draw it in a mermaid diagram. For example:
```mermaid
flowchart LR
start[Start] --> batch[Batch]
batch --> check[Check]
check -->|OK| process
check -->|Error| fix[Fix]
fix --> check
subgraph process[Process]
step1[Step 1] --> step2[Step 2]
end
process --> endNode[End]
```
- > **If Humans can't specify the flow, AI Agents can't automate it!** Before building an LLM system, thoroughly understand the problem and potential solution by manually solving example inputs to develop intuition.
{: .best-practice }
3. **Utilities**: Based on the Flow Design, identify and implement necessary utility functions.
- Think of your AI system as the brain. It needs a body—these *external utility functions*—to interact with the real world:
- Reading inputs (e.g., retrieving Slack messages, reading emails)
- Writing outputs (e.g., generating reports, sending emails)
- Using external tools (e.g., calling LLMs, searching the web)
- **NOTE**: *LLM-based tasks* (e.g., summarizing text, analyzing sentiment) are **NOT** utility functions; rather, they are *core functions* internal in the AI system.
- For each utility function, implement it and write a simple test.
- Document their input/output, as well as why they are necessary. For example:
- `name`: `get_embedding` (`utils/get_embedding.py`)
- `input`: `str`
- `output`: a vector of 3072 floats
- `necessity`: Used by the second node to embed text
- Example utility implementation:
```python
# utils/call_llm.py
from openai import OpenAI
def call_llm(prompt):
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
if __name__ == "__main__":
prompt = "What is the meaning of life?"
print(call_llm(prompt))
```
- > **Sometimes, design Utilies before Flow:** For example, for an LLM project to automate a legacy system, the bottleneck will likely be the available interface to that system. Start by designing the hardest utilities for interfacing, and then build the flow around them.
{: .best-practice }
4. **Node Design**: Plan how each node will read and write data, and use utility functions.
- One core design principle for PocketFlow is to use a [shared store](./core_abstraction/communication.md), so start with a shared store design:
- For simple systems, use an in-memory dictionary.
- For more complex systems or when persistence is required, use a database.
- **Don't Repeat Yourself**: Use in-memory references or foreign keys.
- Example shared store design:
```python
shared = {
"user": {
"id": "user123",
"context": { # Another nested dict
"weather": {"temp": 72, "condition": "sunny"},
"location": "San Francisco"
}
},
"results": {} # Empty dict to store outputs
}
```
- For each [Node](./core_abstraction/node.md), describe its type, how it reads and writes data, and which utility function it uses. Keep it specific but high-level without codes. For example:
- `type`: Regular (or Batch, or Async)
- `prep`: Read "text" from the shared store
- `exec`: Call the embedding utility function
- `post`: Write "embedding" to the shared store
5. **Implementation**: Implement the initial nodes and flows based on the design.
- 🎉 If you've reached this step, humans have finished the design. Now *Agentic Coding* begins!
- **"Keep it simple, stupid!"** Avoid complex features and full-scale type checking.
- **FAIL FAST**! Avoid `try` logic so you can quickly identify any weak points in the system.
- Add logging throughout the code to facilitate debugging.
7. **Optimization**:
- **Use Intuition**: For a quick initial evaluation, human intuition is often a good start.
- **Redesign Flow (Back to Step 3)**: Consider breaking down tasks further, introducing agentic decisions, or better managing input contexts.
- If your flow design is already solid, move on to micro-optimizations:
- **Prompt Engineering**: Use clear, specific instructions with examples to reduce ambiguity.
- **In-Context Learning**: Provide robust examples for tasks that are difficult to specify with instructions alone.
- > **You'll likely iterate a lot!** Expect to repeat Steps 3–6 hundreds of times.
>
>
{: .best-practice }
8. **Reliability**
- **Node Retries**: Add checks in the node `exec` to ensure outputs meet requirements, and consider increasing `max_retries` and `wait` times.
- **Logging and Visualization**: Maintain logs of all attempts and visualize node results for easier debugging.
- **Self-Evaluation**: Add a separate node (powered by an LLM) to review outputs when results are uncertain.
## Example LLM Project File Structure
```
my_project/
├── main.py
├── nodes.py
├── flow.py
├── utils/
│ ├── __init__.py
│ ├── call_llm.py
│ └── search_web.py
├── requirements.txt
└── docs/
└── design.md
```
- **`docs/design.md`**: Contains project documentation for each step above. This should be *high-level* and *no-code*.
- **`utils/`**: Contains all utility functions.
- It's recommended to dedicate one Python file to each API call, for example `call_llm.py` or `search_web.py`.
- Each file should also include a `main()` function to try that API call
- **`nodes.py`**: Contains all the node definitions.
```python
# nodes.py
from pocketflow import Node
from utils.call_llm import call_llm
class GetQuestionNode(Node):
def exec(self, _):
# Get question directly from user input
user_question = input("Enter your question: ")
return user_question
def post(self, shared, prep_res, exec_res):
# Store the user's question
shared["question"] = exec_res
return "default" # Go to the next node
class AnswerNode(Node):
def prep(self, shared):
# Read question from shared
return shared["question"]
def exec(self, question):
# Call LLM to get the answer
return call_llm(question)
def post(self, shared, prep_res, exec_res):
# Store the answer in shared
shared["answer"] = exec_res
```
- **`flow.py`**: Implements functions that create flows by importing node definitions and connecting them.
```python
# flow.py
from pocketflow import Flow
from nodes import GetQuestionNode, AnswerNode
def create_qa_flow():
"""Create and return a question-answering flow."""
# Create nodes
get_question_node = GetQuestionNode()
answer_node = AnswerNode()
# Connect nodes in sequence
get_question_node >> answer_node
# Create flow starting with input node
return Flow(start=get_question_node)
```
- **`main.py`**: Serves as the project's entry point.
```python
# main.py
from flow import create_qa_flow
# Example main function
# Please replace this with your own main function
def main():
shared = {
"question": None, # Will be populated by GetQuestionNode from user input
"answer": None # Will be populated by AnswerNode
}
# Create the flow and run it
qa_flow = create_qa_flow()
qa_flow.run(shared)
print(f"Question: {shared['question']}")
print(f"Answer: {shared['answer']}")
if __name__ == "__main__":
main()
```
================================================
File: docs/index.md
================================================
---
layout: default
title: "Home"
nav_order: 1
---
# Pocket Flow
A [100-line](https://github.com/the-pocket/PocketFlow/blob/main/pocketflow/__init__.py) minimalist LLM framework for *Agents, Task Decomposition, RAG, etc*.
- **Lightweight**: Just the core graph abstraction in 100 lines. ZERO dependencies, and vendor lock-in.
- **Expressive**: Everything you love from larger frameworks—([Multi-](./design_pattern/multi_agent.html))[Agents](./design_pattern/agent.html), [Workflow](./design_pattern/workflow.html), [RAG](./design_pattern/rag.html), and more.
- **Agentic-Coding**: Intuitive enough for AI agents to help humans build complex LLM applications.
## Core Abstraction
We model the LLM workflow as a **Graph + Shared Store**:
- [Node](./core_abstraction/node.md) handles simple (LLM) tasks.
- [Flow](./core_abstraction/flow.md) connects nodes through **Actions** (labeled edges).
- [Shared Store](./core_abstraction/communication.md) enables communication between nodes within flows.
- [Batch](./core_abstraction/batch.md) nodes/flows allow for data-intensive tasks.
- [Async](./core_abstraction/async.md) nodes/flows allow waiting for asynchronous tasks.
- [(Advanced) Parallel](./core_abstraction/parallel.md) nodes/flows handle I/O-bound tasks.
## Design Pattern
From there, it’s easy to implement popular design patterns:
- [Agent](./design_pattern/agent.md) autonomously makes decisions.
- [Workflow](./design_pattern/workflow.md) chains multiple tasks into pipelines.
- [RAG](./design_pattern/rag.md) integrates data retrieval with generation.
- [Map Reduce](./design_pattern/mapreduce.md) splits data tasks into Map and Reduce steps.
- [Structured Output](./design_pattern/structure.md) formats outputs consistently.
- [(Advanced) Multi-Agents](./design_pattern/multi_agent.md) coordinate multiple agents.
## Utility Function
We **do not** provide built-in utilities. Instead, we offer *examples*—please *implement your own*:
- [LLM Wrapper](./utility_function/llm.md)
- [Viz and Debug](./utility_function/viz.md)
- [Web Search](./utility_function/websearch.md)
- [Chunking](./utility_function/chunking.md)
- [Embedding](./utility_function/embedding.md)
- [Vector Databases](./utility_function/vector.md)
- [Text-to-Speech](./utility_function/text_to_speech.md)
**Why not built-in?**: I believe it's a *bad practice* for vendor-specific APIs in a general framework:
- *API Volatility*: Frequent changes lead to heavy maintenance for hardcoded APIs.
- *Flexibility*: You may want to switch vendors, use fine-tuned models, or run them locally.
- *Optimizations*: Prompt caching, batching, and streaming are easier without vendor lock-in.
## Ready to build your Apps?
Check out [Agentic Coding Guidance](./guide.md), the fastest way to develop LLM projects with Pocket Flow!
================================================
File: docs/core_abstraction/async.md
================================================
---
layout: default
title: "(Advanced) Async"
parent: "Core Abstraction"
nav_order: 5
---
# (Advanced) Async
**Async** Nodes implement `prep_async()`, `exec_async()`, `exec_fallback_async()`, and/or `post_async()`. This is useful for:
1. **prep_async()**: For *fetching/reading data (files, APIs, DB)* in an I/O-friendly way.
2. **exec_async()**: Typically used for async LLM calls.
3. **post_async()**: For *awaiting user feedback*, *coordinating across multi-agents* or any additional async steps after `exec_async()`.
**Note**: `AsyncNode` must be wrapped in `AsyncFlow`. `AsyncFlow` can also include regular (sync) nodes.
### Example
```python
class SummarizeThenVerify(AsyncNode):
async def prep_async(self, shared):
# Example: read a file asynchronously
doc_text = await read_file_async(shared["doc_path"])
return doc_text
async def exec_async(self, prep_res):
# Example: async LLM call
summary = await call_llm_async(f"Summarize: {prep_res}")
return summary
async def post_async(self, shared, prep_res, exec_res):
# Example: wait for user feedback
decision = await gather_user_feedback(exec_res)
if decision == "approve":
shared["summary"] = exec_res
return "approve"
return "deny"
summarize_node = SummarizeThenVerify()
final_node = Finalize()
# Define transitions
summarize_node - "approve" >> final_node
summarize_node - "deny" >> summarize_node # retry
flow = AsyncFlow(start=summarize_node)
async def main():
shared = {"doc_path": "document.txt"}
await flow.run_async(shared)
print("Final Summary:", shared.get("summary"))
asyncio.run(main())
```
================================================
File: docs/core_abstraction/batch.md
================================================
---
layout: default
title: "Batch"
parent: "Core Abstraction"
nav_order: 4
---
# Batch
**Batch** makes it easier to handle large inputs in one Node or **rerun** a Flow multiple times. Example use cases:
- **Chunk-based** processing (e.g., splitting large texts).
- **Iterative** processing over lists of input items (e.g., user queries, files, URLs).
## 1. BatchNode
A **BatchNode** extends `Node` but changes `prep()` and `exec()`:
- **`prep(shared)`**: returns an **iterable** (e.g., list, generator).
- **`exec(item)`**: called **once** per item in that iterable.
- **`post(shared, prep_res, exec_res_list)`**: after all items are processed, receives a **list** of results (`exec_res_list`) and returns an **Action**.
### Example: Summarize a Large File
```python
class MapSummaries(BatchNode):
def prep(self, shared):
# Suppose we have a big file; chunk it
content = shared["data"]
chunk_size = 10000
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
return chunks
def exec(self, chunk):
prompt = f"Summarize this chunk in 10 words: {chunk}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res_list):
combined = "\n".join(exec_res_list)
shared["summary"] = combined
return "default"
map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)
```
---
## 2. BatchFlow
A **BatchFlow** runs a **Flow** multiple times, each time with different `params`. Think of it as a loop that replays the Flow for each parameter set.
### Example: Summarize Many Files
```python
class SummarizeAllFiles(BatchFlow):
def prep(self, shared):
# Return a list of param dicts (one per file)
filenames = list(shared["data"].keys()) # e.g., ["file1.txt", "file2.txt", ...]
return [{"filename": fn} for fn in filenames]
# Suppose we have a per-file Flow (e.g., load_file >> summarize >> reduce):
summarize_file = SummarizeFile(start=load_file)
# Wrap that flow into a BatchFlow:
summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)
```
### Under the Hood
1. `prep(shared)` returns a list of param dicts—e.g., `[{filename: "file1.txt"}, {filename: "file2.txt"}, ...]`.
2. The **BatchFlow** loops through each dict. For each one:
- It merges the dict with the BatchFlow’s own `params`.
- It calls `flow.run(shared)` using the merged result.
3. This means the sub-Flow is run **repeatedly**, once for every param dict.
---
## 3. Nested or Multi-Level Batches
You can nest a **BatchFlow** in another **BatchFlow**. For instance:
- **Outer** batch: returns a list of diretory param dicts (e.g., `{"directory": "/pathA"}`, `{"directory": "/pathB"}`, ...).
- **Inner** batch: returning a list of per-file param dicts.
At each level, **BatchFlow** merges its own param dict with the parent’s. By the time you reach the **innermost** node, the final `params` is the merged result of **all** parents in the chain. This way, a nested structure can keep track of the entire context (e.g., directory + file name) at once.
```python
class FileBatchFlow(BatchFlow):
def prep(self, shared):
directory = self.params["directory"]
# e.g., files = ["file1.txt", "file2.txt", ...]
files = [f for f in os.listdir(directory) if f.endswith(".txt")]
return [{"filename": f} for f in files]
class DirectoryBatchFlow(BatchFlow):
def prep(self, shared):
directories = [ "/path/to/dirA", "/path/to/dirB"]
return [{"directory": d} for d in directories]
# MapSummaries have params like {"directory": "/path/to/dirA", "filename": "file1.txt"}
inner_flow = FileBatchFlow(start=MapSummaries())
outer_flow = DirectoryBatchFlow(start=inner_flow)
```
================================================
File: docs/core_abstraction/communication.md
================================================
---
layout: default
title: "Communication"
parent: "Core Abstraction"
nav_order: 3
---
# Communication
Nodes and Flows **communicate** in 2 ways:
1. **Shared Store (for almost all the cases)**
- A global data structure (often an in-mem dict) that all nodes can read ( `prep()`) and write (`post()`).
- Great for data results, large content, or anything multiple nodes need.
- You shall design the data structure and populate it ahead.
- > **Separation of Concerns:** Use `Shared Store` for almost all cases to separate *Data Schema* from *Compute Logic*! This approach is both flexible and easy to manage, resulting in more maintainable code. `Params` is more a syntax sugar for [Batch](./batch.md).
{: .best-practice }
2. **Params (only for [Batch](./batch.md))**
- Each node has a local, ephemeral `params` dict passed in by the **parent Flow**, used as an identifier for tasks. Parameter keys and values shall be **immutable**.
- Good for identifiers like filenames or numeric IDs, in Batch mode.
If you know memory management, think of the **Shared Store** like a **heap** (shared by all function calls), and **Params** like a **stack** (assigned by the caller).
---
## 1. Shared Store
### Overview
A shared store is typically an in-mem dictionary, like:
```python
shared = {"data": {}, "summary": {}, "config": {...}, ...}
```
It can also contain local file handlers, DB connections, or a combination for persistence. We recommend deciding the data structure or DB schema first based on your app requirements.
### Example
```python
class LoadData(Node):
def post(self, shared, prep_res, exec_res):
# We write data to shared store
shared["data"] = "Some text content"
return None
class Summarize(Node):
def prep(self, shared):
# We read data from shared store
return shared["data"]
def exec(self, prep_res):
# Call LLM to summarize
prompt = f"Summarize: {prep_res}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res):
# We write summary to shared store
shared["summary"] = exec_res
return "default"
load_data = LoadData()
summarize = Summarize()
load_data >> summarize
flow = Flow(start=load_data)
shared = {}
flow.run(shared)
```
Here:
- `LoadData` writes to `shared["data"]`.
- `Summarize` reads from `shared["data"]`, summarizes, and writes to `shared["summary"]`.
---
## 2. Params
**Params** let you store *per-Node* or *per-Flow* config that doesn't need to live in the shared store. They are:
- **Immutable** during a Node's run cycle (i.e., they don't change mid-`prep->exec->post`).
- **Set** via `set_params()`.
- **Cleared** and updated each time a parent Flow calls it.
> Only set the uppermost Flow params because others will be overwritten by the parent Flow.
>
> If you need to set child node params, see [Batch](./batch.md).
{: .warning }
Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store.
### Example
```python
# 1) Create a Node that uses params
class SummarizeFile(Node):
def prep(self, shared):
# Access the node's param
filename = self.params["filename"]
return shared["data"].get(filename, "")
def exec(self, prep_res):
prompt = f"Summarize: {prep_res}"
return call_llm(prompt)
def post(self, shared, prep_res, exec_res):
filename = self.params["filename"]
shared["summary"][filename] = exec_res
return "default"
# 2) Set params
node = SummarizeFile()
# 3) Set Node params directly (for testing)
node.set_params({"filename": "doc1.txt"})
node.run(shared)
# 4) Create Flow
flow = Flow(start=node)
# 5) Set Flow params (overwrites node params)
flow.set_params({"filename": "doc2.txt"})
flow.run(shared) # The node summarizes doc2, not doc1
```
================================================
File: docs/core_abstraction/flow.md
================================================
---
layout: default
title: "Flow"
parent: "Core Abstraction"
nav_order: 2
---
# Flow
A **Flow** orchestrates a graph of Nodes. You can chain Nodes in a sequence or create branching depending on the **Actions** returned from each Node's `post()`.
## 1. Action-based Transitions
Each Node's `post()` returns an **Action** string. By default, if `post()` doesn't return anything, we treat that as `"default"`.
You define transitions with the syntax:
1. **Basic default transition**: `node_a >> node_b`
This means if `node_a.post()` returns `"default"`, go to `node_b`.
(Equivalent to `node_a - "default" >> node_b`)
2. **Named action transition**: `node_a - "action_name" >> node_b`
This means if `node_a.post()` returns `"action_name"`, go to `node_b`.
It's possible to create loops, branching, or multi-step flows.
## 2. Creating a Flow
A **Flow** begins with a **start** node. You call `Flow(start=some_node)` to specify the entry point. When you call `flow.run(shared)`, it executes the start node, looks at its returned Action from `post()`, follows the transition, and continues until there's no next node.
### Example: Simple Sequence
Here's a minimal flow of two nodes in a chain:
```python
node_a >> node_b
flow = Flow(start=node_a)
flow.run(shared)
```
- When you run the flow, it executes `node_a`.
- Suppose `node_a.post()` returns `"default"`.
- The flow then sees `"default"` Action is linked to `node_b` and runs `node_b`.
- `node_b.post()` returns `"default"` but we didn't define `node_b >> something_else`. So the flow ends there.
### Example: Branching & Looping
Here's a simple expense approval flow that demonstrates branching and looping. The `ReviewExpense` node can return three possible Actions:
- `"approved"`: expense is approved, move to payment processing
- `"needs_revision"`: expense needs changes, send back for revision
- `"rejected"`: expense is denied, finish the process
We can wire them like this:
```python
# Define the flow connections
review - "approved" >> payment # If approved, process payment
review - "needs_revision" >> revise # If needs changes, go to revision
review - "rejected" >> finish # If rejected, finish the process
revise >> review # After revision, go back for another review
payment >> finish # After payment, finish the process
flow = Flow(start=review)
```
Let's see how it flows:
1. If `review.post()` returns `"approved"`, the expense moves to the `payment` node
2. If `review.post()` returns `"needs_revision"`, it goes to the `revise` node, which then loops back to `review`
3. If `review.post()` returns `"rejected"`, it moves to the `finish` node and stops
```mermaid
flowchart TD
review[Review Expense] -->|approved| payment[Process Payment]
review -->|needs_revision| revise[Revise Report]
review -->|rejected| finish[Finish Process]
revise --> review
payment --> finish
```
### Running Individual Nodes vs. Running a Flow
- `node.run(shared)`: Just runs that node alone (calls `prep->exec->post()`), returns an Action.
- `flow.run(shared)`: Executes from the start node, follows Actions to the next node, and so on until the flow can't continue.
> `node.run(shared)` **does not** proceed to the successor.
> This is mainly for debugging or testing a single node.
>
> Always use `flow.run(...)` in production to ensure the full pipeline runs correctly.
{: .warning }
## 3. Nested Flows
A **Flow** can act like a Node, which enables powerful composition patterns. This means you can:
1. Use a Flow as a Node within another Flow's transitions.
2. Combine multiple smaller Flows into a larger Flow for reuse.
3. Node `params` will be a merging of **all** parents' `params`.
### Flow's Node Methods
A **Flow** is also a **Node**, so it will run `prep()` and `post()`. However:
- It **won't** run `exec()`, as its main logic is to orchestrate its nodes.
- `post()` always receives `None` for `exec_res` and should instead get the flow execution results from the shared store.
### Basic Flow Nesting
Here's how to connect a flow to another node:
```python
# Create a sub-flow
node_a >> node_b
subflow = Flow(start=node_a)
# Connect it to another node
subflow >> node_c
# Create the parent flow
parent_flow = Flow(start=subflow)
```
When `parent_flow.run()` executes:
1. It starts `subflow`
2. `subflow` runs through its nodes (`node_a->node_b`)
3. After `subflow` completes, execution continues to `node_c`
### Example: Order Processing Pipeline
Here's a practical example that breaks down order processing into nested flows:
```python
# Payment processing sub-flow
validate_payment >> process_payment >> payment_confirmation
payment_flow = Flow(start=validate_payment)
# Inventory sub-flow
check_stock >> reserve_items >> update_inventory
inventory_flow = Flow(start=check_stock)
# Shipping sub-flow
create_label >> assign_carrier >> schedule_pickup
shipping_flow = Flow(start=create_label)
# Connect the flows into a main order pipeline
payment_flow >> inventory_flow >> shipping_flow
# Create the master flow
order_pipeline = Flow(start=payment_flow)
# Run the entire pipeline
order_pipeline.run(shared_data)
```
This creates a clean separation of concerns while maintaining a clear execution path:
```mermaid
flowchart LR
subgraph order_pipeline[Order Pipeline]
subgraph paymentFlow["Payment Flow"]
A[Validate Payment] --> B[Process Payment] --> C[Payment Confirmation]
end
subgraph inventoryFlow["Inventory Flow"]
D[Check Stock] --> E[Reserve Items] --> F[Update Inventory]
end
subgraph shippingFlow["Shipping Flow"]
G[Create Label] --> H[Assign Carrier] --> I[Schedule Pickup]
end
paymentFlow --> inventoryFlow
inventoryFlow --> shippingFlow
end
```
================================================
File: docs/core_abstraction/node.md
================================================
---
layout: default
title: "Node"
parent: "Core Abstraction"
nav_order: 1
---
# Node
A **Node** is the smallest building block. Each Node has 3 steps `prep->exec->post`:
1. `prep(shared)`
- **Read and preprocess data** from `shared` store.
- Examples: *query DB, read files, or serialize data into a string*.
- Return `prep_res`, which is used by `exec()` and `post()`.
2. `exec(prep_res)`
- **Execute compute logic**, with optional retries and error handling (below).
- Examples: *(mostly) LLM calls, remote APIs, tool use*.
- ⚠️ This shall be only for compute and **NOT** access `shared`.
- ⚠️ If retries enabled, ensure idempotent implementation.
- Return `exec_res`, which is passed to `post()`.
3. `post(shared, prep_res, exec_res)`
- **Postprocess and write data** back to `shared`.
- Examples: *update DB, change states, log results*.
- **Decide the next action** by returning a *string* (`action = "default"` if *None*).
> **Why 3 steps?** To enforce the principle of *separation of concerns*. The data storage and data processing are operated separately.
>
> All steps are *optional*. E.g., you can only implement `prep` and `post` if you just need to process data.
{: .note }
### Fault Tolerance & Retries
You can **retry** `exec()` if it raises an exception via two parameters when define the Node:
- `max_retries` (int): Max times to run `exec()`. The default is `1` (**no** retry).
- `wait` (int): The time to wait (in **seconds**) before next retry. By default, `wait=0` (no waiting).
`wait` is helpful when you encounter rate-limits or quota errors from your LLM provider and need to back off.
```python
my_node = SummarizeFile(max_retries=3, wait=10)
```
When an exception occurs in `exec()`, the Node automatically retries until:
- It either succeeds, or
- The Node has retried `max_retries - 1` times already and fails on the last attempt.
You can get the current retry times (0-based) from `self.cur_retry`.
```python
class RetryNode(Node):
def exec(self, prep_res):
print(f"Retry {self.cur_retry} times")
raise Exception("Failed")
```
### Graceful Fallback
To **gracefully handle** the exception (after all retries) rather than raising it, override:
```python
def exec_fallback(self, prep_res, exc):
raise exc
```
By default, it just re-raises exception. But you can return a fallback result instead, which becomes the `exec_res` passed to `post()`.
### Example: Summarize file
```python
class SummarizeFile(Node):
def prep(self, shared):
return shared["data"]
def exec(self, prep_res):
if not prep_res:
return "Empty file content"
prompt = f"Summarize this text in 10 words: {prep_res}"
summary = call_llm(prompt) # might fail
return summary
def exec_fallback(self, prep_res, exc):
# Provide a simple fallback instead of crashing
return "There was an error processing your request."
def post(self, shared, prep_res, exec_res):
shared["summary"] = exec_res
# Return "default" by not returning
summarize_node = SummarizeFile(max_retries=3)
# node.run() calls prep->exec->post
# If exec() fails, it retries up to 3 times before calling exec_fallback()
action_result = summarize_node.run(shared)
print("Action returned:", action_result) # "default"
print("Summary stored:", shared["summary"])
```
================================================
File: docs/core_abstraction/parallel.md
================================================
---
layout: default
title: "(Advanced) Parallel"
parent: "Core Abstraction"
nav_order: 6
---
# (Advanced) Parallel
**Parallel** Nodes and Flows let you run multiple **Async** Nodes and Flows **concurrently**—for example, summarizing multiple texts at once. This can improve performance by overlapping I/O and compute.
> Because of Python’s GIL, parallel nodes and flows can’t truly parallelize CPU-bound tasks (e.g., heavy numerical computations). However, they excel at overlapping I/O-bound work—like LLM calls, database queries, API requests, or file I/O.
{: .warning }
> - **Ensure Tasks Are Independent**: If each item depends on the output of a previous item, **do not** parallelize.
>
> - **Beware of Rate Limits**: Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals).
>
> - **Consider Single-Node Batch APIs**: Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests and mitigates rate limits.
{: .best-practice }
## AsyncParallelBatchNode
Like **AsyncBatchNode**, but run `exec_async()` in **parallel**:
```python
class ParallelSummaries(AsyncParallelBatchNode):
async def prep_async(self, shared):
# e.g., multiple texts
return shared["texts"]
async def exec_async(self, text):
prompt = f"Summarize: {text}"
return await call_llm_async(prompt)
async def post_async(self, shared, prep_res, exec_res_list):
shared["summary"] = "\n\n".join(exec_res_list)
return "default"
node = ParallelSummaries()
flow = AsyncFlow(start=node)
```
## AsyncParallelBatchFlow
Parallel version of **BatchFlow**. Each iteration of the sub-flow runs **concurrently** using different parameters:
```python
class SummarizeMultipleFiles(AsyncParallelBatchFlow):
async def prep_async(self, shared):
return [{"filename": f} for f in shared["files"]]
sub_flow = AsyncFlow(start=LoadAndSummarizeFile())
parallel_flow = SummarizeMultipleFiles(start=sub_flow)
await parallel_flow.run_async(shared)
```
================================================
File: docs/design_pattern/agent.md
================================================
---
layout: default
title: "Agent"
parent: "Design Pattern"
nav_order: 1
---
# Agent
Agent is a powerful design pattern in which nodes can take dynamic actions based on the context.
## Implement Agent with Graph
1. **Context and Action:** Implement nodes that supply context and perform actions.
2. **Branching:** Use branching to connect each action node to an agent node. Use action to allow the agent to direct the [flow](../core_abstraction/flow.md) between nodes—and potentially loop back for multi-step.
3. **Agent Node:** Provide a prompt to decide action—for example:
```python
f"""
### CONTEXT
Task: {task_description}
Previous Actions: {previous_actions}
Current State: {current_state}
### ACTION SPACE
[1] search
Description: Use web search to get results
Parameters:
- query (str): What to search for
[2] answer
Description: Conclude based on the results
Parameters:
- result (str): Final answer to provide
### NEXT ACTION
Decide the next action based on the current context and available action space.
Return your response in the following format:
```yaml
thinking: |
action:
parameters:
:
```"""
```
The core of building **high-performance** and **reliable** agents boils down to:
1. **Context Management:** Provide *relevant, minimal context.* For example, rather than including an entire chat history, retrieve the most relevant via [RAG](./rag.md). Even with larger context windows, LLMs still fall victim to ["lost in the middle"](https://arxiv.org/abs/2307.03172), overlooking mid-prompt content.
2. **Action Space:** Provide *a well-structured and unambiguous* set of actions—avoiding overlap like separate `read_databases` or `read_csvs`. Instead, import CSVs into the database.
## Example Good Action Design
- **Incremental:** Feed content in manageable chunks (500 lines or 1 page) instead of all at once.
- **Overview-zoom-in:** First provide high-level structure (table of contents, summary), then allow drilling into details (raw texts).
- **Parameterized/Programmable:** Instead of fixed actions, enable parameterized (columns to select) or programmable (SQL queries) actions, for example, to read CSV files.
- **Backtracking:** Let the agent undo the last step instead of restarting entirely, preserving progress when encountering errors or dead ends.
## Example: Search Agent
This agent:
1. Decides whether to search or answer
2. If searches, loops back to decide if more search needed
3. Answers when enough context gathered
```python
class DecideAction(Node):
def prep(self, shared):
context = shared.get("context", "No previous search")
query = shared["query"]
return query, context
def exec(self, inputs):
query, context = inputs
prompt = f"""
Given input: {query}
Previous search results: {context}
Should I: 1) Search web for more info 2) Answer with current knowledge
Output in yaml:
```yaml
action: search/answer
reason: why this action
search_term: search phrase if action is search
```"""
resp = call_llm(prompt)
yaml_str = resp.split("```yaml")[1].split("```")[0].strip()
result = yaml.safe_load(yaml_str)
assert isinstance(result, dict)
assert "action" in result
assert "reason" in result
assert result["action"] in ["search", "answer"]
if result["action"] == "search":
assert "search_term" in result
return result
def post(self, shared, prep_res, exec_res):
if exec_res["action"] == "search":
shared["search_term"] = exec_res["search_term"]
return exec_res["action"]
class SearchWeb(Node):
def prep(self, shared):
return shared["search_term"]
def exec(self, search_term):
return search_web(search_term)
def post(self, shared, prep_res, exec_res):
prev_searches = shared.get("context", [])
shared["context"] = prev_searches + [
{"term": shared["search_term"], "result": exec_res}
]
return "decide"
class DirectAnswer(Node):
def prep(self, shared):
return shared["query"], shared.get("context", "")
def exec(self, inputs):
query, context = inputs
return call_llm(f"Context: {context}\nAnswer: {query}")
def post(self, shared, prep_res, exec_res):
print(f"Answer: {exec_res}")
shared["answer"] = exec_res
# Connect nodes
decide = DecideAction()
search = SearchWeb()
answer = DirectAnswer()
decide - "search" >> search
decide - "answer" >> answer
search - "decide" >> decide # Loop back
flow = Flow(start=decide)
flow.run({"query": "Who won the Nobel Prize in Physics 2024?"})
```
================================================
File: docs/design_pattern/mapreduce.md
================================================
---
layout: default
title: "Map Reduce"
parent: "Design Pattern"
nav_order: 4
---
# Map Reduce
MapReduce is a design pattern suitable when you have either:
- Large input data (e.g., multiple files to process), or
- Large output data (e.g., multiple forms to fill)
and there is a logical way to break the task into smaller, ideally independent parts.
You first break down the task using [BatchNode](../core_abstraction/batch.md) in the map phase, followed by aggregation in the reduce phase.
### Example: Document Summarization
```python
class SummarizeAllFiles(BatchNode):
def prep(self, shared):
files_dict = shared["files"] # e.g. 10 files
return list(files_dict.items()) # [("file1.txt", "aaa..."), ("file2.txt", "bbb..."), ...]
def exec(self, one_file):
filename, file_content = one_file
summary_text = call_llm(f"Summarize the following file:\n{file_content}")
return (filename, summary_text)
def post(self, shared, prep_res, exec_res_list):
shared["file_summaries"] = dict(exec_res_list)
class CombineSummaries(Node):
def prep(self, shared):
return shared["file_summaries"]
def exec(self, file_summaries):
# format as: "File1: summary\nFile2: summary...\n"
text_list = []
for fname, summ in file_summaries.items():
text_list.append(f"{fname} summary:\n{summ}\n")
big_text = "\n---\n".join(text_list)
return call_llm(f"Combine these file summaries into one final summary:\n{big_text}")
def post(self, shared, prep_res, final_summary):
shared["all_files_summary"] = final_summary
batch_node = SummarizeAllFiles()
combine_node = CombineSummaries()
batch_node >> combine_node
flow = Flow(start=batch_node)
shared = {
"files": {
"file1.txt": "Alice was beginning to get very tired of sitting by her sister...",
"file2.txt": "Some other interesting text ...",
# ...
}
}
flow.run(shared)
print("Individual Summaries:", shared["file_summaries"])
print("\nFinal Summary:\n", shared["all_files_summary"])
```
================================================
File: docs/design_pattern/rag.md
================================================
---
layout: default
title: "RAG"
parent: "Design Pattern"
nav_order: 3
---
# RAG (Retrieval Augmented Generation)
For certain LLM tasks like answering questions, providing relevant context is essential. One common architecture is a **two-stage** RAG pipeline:
1. **Offline stage**: Preprocess and index documents ("building the index").
2. **Online stage**: Given a question, generate answers by retrieving the most relevant context.
---
## Stage 1: Offline Indexing
We create three Nodes:
1. `ChunkDocs` – [chunks](../utility_function/chunking.md) raw text.
2. `EmbedDocs` – [embeds](../utility_function/embedding.md) each chunk.
3. `StoreIndex` – stores embeddings into a [vector database](../utility_function/vector.md).
```python
class ChunkDocs(BatchNode):
def prep(self, shared):
# A list of file paths in shared["files"]. We process each file.
return shared["files"]
def exec(self, filepath):
# read file content. In real usage, do error handling.
with open(filepath, "r", encoding="utf-8") as f:
text = f.read()
# chunk by 100 chars each
chunks = []
size = 100
for i in range(0, len(text), size):
chunks.append(text[i : i + size])
return chunks
def post(self, shared, prep_res, exec_res_list):
# exec_res_list is a list of chunk-lists, one per file.
# flatten them all into a single list of chunks.
all_chunks = []
for chunk_list in exec_res_list:
all_chunks.extend(chunk_list)
shared["all_chunks"] = all_chunks
class EmbedDocs(BatchNode):
def prep(self, shared):
return shared["all_chunks"]
def exec(self, chunk):
return get_embedding(chunk)
def post(self, shared, prep_res, exec_res_list):
# Store the list of embeddings.
shared["all_embeds"] = exec_res_list
print(f"Total embeddings: {len(exec_res_list)}")
class StoreIndex(Node):
def prep(self, shared):
# We'll read all embeds from shared.
return shared["all_embeds"]
def exec(self, all_embeds):
# Create a vector index (faiss or other DB in real usage).
index = create_index(all_embeds)
return index
def post(self, shared, prep_res, index):
shared["index"] = index
# Wire them in sequence
chunk_node = ChunkDocs()
embed_node = EmbedDocs()
store_node = StoreIndex()
chunk_node >> embed_node >> store_node
OfflineFlow = Flow(start=chunk_node)
```
Usage example:
```python
shared = {
"files": ["doc1.txt", "doc2.txt"], # any text files
}
OfflineFlow.run(shared)
```
---
## Stage 2: Online Query & Answer
We have 3 nodes:
1. `EmbedQuery` – embeds the user’s question.
2. `RetrieveDocs` – retrieves top chunk from the index.
3. `GenerateAnswer` – calls the LLM with the question + chunk to produce the final answer.
```python
class EmbedQuery(Node):
def prep(self, shared):
return shared["question"]
def exec(self, question):
return get_embedding(question)
def post(self, shared, prep_res, q_emb):
shared["q_emb"] = q_emb
class RetrieveDocs(Node):
def prep(self, shared):
# We'll need the query embedding, plus the offline index/chunks
return shared["q_emb"], shared["index"], shared["all_chunks"]
def exec(self, inputs):
q_emb, index, chunks = inputs
I, D = search_index(index, q_emb, top_k=1)
best_id = I[0][0]
relevant_chunk = chunks[best_id]
return relevant_chunk
def post(self, shared, prep_res, relevant_chunk):
shared["retrieved_chunk"] = relevant_chunk
print("Retrieved chunk:", relevant_chunk[:60], "...")
class GenerateAnswer(Node):
def prep(self, shared):
return shared["question"], shared["retrieved_chunk"]
def exec(self, inputs):
question, chunk = inputs
prompt = f"Question: {question}\nContext: {chunk}\nAnswer:"
return call_llm(prompt)
def post(self, shared, prep_res, answer):
shared["answer"] = answer
print("Answer:", answer)
embed_qnode = EmbedQuery()
retrieve_node = RetrieveDocs()
generate_node = GenerateAnswer()
embed_qnode >> retrieve_node >> generate_node
OnlineFlow = Flow(start=embed_qnode)
```
Usage example:
```python
# Suppose we already ran OfflineFlow and have:
# shared["all_chunks"], shared["index"], etc.
shared["question"] = "Why do people like cats?"
OnlineFlow.run(shared)
# final answer in shared["answer"]
```
================================================
File: docs/design_pattern/structure.md
================================================
---
layout: default
title: "Structured Output"
parent: "Design Pattern"
nav_order: 5
---
# Structured Output
In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.
There are several approaches to achieve a structured output:
- **Prompting** the LLM to strictly return a defined structure.
- Using LLMs that natively support **schema enforcement**.
- **Post-processing** the LLM's response to extract structured content.
In practice, **Prompting** is simple and reliable for modern LLMs.
### Example Use Cases
- Extracting Key Information
```yaml
product:
name: Widget Pro
price: 199.99
description: |
A high-quality widget designed for professionals.
Recommended for advanced users.
```
- Summarizing Documents into Bullet Points
```yaml
summary:
- This product is easy to use.
- It is cost-effective.
- Suitable for all skill levels.
```
- Generating Configuration Files
```yaml
server:
host: 127.0.0.1
port: 8080
ssl: true
```
## Prompt Engineering
When prompting the LLM to produce **structured** output:
1. **Wrap** the structure in code fences (e.g., `yaml`).
2. **Validate** that all required fields exist (and let `Node` handles retry).
### Example Text Summarization
```python
class SummarizeNode(Node):
def exec(self, prep_res):
# Suppose `prep_res` is the text to summarize.
prompt = f"""
Please summarize the following text as YAML, with exactly 3 bullet points
{prep_res}
Now, output:
```yaml
summary:
- bullet 1
- bullet 2
- bullet 3
```"""
response = call_llm(prompt)
yaml_str = response.split("```yaml")[1].split("```")[0].strip()
import yaml
structured_result = yaml.safe_load(yaml_str)
assert "summary" in structured_result
assert isinstance(structured_result["summary"], list)
return structured_result
```
> Besides using `assert` statements, another popular way to validate schemas is [Pydantic](https://github.com/pydantic/pydantic)
{: .note }
### Why YAML instead of JSON?
Current LLMs struggle with escaping. YAML is easier with strings since they don't always need quotes.
**In JSON**
```json
{
"dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
}
```
- Every double quote inside the string must be escaped with `\"`.
- Each newline in the dialogue must be represented as `\n`.
**In YAML**
```yaml
dialogue: |
Alice said: "Hello Bob.
How are you?
I am good."
```
- No need to escape interior quotes—just place the entire text under a block literal (`|`).
- Newlines are naturally preserved without needing `\n`.
================================================
File: docs/design_pattern/workflow.md
================================================
---
layout: default
title: "Workflow"
parent: "Design Pattern"
nav_order: 2
---
# Workflow
Many real-world tasks are too complex for one LLM call. The solution is to **Task Decomposition**: decompose them into a [chain](../core_abstraction/flow.md) of multiple Nodes.
> - You don't want to make each task **too coarse**, because it may be *too complex for one LLM call*.
> - You don't want to make each task **too granular**, because then *the LLM call doesn't have enough context* and results are *not consistent across nodes*.
>
> You usually need multiple *iterations* to find the *sweet spot*. If the task has too many *edge cases*, consider using [Agents](./agent.md).
{: .best-practice }
### Example: Article Writing
```python
class GenerateOutline(Node):
def prep(self, shared): return shared["topic"]
def exec(self, topic): return call_llm(f"Create a detailed outline for an article about {topic}")
def post(self, shared, prep_res, exec_res): shared["outline"] = exec_res
class WriteSection(Node):
def prep(self, shared): return shared["outline"]
def exec(self, outline): return call_llm(f"Write content based on this outline: {outline}")
def post(self, shared, prep_res, exec_res): shared["draft"] = exec_res
class ReviewAndRefine(Node):
def prep(self, shared): return shared["draft"]
def exec(self, draft): return call_llm(f"Review and improve this draft: {draft}")
def post(self, shared, prep_res, exec_res): shared["final_article"] = exec_res
# Connect nodes
outline = GenerateOutline()
write = WriteSection()
review = ReviewAndRefine()
outline >> write >> review
# Create and run flow
writing_flow = Flow(start=outline)
shared = {"topic": "AI Safety"}
writing_flow.run(shared)
```
For *dynamic cases*, consider using [Agents](./agent.md).
================================================
File: docs/utility_function/llm.md
================================================
---
layout: default
title: "LLM Wrapper"
parent: "Utility Function"
nav_order: 1
---
# LLM Wrappers
Check out libraries like [litellm](https://github.com/BerriAI/litellm).
Here, we provide some minimal example implementations:
1. OpenAI
```python
def call_llm(prompt):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
# Example usage
call_llm("How are you?")
```
> Store the API key in an environment variable like OPENAI_API_KEY for security.
{: .best-practice }
2. Claude (Anthropic)
```python
def call_llm(prompt):
from anthropic import Anthropic
client = Anthropic(api_key="YOUR_API_KEY_HERE")
response = client.messages.create(
model="claude-2",
messages=[{"role": "user", "content": prompt}],
max_tokens=100
)
return response.content
```
3. Google (Generative AI Studio / PaLM API)
```python
def call_llm(prompt):
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY_HERE")
response = genai.generate_text(
model="models/text-bison-001",
prompt=prompt
)
return response.result
```
4. Azure (Azure OpenAI)
```python
def call_llm(prompt):
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://.openai.azure.com/",
api_key="YOUR_API_KEY_HERE",
api_version="2023-05-15"
)
r = client.chat.completions.create(
model="",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
```
5. Ollama (Local LLM)
```python
def call_llm(prompt):
from ollama import chat
response = chat(
model="llama2",
messages=[{"role": "user", "content": prompt}]
)
return response.message.content
```
## Improvements
Feel free to enhance your `call_llm` function as needed. Here are examples:
- Handle chat history:
```python
def call_llm(messages):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return r.choices[0].message.content
```
- Add in-memory caching
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def call_llm(prompt):
# Your implementation here
pass
```
> ⚠️ Caching conflicts with Node retries, as retries yield the same result.
>
> To address this, you could use cached results only if not retried.
{: .warning }
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_call(prompt):
pass
def call_llm(prompt, use_cache):
if use_cache:
return cached_call(prompt)
# Call the underlying function directly
return cached_call.__wrapped__(prompt)
class SummarizeNode(Node):
def exec(self, text):
return call_llm(f"Summarize: {text}", self.cur_retry==0)
```
- Enable logging:
```python
def call_llm(prompt):
import logging
logging.info(f"Prompt: {prompt}")
response = ... # Your implementation here
logging.info(f"Response: {response}")
return response
```
================================================
FILE: .dockerignore
================================================
# Byte-compiled / cache files
__pycache__/
*.py[cod]
*.pyo
*.pyd
# Virtual environments
venv/
env/
.venv/
.env/
# Distribution / packaging
*.egg-info/
build/
dist/
# Git and other VCS
.git/
.gitignore
# Editor files
*.swp
*.swo
*.bak
*.tmp
.DS_Store
.idea/
.vscode/
# Secrets (if you’re using .env for API keys etc.)
.env
================================================
FILE: .gitignore
================================================
# Dependencies
node_modules/
vendor/
.pnp/
.pnp.js
# Build outputs
dist/
build/
out/
*.pyc
__pycache__/
# Environment files
.env
.env.local
.env.*.local
.env.development
.env.test
.env.production
# Python virtual environments
.venv/
venv/
# IDE - VSCode
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
# IDE - JetBrains
.idea/
*.iml
*.iws
*.ipr
# IDE - Eclipse
.project
.classpath
.settings/
# Logs
logs/
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# Operating System
.DS_Store
Thumbs.db
*.swp
*.swo
# Testing
coverage/
.nyc_output/
# Temporary files
*.tmp
*.temp
.cache/
# Compiled files
*.com
*.class
*.dll
*.exe
*.o
*.so
# Package files
*.7z
*.dmg
*.gz
*.iso
*.jar
*.rar
*.tar
*.zip
# Database
*.sqlite
*.sqlite3
*.db
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Optional REPL history
.node_repl_history
# LLM cache
llm_cache.json
# Output files
output/
# uv manage
pyproject.toml
uv.lock
docs/*.pdf
docs/design-cn.md
================================================
FILE: .windsurfrules
================================================
---
layout: default
title: "Agentic Coding"
---
# Agentic Coding: Humans Design, Agents code!
> If you are an AI agents involved in building LLM Systems, read this guide **VERY, VERY** carefully! This is the most important chapter in the entire document. Throughout development, you should always (1) start with a small and simple solution, (2) design at a high level (`docs/design.md`) before implementation, and (3) frequently ask humans for feedback and clarification.
{: .warning }
## Agentic Coding Steps
Agentic Coding should be a collaboration between Human System Design and Agent Implementation:
| Steps | Human | AI | Comment |
|:-----------------------|:----------:|:---------:|:------------------------------------------------------------------------|
| 1. Requirements | ★★★ High | ★☆☆ Low | Humans understand the requirements and context. |
| 2. Flow | ★★☆ Medium | ★★☆ Medium | Humans specify the high-level design, and the AI fills in the details. |
| 3. Utilities | ★★☆ Medium | ★★☆ Medium | Humans provide available external APIs and integrations, and the AI helps with implementation. |
| 4. Node | ★☆☆ Low | ★★★ High | The AI helps design the node types and data handling based on the flow. |
| 5. Implementation | ★☆☆ Low | ★★★ High | The AI implements the flow based on the design. |
| 6. Optimization | ★★☆ Medium | ★★☆ Medium | Humans evaluate the results, and the AI helps optimize. |
| 7. Reliability | ★☆☆ Low | ★★★ High | The AI writes test cases and addresses corner cases. |
1. **Requirements**: Clarify the requirements for your project, and evaluate whether an AI system is a good fit.
- Understand AI systems' strengths and limitations:
- **Good for**: Routine tasks requiring common sense (filling forms, replying to emails)
- **Good for**: Creative tasks with well-defined inputs (building slides, writing SQL)
- **Not good for**: Ambiguous problems requiring complex decision-making (business strategy, startup planning)
- **Keep It User-Centric:** Explain the "problem" from the user's perspective rather than just listing features.
- **Balance complexity vs. impact**: Aim to deliver the highest value features with minimal complexity early.
2. **Flow Design**: Outline at a high level, describe how your AI system orchestrates nodes.
- Identify applicable design patterns (e.g., [Map Reduce](./design_pattern/mapreduce.md), [Agent](./design_pattern/agent.md), [RAG](./design_pattern/rag.md)).
- For each node in the flow, start with a high-level one-line description of what it does.
- If using **Map Reduce**, specify how to map (what to split) and how to reduce (how to combine).
- If using **Agent**, specify what are the inputs (context) and what are the possible actions.
- If using **RAG**, specify what to embed, noting that there's usually both offline (indexing) and online (retrieval) workflows.
- Outline the flow and draw it in a mermaid diagram. For example:
```mermaid
flowchart LR
start[Start] --> batch[Batch]
batch --> check[Check]
check -->|OK| process
check -->|Error| fix[Fix]
fix --> check
subgraph process[Process]
step1[Step 1] --> step2[Step 2]
end
process --> endNode[End]
```
- > **If Humans can't specify the flow, AI Agents can't automate it!** Before building an LLM system, thoroughly understand the problem and potential solution by manually solving example inputs to develop intuition.
{: .best-practice }
3. **Utilities**: Based on the Flow Design, identify and implement necessary utility functions.
- Think of your AI system as the brain. It needs a body—these *external utility functions*—to interact with the real world:
- Reading inputs (e.g., retrieving Slack messages, reading emails)
- Writing outputs (e.g., generating reports, sending emails)
- Using external tools (e.g., calling LLMs, searching the web)
- **NOTE**: *LLM-based tasks* (e.g., summarizing text, analyzing sentiment) are **NOT** utility functions; rather, they are *core functions* internal in the AI system.
- For each utility function, implement it and write a simple test.
- Document their input/output, as well as why they are necessary. For example:
- `name`: `get_embedding` (`utils/get_embedding.py`)
- `input`: `str`
- `output`: a vector of 3072 floats
- `necessity`: Used by the second node to embed text
- Example utility implementation:
```python
# utils/call_llm.py
from openai import OpenAI
def call_llm(prompt):
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
if __name__ == "__main__":
prompt = "What is the meaning of life?"
print(call_llm(prompt))
```
- > **Sometimes, design Utilies before Flow:** For example, for an LLM project to automate a legacy system, the bottleneck will likely be the available interface to that system. Start by designing the hardest utilities for interfacing, and then build the flow around them.
{: .best-practice }
4. **Node Design**: Plan how each node will read and write data, and use utility functions.
- One core design principle for PocketFlow is to use a [shared store](./core_abstraction/communication.md), so start with a shared store design:
- For simple systems, use an in-memory dictionary.
- For more complex systems or when persistence is required, use a database.
- **Don't Repeat Yourself**: Use in-memory references or foreign keys.
- Example shared store design:
```python
shared = {
"user": {
"id": "user123",
"context": { # Another nested dict
"weather": {"temp": 72, "condition": "sunny"},
"location": "San Francisco"
}
},
"results": {} # Empty dict to store outputs
}
```
- For each [Node](./core_abstraction/node.md), describe its type, how it reads and writes data, and which utility function it uses. Keep it specific but high-level without codes. For example:
- `type`: Regular (or Batch, or Async)
- `prep`: Read "text" from the shared store
- `exec`: Call the embedding utility function
- `post`: Write "embedding" to the shared store
5. **Implementation**: Implement the initial nodes and flows based on the design.
- 🎉 If you've reached this step, humans have finished the design. Now *Agentic Coding* begins!
- **"Keep it simple, stupid!"** Avoid complex features and full-scale type checking.
- **FAIL FAST**! Avoid `try` logic so you can quickly identify any weak points in the system.
- Add logging throughout the code to facilitate debugging.
7. **Optimization**:
- **Use Intuition**: For a quick initial evaluation, human intuition is often a good start.
- **Redesign Flow (Back to Step 3)**: Consider breaking down tasks further, introducing agentic decisions, or better managing input contexts.
- If your flow design is already solid, move on to micro-optimizations:
- **Prompt Engineering**: Use clear, specific instructions with examples to reduce ambiguity.
- **In-Context Learning**: Provide robust examples for tasks that are difficult to specify with instructions alone.
- > **You'll likely iterate a lot!** Expect to repeat Steps 3–6 hundreds of times.
>
>
{: .best-practice }
8. **Reliability**
- **Node Retries**: Add checks in the node `exec` to ensure outputs meet requirements, and consider increasing `max_retries` and `wait` times.
- **Logging and Visualization**: Maintain logs of all attempts and visualize node results for easier debugging.
- **Self-Evaluation**: Add a separate node (powered by an LLM) to review outputs when results are uncertain.
## Example LLM Project File Structure
```
my_project/
├── main.py
├── nodes.py
├── flow.py
├── utils/
│ ├── __init__.py
│ ├── call_llm.py
│ └── search_web.py
├── requirements.txt
└── docs/
└── design.md
```
- **`docs/design.md`**: Contains project documentation for each step above. This should be *high-level* and *no-code*.
- **`utils/`**: Contains all utility functions.
- It's recommended to dedicate one Python file to each API call, for example `call_llm.py` or `search_web.py`.
- Each file should also include a `main()` function to try that API call
- **`nodes.py`**: Contains all the node definitions.
```python
# nodes.py
from pocketflow import Node
from utils.call_llm import call_llm
class GetQuestionNode(Node):
def exec(self, _):
# Get question directly from user input
user_question = input("Enter your question: ")
return user_question
def post(self, shared, prep_res, exec_res):
# Store the user's question
shared["question"] = exec_res
return "default" # Go to the next node
class AnswerNode(Node):
def prep(self, shared):
# Read question from shared
return shared["question"]
def exec(self, question):
# Call LLM to get the answer
return call_llm(question)
def post(self, shared, prep_res, exec_res):
# Store the answer in shared
shared["answer"] = exec_res
```
- **`flow.py`**: Implements functions that create flows by importing node definitions and connecting them.
```python
# flow.py
from pocketflow import Flow
from nodes import GetQuestionNode, AnswerNode
def create_qa_flow():
"""Create and return a question-answering flow."""
# Create nodes
get_question_node = GetQuestionNode()
answer_node = AnswerNode()
# Connect nodes in sequence
get_question_node >> answer_node
# Create flow starting with input node
return Flow(start=get_question_node)
```
- **`main.py`**: Serves as the project's entry point.
```python
# main.py
from flow import create_qa_flow
# Example main function
# Please replace this with your own main function
def main():
shared = {
"question": None, # Will be populated by GetQuestionNode from user input
"answer": None # Will be populated by AnswerNode
}
# Create the flow and run it
qa_flow = create_qa_flow()
qa_flow.run(shared)
print(f"Question: {shared['question']}")
print(f"Answer: {shared['answer']}")
if __name__ == "__main__":
main()
```
================================================
File: docs/index.md
================================================
---
layout: default
title: "Home"
nav_order: 1
---
# Pocket Flow
A [100-line](https://github.com/the-pocket/PocketFlow/blob/main/pocketflow/__init__.py) minimalist LLM framework for *Agents, Task Decomposition, RAG, etc*.
- **Lightweight**: Just the core graph abstraction in 100 lines. ZERO dependencies, and vendor lock-in.
- **Expressive**: Everything you love from larger frameworks—([Multi-](./design_pattern/multi_agent.html))[Agents](./design_pattern/agent.html), [Workflow](./design_pattern/workflow.html), [RAG](./design_pattern/rag.html), and more.
- **Agentic-Coding**: Intuitive enough for AI agents to help humans build complex LLM applications.
## Core Abstraction
We model the LLM workflow as a **Graph + Shared Store**:
- [Node](./core_abstraction/node.md) handles simple (LLM) tasks.
- [Flow](./core_abstraction/flow.md) connects nodes through **Actions** (labeled edges).
- [Shared Store](./core_abstraction/communication.md) enables communication between nodes within flows.
- [Batch](./core_abstraction/batch.md) nodes/flows allow for data-intensive tasks.
- [Async](./core_abstraction/async.md) nodes/flows allow waiting for asynchronous tasks.
- [(Advanced) Parallel](./core_abstraction/parallel.md) nodes/flows handle I/O-bound tasks.
## Design Pattern
From there, it’s easy to implement popular design patterns:
- [Agent](./design_pattern/agent.md) autonomously makes decisions.
- [Workflow](./design_pattern/workflow.md) chains multiple tasks into pipelines.
- [RAG](./design_pattern/rag.md) integrates data retrieval with generation.
- [Map Reduce](./design_pattern/mapreduce.md) splits data tasks into Map and Reduce steps.
- [Structured Output](./design_pattern/structure.md) formats outputs consistently.
- [(Advanced) Multi-Agents](./design_pattern/multi_agent.md) coordinate multiple agents.
## Utility Function
We **do not** provide built-in utilities. Instead, we offer *examples*—please *implement your own*:
- [LLM Wrapper](./utility_function/llm.md)
- [Viz and Debug](./utility_function/viz.md)
- [Web Search](./utility_function/websearch.md)
- [Chunking](./utility_function/chunking.md)
- [Embedding](./utility_function/embedding.md)
- [Vector Databases](./utility_function/vector.md)
- [Text-to-Speech](./utility_function/text_to_speech.md)
**Why not built-in?**: I believe it's a *bad practice* for vendor-specific APIs in a general framework:
- *API Volatility*: Frequent changes lead to heavy maintenance for hardcoded APIs.
- *Flexibility*: You may want to switch vendors, use fine-tuned models, or run them locally.
- *Optimizations*: Prompt caching, batching, and streaming are easier without vendor lock-in.
## Ready to build your Apps?
Check out [Agentic Coding Guidance](./guide.md), the fastest way to develop LLM projects with Pocket Flow!
================================================
File: docs/core_abstraction/async.md
================================================
---
layout: default
title: "(Advanced) Async"
parent: "Core Abstraction"
nav_order: 5
---
# (Advanced) Async
**Async** Nodes implement `prep_async()`, `exec_async()`, `exec_fallback_async()`, and/or `post_async()`. This is useful for:
1. **prep_async()**: For *fetching/reading data (files, APIs, DB)* in an I/O-friendly way.
2. **exec_async()**: Typically used for async LLM calls.
3. **post_async()**: For *awaiting user feedback*, *coordinating across multi-agents* or any additional async steps after `exec_async()`.
**Note**: `AsyncNode` must be wrapped in `AsyncFlow`. `AsyncFlow` can also include regular (sync) nodes.
### Example
```python
class SummarizeThenVerify(AsyncNode):
async def prep_async(self, shared):
# Example: read a file asynchronously
doc_text = await read_file_async(shared["doc_path"])
return doc_text
async def exec_async(self, prep_res):
# Example: async LLM call
summary = await call_llm_async(f"Summarize: {prep_res}")
return summary
async def post_async(self, shared, prep_res, exec_res):
# Example: wait for user feedback
decision = await gather_user_feedback(exec_res)
if decision == "approve":
shared["summary"] = exec_res
return "approve"
return "deny"
summarize_node = SummarizeThenVerify()
final_node = Finalize()
# Define transitions
summarize_node - "approve" >> final_node
summarize_node - "deny" >> summarize_node # retry
flow = AsyncFlow(start=summarize_node)
async def main():
shared = {"doc_path": "document.txt"}
await flow.run_async(shared)
print("Final Summary:", shared.get("summary"))
asyncio.run(main())
```
================================================
File: docs/core_abstraction/batch.md
================================================
---
layout: default
title: "Batch"
parent: "Core Abstraction"
nav_order: 4
---
# Batch
**Batch** makes it easier to handle large inputs in one Node or **rerun** a Flow multiple times. Example use cases:
- **Chunk-based** processing (e.g., splitting large texts).
- **Iterative** processing over lists of input items (e.g., user queries, files, URLs).
## 1. BatchNode
A **BatchNode** extends `Node` but changes `prep()` and `exec()`:
- **`prep(shared)`**: returns an **iterable** (e.g., list, generator).
- **`exec(item)`**: called **once** per item in that iterable.
- **`post(shared, prep_res, exec_res_list)`**: after all items are processed, receives a **list** of results (`exec_res_list`) and returns an **Action**.
### Example: Summarize a Large File
```python
class MapSummaries(BatchNode):
def prep(self, shared):
# Suppose we have a big file; chunk it
content = shared["data"]
chunk_size = 10000
chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
return chunks
def exec(self, chunk):
prompt = f"Summarize this chunk in 10 words: {chunk}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res_list):
combined = "\n".join(exec_res_list)
shared["summary"] = combined
return "default"
map_summaries = MapSummaries()
flow = Flow(start=map_summaries)
flow.run(shared)
```
---
## 2. BatchFlow
A **BatchFlow** runs a **Flow** multiple times, each time with different `params`. Think of it as a loop that replays the Flow for each parameter set.
### Example: Summarize Many Files
```python
class SummarizeAllFiles(BatchFlow):
def prep(self, shared):
# Return a list of param dicts (one per file)
filenames = list(shared["data"].keys()) # e.g., ["file1.txt", "file2.txt", ...]
return [{"filename": fn} for fn in filenames]
# Suppose we have a per-file Flow (e.g., load_file >> summarize >> reduce):
summarize_file = SummarizeFile(start=load_file)
# Wrap that flow into a BatchFlow:
summarize_all_files = SummarizeAllFiles(start=summarize_file)
summarize_all_files.run(shared)
```
### Under the Hood
1. `prep(shared)` returns a list of param dicts—e.g., `[{filename: "file1.txt"}, {filename: "file2.txt"}, ...]`.
2. The **BatchFlow** loops through each dict. For each one:
- It merges the dict with the BatchFlow’s own `params`.
- It calls `flow.run(shared)` using the merged result.
3. This means the sub-Flow is run **repeatedly**, once for every param dict.
---
## 3. Nested or Multi-Level Batches
You can nest a **BatchFlow** in another **BatchFlow**. For instance:
- **Outer** batch: returns a list of diretory param dicts (e.g., `{"directory": "/pathA"}`, `{"directory": "/pathB"}`, ...).
- **Inner** batch: returning a list of per-file param dicts.
At each level, **BatchFlow** merges its own param dict with the parent’s. By the time you reach the **innermost** node, the final `params` is the merged result of **all** parents in the chain. This way, a nested structure can keep track of the entire context (e.g., directory + file name) at once.
```python
class FileBatchFlow(BatchFlow):
def prep(self, shared):
directory = self.params["directory"]
# e.g., files = ["file1.txt", "file2.txt", ...]
files = [f for f in os.listdir(directory) if f.endswith(".txt")]
return [{"filename": f} for f in files]
class DirectoryBatchFlow(BatchFlow):
def prep(self, shared):
directories = [ "/path/to/dirA", "/path/to/dirB"]
return [{"directory": d} for d in directories]
# MapSummaries have params like {"directory": "/path/to/dirA", "filename": "file1.txt"}
inner_flow = FileBatchFlow(start=MapSummaries())
outer_flow = DirectoryBatchFlow(start=inner_flow)
```
================================================
File: docs/core_abstraction/communication.md
================================================
---
layout: default
title: "Communication"
parent: "Core Abstraction"
nav_order: 3
---
# Communication
Nodes and Flows **communicate** in 2 ways:
1. **Shared Store (for almost all the cases)**
- A global data structure (often an in-mem dict) that all nodes can read ( `prep()`) and write (`post()`).
- Great for data results, large content, or anything multiple nodes need.
- You shall design the data structure and populate it ahead.
- > **Separation of Concerns:** Use `Shared Store` for almost all cases to separate *Data Schema* from *Compute Logic*! This approach is both flexible and easy to manage, resulting in more maintainable code. `Params` is more a syntax sugar for [Batch](./batch.md).
{: .best-practice }
2. **Params (only for [Batch](./batch.md))**
- Each node has a local, ephemeral `params` dict passed in by the **parent Flow**, used as an identifier for tasks. Parameter keys and values shall be **immutable**.
- Good for identifiers like filenames or numeric IDs, in Batch mode.
If you know memory management, think of the **Shared Store** like a **heap** (shared by all function calls), and **Params** like a **stack** (assigned by the caller).
---
## 1. Shared Store
### Overview
A shared store is typically an in-mem dictionary, like:
```python
shared = {"data": {}, "summary": {}, "config": {...}, ...}
```
It can also contain local file handlers, DB connections, or a combination for persistence. We recommend deciding the data structure or DB schema first based on your app requirements.
### Example
```python
class LoadData(Node):
def post(self, shared, prep_res, exec_res):
# We write data to shared store
shared["data"] = "Some text content"
return None
class Summarize(Node):
def prep(self, shared):
# We read data from shared store
return shared["data"]
def exec(self, prep_res):
# Call LLM to summarize
prompt = f"Summarize: {prep_res}"
summary = call_llm(prompt)
return summary
def post(self, shared, prep_res, exec_res):
# We write summary to shared store
shared["summary"] = exec_res
return "default"
load_data = LoadData()
summarize = Summarize()
load_data >> summarize
flow = Flow(start=load_data)
shared = {}
flow.run(shared)
```
Here:
- `LoadData` writes to `shared["data"]`.
- `Summarize` reads from `shared["data"]`, summarizes, and writes to `shared["summary"]`.
---
## 2. Params
**Params** let you store *per-Node* or *per-Flow* config that doesn't need to live in the shared store. They are:
- **Immutable** during a Node's run cycle (i.e., they don't change mid-`prep->exec->post`).
- **Set** via `set_params()`.
- **Cleared** and updated each time a parent Flow calls it.
> Only set the uppermost Flow params because others will be overwritten by the parent Flow.
>
> If you need to set child node params, see [Batch](./batch.md).
{: .warning }
Typically, **Params** are identifiers (e.g., file name, page number). Use them to fetch the task you assigned or write to a specific part of the shared store.
### Example
```python
# 1) Create a Node that uses params
class SummarizeFile(Node):
def prep(self, shared):
# Access the node's param
filename = self.params["filename"]
return shared["data"].get(filename, "")
def exec(self, prep_res):
prompt = f"Summarize: {prep_res}"
return call_llm(prompt)
def post(self, shared, prep_res, exec_res):
filename = self.params["filename"]
shared["summary"][filename] = exec_res
return "default"
# 2) Set params
node = SummarizeFile()
# 3) Set Node params directly (for testing)
node.set_params({"filename": "doc1.txt"})
node.run(shared)
# 4) Create Flow
flow = Flow(start=node)
# 5) Set Flow params (overwrites node params)
flow.set_params({"filename": "doc2.txt"})
flow.run(shared) # The node summarizes doc2, not doc1
```
================================================
File: docs/core_abstraction/flow.md
================================================
---
layout: default
title: "Flow"
parent: "Core Abstraction"
nav_order: 2
---
# Flow
A **Flow** orchestrates a graph of Nodes. You can chain Nodes in a sequence or create branching depending on the **Actions** returned from each Node's `post()`.
## 1. Action-based Transitions
Each Node's `post()` returns an **Action** string. By default, if `post()` doesn't return anything, we treat that as `"default"`.
You define transitions with the syntax:
1. **Basic default transition**: `node_a >> node_b`
This means if `node_a.post()` returns `"default"`, go to `node_b`.
(Equivalent to `node_a - "default" >> node_b`)
2. **Named action transition**: `node_a - "action_name" >> node_b`
This means if `node_a.post()` returns `"action_name"`, go to `node_b`.
It's possible to create loops, branching, or multi-step flows.
## 2. Creating a Flow
A **Flow** begins with a **start** node. You call `Flow(start=some_node)` to specify the entry point. When you call `flow.run(shared)`, it executes the start node, looks at its returned Action from `post()`, follows the transition, and continues until there's no next node.
### Example: Simple Sequence
Here's a minimal flow of two nodes in a chain:
```python
node_a >> node_b
flow = Flow(start=node_a)
flow.run(shared)
```
- When you run the flow, it executes `node_a`.
- Suppose `node_a.post()` returns `"default"`.
- The flow then sees `"default"` Action is linked to `node_b` and runs `node_b`.
- `node_b.post()` returns `"default"` but we didn't define `node_b >> something_else`. So the flow ends there.
### Example: Branching & Looping
Here's a simple expense approval flow that demonstrates branching and looping. The `ReviewExpense` node can return three possible Actions:
- `"approved"`: expense is approved, move to payment processing
- `"needs_revision"`: expense needs changes, send back for revision
- `"rejected"`: expense is denied, finish the process
We can wire them like this:
```python
# Define the flow connections
review - "approved" >> payment # If approved, process payment
review - "needs_revision" >> revise # If needs changes, go to revision
review - "rejected" >> finish # If rejected, finish the process
revise >> review # After revision, go back for another review
payment >> finish # After payment, finish the process
flow = Flow(start=review)
```
Let's see how it flows:
1. If `review.post()` returns `"approved"`, the expense moves to the `payment` node
2. If `review.post()` returns `"needs_revision"`, it goes to the `revise` node, which then loops back to `review`
3. If `review.post()` returns `"rejected"`, it moves to the `finish` node and stops
```mermaid
flowchart TD
review[Review Expense] -->|approved| payment[Process Payment]
review -->|needs_revision| revise[Revise Report]
review -->|rejected| finish[Finish Process]
revise --> review
payment --> finish
```
### Running Individual Nodes vs. Running a Flow
- `node.run(shared)`: Just runs that node alone (calls `prep->exec->post()`), returns an Action.
- `flow.run(shared)`: Executes from the start node, follows Actions to the next node, and so on until the flow can't continue.
> `node.run(shared)` **does not** proceed to the successor.
> This is mainly for debugging or testing a single node.
>
> Always use `flow.run(...)` in production to ensure the full pipeline runs correctly.
{: .warning }
## 3. Nested Flows
A **Flow** can act like a Node, which enables powerful composition patterns. This means you can:
1. Use a Flow as a Node within another Flow's transitions.
2. Combine multiple smaller Flows into a larger Flow for reuse.
3. Node `params` will be a merging of **all** parents' `params`.
### Flow's Node Methods
A **Flow** is also a **Node**, so it will run `prep()` and `post()`. However:
- It **won't** run `exec()`, as its main logic is to orchestrate its nodes.
- `post()` always receives `None` for `exec_res` and should instead get the flow execution results from the shared store.
### Basic Flow Nesting
Here's how to connect a flow to another node:
```python
# Create a sub-flow
node_a >> node_b
subflow = Flow(start=node_a)
# Connect it to another node
subflow >> node_c
# Create the parent flow
parent_flow = Flow(start=subflow)
```
When `parent_flow.run()` executes:
1. It starts `subflow`
2. `subflow` runs through its nodes (`node_a->node_b`)
3. After `subflow` completes, execution continues to `node_c`
### Example: Order Processing Pipeline
Here's a practical example that breaks down order processing into nested flows:
```python
# Payment processing sub-flow
validate_payment >> process_payment >> payment_confirmation
payment_flow = Flow(start=validate_payment)
# Inventory sub-flow
check_stock >> reserve_items >> update_inventory
inventory_flow = Flow(start=check_stock)
# Shipping sub-flow
create_label >> assign_carrier >> schedule_pickup
shipping_flow = Flow(start=create_label)
# Connect the flows into a main order pipeline
payment_flow >> inventory_flow >> shipping_flow
# Create the master flow
order_pipeline = Flow(start=payment_flow)
# Run the entire pipeline
order_pipeline.run(shared_data)
```
This creates a clean separation of concerns while maintaining a clear execution path:
```mermaid
flowchart LR
subgraph order_pipeline[Order Pipeline]
subgraph paymentFlow["Payment Flow"]
A[Validate Payment] --> B[Process Payment] --> C[Payment Confirmation]
end
subgraph inventoryFlow["Inventory Flow"]
D[Check Stock] --> E[Reserve Items] --> F[Update Inventory]
end
subgraph shippingFlow["Shipping Flow"]
G[Create Label] --> H[Assign Carrier] --> I[Schedule Pickup]
end
paymentFlow --> inventoryFlow
inventoryFlow --> shippingFlow
end
```
================================================
File: docs/core_abstraction/node.md
================================================
---
layout: default
title: "Node"
parent: "Core Abstraction"
nav_order: 1
---
# Node
A **Node** is the smallest building block. Each Node has 3 steps `prep->exec->post`:
1. `prep(shared)`
- **Read and preprocess data** from `shared` store.
- Examples: *query DB, read files, or serialize data into a string*.
- Return `prep_res`, which is used by `exec()` and `post()`.
2. `exec(prep_res)`
- **Execute compute logic**, with optional retries and error handling (below).
- Examples: *(mostly) LLM calls, remote APIs, tool use*.
- ⚠️ This shall be only for compute and **NOT** access `shared`.
- ⚠️ If retries enabled, ensure idempotent implementation.
- Return `exec_res`, which is passed to `post()`.
3. `post(shared, prep_res, exec_res)`
- **Postprocess and write data** back to `shared`.
- Examples: *update DB, change states, log results*.
- **Decide the next action** by returning a *string* (`action = "default"` if *None*).
> **Why 3 steps?** To enforce the principle of *separation of concerns*. The data storage and data processing are operated separately.
>
> All steps are *optional*. E.g., you can only implement `prep` and `post` if you just need to process data.
{: .note }
### Fault Tolerance & Retries
You can **retry** `exec()` if it raises an exception via two parameters when define the Node:
- `max_retries` (int): Max times to run `exec()`. The default is `1` (**no** retry).
- `wait` (int): The time to wait (in **seconds**) before next retry. By default, `wait=0` (no waiting).
`wait` is helpful when you encounter rate-limits or quota errors from your LLM provider and need to back off.
```python
my_node = SummarizeFile(max_retries=3, wait=10)
```
When an exception occurs in `exec()`, the Node automatically retries until:
- It either succeeds, or
- The Node has retried `max_retries - 1` times already and fails on the last attempt.
You can get the current retry times (0-based) from `self.cur_retry`.
```python
class RetryNode(Node):
def exec(self, prep_res):
print(f"Retry {self.cur_retry} times")
raise Exception("Failed")
```
### Graceful Fallback
To **gracefully handle** the exception (after all retries) rather than raising it, override:
```python
def exec_fallback(self, prep_res, exc):
raise exc
```
By default, it just re-raises exception. But you can return a fallback result instead, which becomes the `exec_res` passed to `post()`.
### Example: Summarize file
```python
class SummarizeFile(Node):
def prep(self, shared):
return shared["data"]
def exec(self, prep_res):
if not prep_res:
return "Empty file content"
prompt = f"Summarize this text in 10 words: {prep_res}"
summary = call_llm(prompt) # might fail
return summary
def exec_fallback(self, prep_res, exc):
# Provide a simple fallback instead of crashing
return "There was an error processing your request."
def post(self, shared, prep_res, exec_res):
shared["summary"] = exec_res
# Return "default" by not returning
summarize_node = SummarizeFile(max_retries=3)
# node.run() calls prep->exec->post
# If exec() fails, it retries up to 3 times before calling exec_fallback()
action_result = summarize_node.run(shared)
print("Action returned:", action_result) # "default"
print("Summary stored:", shared["summary"])
```
================================================
File: docs/core_abstraction/parallel.md
================================================
---
layout: default
title: "(Advanced) Parallel"
parent: "Core Abstraction"
nav_order: 6
---
# (Advanced) Parallel
**Parallel** Nodes and Flows let you run multiple **Async** Nodes and Flows **concurrently**—for example, summarizing multiple texts at once. This can improve performance by overlapping I/O and compute.
> Because of Python’s GIL, parallel nodes and flows can’t truly parallelize CPU-bound tasks (e.g., heavy numerical computations). However, they excel at overlapping I/O-bound work—like LLM calls, database queries, API requests, or file I/O.
{: .warning }
> - **Ensure Tasks Are Independent**: If each item depends on the output of a previous item, **do not** parallelize.
>
> - **Beware of Rate Limits**: Parallel calls can **quickly** trigger rate limits on LLM services. You may need a **throttling** mechanism (e.g., semaphores or sleep intervals).
>
> - **Consider Single-Node Batch APIs**: Some LLMs offer a **batch inference** API where you can send multiple prompts in a single call. This is more complex to implement but can be more efficient than launching many parallel requests and mitigates rate limits.
{: .best-practice }
## AsyncParallelBatchNode
Like **AsyncBatchNode**, but run `exec_async()` in **parallel**:
```python
class ParallelSummaries(AsyncParallelBatchNode):
async def prep_async(self, shared):
# e.g., multiple texts
return shared["texts"]
async def exec_async(self, text):
prompt = f"Summarize: {text}"
return await call_llm_async(prompt)
async def post_async(self, shared, prep_res, exec_res_list):
shared["summary"] = "\n\n".join(exec_res_list)
return "default"
node = ParallelSummaries()
flow = AsyncFlow(start=node)
```
## AsyncParallelBatchFlow
Parallel version of **BatchFlow**. Each iteration of the sub-flow runs **concurrently** using different parameters:
```python
class SummarizeMultipleFiles(AsyncParallelBatchFlow):
async def prep_async(self, shared):
return [{"filename": f} for f in shared["files"]]
sub_flow = AsyncFlow(start=LoadAndSummarizeFile())
parallel_flow = SummarizeMultipleFiles(start=sub_flow)
await parallel_flow.run_async(shared)
```
================================================
File: docs/design_pattern/agent.md
================================================
---
layout: default
title: "Agent"
parent: "Design Pattern"
nav_order: 1
---
# Agent
Agent is a powerful design pattern in which nodes can take dynamic actions based on the context.
## Implement Agent with Graph
1. **Context and Action:** Implement nodes that supply context and perform actions.
2. **Branching:** Use branching to connect each action node to an agent node. Use action to allow the agent to direct the [flow](../core_abstraction/flow.md) between nodes—and potentially loop back for multi-step.
3. **Agent Node:** Provide a prompt to decide action—for example:
```python
f"""
### CONTEXT
Task: {task_description}
Previous Actions: {previous_actions}
Current State: {current_state}
### ACTION SPACE
[1] search
Description: Use web search to get results
Parameters:
- query (str): What to search for
[2] answer
Description: Conclude based on the results
Parameters:
- result (str): Final answer to provide
### NEXT ACTION
Decide the next action based on the current context and available action space.
Return your response in the following format:
```yaml
thinking: |
action:
parameters:
:
```"""
```
The core of building **high-performance** and **reliable** agents boils down to:
1. **Context Management:** Provide *relevant, minimal context.* For example, rather than including an entire chat history, retrieve the most relevant via [RAG](./rag.md). Even with larger context windows, LLMs still fall victim to ["lost in the middle"](https://arxiv.org/abs/2307.03172), overlooking mid-prompt content.
2. **Action Space:** Provide *a well-structured and unambiguous* set of actions—avoiding overlap like separate `read_databases` or `read_csvs`. Instead, import CSVs into the database.
## Example Good Action Design
- **Incremental:** Feed content in manageable chunks (500 lines or 1 page) instead of all at once.
- **Overview-zoom-in:** First provide high-level structure (table of contents, summary), then allow drilling into details (raw texts).
- **Parameterized/Programmable:** Instead of fixed actions, enable parameterized (columns to select) or programmable (SQL queries) actions, for example, to read CSV files.
- **Backtracking:** Let the agent undo the last step instead of restarting entirely, preserving progress when encountering errors or dead ends.
## Example: Search Agent
This agent:
1. Decides whether to search or answer
2. If searches, loops back to decide if more search needed
3. Answers when enough context gathered
```python
class DecideAction(Node):
def prep(self, shared):
context = shared.get("context", "No previous search")
query = shared["query"]
return query, context
def exec(self, inputs):
query, context = inputs
prompt = f"""
Given input: {query}
Previous search results: {context}
Should I: 1) Search web for more info 2) Answer with current knowledge
Output in yaml:
```yaml
action: search/answer
reason: why this action
search_term: search phrase if action is search
```"""
resp = call_llm(prompt)
yaml_str = resp.split("```yaml")[1].split("```")[0].strip()
result = yaml.safe_load(yaml_str)
assert isinstance(result, dict)
assert "action" in result
assert "reason" in result
assert result["action"] in ["search", "answer"]
if result["action"] == "search":
assert "search_term" in result
return result
def post(self, shared, prep_res, exec_res):
if exec_res["action"] == "search":
shared["search_term"] = exec_res["search_term"]
return exec_res["action"]
class SearchWeb(Node):
def prep(self, shared):
return shared["search_term"]
def exec(self, search_term):
return search_web(search_term)
def post(self, shared, prep_res, exec_res):
prev_searches = shared.get("context", [])
shared["context"] = prev_searches + [
{"term": shared["search_term"], "result": exec_res}
]
return "decide"
class DirectAnswer(Node):
def prep(self, shared):
return shared["query"], shared.get("context", "")
def exec(self, inputs):
query, context = inputs
return call_llm(f"Context: {context}\nAnswer: {query}")
def post(self, shared, prep_res, exec_res):
print(f"Answer: {exec_res}")
shared["answer"] = exec_res
# Connect nodes
decide = DecideAction()
search = SearchWeb()
answer = DirectAnswer()
decide - "search" >> search
decide - "answer" >> answer
search - "decide" >> decide # Loop back
flow = Flow(start=decide)
flow.run({"query": "Who won the Nobel Prize in Physics 2024?"})
```
================================================
File: docs/design_pattern/mapreduce.md
================================================
---
layout: default
title: "Map Reduce"
parent: "Design Pattern"
nav_order: 4
---
# Map Reduce
MapReduce is a design pattern suitable when you have either:
- Large input data (e.g., multiple files to process), or
- Large output data (e.g., multiple forms to fill)
and there is a logical way to break the task into smaller, ideally independent parts.
You first break down the task using [BatchNode](../core_abstraction/batch.md) in the map phase, followed by aggregation in the reduce phase.
### Example: Document Summarization
```python
class SummarizeAllFiles(BatchNode):
def prep(self, shared):
files_dict = shared["files"] # e.g. 10 files
return list(files_dict.items()) # [("file1.txt", "aaa..."), ("file2.txt", "bbb..."), ...]
def exec(self, one_file):
filename, file_content = one_file
summary_text = call_llm(f"Summarize the following file:\n{file_content}")
return (filename, summary_text)
def post(self, shared, prep_res, exec_res_list):
shared["file_summaries"] = dict(exec_res_list)
class CombineSummaries(Node):
def prep(self, shared):
return shared["file_summaries"]
def exec(self, file_summaries):
# format as: "File1: summary\nFile2: summary...\n"
text_list = []
for fname, summ in file_summaries.items():
text_list.append(f"{fname} summary:\n{summ}\n")
big_text = "\n---\n".join(text_list)
return call_llm(f"Combine these file summaries into one final summary:\n{big_text}")
def post(self, shared, prep_res, final_summary):
shared["all_files_summary"] = final_summary
batch_node = SummarizeAllFiles()
combine_node = CombineSummaries()
batch_node >> combine_node
flow = Flow(start=batch_node)
shared = {
"files": {
"file1.txt": "Alice was beginning to get very tired of sitting by her sister...",
"file2.txt": "Some other interesting text ...",
# ...
}
}
flow.run(shared)
print("Individual Summaries:", shared["file_summaries"])
print("\nFinal Summary:\n", shared["all_files_summary"])
```
================================================
File: docs/design_pattern/rag.md
================================================
---
layout: default
title: "RAG"
parent: "Design Pattern"
nav_order: 3
---
# RAG (Retrieval Augmented Generation)
For certain LLM tasks like answering questions, providing relevant context is essential. One common architecture is a **two-stage** RAG pipeline:
1. **Offline stage**: Preprocess and index documents ("building the index").
2. **Online stage**: Given a question, generate answers by retrieving the most relevant context.
---
## Stage 1: Offline Indexing
We create three Nodes:
1. `ChunkDocs` – [chunks](../utility_function/chunking.md) raw text.
2. `EmbedDocs` – [embeds](../utility_function/embedding.md) each chunk.
3. `StoreIndex` – stores embeddings into a [vector database](../utility_function/vector.md).
```python
class ChunkDocs(BatchNode):
def prep(self, shared):
# A list of file paths in shared["files"]. We process each file.
return shared["files"]
def exec(self, filepath):
# read file content. In real usage, do error handling.
with open(filepath, "r", encoding="utf-8") as f:
text = f.read()
# chunk by 100 chars each
chunks = []
size = 100
for i in range(0, len(text), size):
chunks.append(text[i : i + size])
return chunks
def post(self, shared, prep_res, exec_res_list):
# exec_res_list is a list of chunk-lists, one per file.
# flatten them all into a single list of chunks.
all_chunks = []
for chunk_list in exec_res_list:
all_chunks.extend(chunk_list)
shared["all_chunks"] = all_chunks
class EmbedDocs(BatchNode):
def prep(self, shared):
return shared["all_chunks"]
def exec(self, chunk):
return get_embedding(chunk)
def post(self, shared, prep_res, exec_res_list):
# Store the list of embeddings.
shared["all_embeds"] = exec_res_list
print(f"Total embeddings: {len(exec_res_list)}")
class StoreIndex(Node):
def prep(self, shared):
# We'll read all embeds from shared.
return shared["all_embeds"]
def exec(self, all_embeds):
# Create a vector index (faiss or other DB in real usage).
index = create_index(all_embeds)
return index
def post(self, shared, prep_res, index):
shared["index"] = index
# Wire them in sequence
chunk_node = ChunkDocs()
embed_node = EmbedDocs()
store_node = StoreIndex()
chunk_node >> embed_node >> store_node
OfflineFlow = Flow(start=chunk_node)
```
Usage example:
```python
shared = {
"files": ["doc1.txt", "doc2.txt"], # any text files
}
OfflineFlow.run(shared)
```
---
## Stage 2: Online Query & Answer
We have 3 nodes:
1. `EmbedQuery` – embeds the user’s question.
2. `RetrieveDocs` – retrieves top chunk from the index.
3. `GenerateAnswer` – calls the LLM with the question + chunk to produce the final answer.
```python
class EmbedQuery(Node):
def prep(self, shared):
return shared["question"]
def exec(self, question):
return get_embedding(question)
def post(self, shared, prep_res, q_emb):
shared["q_emb"] = q_emb
class RetrieveDocs(Node):
def prep(self, shared):
# We'll need the query embedding, plus the offline index/chunks
return shared["q_emb"], shared["index"], shared["all_chunks"]
def exec(self, inputs):
q_emb, index, chunks = inputs
I, D = search_index(index, q_emb, top_k=1)
best_id = I[0][0]
relevant_chunk = chunks[best_id]
return relevant_chunk
def post(self, shared, prep_res, relevant_chunk):
shared["retrieved_chunk"] = relevant_chunk
print("Retrieved chunk:", relevant_chunk[:60], "...")
class GenerateAnswer(Node):
def prep(self, shared):
return shared["question"], shared["retrieved_chunk"]
def exec(self, inputs):
question, chunk = inputs
prompt = f"Question: {question}\nContext: {chunk}\nAnswer:"
return call_llm(prompt)
def post(self, shared, prep_res, answer):
shared["answer"] = answer
print("Answer:", answer)
embed_qnode = EmbedQuery()
retrieve_node = RetrieveDocs()
generate_node = GenerateAnswer()
embed_qnode >> retrieve_node >> generate_node
OnlineFlow = Flow(start=embed_qnode)
```
Usage example:
```python
# Suppose we already ran OfflineFlow and have:
# shared["all_chunks"], shared["index"], etc.
shared["question"] = "Why do people like cats?"
OnlineFlow.run(shared)
# final answer in shared["answer"]
```
================================================
File: docs/design_pattern/structure.md
================================================
---
layout: default
title: "Structured Output"
parent: "Design Pattern"
nav_order: 5
---
# Structured Output
In many use cases, you may want the LLM to output a specific structure, such as a list or a dictionary with predefined keys.
There are several approaches to achieve a structured output:
- **Prompting** the LLM to strictly return a defined structure.
- Using LLMs that natively support **schema enforcement**.
- **Post-processing** the LLM's response to extract structured content.
In practice, **Prompting** is simple and reliable for modern LLMs.
### Example Use Cases
- Extracting Key Information
```yaml
product:
name: Widget Pro
price: 199.99
description: |
A high-quality widget designed for professionals.
Recommended for advanced users.
```
- Summarizing Documents into Bullet Points
```yaml
summary:
- This product is easy to use.
- It is cost-effective.
- Suitable for all skill levels.
```
- Generating Configuration Files
```yaml
server:
host: 127.0.0.1
port: 8080
ssl: true
```
## Prompt Engineering
When prompting the LLM to produce **structured** output:
1. **Wrap** the structure in code fences (e.g., `yaml`).
2. **Validate** that all required fields exist (and let `Node` handles retry).
### Example Text Summarization
```python
class SummarizeNode(Node):
def exec(self, prep_res):
# Suppose `prep_res` is the text to summarize.
prompt = f"""
Please summarize the following text as YAML, with exactly 3 bullet points
{prep_res}
Now, output:
```yaml
summary:
- bullet 1
- bullet 2
- bullet 3
```"""
response = call_llm(prompt)
yaml_str = response.split("```yaml")[1].split("```")[0].strip()
import yaml
structured_result = yaml.safe_load(yaml_str)
assert "summary" in structured_result
assert isinstance(structured_result["summary"], list)
return structured_result
```
> Besides using `assert` statements, another popular way to validate schemas is [Pydantic](https://github.com/pydantic/pydantic)
{: .note }
### Why YAML instead of JSON?
Current LLMs struggle with escaping. YAML is easier with strings since they don't always need quotes.
**In JSON**
```json
{
"dialogue": "Alice said: \"Hello Bob.\\nHow are you?\\nI am good.\""
}
```
- Every double quote inside the string must be escaped with `\"`.
- Each newline in the dialogue must be represented as `\n`.
**In YAML**
```yaml
dialogue: |
Alice said: "Hello Bob.
How are you?
I am good."
```
- No need to escape interior quotes—just place the entire text under a block literal (`|`).
- Newlines are naturally preserved without needing `\n`.
================================================
File: docs/design_pattern/workflow.md
================================================
---
layout: default
title: "Workflow"
parent: "Design Pattern"
nav_order: 2
---
# Workflow
Many real-world tasks are too complex for one LLM call. The solution is to **Task Decomposition**: decompose them into a [chain](../core_abstraction/flow.md) of multiple Nodes.
> - You don't want to make each task **too coarse**, because it may be *too complex for one LLM call*.
> - You don't want to make each task **too granular**, because then *the LLM call doesn't have enough context* and results are *not consistent across nodes*.
>
> You usually need multiple *iterations* to find the *sweet spot*. If the task has too many *edge cases*, consider using [Agents](./agent.md).
{: .best-practice }
### Example: Article Writing
```python
class GenerateOutline(Node):
def prep(self, shared): return shared["topic"]
def exec(self, topic): return call_llm(f"Create a detailed outline for an article about {topic}")
def post(self, shared, prep_res, exec_res): shared["outline"] = exec_res
class WriteSection(Node):
def prep(self, shared): return shared["outline"]
def exec(self, outline): return call_llm(f"Write content based on this outline: {outline}")
def post(self, shared, prep_res, exec_res): shared["draft"] = exec_res
class ReviewAndRefine(Node):
def prep(self, shared): return shared["draft"]
def exec(self, draft): return call_llm(f"Review and improve this draft: {draft}")
def post(self, shared, prep_res, exec_res): shared["final_article"] = exec_res
# Connect nodes
outline = GenerateOutline()
write = WriteSection()
review = ReviewAndRefine()
outline >> write >> review
# Create and run flow
writing_flow = Flow(start=outline)
shared = {"topic": "AI Safety"}
writing_flow.run(shared)
```
For *dynamic cases*, consider using [Agents](./agent.md).
================================================
File: docs/utility_function/llm.md
================================================
---
layout: default
title: "LLM Wrapper"
parent: "Utility Function"
nav_order: 1
---
# LLM Wrappers
Check out libraries like [litellm](https://github.com/BerriAI/litellm).
Here, we provide some minimal example implementations:
1. OpenAI
```python
def call_llm(prompt):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
# Example usage
call_llm("How are you?")
```
> Store the API key in an environment variable like OPENAI_API_KEY for security.
{: .best-practice }
2. Claude (Anthropic)
```python
def call_llm(prompt):
from anthropic import Anthropic
client = Anthropic(api_key="YOUR_API_KEY_HERE")
response = client.messages.create(
model="claude-2",
messages=[{"role": "user", "content": prompt}],
max_tokens=100
)
return response.content
```
3. Google (Generative AI Studio / PaLM API)
```python
def call_llm(prompt):
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY_HERE")
response = genai.generate_text(
model="models/text-bison-001",
prompt=prompt
)
return response.result
```
4. Azure (Azure OpenAI)
```python
def call_llm(prompt):
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://.openai.azure.com/",
api_key="YOUR_API_KEY_HERE",
api_version="2023-05-15"
)
r = client.chat.completions.create(
model="",
messages=[{"role": "user", "content": prompt}]
)
return r.choices[0].message.content
```
5. Ollama (Local LLM)
```python
def call_llm(prompt):
from ollama import chat
response = chat(
model="llama2",
messages=[{"role": "user", "content": prompt}]
)
return response.message.content
```
## Improvements
Feel free to enhance your `call_llm` function as needed. Here are examples:
- Handle chat history:
```python
def call_llm(messages):
from openai import OpenAI
client = OpenAI(api_key="YOUR_API_KEY_HERE")
r = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
return r.choices[0].message.content
```
- Add in-memory caching
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def call_llm(prompt):
# Your implementation here
pass
```
> ⚠️ Caching conflicts with Node retries, as retries yield the same result.
>
> To address this, you could use cached results only if not retried.
{: .warning }
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_call(prompt):
pass
def call_llm(prompt, use_cache):
if use_cache:
return cached_call(prompt)
# Call the underlying function directly
return cached_call.__wrapped__(prompt)
class SummarizeNode(Node):
def exec(self, text):
return call_llm(f"Summarize: {text}", self.cur_retry==0)
```
- Enable logging:
```python
def call_llm(prompt):
import logging
logging.info(f"Prompt: {prompt}")
response = ... # Your implementation here
logging.info(f"Response: {response}")
return response
```
================================================
FILE: Dockerfile
================================================
FROM python:3.10-slim
# update packages, install git and remove cache
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENTRYPOINT ["python", "main.py"]
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2025 Zachary Huang
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
Turns Codebase into Easy Tutorial with AI

> *Ever stared at a new codebase written by others feeling completely lost? This tutorial shows you how to build an AI agent that analyzes GitHub repositories and creates beginner-friendly tutorials explaining exactly how the code works.*
This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework. It crawls GitHub repositories and builds a knowledge base from the code. It analyzes entire codebases to identify core abstractions and how they interact, and transforms complex code into beginner-friendly tutorials with clear visualizations.
- Check out the [YouTube Development Tutorial](https://youtu.be/AFY67zOpbSo) for more!
- Check out the [Substack Post Tutorial](https://zacharyhuang.substack.com/p/ai-codebase-knowledge-builder-full) for more!
**🔸 🎉 Reached Hacker News Front Page** (April 2025) with >900 up‑votes: [Discussion »](https://news.ycombinator.com/item?id=43739456)
**🔸 🎊 Online Service Now Live!** (May 2025) Try our new online version at [https://code2tutorial.com/](https://code2tutorial.com/) – just paste a GitHub link, no installation needed!
## ⭐ Example Results for Popular GitHub Repositories!
🤯 All these tutorials are generated **entirely by AI** by crawling the GitHub repo!
- [AutoGen Core](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/AutoGen%20Core) - Build AI teams that talk, think, and solve problems together like coworkers!
- [Browser Use](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Browser%20Use) - Let AI surf the web for you, clicking buttons and filling forms like a digital assistant!
- [Celery](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Celery) - Supercharge your app with background tasks that run while you sleep!
- [Click](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Click) - Turn Python functions into slick command-line tools with just a decorator!
- [Codex](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Codex) - Turn plain English into working code with this AI terminal wizard!
- [Crawl4AI](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Crawl4AI) - Train your AI to extract exactly what matters from any website!
- [CrewAI](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/CrewAI) - Assemble a dream team of AI specialists to tackle impossible problems!
- [DSPy](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/DSPy) - Build LLM apps like Lego blocks that optimize themselves!
- [FastAPI](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/FastAPI) - Create APIs at lightning speed with automatic docs that clients will love!
- [Flask](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Flask) - Craft web apps with minimal code that scales from prototype to production!
- [Google A2A](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Google%20A2A) - The universal language that lets AI agents collaborate across borders!
- [LangGraph](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/LangGraph) - Design AI agents as flowcharts where each step remembers what happened before!
- [LevelDB](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/LevelDB) - Store data at warp speed with Google's engine that powers blockchains!
- [MCP Python SDK](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/MCP%20Python%20SDK) - Build powerful apps that communicate through an elegant protocol without sweating the details!
- [NumPy Core](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/NumPy%20Core) - Master the engine behind data science that makes Python as fast as C!
- [OpenManus](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/OpenManus) - Build AI agents with digital brains that think, learn, and use tools just like humans do!
- [PocketFlow](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/PocketFlow) - 100-line LLM framework. Let Agents build Agents!
- [Pydantic Core](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Pydantic%20Core) - Validate data at rocket speed with just Python type hints!
- [Requests](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/Requests) - Talk to the internet in Python with code so simple it feels like cheating!
- [SmolaAgents](https://the-pocket.github.io/PocketFlow-Tutorial-Codebase-Knowledge/SmolaAgents) - Build tiny AI agents that punch way above their weight class!
- Showcase Your AI-Generated Tutorials in [Discussions](https://github.com/The-Pocket/PocketFlow-Tutorial-Codebase-Knowledge/discussions)!
## 🚀 Getting Started
1. Clone this repository
```bash
git clone https://github.com/The-Pocket/PocketFlow-Tutorial-Codebase-Knowledge
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Set up LLM in [`utils/call_llm.py`](./utils/call_llm.py) by providing credentials. To do so, you can put the values in a `.env` file. By default, you can use the AI Studio key with this client for Gemini Pro 2.5 by setting the `GEMINI_API_KEY` environment variable. If you want to use another LLM, you can set the `LLM_PROVIDER` environment variable (e.g. `XAI`), and then set the model, url, and API key (e.g. `XAI_MODEL`, `XAI_URL`,`XAI_API_KEY`). If using Ollama, the url is `http://localhost:11434/` and the API key can be omitted.
You can use your own models. We highly recommend the latest models with thinking capabilities (Claude 3.7 with thinking, O1). You can verify that it is correctly set up by running:
```bash
python utils/call_llm.py
```
5. Generate a complete codebase tutorial by running the main script:
```bash
# Analyze a GitHub repository
python main.py --repo https://github.com/username/repo --include "*.py" "*.js" --exclude "tests/*" --max-size 50000
# Or, analyze a local directory
python main.py --dir /path/to/your/codebase --include "*.py" --exclude "*test*"
# Or, generate a tutorial in Chinese
python main.py --repo https://github.com/username/repo --language "Chinese"
```
- `--repo` or `--dir` - Specify either a GitHub repo URL or a local directory path (required, mutually exclusive)
- `-n, --name` - Project name (optional, derived from URL/directory if omitted)
- `-t, --token` - GitHub token (or set GITHUB_TOKEN environment variable)
- `-o, --output` - Output directory (default: ./output)
- `-i, --include` - Files to include (e.g., "`*.py`" "`*.js`")
- `-e, --exclude` - Files to exclude (e.g., "`tests/*`" "`docs/*`")
- `-s, --max-size` - Maximum file size in bytes (default: 100KB)
- `--language` - Language for the generated tutorial (default: "english")
- `--max-abstractions` - Maximum number of abstractions to identify (default: 10)
- `--no-cache` - Disable LLM response caching (default: caching enabled)
The application will crawl the repository, analyze the codebase structure, generate tutorial content in the specified language, and save the output in the specified directory (default: ./output).
🐳 Running with Docker
To run this project in a Docker container, you'll need to pass your API keys as environment variables.
1. Build the Docker image
```bash
docker build -t pocketflow-app .
```
2. Run the container
You'll need to provide your `GEMINI_API_KEY` for the LLM to function. If you're analyzing private GitHub repositories or want to avoid rate limits, also provide your `GITHUB_TOKEN`.
Mount a local directory to `/app/output` inside the container to access the generated tutorials on your host machine.
**Example for analyzing a public GitHub repository:**
```bash
docker run -it --rm \
-e GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE" \
-v "$(pwd)/output_tutorials":/app/output \
pocketflow-app --repo https://github.com/username/repo
```
**Example for analyzing a local directory:**
```bash
docker run -it --rm \
-e GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE" \
-v "/path/to/your/local_codebase":/app/code_to_analyze \
-v "$(pwd)/output_tutorials":/app/output \
pocketflow-app --dir /app/code_to_analyze
```
## 💡 Development Tutorial
- I built using [**Agentic Coding**](https://zacharyhuang.substack.com/p/agentic-coding-the-most-fun-way-to), the fastest development paradigm, where humans simply [design](docs/design.md) and agents [code](flow.py).
- The secret weapon is [Pocket Flow](https://github.com/The-Pocket/PocketFlow), a 100-line LLM framework that lets Agents (e.g., Cursor AI) build for you
- Check out the Step-by-step YouTube development tutorial:
================================================
FILE: docs/AutoGen Core/01_agent.md
================================================
---
layout: default
title: "Agent"
parent: "AutoGen Core"
nav_order: 1
---
# Chapter 1: Agent - The Workers of AutoGen
Welcome to the AutoGen Core tutorial! We're excited to guide you through building powerful applications with autonomous agents.
## Motivation: Why Do We Need Agents?
Imagine you want to build an automated system to write blog posts. You might need one part of the system to research a topic and another part to write the actual post based on the research. How do you represent these different "workers" and make them talk to each other?
This is where the concept of an **Agent** comes in. In AutoGen Core, an `Agent` is the fundamental building block representing an actor or worker in your system. Think of it like an employee in an office.
## Key Concepts: Understanding Agents
Let's break down what makes an Agent:
1. **It's a Worker:** An Agent is designed to *do* things. This could be running calculations, calling a Large Language Model (LLM) like ChatGPT, using a tool (like a search engine), or managing a piece of data.
2. **It Has an Identity (`AgentId`):** Just like every employee has a name and a job title, every Agent needs a unique identity. This identity, called `AgentId`, has two parts:
* `type`: What kind of role does the agent have? (e.g., "researcher", "writer", "coder"). This helps organize agents.
* `key`: A unique name for this specific agent instance (e.g., "researcher-01", "amy-the-writer").
```python
# From: _agent_id.py
class AgentId:
def __init__(self, type: str, key: str) -> None:
# ... (validation checks omitted for brevity)
self._type = type
self._key = key
@property
def type(self) -> str:
return self._type
@property
def key(self) -> str:
return self._key
def __str__(self) -> str:
# Creates an id like "researcher/amy-the-writer"
return f"{self._type}/{self._key}"
```
This `AgentId` acts like the agent's address, allowing other agents (or the system) to send messages specifically to it.
3. **It Has Metadata (`AgentMetadata`):** Besides its core identity, an agent often has descriptive information.
* `type`: Same as in `AgentId`.
* `key`: Same as in `AgentId`.
* `description`: A human-readable explanation of what the agent does (e.g., "Researches topics using web search").
```python
# From: _agent_metadata.py
from typing import TypedDict
class AgentMetadata(TypedDict):
type: str
key: str
description: str
```
This metadata helps understand the agent's purpose within the system.
4. **It Communicates via Messages:** Agents don't work in isolation. They collaborate by sending and receiving messages. The primary way an agent receives work is through its `on_message` method. Think of this like the agent's inbox.
```python
# From: _agent.py (Simplified Agent Protocol)
from typing import Any, Mapping, Protocol
# ... other imports
class Agent(Protocol):
@property
def id(self) -> AgentId: ... # The agent's unique ID
async def on_message(self, message: Any, ctx: MessageContext) -> Any:
"""Handles an incoming message."""
# Agent's logic to process the message goes here
...
```
When an agent receives a message, `on_message` is called. The `message` contains the data or task, and `ctx` (MessageContext) provides extra information about the message (like who sent it). We'll cover `MessageContext` more later.
5. **It Can Remember Things (State):** Sometimes, an agent needs to remember information between tasks, like keeping notes on research progress. Agents can optionally implement `save_state` and `load_state` methods to store and retrieve their internal memory.
```python
# From: _agent.py (Simplified Agent Protocol)
class Agent(Protocol):
# ... other methods
async def save_state(self) -> Mapping[str, Any]:
"""Save the agent's internal memory."""
# Return a dictionary representing the state
...
async def load_state(self, state: Mapping[str, Any]) -> None:
"""Load the agent's internal memory."""
# Restore state from the dictionary
...
```
We'll explore state and memory in more detail in [Chapter 7: Memory](07_memory.md).
6. **Different Agent Types:** AutoGen Core provides base classes to make creating agents easier:
* `BaseAgent`: The fundamental class most agents inherit from. It provides common setup.
* `ClosureAgent`: A very quick way to create simple agents using just a function (like hiring a temp worker for a specific task defined on the spot).
* `RoutedAgent`: An agent that can automatically direct different types of messages to different internal handler methods (like a smart receptionist).
## Use Case Example: Researcher and Writer
Let's revisit our blog post example. We want a `Researcher` agent and a `Writer` agent.
**Goal:**
1. Tell the `Researcher` a topic (e.g., "AutoGen Agents").
2. The `Researcher` finds some facts (we'll keep it simple and just make them up for now).
3. The `Researcher` sends these facts to the `Writer`.
4. The `Writer` receives the facts and drafts a short post.
**Simplified Implementation Idea (using `ClosureAgent` for brevity):**
First, let's define the messages they might exchange:
```python
from dataclasses import dataclass
@dataclass
class ResearchTopic:
topic: str
@dataclass
class ResearchFacts:
topic: str
facts: list[str]
@dataclass
class DraftPost:
topic: str
draft: str
```
These are simple Python classes to hold the data being passed around.
Now, let's imagine defining the `Researcher` using a `ClosureAgent`. This agent will listen for `ResearchTopic` messages.
```python
# Simplified concept - requires AgentRuntime (Chapter 3) to actually run
async def researcher_logic(agent_context, message: ResearchTopic, msg_context):
print(f"Researcher received topic: {message.topic}")
# In a real scenario, this would involve searching, calling an LLM, etc.
# For now, we just make up facts.
facts = [f"Fact 1 about {message.topic}", f"Fact 2 about {message.topic}"]
print(f"Researcher found facts: {facts}")
# Find the Writer agent's ID (we assume we know it)
writer_id = AgentId(type="writer", key="blog_writer_1")
# Send the facts to the Writer
await agent_context.send_message(
message=ResearchFacts(topic=message.topic, facts=facts),
recipient=writer_id,
)
print("Researcher sent facts to Writer.")
# This agent doesn't return a direct reply
return None
```
This `researcher_logic` function defines *what* the researcher does when it gets a `ResearchTopic` message. It processes the topic, creates `ResearchFacts`, and uses `agent_context.send_message` to send them to the `writer` agent.
Similarly, the `Writer` agent would have its own logic:
```python
# Simplified concept - requires AgentRuntime (Chapter 3) to actually run
async def writer_logic(agent_context, message: ResearchFacts, msg_context):
print(f"Writer received facts for topic: {message.topic}")
# In a real scenario, this would involve LLM prompting
draft = f"Blog Post about {message.topic}:\n"
for fact in message.facts:
draft += f"- {fact}\n"
print(f"Writer drafted post:\n{draft}")
# Perhaps save the draft or send it somewhere else
# For now, we just print it. We don't send another message.
return None # Or maybe return a confirmation/result
```
This `writer_logic` function defines how the writer reacts to receiving `ResearchFacts`.
**Important:** To actually *run* these agents and make them communicate, we need the `AgentRuntime` (covered in [Chapter 3: AgentRuntime](03_agentruntime.md)) and the `Messaging System` (covered in [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md)). For now, focus on the *idea* that Agents are distinct workers defined by their logic (`on_message`) and identified by their `AgentId`.
## Under the Hood: How an Agent Gets a Message
While the full message delivery involves the `Messaging System` and `AgentRuntime`, let's look at the agent's role when it receives a message.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Sender as Sender Agent
participant Runtime as AgentRuntime
participant Recipient as Recipient Agent
Sender->>+Runtime: send_message(message, recipient_id)
Runtime->>+Recipient: Locate agent by recipient_id
Runtime->>+Recipient: on_message(message, context)
Recipient->>Recipient: Process message using internal logic
alt Response Needed
Recipient->>-Runtime: Return response value
Runtime->>-Sender: Deliver response value
else No Response
Recipient->>-Runtime: Return None (or no return)
end
```
1. Some other agent (Sender) or the system decides to send a message to our agent (Recipient).
2. It tells the `AgentRuntime` (the manager): "Deliver this `message` to the agent with `recipient_id`".
3. The `AgentRuntime` finds the correct `Recipient` agent instance.
4. The `AgentRuntime` calls the `Recipient.on_message(message, context)` method.
5. The agent's internal logic inside `on_message` (or methods called by it, like in `RoutedAgent`) runs to process the message.
6. If the message requires a direct response (like an RPC call), the agent returns a value from `on_message`. If not (like a general notification or event), it might return `None`.
**Code Glimpse:**
The core definition is the `Agent` Protocol (`_agent.py`). It's like an interface or a contract – any class wanting to be an Agent *must* provide these methods.
```python
# From: _agent.py - The Agent blueprint (Protocol)
@runtime_checkable
class Agent(Protocol):
@property
def metadata(self) -> AgentMetadata: ...
@property
def id(self) -> AgentId: ...
async def on_message(self, message: Any, ctx: MessageContext) -> Any: ...
async def save_state(self) -> Mapping[str, Any]: ...
async def load_state(self, state: Mapping[str, Any]) -> None: ...
async def close(self) -> None: ...
```
Most agents you create will inherit from `BaseAgent` (`_base_agent.py`). It provides some standard setup:
```python
# From: _base_agent.py (Simplified)
class BaseAgent(ABC, Agent):
def __init__(self, description: str) -> None:
# Gets runtime & id from a special context when created by the runtime
# Raises error if you try to create it directly!
self._runtime: AgentRuntime = AgentInstantiationContext.current_runtime()
self._id: AgentId = AgentInstantiationContext.current_agent_id()
self._description = description
# ...
# This is the final version called by the runtime
@final
async def on_message(self, message: Any, ctx: MessageContext) -> Any:
# It calls the implementation method you need to write
return await self.on_message_impl(message, ctx)
# You MUST implement this in your subclass
@abstractmethod
async def on_message_impl(self, message: Any, ctx: MessageContext) -> Any: ...
# Helper to send messages easily
async def send_message(self, message: Any, recipient: AgentId, ...) -> Any:
# It just asks the runtime to do the actual sending
return await self._runtime.send_message(
message, sender=self.id, recipient=recipient, ...
)
# ... other methods like publish_message, save_state, load_state
```
Notice how `BaseAgent` handles getting its `id` and `runtime` during creation and provides a convenient `send_message` method that uses the runtime. When inheriting from `BaseAgent`, you primarily focus on implementing the `on_message_impl` method to define your agent's unique behavior.
## Next Steps
You now understand the core concept of an `Agent` in AutoGen Core! It's the fundamental worker unit with an identity, the ability to process messages, and optionally maintain state.
In the next chapters, we'll explore:
* [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md): How messages actually travel between agents.
* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents.
Let's continue building your understanding!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/02_messaging_system__topic___subscription_.md
================================================
---
layout: default
title: "Messaging System"
parent: "AutoGen Core"
nav_order: 2
---
# Chapter 2: Messaging System (Topic & Subscription)
In [Chapter 1: Agent](01_agent.md), we learned about Agents as individual workers. But how do they coordinate when one agent doesn't know exactly *who* needs the information it produces? Imagine our Researcher finds some facts. Maybe the Writer needs them, but maybe a Fact-Checker agent or a Summary agent also needs them later. How can the Researcher just announce "Here are the facts!" without needing a specific mailing list?
This is where the **Messaging System**, specifically **Topics** and **Subscriptions**, comes in. It allows agents to broadcast messages to anyone interested, like posting on a company announcement board.
## Motivation: Broadcasting Information
Let's refine our blog post example:
1. The `Researcher` agent finds facts about "AutoGen Agents".
2. Instead of sending *directly* to the `Writer`, the `Researcher` **publishes** these facts to a general "research-results" **Topic**.
3. The `Writer` agent has previously told the system it's **subscribed** to the "research-results" Topic.
4. The system sees the new message on the Topic and delivers it to the `Writer` (and any other subscribers).
This way, the `Researcher` doesn't need to know who the `Writer` is, or even if a `Writer` exists! It just broadcasts the results. If we later add a `FactChecker` agent that also needs the results, it simply subscribes to the same Topic.
## Key Concepts: Topics and Subscriptions
Let's break down the components of this broadcasting system:
1. **Topic (`TopicId`): The Announcement Board**
* A `TopicId` represents a specific channel or category for messages. Think of it like the name of an announcement board (e.g., "Project Updates", "General Announcements").
* It has two main parts:
* `type`: What *kind* of event or information is this? (e.g., "research.completed", "user.request"). This helps categorize messages.
* `source`: *Where* or *why* did this event originate? Often, this relates to the specific task or context (e.g., the specific blog post being researched like "autogen-agents-blog-post", or the team generating the event like "research-team").
```python
# From: _topic.py (Simplified)
from dataclasses import dataclass
@dataclass(frozen=True) # Immutable: can't change after creation
class TopicId:
type: str
source: str
def __str__(self) -> str:
# Creates an id like "research.completed/autogen-agents-blog-post"
return f"{self.type}/{self.source}"
```
This structure allows for flexible filtering. Agents might subscribe to all topics of a certain `type`, regardless of the `source`, or only to topics with a specific `source`.
2. **Publishing: Posting the Announcement**
* When an agent has information to share broadly, it *publishes* a message to a specific `TopicId`.
* This is like pinning a note to the designated announcement board. The agent doesn't need to know who will read it.
3. **Subscription (`Subscription`): Signing Up for Updates**
* A `Subscription` is how an agent declares its interest in certain `TopicId`s.
* It acts like a rule: "If a message is published to a Topic that matches *this pattern*, please deliver it to *this kind of agent*".
* The `Subscription` links a `TopicId` pattern (e.g., "all topics with type `research.completed`") to an `AgentId` (or a way to determine the `AgentId`).
4. **Routing: Delivering the Mail**
* The `AgentRuntime` (the system manager we'll meet in [Chapter 3: AgentRuntime](03_agentruntime.md)) keeps track of all active `Subscription`s.
* When a message is published to a `TopicId`, the `AgentRuntime` checks which `Subscription`s match that `TopicId`.
* For each match, it uses the `Subscription`'s rule to figure out which specific `AgentId` should receive the message and delivers it.
## Use Case Example: Researcher Publishes, Writer Subscribes
Let's see how our Researcher and Writer can use this system.
**Goal:** Researcher publishes facts to a topic, Writer receives them via subscription.
**1. Define the Topic:**
We need a `TopicId` for research results. Let's say the `type` is "research.facts.available" and the `source` identifies the specific research task (e.g., "blog-post-autogen").
```python
# From: _topic.py
from autogen_core import TopicId
# Define the topic for this specific research task
research_topic_id = TopicId(type="research.facts.available", source="blog-post-autogen")
print(f"Topic ID: {research_topic_id}")
# Output: Topic ID: research.facts.available/blog-post-autogen
```
This defines the "announcement board" we'll use.
**2. Researcher Publishes:**
The `Researcher` agent, after finding facts, will use its `agent_context` (provided by the runtime) to publish the `ResearchFacts` message to this topic.
```python
# Simplified concept - Researcher agent logic
# Assume 'agent_context' and 'message' (ResearchTopic) are provided
# Define the facts message (from Chapter 1)
@dataclass
class ResearchFacts:
topic: str
facts: list[str]
async def researcher_publish_logic(agent_context, message: ResearchTopic, msg_context):
print(f"Researcher working on: {message.topic}")
facts_data = ResearchFacts(
topic=message.topic,
facts=[f"Fact A about {message.topic}", f"Fact B about {message.topic}"]
)
# Define the specific topic for this task's results
results_topic = TopicId(type="research.facts.available", source=message.topic) # Use message topic as source
# Publish the facts to the topic
await agent_context.publish_message(message=facts_data, topic_id=results_topic)
print(f"Researcher published facts to topic: {results_topic}")
# No direct reply needed
return None
```
Notice the `agent_context.publish_message` call. The Researcher doesn't specify a recipient, only the topic.
**3. Writer Subscribes:**
The `Writer` agent needs to tell the system it's interested in messages on topics like "research.facts.available". We can use a predefined `Subscription` type called `TypeSubscription`. This subscription typically means: "I am interested in all topics with this *exact type*. When a message arrives, create/use an agent of *my type* whose `key` matches the topic's `source`."
```python
# From: _type_subscription.py (Simplified Concept)
from autogen_core import TypeSubscription, BaseAgent
class WriterAgent(BaseAgent):
# ... agent implementation ...
async def on_message_impl(self, message: ResearchFacts, ctx):
# This method gets called when a subscribed message arrives
print(f"Writer ({self.id}) received facts via subscription: {message.facts}")
# ... process facts and write draft ...
# How the Writer subscribes (usually done during runtime setup - Chapter 3)
# This tells the runtime: "Messages on topics with type 'research.facts.available'
# should go to a 'writer' agent whose key matches the topic source."
writer_subscription = TypeSubscription(
topic_type="research.facts.available",
agent_type="writer" # The type of agent that should handle this
)
print(f"Writer subscription created for topic type: {writer_subscription.topic_type}")
# Output: Writer subscription created for topic type: research.facts.available
```
When the `Researcher` publishes to `TopicId(type="research.facts.available", source="blog-post-autogen")`, the `AgentRuntime` will see that `writer_subscription` matches the `topic_type`. It will then use the rule: "Find (or create) an agent with `AgentId(type='writer', key='blog-post-autogen')` and deliver the message."
**Benefit:** Decoupling! The Researcher just broadcasts. The Writer just listens for relevant broadcasts. We can add more listeners (like a `FactChecker` subscribing to the same `topic_type`) without changing the `Researcher` at all.
## Under the Hood: How Publishing Works
Let's trace the journey of a published message.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Publisher as Publisher Agent
participant Runtime as AgentRuntime
participant SubRegistry as Subscription Registry
participant Subscriber as Subscriber Agent
Publisher->>+Runtime: publish_message(message, topic_id)
Runtime->>+SubRegistry: Find subscriptions matching topic_id
SubRegistry-->>-Runtime: Return list of matching Subscriptions
loop For each matching Subscription
Runtime->>Subscription: map_to_agent(topic_id)
Subscription-->>Runtime: Return target AgentId
Runtime->>+Subscriber: Locate/Create Agent instance by AgentId
Runtime->>Subscriber: on_message(message, context)
Subscriber-->>-Runtime: Process message (optional return)
end
Runtime-->>-Publisher: Return (usually None for publish)
```
1. **Publish:** An agent calls `agent_context.publish_message(message, topic_id)`. This internally calls the `AgentRuntime`'s publish method.
2. **Lookup:** The `AgentRuntime` takes the `topic_id` and consults its internal `Subscription Registry`.
3. **Match:** The Registry checks all registered `Subscription` objects. Each `Subscription` has an `is_match(topic_id)` method. The registry finds all subscriptions where `is_match` returns `True`.
4. **Map:** For each matching `Subscription`, the Runtime calls its `map_to_agent(topic_id)` method. This method returns the specific `AgentId` that should handle this message based on the subscription rule and the topic details.
5. **Deliver:** The `AgentRuntime` finds the agent instance corresponding to the returned `AgentId` (potentially creating it if it doesn't exist yet, especially with `TypeSubscription`). It then calls that agent's `on_message` method, delivering the original published `message`.
**Code Glimpse:**
* **`TopicId` (`_topic.py`):** As shown before, a simple dataclass holding `type` and `source`. It includes validation to ensure the `type` follows certain naming conventions.
```python
# From: _topic.py
@dataclass(eq=True, frozen=True)
class TopicId:
type: str
source: str
# ... validation and __str__ ...
@classmethod
def from_str(cls, topic_id: str) -> Self:
# Helper to parse "type/source" string
# ... implementation ...
```
* **`Subscription` Protocol (`_subscription.py`):** This defines the *contract* for any subscription rule.
```python
# From: _subscription.py (Simplified Protocol)
from typing import Protocol
# ... other imports
class Subscription(Protocol):
@property
def id(self) -> str: ... # Unique ID for this subscription instance
def is_match(self, topic_id: TopicId) -> bool:
"""Check if a topic matches this subscription's rule."""
...
def map_to_agent(self, topic_id: TopicId) -> AgentId:
"""Determine the target AgentId if is_match was True."""
...
```
Any class implementing these methods can act as a subscription rule.
* **`TypeSubscription` (`_type_subscription.py`):** A common implementation of the `Subscription` protocol.
```python
# From: _type_subscription.py (Simplified)
class TypeSubscription(Subscription):
def __init__(self, topic_type: str, agent_type: str, ...):
self._topic_type = topic_type
self._agent_type = agent_type
# ... generates a unique self._id ...
def is_match(self, topic_id: TopicId) -> bool:
# Matches if the topic's type is exactly the one we want
return topic_id.type == self._topic_type
def map_to_agent(self, topic_id: TopicId) -> AgentId:
# Maps to an agent of the specified type, using the
# topic's source as the agent's unique key.
if not self.is_match(topic_id):
raise CantHandleException(...) # Should not happen if used correctly
return AgentId(type=self._agent_type, key=topic_id.source)
# ... id property ...
```
This implementation provides the "one agent instance per source" behavior for a specific topic type.
* **`DefaultSubscription` (`_default_subscription.py`):** This is often used via a decorator (`@default_subscription`) and provides a convenient way to create a `TypeSubscription` where the `agent_type` is automatically inferred from the agent class being defined, and the `topic_type` defaults to "default" (but can be overridden). It simplifies common use cases.
```python
# From: _default_subscription.py (Conceptual Usage)
from autogen_core import BaseAgent, default_subscription, ResearchFacts
@default_subscription # Uses 'default' topic type, infers agent type 'writer'
class WriterAgent(BaseAgent):
# Agent logic here...
async def on_message_impl(self, message: ResearchFacts, ctx): ...
# Or specify the topic type
@default_subscription(topic_type="research.facts.available")
class SpecificWriterAgent(BaseAgent):
# Agent logic here...
async def on_message_impl(self, message: ResearchFacts, ctx): ...
```
The actual sending (`publish_message`) and routing logic reside within the `AgentRuntime`, which we'll explore next.
## Next Steps
You've learned how AutoGen Core uses a publish/subscribe system (`TopicId`, `Subscription`) to allow agents to communicate without direct coupling. This is crucial for building flexible and scalable multi-agent applications.
* **Topic (`TopicId`):** Named channels (`type`/`source`) for broadcasting messages.
* **Publish:** Sending a message to a Topic.
* **Subscription:** An agent's declared interest in messages on certain Topics, defining a routing rule.
Now, let's dive into the orchestrator that manages agents and makes this messaging system work:
* [Chapter 3: AgentRuntime](03_agentruntime.md): The manager responsible for creating, running, and connecting agents, including handling message publishing and subscription routing.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/03_agentruntime.md
================================================
---
layout: default
title: "AgentRuntime"
parent: "AutoGen Core"
nav_order: 3
---
# Chapter 3: AgentRuntime - The Office Manager
In [Chapter 1: Agent](01_agent.md), we met the workers (`Agent`) of our system. In [Chapter 2: Messaging System](02_messaging_system__topic___subscription_.md), we saw how they can communicate broadly using topics and subscriptions. But who hires these agents? Who actually delivers the messages, whether direct or published? And who keeps the whole system running smoothly?
This is where the **`AgentRuntime`** comes in. It's the central nervous system, the operating system, or perhaps the most fitting analogy: **the office manager** for all your agents.
## Motivation: Why Do We Need an Office Manager?
Imagine an office full of employees (Agents). You have researchers, writers, maybe coders.
* How does a new employee get hired and set up?
* When one employee wants to send a memo directly to another, who makes sure it gets to the right desk?
* When someone posts an announcement on the company bulletin board (publishes to a topic), who ensures everyone who signed up for that type of announcement sees it?
* Who starts the workday and ensures everything keeps running?
Without an office manager, it would be chaos! The `AgentRuntime` serves this crucial role in AutoGen Core. It handles:
1. **Agent Creation:** "Onboarding" new agents when they are needed.
2. **Message Routing:** Delivering direct messages (`send_message`) and published messages (`publish_message`).
3. **Lifecycle Management:** Starting, running, and stopping the whole system.
4. **State Management:** Keeping track of the overall system state (optional).
## Key Concepts: Understanding the Manager's Job
Let's break down the main responsibilities of the `AgentRuntime`:
1. **Agent Instantiation (Hiring):**
* You don't usually create agent objects directly (like `my_agent = ResearcherAgent()`). Why? Because the agent needs to know *about* the runtime (the office it works in) to send messages, publish announcements, etc.
* Instead, you tell the `AgentRuntime`: "I need an agent of type 'researcher'. Here's a recipe (a **factory function**) for how to create one." This is done using `runtime.register_factory(...)`.
* When a message needs to go to a 'researcher' agent with a specific key (e.g., 'researcher-01'), the runtime checks if it already exists. If not, it uses the registered factory function to create (instantiate) the agent.
* **Crucially**, while creating the agent, the runtime provides special context (`AgentInstantiationContext`) so the new agent automatically gets its unique `AgentId` and a reference to the `AgentRuntime` itself. This is like giving a new employee their ID badge and telling them who the office manager is.
```python
# Simplified Concept - How a BaseAgent gets its ID and runtime access
# From: _agent_instantiation.py and _base_agent.py
# Inside the agent's __init__ method (when inheriting from BaseAgent):
class MyAgent(BaseAgent):
def __init__(self, description: str):
# This magic happens *because* the AgentRuntime is creating the agent
# inside a special context.
self._runtime = AgentInstantiationContext.current_runtime() # Gets the manager
self._id = AgentInstantiationContext.current_agent_id() # Gets its own ID
self._description = description
# ... rest of initialization ...
```
This ensures agents are properly integrated into the system from the moment they are created.
2. **Message Delivery (Mail Room):**
* **Direct Send (`send_message`):** When an agent calls `await agent_context.send_message(message, recipient_id)`, it's actually telling the `AgentRuntime`, "Please deliver this `message` directly to the agent identified by `recipient_id`." The runtime finds the recipient agent (creating it if necessary) and calls its `on_message` method. It's like putting a specific name on an envelope and handing it to the mail room.
* **Publish (`publish_message`):** When an agent calls `await agent_context.publish_message(message, topic_id)`, it tells the runtime, "Post this `message` to the announcement board named `topic_id`." The runtime then checks its list of **subscriptions** (who signed up for which boards). For every matching subscription, it figures out the correct recipient agent(s) (based on the subscription rule) and delivers the message to their `on_message` method.
3. **Lifecycle Management (Opening/Closing the Office):**
* The runtime needs to be started to begin processing messages. Typically, you call `runtime.start()`. This usually kicks off a background process or loop that watches for incoming messages.
* When work is done, you need to stop the runtime gracefully. `runtime.stop_when_idle()` is common – it waits until all messages currently in the queue have been processed, then stops. `runtime.stop()` stops more abruptly.
4. **State Management (Office Records):**
* The runtime can save the state of *all* the agents it manages (`runtime.save_state()`) and load it back later (`runtime.load_state()`). This is useful for pausing and resuming complex multi-agent interactions. It can also save/load state for individual agents (`runtime.agent_save_state()` / `runtime.agent_load_state()`). We'll touch more on state in [Chapter 7: Memory](07_memory.md).
## Use Case Example: Running Our Researcher and Writer
Let's finally run the Researcher/Writer scenario from Chapters 1 and 2. We need the `AgentRuntime` to make it happen.
**Goal:**
1. Create a runtime.
2. Register factories for a 'researcher' and a 'writer' agent.
3. Tell the runtime that 'writer' agents are interested in "research.facts.available" topics (add subscription).
4. Start the runtime.
5. Send an initial `ResearchTopic` message to a 'researcher' agent.
6. Let the system run (Researcher publishes facts, Runtime delivers to Writer via subscription, Writer processes).
7. Stop the runtime when idle.
**Code Snippets (Simplified):**
```python
# 0. Imports and Message Definitions (from previous chapters)
import asyncio
from dataclasses import dataclass
from autogen_core import (
AgentId, BaseAgent, SingleThreadedAgentRuntime, TopicId,
MessageContext, TypeSubscription, AgentInstantiationContext
)
@dataclass
class ResearchTopic: topic: str
@dataclass
class ResearchFacts: topic: str; facts: list[str]
```
These are the messages our agents will exchange.
```python
# 1. Define Agent Logic (using BaseAgent)
class ResearcherAgent(BaseAgent):
async def on_message_impl(self, message: ResearchTopic, ctx: MessageContext):
print(f"Researcher ({self.id}) got topic: {message.topic}")
facts = [f"Fact 1 about {message.topic}", f"Fact 2"]
results_topic = TopicId("research.facts.available", message.topic)
# Use the runtime (via self.publish_message helper) to publish
await self.publish_message(
ResearchFacts(topic=message.topic, facts=facts), results_topic
)
print(f"Researcher ({self.id}) published facts to {results_topic}")
class WriterAgent(BaseAgent):
async def on_message_impl(self, message: ResearchFacts, ctx: MessageContext):
print(f"Writer ({self.id}) received facts via topic '{ctx.topic_id}': {message.facts}")
draft = f"Draft for {message.topic}: {'; '.join(message.facts)}"
print(f"Writer ({self.id}) created draft: '{draft}'")
# This agent doesn't send further messages in this example
```
Here we define the behavior of our two agent types, inheriting from `BaseAgent` which gives us `self.id`, `self.publish_message`, etc.
```python
# 2. Define Agent Factories
def researcher_factory():
# Gets runtime/id via AgentInstantiationContext inside BaseAgent.__init__
print("Runtime is creating a ResearcherAgent...")
return ResearcherAgent(description="I research topics.")
def writer_factory():
print("Runtime is creating a WriterAgent...")
return WriterAgent(description="I write drafts from facts.")
```
These simple functions tell the runtime *how* to create instances of our agents when needed.
```python
# 3. Setup and Run the Runtime
async def main():
# Create the runtime (the office manager)
runtime = SingleThreadedAgentRuntime()
# Register the factories (tell the manager how to hire)
await runtime.register_factory("researcher", researcher_factory)
await runtime.register_factory("writer", writer_factory)
print("Registered agent factories.")
# Add the subscription (tell manager who listens to which announcements)
# Rule: Messages to topics of type "research.facts.available"
# should go to a "writer" agent whose key matches the topic source.
writer_sub = TypeSubscription(topic_type="research.facts.available", agent_type="writer")
await runtime.add_subscription(writer_sub)
print(f"Added subscription: {writer_sub.id}")
# Start the runtime (open the office)
runtime.start()
print("Runtime started.")
# Send the initial message to kick things off
research_task_topic = "AutoGen Agents"
researcher_instance_id = AgentId(type="researcher", key=research_task_topic)
print(f"Sending initial topic '{research_task_topic}' to {researcher_instance_id}")
await runtime.send_message(
message=ResearchTopic(topic=research_task_topic),
recipient=researcher_instance_id,
)
# Wait until all messages are processed (wait for work day to end)
print("Waiting for runtime to become idle...")
await runtime.stop_when_idle()
print("Runtime stopped.")
# Run the main function
asyncio.run(main())
```
This script sets up the `SingleThreadedAgentRuntime`, registers the blueprints (factories) and communication rules (subscription), starts the process, and then shuts down cleanly.
**Expected Output (Conceptual Order):**
```
Registered agent factories.
Added subscription: type=research.facts.available=>agent=writer
Runtime started.
Sending initial topic 'AutoGen Agents' to researcher/AutoGen Agents
Waiting for runtime to become idle...
Runtime is creating a ResearcherAgent... # First time researcher/AutoGen Agents is needed
Researcher (researcher/AutoGen Agents) got topic: AutoGen Agents
Researcher (researcher/AutoGen Agents) published facts to research.facts.available/AutoGen Agents
Runtime is creating a WriterAgent... # First time writer/AutoGen Agents is needed (due to subscription)
Writer (writer/AutoGen Agents) received facts via topic 'research.facts.available/AutoGen Agents': ['Fact 1 about AutoGen Agents', 'Fact 2']
Writer (writer/AutoGen Agents) created draft: 'Draft for AutoGen Agents: Fact 1 about AutoGen Agents; Fact 2'
Runtime stopped.
```
You can see the runtime orchestrating the creation of agents and the flow of messages based on the initial request and the subscription rule.
## Under the Hood: How the Manager Works
Let's peek inside the `SingleThreadedAgentRuntime` (a common implementation provided by AutoGen Core) to understand the flow.
**Core Idea:** It uses an internal queue (`_message_queue`) to hold incoming requests (`send_message`, `publish_message`). A background task continuously takes items from the queue and processes them one by one (though the *handling* of a message might involve `await` and allow other tasks to run).
**1. Agent Creation (`_get_agent`, `_invoke_agent_factory`)**
When the runtime needs an agent instance (e.g., to deliver a message) that hasn't been created yet:
```mermaid
sequenceDiagram
participant Runtime as AgentRuntime
participant Factory as Agent Factory Func
participant AgentCtx as AgentInstantiationContext
participant Agent as New Agent Instance
Runtime->>Runtime: Check if agent instance exists (e.g., in `_instantiated_agents` dict)
alt Agent Not Found
Runtime->>Runtime: Find registered factory for agent type
Runtime->>AgentCtx: Set current runtime & agent_id
activate AgentCtx
Runtime->>Factory: Call factory function()
activate Factory
Factory->>AgentCtx: (Inside Agent.__init__) Get current runtime
AgentCtx-->>Factory: Return runtime
Factory->>AgentCtx: (Inside Agent.__init__) Get current agent_id
AgentCtx-->>Factory: Return agent_id
Factory-->>Runtime: Return new Agent instance
deactivate Factory
Runtime->>AgentCtx: Clear context
deactivate AgentCtx
Runtime->>Runtime: Store new agent instance
end
Runtime->>Runtime: Return agent instance
```
* The runtime looks up the factory function registered for the required `AgentId.type`.
* It uses `AgentInstantiationContext.populate_context` to temporarily store its own reference and the target `AgentId`.
* It calls the factory function.
* Inside the agent's `__init__` (usually via `BaseAgent`), `AgentInstantiationContext.current_runtime()` and `AgentInstantiationContext.current_agent_id()` are called to retrieve the context set by the runtime.
* The factory returns the fully initialized agent instance.
* The runtime stores this instance for future use.
```python
# From: _agent_instantiation.py (Simplified)
class AgentInstantiationContext:
_CONTEXT_VAR = ContextVar("agent_context") # Stores (runtime, agent_id)
@classmethod
@contextmanager
def populate_context(cls, ctx: tuple[AgentRuntime, AgentId]):
token = cls._CONTEXT_VAR.set(ctx) # Store context for this block
try:
yield # Code inside the 'with' block runs here
finally:
cls._CONTEXT_VAR.reset(token) # Clean up context
@classmethod
def current_runtime(cls) -> AgentRuntime:
return cls._CONTEXT_VAR.get()[0] # Retrieve runtime from context
@classmethod
def current_agent_id(cls) -> AgentId:
return cls._CONTEXT_VAR.get()[1] # Retrieve agent_id from context
```
This context manager pattern ensures the correct runtime and ID are available *only* during the agent's creation by the runtime.
**2. Direct Messaging (`send_message` -> `_process_send`)**
```mermaid
sequenceDiagram
participant Sender as Sending Agent/Code
participant Runtime as AgentRuntime
participant Queue as Internal Queue
participant Recipient as Recipient Agent
Sender->>+Runtime: send_message(msg, recipient_id, ...)
Runtime->>Runtime: Create Future (for response)
Runtime->>+Queue: Put SendMessageEnvelope(msg, recipient_id, future)
Runtime-->>-Sender: Return awaitable Future
Note over Queue, Runtime: Background task picks up envelope
Runtime->>Runtime: _process_send(envelope)
Runtime->>+Recipient: _get_agent(recipient_id) (creates if needed)
Recipient-->>-Runtime: Return Agent instance
Runtime->>+Recipient: on_message(msg, context)
Recipient->>Recipient: Process message...
Recipient-->>-Runtime: Return response value
Runtime->>Runtime: Set Future result with response value
```
* `send_message` creates a `Future` object (a placeholder for the eventual result) and wraps the message details in a `SendMessageEnvelope`.
* This envelope is put onto the internal `_message_queue`.
* The background task picks up the envelope.
* `_process_send` gets the recipient agent instance (using `_get_agent`).
* It calls the recipient's `on_message` method.
* When `on_message` returns a result, `_process_send` sets the result on the `Future` object, which makes the original `await runtime.send_message(...)` call return the value.
**3. Publish/Subscribe (`publish_message` -> `_process_publish`)**
```mermaid
sequenceDiagram
participant Publisher as Publishing Agent/Code
participant Runtime as AgentRuntime
participant Queue as Internal Queue
participant SubManager as SubscriptionManager
participant Subscriber as Subscribed Agent
Publisher->>+Runtime: publish_message(msg, topic_id, ...)
Runtime->>+Queue: Put PublishMessageEnvelope(msg, topic_id)
Runtime-->>-Publisher: Return (None for publish)
Note over Queue, Runtime: Background task picks up envelope
Runtime->>Runtime: _process_publish(envelope)
Runtime->>+SubManager: get_subscribed_recipients(topic_id)
SubManager->>SubManager: Find matching subscriptions
SubManager->>SubManager: Map subscriptions to AgentIds
SubManager-->>-Runtime: Return list of recipient AgentIds
loop For each recipient AgentId
Runtime->>+Subscriber: _get_agent(recipient_id) (creates if needed)
Subscriber-->>-Runtime: Return Agent instance
Runtime->>+Subscriber: on_message(msg, context with topic_id)
Subscriber->>Subscriber: Process message...
Subscriber-->>-Runtime: Return (usually None for publish)
end
```
* `publish_message` wraps the message in a `PublishMessageEnvelope` and puts it on the queue.
* The background task picks it up.
* `_process_publish` asks the `SubscriptionManager` (`_subscription_manager`) for all `AgentId`s that are subscribed to the given `topic_id`.
* The `SubscriptionManager` checks its registered `Subscription` objects (`_subscriptions` list, added via `add_subscription`). For each `Subscription` where `is_match(topic_id)` is true, it calls `map_to_agent(topic_id)` to get the target `AgentId`.
* For each resulting `AgentId`, the runtime gets the agent instance and calls its `on_message` method, providing the `topic_id` in the `MessageContext`.
```python
# From: _runtime_impl_helpers.py (SubscriptionManager simplified)
class SubscriptionManager:
def __init__(self):
self._subscriptions: List[Subscription] = []
# Optimization cache can be added here
async def add_subscription(self, subscription: Subscription):
self._subscriptions.append(subscription)
# Clear cache if any
async def get_subscribed_recipients(self, topic: TopicId) -> List[AgentId]:
recipients = []
for sub in self._subscriptions:
if sub.is_match(topic):
recipients.append(sub.map_to_agent(topic))
return recipients
```
The `SubscriptionManager` simply iterates through registered subscriptions to find matches when a message is published.
## Next Steps
You now understand the `AgentRuntime` - the essential coordinator that brings Agents to life, manages their communication, and runs the entire show. It handles agent creation via factories, routes direct and published messages, and manages the system's lifecycle.
With the core concepts of `Agent`, `Messaging`, and `AgentRuntime` covered, we can start looking at more specialized building blocks. Next, we'll explore how agents can use external capabilities:
* [Chapter 4: Tool](04_tool.md): How to give agents tools (like functions or APIs) to perform specific actions beyond just processing messages.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/04_tool.md
================================================
---
layout: default
title: "Tool"
parent: "AutoGen Core"
nav_order: 4
---
# Chapter 4: Tool - Giving Agents Specific Capabilities
In the previous chapters, we learned about Agents as workers ([Chapter 1](01_agent.md)), how they can communicate directly or using announcements ([Chapter 2](02_messaging_system__topic___subscription_.md)), and the `AgentRuntime` that manages them ([Chapter 3](03_agentruntime.md)).
Agents can process messages and coordinate, but what if an agent needs to perform a very specific action, like looking up information online, running a piece of code, accessing a database, or even just finding out the current date? They need specialized *capabilities*.
This is where the concept of a **Tool** comes in.
## Motivation: Agents Need Skills!
Imagine our `Writer` agent from before. It receives facts and writes a draft. Now, let's say we want the `Writer` (or perhaps a smarter `Assistant` agent helping it) to always include the current date in the blog post title.
How does the agent get the current date? It doesn't inherently know it. It needs a specific *skill* or *tool* for that.
A `Tool` in AutoGen Core represents exactly this: a specific, well-defined capability that an Agent can use. Think of it like giving an employee (Agent) a specialized piece of equipment (Tool), like a calculator, a web browser, or a calendar lookup program.
## Key Concepts: Understanding Tools
Let's break down what defines a Tool:
1. **It's a Specific Capability:** A Tool performs one well-defined task. Examples:
* `search_web(query: str)`
* `run_python_code(code: str)`
* `get_stock_price(ticker: str)`
* `get_current_date()`
2. **It Has a Schema (The Manual):** This is crucial! For an Agent (especially one powered by a Large Language Model - LLM) to know *when* and *how* to use a tool, the tool needs a clear description or "manual". This is called the `ToolSchema`. It typically includes:
* **`name`**: A unique identifier for the tool (e.g., `get_current_date`).
* **`description`**: A clear explanation of what the tool does, which helps the LLM decide if this tool is appropriate for the current task (e.g., "Fetches the current date in YYYY-MM-DD format").
* **`parameters`**: Defines what inputs the tool needs. This is itself a schema (`ParametersSchema`) describing the input fields, their types, and which ones are required. For our `get_current_date` example, it might need no parameters. For `get_stock_price`, it would need a `ticker` parameter of type string.
```python
# From: tools/_base.py (Simplified Concept)
from typing import TypedDict, Dict, Any, Sequence, NotRequired
class ParametersSchema(TypedDict):
type: str # Usually "object"
properties: Dict[str, Any] # Defines input fields and their types
required: NotRequired[Sequence[str]] # List of required field names
class ToolSchema(TypedDict):
name: str
description: NotRequired[str]
parameters: NotRequired[ParametersSchema]
# 'strict' flag also possible (Chapter 5 related)
```
This schema allows an LLM to understand: "Ah, there's a tool called `get_current_date` that takes no inputs and gives me the current date. I should use that now!"
3. **It Can Be Executed:** Once an agent decides to use a tool (often based on the schema), there needs to be a mechanism to actually *run* the tool's underlying function and get the result.
## Use Case Example: Adding a `get_current_date` Tool
Let's equip an agent with the ability to find the current date.
**Goal:** Define a tool that gets the current date and show how it could be executed by a specialized agent.
**Step 1: Define the Python Function**
First, we need the actual Python code that performs the action.
```python
# File: get_date_function.py
import datetime
def get_current_date() -> str:
"""Fetches the current date as a string."""
today = datetime.date.today()
return today.isoformat() # Returns date like "2023-10-27"
# Test the function
print(f"Function output: {get_current_date()}")
```
This is a standard Python function. It takes no arguments and returns the date as a string.
**Step 2: Wrap it as a `FunctionTool`**
AutoGen Core provides a convenient way to turn a Python function like this into a `Tool` object using `FunctionTool`. It automatically inspects the function's signature (arguments and return type) and docstring to help build the `ToolSchema`.
```python
# File: create_date_tool.py
from autogen_core.tools import FunctionTool
from get_date_function import get_current_date # Import our function
# Create the Tool instance
# We provide the function and a clear description for the LLM
date_tool = FunctionTool(
func=get_current_date,
description="Use this tool to get the current date in YYYY-MM-DD format."
# Name defaults to function name 'get_current_date'
)
# Let's see what FunctionTool generated
print(f"Tool Name: {date_tool.name}")
print(f"Tool Description: {date_tool.description}")
# The schema defines inputs (none in this case)
# print(f"Tool Schema Parameters: {date_tool.schema['parameters']}")
# Output (simplified): {'type': 'object', 'properties': {}, 'required': []}
```
`FunctionTool` wraps our `get_current_date` function. It uses the function name as the tool name and the description we provided. It also correctly determines from the function signature that there are no input parameters (`properties: {}`).
**Step 3: How an Agent Might Request Tool Use**
Now we have a `date_tool`. How is it used? Typically, an LLM-powered agent (which we'll see more of in [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md)) analyzes a request and decides a tool is needed. It then generates a request to *call* that tool, often using a specific message type like `FunctionCall`.
```python
# File: tool_call_request.py
from autogen_core import FunctionCall # Represents a request to call a tool
# Imagine an LLM agent decided to use the date tool.
# It constructs this message, providing the tool name and arguments (as JSON string).
date_call_request = FunctionCall(
id="call_date_001", # A unique ID for this specific call attempt
name="get_current_date", # Matches the Tool's name
arguments="{}" # An empty JSON object because no arguments are needed
)
print("FunctionCall message:", date_call_request)
# Output: FunctionCall(id='call_date_001', name='get_current_date', arguments='{}')
```
This `FunctionCall` message is like a work order: "Please execute the tool named `get_current_date` with these arguments."
**Step 4: The `ToolAgent` Executes the Tool**
Who receives this `FunctionCall` message? Usually, a specialized agent called `ToolAgent`. You create a `ToolAgent` and give it the list of tools it knows how to execute. When it receives a `FunctionCall`, it finds the matching tool and runs it.
```python
# File: tool_agent_example.py
import asyncio
from autogen_core.tool_agent import ToolAgent
from autogen_core.models import FunctionExecutionResult
from create_date_tool import date_tool # Import the tool we created
from tool_call_request import date_call_request # Import the request message
# Create an agent specifically designed to execute tools
tool_executor = ToolAgent(
description="I can execute tools like getting the date.",
tools=[date_tool] # Give it the list of tools it manages
)
# --- Simulation of Runtime delivering the message ---
# In a real app, the AgentRuntime (Chapter 3) would route the
# date_call_request message to this tool_executor agent.
# We simulate the call to its message handler here:
async def simulate_execution():
# Fake context (normally provided by runtime)
class MockContext: cancellation_token = None
ctx = MockContext()
print(f"ToolAgent received request: {date_call_request.name}")
result: FunctionExecutionResult = await tool_executor.handle_function_call(
message=date_call_request,
ctx=ctx
)
print(f"ToolAgent produced result: {result}")
asyncio.run(simulate_execution())
```
**Expected Output:**
```
ToolAgent received request: get_current_date
ToolAgent produced result: FunctionExecutionResult(content='2023-10-27', call_id='call_date_001', is_error=False, name='get_current_date') # Date will be current date
```
The `ToolAgent` received the `FunctionCall`, found the `date_tool` in its list, executed the underlying `get_current_date` function, and packaged the result (the date string) into a `FunctionExecutionResult` message. This result message can then be sent back to the agent that originally requested the tool use.
## Under the Hood: How Tool Execution Works
Let's visualize the typical flow when an LLM agent decides to use a tool managed by a `ToolAgent`.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant LLMA as LLM Agent (Decides)
participant Caller as Caller Agent (Orchestrates)
participant ToolA as ToolAgent (Executes)
participant ToolFunc as Tool Function (e.g., get_current_date)
Note over LLMA: Analyzes conversation, decides tool needed.
LLMA->>Caller: Sends AssistantMessage containing FunctionCall(name='get_current_date', args='{}')
Note over Caller: Receives LLM response, sees FunctionCall.
Caller->>+ToolA: Uses runtime.send_message(message=FunctionCall, recipient=ToolAgent_ID)
Note over ToolA: Receives FunctionCall via on_message.
ToolA->>ToolA: Looks up 'get_current_date' in its internal list of Tools.
ToolA->>+ToolFunc: Calls tool.run_json(args={}) -> triggers get_current_date()
ToolFunc-->>-ToolA: Returns the result (e.g., "2023-10-27")
ToolA->>ToolA: Creates FunctionExecutionResult message with the content.
ToolA-->>-Caller: Returns FunctionExecutionResult via runtime messaging.
Note over Caller: Receives the tool result.
Caller->>LLMA: Sends FunctionExecutionResultMessage to LLM for next step.
Note over LLMA: Now knows the current date.
```
1. **Decision:** An LLM-powered agent decides a tool is needed based on the conversation and the available tools' descriptions. It generates a `FunctionCall`.
2. **Request:** A "Caller" agent (often the same LLM agent or a managing agent) sends this `FunctionCall` message to the dedicated `ToolAgent` using the `AgentRuntime`.
3. **Lookup:** The `ToolAgent` receives the message, extracts the tool `name` (`get_current_date`), and finds the corresponding `Tool` object (our `date_tool`) in the list it was configured with.
4. **Execution:** The `ToolAgent` calls the `run_json` method on the `Tool` object, passing the arguments from the `FunctionCall`. For a `FunctionTool`, `run_json` validates the arguments against the generated schema and then executes the original Python function (`get_current_date`).
5. **Result:** The Python function returns its result (the date string).
6. **Response:** The `ToolAgent` wraps this result string in a `FunctionExecutionResult` message, including the original `call_id`, and sends it back to the Caller agent.
7. **Continuation:** The Caller agent typically sends this result back to the LLM agent, allowing the conversation or task to continue with the new information.
**Code Glimpse:**
* **`Tool` Protocol (`tools/_base.py`):** Defines the basic contract any tool must fulfill. Key methods are `schema` (property returning the `ToolSchema`) and `run_json` (method to execute the tool with JSON-like arguments).
* **`BaseTool` (`tools/_base.py`):** An abstract class that helps implement the `Tool` protocol, especially using Pydantic models for defining arguments (`args_type`) and return values (`return_type`). It automatically generates the `parameters` part of the schema from the `args_type` model.
* **`FunctionTool` (`tools/_function_tool.py`):** Inherits from `BaseTool`. Its magic lies in automatically creating the `args_type` Pydantic model by inspecting the wrapped Python function's signature (`args_base_model_from_signature`). Its `run` method handles calling the original sync or async Python function.
```python
# Inside FunctionTool (Simplified Concept)
class FunctionTool(BaseTool[BaseModel, BaseModel]):
def __init__(self, func, description, ...):
self._func = func
self._signature = get_typed_signature(func)
# Automatically create Pydantic model for arguments
args_model = args_base_model_from_signature(...)
# Get return type from signature
return_type = self._signature.return_annotation
super().__init__(args_model, return_type, ...)
async def run(self, args: BaseModel, ...):
# Extract arguments from the 'args' model
kwargs = args.model_dump()
# Call the original Python function (sync or async)
result = await self._call_underlying_func(**kwargs)
return result # Must match the expected return_type
```
* **`ToolAgent` (`tool_agent/_tool_agent.py`):** A specialized `RoutedAgent`. It registers a handler specifically for `FunctionCall` messages.
```python
# Inside ToolAgent (Simplified Concept)
class ToolAgent(RoutedAgent):
def __init__(self, ..., tools: List[Tool]):
super().__init__(...)
self._tools = {tool.name: tool for tool in tools} # Store tools by name
@message_handler # Registers this for FunctionCall messages
async def handle_function_call(self, message: FunctionCall, ctx: MessageContext):
# Find the tool by name
tool = self._tools.get(message.name)
if tool is None:
# Handle error: Tool not found
raise ToolNotFoundException(...)
try:
# Parse arguments string into a dictionary
arguments = json.loads(message.arguments)
# Execute the tool's run_json method
result_obj = await tool.run_json(args=arguments, ...)
# Convert result object back to string if needed
result_str = tool.return_value_as_string(result_obj)
# Create the success result message
return FunctionExecutionResult(content=result_str, ...)
except Exception as e:
# Handle execution errors
return FunctionExecutionResult(content=f"Error: {e}", is_error=True, ...)
```
Its core logic is: find tool -> parse args -> run tool -> return result/error.
## Next Steps
You've learned how **Tools** provide specific capabilities to Agents, defined by a **Schema** that LLMs can understand. We saw how `FunctionTool` makes it easy to wrap existing Python functions and how `ToolAgent` acts as the executor for these tools.
This ability for agents to use tools is fundamental to building powerful and versatile AI systems that can interact with the real world or perform complex calculations.
Now that agents can use tools, we need to understand more about the agents that *decide* which tools to use, which often involves interacting with Large Language Models:
* [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md): How agents interact with LLMs like GPT to generate responses or decide on actions (like calling a tool).
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): How the history of the conversation, including tool calls and results, is managed when talking to an LLM.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/05_chatcompletionclient.md
================================================
---
layout: default
title: "ChatCompletionClient"
parent: "AutoGen Core"
nav_order: 5
---
# Chapter 5: ChatCompletionClient - Talking to the Brains
So far, we've learned about:
* [Agents](01_agent.md): The workers in our system.
* [Messaging](02_messaging_system__topic___subscription_.md): How agents communicate broadly.
* [AgentRuntime](03_agentruntime.md): The manager that runs the show.
* [Tools](04_tool.md): How agents get specific skills.
But how does an agent actually *think* or *generate text*? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their "brains". How does an agent in AutoGen Core communicate with these external LLM services?
This is where the **`ChatCompletionClient`** comes in. It's the dedicated component for talking to LLMs.
## Motivation: Bridging the Gap to LLMs
Imagine you want to build an agent that can summarize long articles.
1. You give the agent an article (as a message).
2. The agent needs to send this article to an LLM (like GPT-4).
3. It also needs to tell the LLM: "Please summarize this."
4. The LLM processes the request and generates a summary.
5. The agent needs to receive this summary back from the LLM.
How does the agent handle the technical details of connecting to the LLM's specific API, formatting the request correctly, sending it over the internet, and understanding the response?
The `ChatCompletionClient` solves this! Think of it as the **standard phone line and translator** connecting your agent to the LLM service. You tell the client *what* to say (the conversation history and instructions), and it handles *how* to say it to the specific LLM and translates the LLM's reply back into a standard format.
## Key Concepts: Understanding the LLM Communicator
Let's break down the `ChatCompletionClient`:
1. **LLM Communication Bridge:** It's the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
2. **Standard Interface (`create` method):** It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is `create`. You give it:
* `messages`: A list of messages representing the conversation history so far.
* Optional `tools`: A list of tools ([Chapter 4](04_tool.md)) the LLM might be able to use.
* Other parameters (like `json_output` hints, `cancellation_token`).
3. **Messages (`LLMMessage`):** The conversation history is passed as a sequence of specific message types defined in `autogen_core.models`:
* `SystemMessage`: Instructions for the LLM (e.g., "You are a helpful assistant.").
* `UserMessage`: Input from the user or another agent (e.g., the article text).
* `AssistantMessage`: Previous responses from the LLM (can include text or requests to call functions/tools).
* `FunctionExecutionResultMessage`: The results of executing a tool/function call.
4. **Tools (`ToolSchema`):** You can provide the schemas of available tools ([Chapter 4](04_tool.md)). The LLM might then respond not with text, but with a request to call one of these tools (`FunctionCall` inside an `AssistantMessage`).
5. **Response (`CreateResult`):** The `create` method returns a standard `CreateResult` object containing:
* `content`: The LLM's generated text or a list of `FunctionCall` requests.
* `finish_reason`: Why the LLM stopped generating (e.g., "stop", "length", "function_calls").
* `usage`: How many input (`prompt_tokens`) and output (`completion_tokens`) tokens were used.
* `cached`: Whether the response came from a cache.
6. **Token Tracking:** The client automatically tracks token usage (`prompt_tokens`, `completion_tokens`) for each call. You can query the total usage via methods like `total_usage()`. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
## Use Case Example: Summarizing Text with an LLM
Let's build a simplified scenario where we use a `ChatCompletionClient` to ask an LLM to summarize text.
**Goal:** Send text to an LLM via a client and get a summary back.
**Step 1: Prepare the Input Messages**
We need to structure our request as a list of `LLMMessage` objects.
```python
# File: prepare_messages.py
from autogen_core.models import SystemMessage, UserMessage
# Instructions for the LLM
system_prompt = SystemMessage(
content="You are a helpful assistant designed to summarize text concisely."
)
# The text we want to summarize
article_text = """
AutoGen is a framework that enables the development of LLM applications using multiple agents
that can converse with each other to solve tasks. AutoGen agents are customizable,
conversable, and can seamlessly allow human participation. They can operate in various modes
that employ combinations of LLMs, human inputs, and tools.
"""
user_request = UserMessage(
content=f"Please summarize the following text in one sentence:\n\n{article_text}",
source="User" # Indicate who provided this input
)
# Combine into a list for the client
messages_to_send = [system_prompt, user_request]
print("Messages prepared:")
for msg in messages_to_send:
print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
```
This code defines the instructions (`SystemMessage`) and the user's request (`UserMessage`) and puts them in a list, ready to be sent.
**Step 2: Use the ChatCompletionClient (Conceptual)**
Now, we need an instance of a `ChatCompletionClient`. In a real application, you'd configure a specific client (like `OpenAIChatCompletionClient` with your API key). For this example, let's imagine we have a pre-configured client called `llm_client`.
```python
# File: call_llm_client.py
import asyncio
from autogen_core.models import CreateResult, RequestUsage
# Assume 'messages_to_send' is from the previous step
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
async def get_summary(client, messages):
print("\nSending messages to LLM via ChatCompletionClient...")
try:
# The core call: send messages, get structured result
response: CreateResult = await client.create(
messages=messages,
# We aren't providing tools in this simple example
tools=[]
)
print("Received response:")
print(f"- Finish Reason: {response.finish_reason}")
print(f"- Content: {response.content}") # This should be the summary
print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
print(f"- Cached: {response.cached}")
# Also, check total usage tracked by the client
total_usage = client.total_usage()
print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
except Exception as e:
print(f"An error occurred: {e}")
# --- Placeholder for actual client ---
class MockChatCompletionClient: # Simulate a real client
_total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
async def create(self, messages, tools=[], **kwargs) -> CreateResult:
# Simulate API call and response
prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
summary = "AutoGen is a multi-agent framework for developing LLM applications."
completion_len = len(summary) // 4 # Rough token estimate
usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
self._total_usage.prompt_tokens += usage.prompt_tokens
self._total_usage.completion_tokens += usage.completion_tokens
return CreateResult(
finish_reason="stop", content=summary, usage=usage, cached=False
)
def total_usage(self) -> RequestUsage: return self._total_usage
# Other required methods (count_tokens, model_info etc.) omitted for brevity
async def main():
from prepare_messages import messages_to_send # Get messages from previous step
mock_client = MockChatCompletionClient()
await get_summary(mock_client, messages_to_send)
# asyncio.run(main()) # If you run this, it uses the mock client
```
This code shows the essential `client.create(...)` call. We pass our `messages_to_send` and receive a `CreateResult`. We then print the summary (`response.content`) and the token usage reported for that specific call (`response.usage`) and the total tracked by the client (`client.total_usage()`).
**How an Agent Uses It:**
Typically, an agent's logic (e.g., inside its `on_message` handler) would:
1. Receive an incoming message (like the article to summarize).
2. Prepare the list of `LLMMessage` objects (including system prompts, history, and the new request).
3. Access a `ChatCompletionClient` instance (often provided during agent setup or accessed via its context).
4. Call `await client.create(...)`.
5. Process the `CreateResult` (e.g., extract the summary text, check for function calls if tools were provided).
6. Potentially send the result as a new message to another agent or return it.
## Under the Hood: How the Client Talks to the LLM
What happens when you call `await client.create(...)`?
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Agent as Agent Logic
participant Client as ChatCompletionClient
participant Formatter as API Formatter
participant HTTP as HTTP Client
participant LLM_API as External LLM API
Agent->>+Client: create(messages, tools)
Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
Formatter-->>-Client: Return formatted request body
Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
HTTP->>+LLM_API: Transmit request over network
LLM_API->>LLM_API: Process request, generate completion/function call
LLM_API-->>-HTTP: Return API response (e.g., JSON)
HTTP-->>-Client: Receive HTTP response
Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
Formatter-->>-Client: Return parsed data
Client->>Client: Create standard CreateResult object
Client-->>-Agent: Return CreateResult
```
1. **Prepare:** The `ChatCompletionClient` takes the standard `LLMMessage` list and `ToolSchema` list.
2. **Format:** It translates these into the specific format required by the target LLM's API (e.g., the JSON structure expected by OpenAI's `/chat/completions` endpoint). This might involve renaming roles (like `SystemMessage` to `system`), formatting tool descriptions, etc.
3. **Request:** It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service's API endpoint, including the formatted data and authentication (like an API key).
4. **Wait & Receive:** It waits for the LLM service to process the request and send back a response over the network.
5. **Parse:** It receives the raw HTTP response (usually JSON) from the API.
6. **Standardize:** It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
7. **Return:** It packages all this information into a standard `CreateResult` object and returns it to the calling agent code.
**Code Glimpse:**
* **`ChatCompletionClient` Protocol (`models/_model_client.py`):** This is the abstract base class (or protocol) defining the *contract* that all specific clients must follow.
```python
# From: models/_model_client.py (Simplified ABC)
from abc import ABC, abstractmethod
from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
from ._types import LLMMessage, CreateResult, RequestUsage
from ..tools import Tool, ToolSchema
from .. import CancellationToken
class ChatCompletionClient(ABC):
@abstractmethod
async def create(
self, messages: Sequence[LLMMessage], *,
tools: Sequence[Tool | ToolSchema] = [],
json_output: Optional[bool] = None, # Hint for JSON mode
extra_create_args: Mapping[str, Any] = {}, # API-specific args
cancellation_token: Optional[CancellationToken] = None,
) -> CreateResult: ... # The core method
@abstractmethod
def create_stream(
self, # Similar to create, but yields results incrementally
# ... parameters ...
) -> AsyncGenerator[Union[str, CreateResult], None]: ...
@abstractmethod
def total_usage(self) -> RequestUsage: ... # Get total tracked usage
@abstractmethod
def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count
# Other methods like close(), actual_usage(), remaining_tokens(), model_info...
```
Concrete classes like `OpenAIChatCompletionClient`, `AnthropicChatCompletionClient` etc., implement these methods using the specific libraries and API calls for each service.
* **`LLMMessage` Types (`models/_types.py`):** These define the structure of messages passed *to* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from typing import List, Union, Literal
from .. import FunctionCall # From Chapter 4 context
class SystemMessage(BaseModel):
content: str
type: Literal["SystemMessage"] = "SystemMessage"
class UserMessage(BaseModel):
content: Union[str, List[Union[str, Image]]] # Can include images!
source: str
type: Literal["UserMessage"] = "UserMessage"
class AssistantMessage(BaseModel):
content: Union[str, List[FunctionCall]] # Can be text or function calls
source: str
type: Literal["AssistantMessage"] = "AssistantMessage"
# FunctionExecutionResultMessage also exists here...
```
* **`CreateResult` (`models/_types.py`):** This defines the structure of the response *from* the client.
```python
# From: models/_types.py (Simplified)
from pydantic import BaseModel
from dataclasses import dataclass
from typing import Union, List, Optional
from .. import FunctionCall
@dataclass
class RequestUsage:
prompt_tokens: int
completion_tokens: int
FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]
class CreateResult(BaseModel):
finish_reason: FinishReasons
content: Union[str, List[FunctionCall]] # LLM output
usage: RequestUsage # Token usage for this call
cached: bool
# Optional fields like logprobs, thought...
```
Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different `ChatCompletionClient` implementation.
## Next Steps
You now understand the role of `ChatCompletionClient` as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
Managing the conversation history (`messages`) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
* [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md): Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the `ChatCompletionClient`.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/06_chatcompletioncontext.md
================================================
---
layout: default
title: "ChatCompletionContext"
parent: "AutoGen Core"
nav_order: 6
---
# Chapter 6: ChatCompletionContext - Remembering the Conversation
In [Chapter 5: ChatCompletionClient](05_chatcompletionclient.md), we learned how agents talk to Large Language Models (LLMs) using a `ChatCompletionClient`. We saw that we need to send a list of `messages` (the conversation history) to the LLM so it knows the context.
But conversations can get very long! Imagine talking on the phone for an hour. Can you remember *every single word* that was said? Probably not. You remember the main points, the beginning, and what was said most recently. LLMs have a similar limitation – they can only pay attention to a certain amount of text at once (called the "context window").
If we send the *entire* history of a very long chat, it might be too much for the LLM, lead to errors, be slow, or cost more money (since many LLMs charge based on the amount of text).
So, how do we smartly choose *which* parts of the conversation history to send? This is the problem that **`ChatCompletionContext`** solves.
## Motivation: Keeping LLM Conversations Focused
Let's say we have a helpful assistant agent chatting with a user:
1. **User:** "Hi! Can you tell me about AutoGen?"
2. **Assistant:** "Sure! AutoGen is a framework..." (provides details)
3. **User:** "Thanks! Now, can you draft an email to my team about our upcoming meeting?"
4. **Assistant:** "Okay, what's the meeting about?"
5. **User:** "It's about the project planning for Q3."
6. **Assistant:** (Needs to draft the email)
When the Assistant needs to draft the email (step 6), does it need the *exact* text from step 2 about what AutoGen is? Probably not. It definitely needs the instructions from step 3 and the topic from step 5. Maybe the initial greeting isn't super important either.
`ChatCompletionContext` acts like a **smart transcript editor**. Before sending the history to the LLM via the `ChatCompletionClient`, it reviews the full conversation log and prepares a shorter, focused version containing only the messages it thinks are most relevant for the LLM's next response.
## Key Concepts: Managing the Chat History
1. **The Full Transcript Holder:** A `ChatCompletionContext` object holds the *complete* list of messages (`LLMMessage` objects like `SystemMessage`, `UserMessage`, `AssistantMessage` from Chapter 5) that have occurred in a specific conversation thread. You add new messages using its `add_message` method.
2. **The Smart View Generator (`get_messages`):** The core job of `ChatCompletionContext` is done by its `get_messages` method. When called, it looks at the *full* transcript it holds, but returns only a *subset* of those messages based on its specific strategy. This subset is what you'll actually send to the `ChatCompletionClient`.
3. **Different Strategies for Remembering:** Because different situations require different focus, AutoGen Core provides several `ChatCompletionContext` implementations (strategies):
* **`UnboundedChatCompletionContext`:** The simplest (and sometimes riskiest!). It doesn't edit anything; `get_messages` just returns the *entire* history. Good for short chats, but can break with long ones.
* **`BufferedChatCompletionContext`:** Like remembering only the last few things someone said. It keeps the most recent `N` messages (where `N` is the `buffer_size` you set). Good for focusing on recent interactions.
* **`HeadAndTailChatCompletionContext`:** Tries to get the best of both worlds. It keeps the first few messages (the "head", maybe containing initial instructions) and the last few messages (the "tail", the recent context). It skips the messages in the middle.
## Use Case Example: Chatting with Different Memory Strategies
Let's simulate adding messages to different context managers and see what `get_messages` returns.
**Step 1: Define some messages**
```python
# File: define_chat_messages.py
from autogen_core.models import (
SystemMessage, UserMessage, AssistantMessage, LLMMessage
)
from typing import List
# The initial instruction for the assistant
system_msg = SystemMessage(content="You are a helpful assistant.")
# A sequence of user/assistant turns
chat_sequence: List[LLMMessage] = [
UserMessage(content="What is AutoGen?", source="User"),
AssistantMessage(content="AutoGen is a multi-agent framework...", source="Agent"),
UserMessage(content="What can it do?", source="User"),
AssistantMessage(content="It can build complex LLM apps.", source="Agent"),
UserMessage(content="Thanks!", source="User")
]
# Combine system message and the chat sequence
full_history: List[LLMMessage] = [system_msg] + chat_sequence
print(f"Total messages in full history: {len(full_history)}")
# Output: Total messages in full history: 6
```
We have a full history of 6 messages (1 system + 5 chat turns).
**Step 2: Use `UnboundedChatCompletionContext`**
This context keeps everything.
```python
# File: use_unbounded_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import UnboundedChatCompletionContext
async def main():
# Create context and add all messages
context = UnboundedChatCompletionContext()
for msg in full_history:
await context.add_message(msg)
# Get the messages to send to the LLM
messages_for_llm = await context.get_messages()
print(f"--- Unbounded Context ({len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Unbounded):**
```
--- Unbounded Context (6 messages) ---
1. [SystemMessage]: You are a helpful assistant....
2. [UserMessage]: What is AutoGen?...
3. [AssistantMessage]: AutoGen is a multi-agent fram...
4. [UserMessage]: What can it do?...
5. [AssistantMessage]: It can build complex LLM apps...
6. [UserMessage]: Thanks!...
```
It returns all 6 messages, exactly as added.
**Step 3: Use `BufferedChatCompletionContext`**
Let's keep only the last 3 messages.
```python
# File: use_buffered_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import BufferedChatCompletionContext
async def main():
# Keep only the last 3 messages
context = BufferedChatCompletionContext(buffer_size=3)
for msg in full_history:
await context.add_message(msg)
messages_for_llm = await context.get_messages()
print(f"--- Buffered Context (buffer=3, {len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Buffered):**
```
--- Buffered Context (buffer=3, 3 messages) ---
1. [UserMessage]: What can it do?...
2. [AssistantMessage]: It can build complex LLM apps...
3. [UserMessage]: Thanks!...
```
It only returns the last 3 messages from the full history. The system message and the first chat turn are omitted.
**Step 4: Use `HeadAndTailChatCompletionContext`**
Let's keep the first message (head=1) and the last two messages (tail=2).
```python
# File: use_head_tail_context.py
import asyncio
from define_chat_messages import full_history
from autogen_core.model_context import HeadAndTailChatCompletionContext
async def main():
# Keep first 1 and last 2 messages
context = HeadAndTailChatCompletionContext(head_size=1, tail_size=2)
for msg in full_history:
await context.add_message(msg)
messages_for_llm = await context.get_messages()
print(f"--- Head & Tail Context (h=1, t=2, {len(messages_for_llm)} messages) ---")
for i, msg in enumerate(messages_for_llm):
print(f"{i+1}. [{msg.type}]: {msg.content[:30]}...")
# asyncio.run(main()) # If run
```
**Expected Output (Head & Tail):**
```
--- Head & Tail Context (h=1, t=2, 4 messages) ---
1. [SystemMessage]: You are a helpful assistant....
2. [UserMessage]: Skipped 3 messages....
3. [AssistantMessage]: It can build complex LLM apps...
4. [UserMessage]: Thanks!...
```
It keeps the very first message (`SystemMessage`), then inserts a placeholder telling the LLM that some messages were skipped, and finally includes the last two messages. This preserves the initial instruction and the most recent context.
**Which one to choose?** It depends on your agent's task!
* Simple Q&A? `Buffered` might be fine.
* Following complex initial instructions? `HeadAndTail` or even `Unbounded` (if short) might be better.
## Under the Hood: How Context is Managed
The core idea is defined by the `ChatCompletionContext` abstract base class.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant Agent as Agent Logic
participant Context as ChatCompletionContext
participant FullHistory as Internal Message List
Agent->>+Context: add_message(newMessage)
Context->>+FullHistory: Append newMessage to list
FullHistory-->>-Context: List updated
Context-->>-Agent: Done
Agent->>+Context: get_messages()
Context->>+FullHistory: Read the full list
FullHistory-->>-Context: Return full list
Context->>Context: Apply Strategy (e.g., slice list for Buffered/HeadTail)
Context-->>-Agent: Return selected list of messages
```
1. **Adding:** When `add_message(message)` is called, the context simply appends the `message` to its internal list (`self._messages`).
2. **Getting:** When `get_messages()` is called:
* The context accesses its internal `self._messages` list.
* The specific implementation (`Unbounded`, `Buffered`, `HeadAndTail`) applies its logic to select which messages to return.
* It returns the selected list.
**Code Glimpse:**
* **Base Class (`_chat_completion_context.py`):** Defines the structure and common methods.
```python
# From: model_context/_chat_completion_context.py (Simplified)
from abc import ABC, abstractmethod
from typing import List
from ..models import LLMMessage
class ChatCompletionContext(ABC):
component_type = "chat_completion_context" # Identifies this as a component type
def __init__(self, initial_messages: List[LLMMessage] | None = None) -> None:
# Holds the COMPLETE history
self._messages: List[LLMMessage] = initial_messages or []
async def add_message(self, message: LLMMessage) -> None:
"""Add a message to the full context."""
self._messages.append(message)
@abstractmethod
async def get_messages(self) -> List[LLMMessage]:
"""Get the subset of messages based on the strategy."""
# Each subclass MUST implement this logic
...
# Other methods like clear(), save_state(), load_state() exist too
```
The base class handles storing messages; subclasses define *how* to retrieve them.
* **Unbounded (`_unbounded_chat_completion_context.py`):** The simplest implementation.
```python
# From: model_context/_unbounded_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage
class UnboundedChatCompletionContext(ChatCompletionContext):
async def get_messages(self) -> List[LLMMessage]:
"""Returns all messages."""
return self._messages # Just return the whole internal list
```
* **Buffered (`_buffered_chat_completion_context.py`):** Uses slicing to get the end of the list.
```python
# From: model_context/_buffered_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage, FunctionExecutionResultMessage
class BufferedChatCompletionContext(ChatCompletionContext):
def __init__(self, buffer_size: int, ...):
super().__init__(...)
self._buffer_size = buffer_size
async def get_messages(self) -> List[LLMMessage]:
"""Get at most `buffer_size` recent messages."""
# Slice the list to get the last 'buffer_size' items
messages = self._messages[-self._buffer_size :]
# Special case: Avoid starting with a function result message
if messages and isinstance(messages[0], FunctionExecutionResultMessage):
messages = messages[1:]
return messages
```
* **Head and Tail (`_head_and_tail_chat_completion_context.py`):** Combines slices from the beginning and end.
```python
# From: model_context/_head_and_tail_chat_completion_context.py (Simplified)
from typing import List
from ._chat_completion_context import ChatCompletionContext
from ..models import LLMMessage, UserMessage
class HeadAndTailChatCompletionContext(ChatCompletionContext):
def __init__(self, head_size: int, tail_size: int, ...):
super().__init__(...)
self._head_size = head_size
self._tail_size = tail_size
async def get_messages(self) -> List[LLMMessage]:
head = self._messages[: self._head_size] # First 'head_size' items
tail = self._messages[-self._tail_size :] # Last 'tail_size' items
num_skipped = len(self._messages) - len(head) - len(tail)
if num_skipped <= 0: # If no overlap or gap
return self._messages
else: # If messages were skipped
placeholder = [UserMessage(content=f"Skipped {num_skipped} messages.", source="System")]
# Combine head + placeholder + tail
return head + placeholder + tail
```
These implementations provide different ways to manage the context window effectively.
## Putting it Together with ChatCompletionClient
How does an agent use `ChatCompletionContext` with the `ChatCompletionClient` from Chapter 5?
1. An agent has an instance of a `ChatCompletionContext` (e.g., `BufferedChatCompletionContext`) to store its conversation history.
2. When the agent receives a new message (e.g., a `UserMessage`), it calls `await context.add_message(new_user_message)`.
3. To prepare for calling the LLM, the agent calls `messages_to_send = await context.get_messages()`. This gets the strategically selected subset of the history.
4. The agent then passes this list to the `ChatCompletionClient`: `response = await llm_client.create(messages=messages_to_send, ...)`.
5. When the LLM replies (e.g., with an `AssistantMessage`), the agent adds it back to the context: `await context.add_message(llm_response_message)`.
This loop ensures that the history is continuously updated and intelligently trimmed before each call to the LLM.
## Next Steps
You've learned how `ChatCompletionContext` helps manage the conversation history sent to LLMs, preventing context window overflows and keeping the interaction focused using different strategies (`Unbounded`, `Buffered`, `HeadAndTail`).
This context management is a specific form of **memory**. Agents might need to remember things beyond just the chat history. How do they store general information, state, or knowledge over time?
* [Chapter 7: Memory](07_memory.md): Explore the broader concept of Memory in AutoGen Core, which provides more general ways for agents to store and retrieve information.
* [Chapter 8: Component](08_component.md): Understand how `ChatCompletionContext` fits into the general `Component` model, allowing configuration and integration within the AutoGen system.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/07_memory.md
================================================
---
layout: default
title: "Memory"
parent: "AutoGen Core"
nav_order: 7
---
# Chapter 7: Memory - The Agent's Notebook
In [Chapter 6: ChatCompletionContext](06_chatcompletioncontext.md), we saw how agents manage the *short-term* history of a single conversation before talking to an LLM. It's like remembering what was just said in the last few minutes.
But what if an agent needs to remember things for much longer, across *multiple* conversations or tasks? For example, imagine an assistant agent that learns your preferences:
* You tell it: "Please always write emails in a formal style for me."
* Weeks later, you ask it to draft a new email.
How does it remember that preference? The short-term `ChatCompletionContext` might have forgotten the earlier instruction, especially if using a strategy like `BufferedChatCompletionContext`. The agent needs a **long-term memory**.
This is where the **`Memory`** abstraction comes in. Think of it as the agent's **long-term notebook or database**. While `ChatCompletionContext` is the scratchpad for the current chat, `Memory` holds persistent information the agent can add to or look up later.
## Motivation: Remembering Across Conversations
Our goal is to give an agent the ability to store a piece of information (like a user preference) and retrieve it later to influence its behavior, even in a completely new conversation. `Memory` provides the mechanism for this long-term storage and retrieval.
## Key Concepts: How the Notebook Works
1. **What it Stores (`MemoryContent`):** Agents can store various types of information in their memory. This could be:
* Plain text notes (`text/plain`)
* Structured data like JSON (`application/json`)
* Even images (`image/*`)
Each piece of information is wrapped in a `MemoryContent` object, which includes the data itself, its type (`mime_type`), and optional descriptive `metadata`.
```python
# From: memory/_base_memory.py (Simplified Concept)
from pydantic import BaseModel
from typing import Any, Dict, Union
# Represents one entry in the memory notebook
class MemoryContent(BaseModel):
content: Union[str, bytes, Dict[str, Any]] # The actual data
mime_type: str # What kind of data (e.g., "text/plain")
metadata: Dict[str, Any] | None = None # Extra info (optional)
```
This standard format helps manage different kinds of memories.
2. **Adding to Memory (`add`):** When an agent learns something important it wants to remember long-term (like the user's preferred style), it uses the `memory.add(content)` method. This is like writing a new entry in the notebook.
3. **Querying Memory (`query`):** When an agent needs to recall information, it can use `memory.query(query_text)`. This is like searching the notebook for relevant entries. How the search works depends on the specific memory implementation (it could be a simple text match, or a sophisticated vector search in more advanced memories).
4. **Updating Chat Context (`update_context`):** This is a crucial link! Before an agent talks to the LLM (using the `ChatCompletionClient` from [Chapter 5](05_chatcompletionclient.md)), it can use `memory.update_context(chat_context)` method. This method:
* Looks at the current conversation (`chat_context`).
* Queries the long-term memory (`Memory`) for relevant information.
* Injects the retrieved memories *into* the `chat_context`, often as a `SystemMessage`.
This way, the LLM gets the benefit of the long-term memory *in addition* to the short-term conversation history, right before generating its response.
5. **Different Memory Implementations:** Just like there are different `ChatCompletionContext` strategies, there can be different `Memory` implementations:
* `ListMemory`: A very simple memory that stores everything in a Python list (like a simple chronological notebook).
* *Future Possibilities*: More advanced implementations could use databases or vector stores for more efficient storage and retrieval of vast amounts of information.
## Use Case Example: Remembering User Preferences with `ListMemory`
Let's implement our user preference use case using the simple `ListMemory`.
**Goal:**
1. Create a `ListMemory`.
2. Add a user preference ("formal style") to it.
3. Start a *new* chat context.
4. Use `update_context` to inject the preference into the new chat context.
5. Show how the chat context looks *before* being sent to the LLM.
**Step 1: Create the Memory**
We'll use `ListMemory`, the simplest implementation provided by AutoGen Core.
```python
# File: create_list_memory.py
from autogen_core.memory import ListMemory
# Create a simple list-based memory instance
user_prefs_memory = ListMemory(name="user_preferences")
print(f"Created memory: {user_prefs_memory.name}")
print(f"Initial content: {user_prefs_memory.content}")
# Output:
# Created memory: user_preferences
# Initial content: []
```
We have an empty memory notebook named "user_preferences".
**Step 2: Add the Preference**
Let's add the user's preference as a piece of text memory.
```python
# File: add_preference.py
import asyncio
from autogen_core.memory import MemoryContent
# Assume user_prefs_memory exists from the previous step
# Define the preference as MemoryContent
preference = MemoryContent(
content="User prefers all communication to be written in a formal style.",
mime_type="text/plain", # It's just text
metadata={"source": "user_instruction_conversation_1"} # Optional info
)
async def add_to_memory():
# Add the content to our memory instance
await user_prefs_memory.add(preference)
print(f"Memory content after adding: {user_prefs_memory.content}")
asyncio.run(add_to_memory())
# Output (will show the MemoryContent object):
# Memory content after adding: [MemoryContent(content='User prefers...', mime_type='text/plain', metadata={'source': '...'})]
```
We've successfully written the preference into our `ListMemory` notebook.
**Step 3: Start a New Chat Context**
Imagine time passes, and the user starts a new conversation asking for an email draft. We create a fresh `ChatCompletionContext`.
```python
# File: start_new_chat.py
from autogen_core.model_context import UnboundedChatCompletionContext
from autogen_core.models import UserMessage
# Start a new, empty chat context for a new task
new_chat_context = UnboundedChatCompletionContext()
# Add the user's new request
new_request = UserMessage(content="Draft an email to the team about the Q3 results.", source="User")
# await new_chat_context.add_message(new_request) # In a real app, add the request
print("Created a new, empty chat context.")
# Output: Created a new, empty chat context.
```
This context currently *doesn't* know about the "formal style" preference stored in our long-term memory.
**Step 4: Inject Memory into Chat Context**
Before sending the `new_chat_context` to the LLM, we use `update_context` to bring in relevant long-term memories.
```python
# File: update_chat_with_memory.py
import asyncio
# Assume user_prefs_memory exists (with the preference added)
# Assume new_chat_context exists (empty or with just the new request)
# Assume new_request exists
async def main():
# --- This is where Memory connects to Chat Context ---
print("Updating chat context with memory...")
update_result = await user_prefs_memory.update_context(new_chat_context)
print(f"Memories injected: {len(update_result.memories.results)}")
# Now let's add the actual user request for this task
await new_chat_context.add_message(new_request)
# See what messages are now in the context
messages_for_llm = await new_chat_context.get_messages()
print("\nMessages to be sent to LLM:")
for msg in messages_for_llm:
print(f"- [{msg.type}]: {msg.content}")
asyncio.run(main())
```
**Expected Output:**
```
Updating chat context with memory...
Memories injected: 1
Messages to be sent to LLM:
- [SystemMessage]:
Relevant memory content (in chronological order):
1. User prefers all communication to be written in a formal style.
- [UserMessage]: Draft an email to the team about the Q3 results.
```
Look! The `ListMemory.update_context` method automatically queried the memory (in this simple case, it just takes *all* entries) and added a `SystemMessage` to the `new_chat_context`. This message explicitly tells the LLM about the stored preference *before* it sees the user's request to draft the email.
**Step 5: (Conceptual) Sending to LLM**
Now, if we were to send `messages_for_llm` to the `ChatCompletionClient` (Chapter 5):
```python
# Conceptual code - Requires a configured client
# response = await llm_client.create(messages=messages_for_llm)
```
The LLM would receive both the instruction about the formal style preference (from Memory) and the request to draft the email. It's much more likely to follow the preference now!
**Step 6: Direct Query (Optional)**
We can also directly query the memory if needed, without involving a chat context.
```python
# File: query_memory.py
import asyncio
# Assume user_prefs_memory exists
async def main():
# Query the memory (ListMemory returns all items regardless of query text)
query_result = await user_prefs_memory.query("style preference")
print("\nDirect query result:")
for item in query_result.results:
print(f"- Content: {item.content}, Type: {item.mime_type}")
asyncio.run(main())
# Output:
# Direct query result:
# - Content: User prefers all communication to be written in a formal style., Type: text/plain
```
This shows how an agent could specifically look things up in its notebook.
## Under the Hood: How `ListMemory` Injects Context
Let's trace the `update_context` call for `ListMemory`.
**Conceptual Flow:**
```mermaid
sequenceDiagram
participant AgentLogic as Agent Logic
participant ListMem as ListMemory
participant InternalList as Memory's Internal List
participant ChatCtx as ChatCompletionContext
AgentLogic->>+ListMem: update_context(chat_context)
ListMem->>+InternalList: Get all stored MemoryContent items
InternalList-->>-ListMem: Return list of [pref_content]
alt Memory list is NOT empty
ListMem->>ListMem: Format memories into a single string (e.g., "1. pref_content")
ListMem->>ListMem: Create SystemMessage with formatted string
ListMem->>+ChatCtx: add_message(SystemMessage)
ChatCtx-->>-ListMem: Context updated
end
ListMem->>ListMem: Create UpdateContextResult(memories=[pref_content])
ListMem-->>-AgentLogic: Return UpdateContextResult
```
1. The agent calls `user_prefs_memory.update_context(new_chat_context)`.
2. The `ListMemory` instance accesses its internal `_contents` list.
3. It checks if the list is empty. If not:
4. It iterates through the `MemoryContent` items in the list.
5. It formats them into a numbered string (like "Relevant memory content...\n1. Item 1\n2. Item 2...").
6. It creates a single `SystemMessage` containing this formatted string.
7. It calls `new_chat_context.add_message()` to add this `SystemMessage` to the chat history that will be sent to the LLM.
8. It returns an `UpdateContextResult` containing the list of memories it just processed.
**Code Glimpse:**
* **`Memory` Protocol (`memory/_base_memory.py`):** Defines the required methods for any memory implementation.
```python
# From: memory/_base_memory.py (Simplified ABC)
from abc import ABC, abstractmethod
# ... other imports: MemoryContent, MemoryQueryResult, UpdateContextResult, ChatCompletionContext
class Memory(ABC):
component_type = "memory"
@abstractmethod
async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult: ...
@abstractmethod
async def query(self, query: str | MemoryContent, ...) -> MemoryQueryResult: ...
@abstractmethod
async def add(self, content: MemoryContent, ...) -> None: ...
@abstractmethod
async def clear(self) -> None: ...
@abstractmethod
async def close(self) -> None: ...
```
Any class wanting to act as Memory must provide these methods.
* **`ListMemory` Implementation (`memory/_list_memory.py`):**
```python
# From: memory/_list_memory.py (Simplified)
from typing import List
# ... other imports: Memory, MemoryContent, ..., SystemMessage, ChatCompletionContext
class ListMemory(Memory):
def __init__(self, ..., memory_contents: List[MemoryContent] | None = None):
# Stores memory items in a simple list
self._contents: List[MemoryContent] = memory_contents or []
async def add(self, content: MemoryContent, ...) -> None:
"""Add new content to the internal list."""
self._contents.append(content)
async def query(self, query: str | MemoryContent = "", ...) -> MemoryQueryResult:
"""Return all memories, ignoring the query."""
# Simple implementation: just return everything
return MemoryQueryResult(results=self._contents)
async def update_context(self, model_context: ChatCompletionContext) -> UpdateContextResult:
"""Add all memories as a SystemMessage to the chat context."""
if not self._contents: # Do nothing if memory is empty
return UpdateContextResult(memories=MemoryQueryResult(results=[]))
# Format all memories into a numbered list string
memory_strings = [f"{i}. {str(mem.content)}" for i, mem in enumerate(self._contents, 1)]
memory_context_str = "Relevant memory content...\n" + "\n".join(memory_strings) + "\n"
# Add this string as a SystemMessage to the provided chat context
await model_context.add_message(SystemMessage(content=memory_context_str))
# Return info about which memories were added
return UpdateContextResult(memories=MemoryQueryResult(results=self._contents))
# ... clear(), close(), config methods ...
```
This shows the straightforward logic of `ListMemory`: store in a list, retrieve the whole list, and inject the whole list as a single system message into the chat context. More complex memories might use smarter retrieval (e.g., based on the `query` in `query()` or the last message in `update_context`) and inject memories differently.
## Next Steps
You've learned about `Memory`, AutoGen Core's mechanism for giving agents long-term recall beyond the immediate conversation (`ChatCompletionContext`). We saw how `MemoryContent` holds information, `add` stores it, `query` retrieves it, and `update_context` injects relevant memories into the LLM's working context. We explored the simple `ListMemory` as a basic example.
Memory systems are crucial for agents that learn, adapt, or need to maintain state across interactions.
This concludes our deep dive into the core abstractions of AutoGen Core! We've covered Agents, Messaging, Runtime, Tools, LLM Clients, Chat Context, and now Memory. There's one final concept that ties many of these together from a configuration perspective:
* [Chapter 8: Component](08_component.md): Understand the general `Component` model in AutoGen Core, how it allows pieces like `Memory`, `ChatCompletionContext`, and `ChatCompletionClient` to be configured and managed consistently.
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/08_component.md
================================================
---
layout: default
title: "Component"
parent: "AutoGen Core"
nav_order: 8
---
# Chapter 8: Component - The Standardized Building Blocks
Welcome to Chapter 8! In our journey so far, we've met several key players in AutoGen Core:
* [Agents](01_agent.md): The workers.
* [Messaging System](02_messaging_system__topic___subscription_.md): How they communicate.
* [AgentRuntime](03_agentruntime.md): The manager.
* [Tools](04_tool.md): Their special skills.
* [ChatCompletionClient](05_chatcompletionclient.md): How they talk to LLMs.
* [ChatCompletionContext](06_chatcompletioncontext.md): How they remember recent chat history.
* [Memory](07_memory.md): How they remember things long-term.
Now, imagine you've built a fantastic agent system using these parts. You've configured a specific `ChatCompletionClient` to use OpenAI's `gpt-4o` model, and you've set up a `ListMemory` (from Chapter 7) to store user preferences. How do you save this exact setup so you can easily recreate it later, or share it with a friend? And what if you later want to swap out the `gpt-4o` client for a different one, like Anthropic's Claude, without rewriting your agent's core logic?
This is where the **`Component`** concept comes in. It provides a standard way to define, configure, save, and load these reusable building blocks.
## Motivation: Making Setups Portable and Swappable
Think of the parts we've used so far – `ChatCompletionClient`, `Memory`, `Tool` – like specialized **Lego bricks**. Each brick has a specific function (connecting to an LLM, remembering things, performing an action).
Wouldn't it be great if:
1. Each Lego brick had a standard way to describe its properties (like "Red 2x4 Brick")?
2. You could easily save the description of all the bricks used in your creation (your agent system)?
3. Someone else could take that description and automatically rebuild your exact creation?
4. You could easily swap a "Red 2x4 Brick" for a "Blue 2x4 Brick" without having to rebuild everything around it?
The `Component` abstraction in AutoGen Core provides exactly this! It makes your building blocks **configurable**, **savable**, **loadable**, and **swappable**.
## Key Concepts: Understanding Components
Let's break down what makes the Component system work:
1. **Component:** A class (like `ListMemory` or `OpenAIChatCompletionClient`) that is designed to be a standard, reusable building block. It performs a specific role within the AutoGen ecosystem. Many core classes inherit from `Component` or related base classes.
2. **Configuration (`Config`):** Every Component has specific settings. For example, an `OpenAIChatCompletionClient` needs an API key and a model name. A `ListMemory` might have a name. These settings are defined in a standard way, usually using a Pydantic `BaseModel` specific to that component type. This `Config` acts like the "specification sheet" for the component instance.
3. **Saving Settings (`_to_config` method):** A Component instance knows how to generate its *current* configuration. It has an internal method, `_to_config()`, that returns a `Config` object representing its settings. This is like asking a configured Lego brick, "What color and size are you?"
4. **Loading Settings (`_from_config` class method):** A Component *class* knows how to create a *new* instance of itself from a given configuration. It has a class method, `_from_config(config)`, that takes a `Config` object and builds a new, configured component instance. This is like having instructions: "Build a brick with this color and size."
5. **`ComponentModel` (The Box):** This is the standard package format used to save and load components. It's like the label and instructions on the Lego box. A `ComponentModel` contains:
* `provider`: A string telling AutoGen *which* Python class to use (e.g., `"autogen_core.memory.ListMemory"`).
* `config`: A dictionary holding the specific settings for this instance (the output of `_to_config()`).
* `component_type`: The general role of the component (e.g., `"memory"`, `"model"`, `"tool"`).
* Other metadata like `version`, `description`, `label`.
```python
# From: _component_config.py (Conceptual Structure)
from pydantic import BaseModel
from typing import Dict, Any
class ComponentModel(BaseModel):
provider: str # Path to the class (e.g., "autogen_core.memory.ListMemory")
config: Dict[str, Any] # The specific settings for this instance
component_type: str | None = None # Role (e.g., "memory")
# ... other fields like version, description, label ...
```
This `ComponentModel` is what you typically save to a file (often as JSON or YAML).
## Use Case Example: Saving and Loading `ListMemory`
Let's see how this works with the `ListMemory` we used in [Chapter 7: Memory](07_memory.md).
**Goal:**
1. Create a `ListMemory` instance.
2. Save its configuration using the Component system (`dump_component`).
3. Load that configuration to create a *new*, identical `ListMemory` instance (`load_component`).
**Step 1: Create and Configure a `ListMemory`**
First, let's make a memory component. `ListMemory` is already designed as a Component.
```python
# File: create_memory_component.py
import asyncio
from autogen_core.memory import ListMemory, MemoryContent
# Create an instance of ListMemory
my_memory = ListMemory(name="user_prefs_v1")
# Add some content (from Chapter 7 example)
async def add_content():
pref = MemoryContent(content="Use formal style", mime_type="text/plain")
await my_memory.add(pref)
print(f"Created memory '{my_memory.name}' with content: {my_memory.content}")
asyncio.run(add_content())
# Output: Created memory 'user_prefs_v1' with content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)]
```
We have our configured `my_memory` instance.
**Step 2: Save the Configuration (`dump_component`)**
Now, let's ask this component instance to describe itself by creating a `ComponentModel`.
```python
# File: save_memory_config.py
# Assume 'my_memory' exists from the previous step
# Dump the component's configuration into a ComponentModel
memory_model = my_memory.dump_component()
# Let's print it (converting to dict for readability)
print("Saved ComponentModel:")
print(memory_model.model_dump_json(indent=2))
```
**Expected Output:**
```json
Saved ComponentModel:
{
"provider": "autogen_core.memory.ListMemory",
"component_type": "memory",
"version": 1,
"component_version": 1,
"description": "ListMemory stores memory content in a simple list.",
"label": "ListMemory",
"config": {
"name": "user_prefs_v1",
"memory_contents": [
{
"content": "Use formal style",
"mime_type": "text/plain",
"metadata": null
}
]
}
}
```
Look at the output! `dump_component` created a `ComponentModel` that contains:
* `provider`: Exactly which class to use (`autogen_core.memory.ListMemory`).
* `config`: The specific settings, including the `name` and even the `memory_contents` we added!
* `component_type`: Its role is `"memory"`.
* Other useful info like description and version.
You could save this JSON structure to a file (`my_memory_config.json`).
**Step 3: Load the Configuration (`load_component`)**
Now, imagine you're starting a new script or sharing the config file. You can load this `ComponentModel` to recreate the memory instance.
```python
# File: load_memory_config.py
from autogen_core import ComponentModel
from autogen_core.memory import ListMemory # Need the class for type hint/loading
# Assume 'memory_model' is the ComponentModel we just created
# (or loaded from a file)
print(f"Loading component from ComponentModel (Provider: {memory_model.provider})...")
# Use the ComponentLoader mechanism (available on Component classes)
# to load the model. We specify the expected type (ListMemory).
loaded_memory: ListMemory = ListMemory.load_component(memory_model)
print(f"Successfully loaded memory!")
print(f"- Name: {loaded_memory.name}")
print(f"- Content: {loaded_memory.content}")
```
**Expected Output:**
```
Loading component from ComponentModel (Provider: autogen_core.memory.ListMemory)...
Successfully loaded memory!
- Name: user_prefs_v1
- Content: [MemoryContent(content='Use formal style', mime_type='text/plain', metadata=None)]
```
Success! `load_component` read the `ComponentModel`, found the right class (`ListMemory`), used its `_from_config` method with the saved `config` data, and created a brand new `loaded_memory` instance that is identical to our original `my_memory`.
**Benefits Shown:**
* **Reproducibility:** We saved the exact state (including content!) and loaded it perfectly.
* **Configuration:** We could easily save this to a JSON/YAML file and manage it outside our Python code.
* **Modularity (Conceptual):** If `ListMemory` and `VectorDBMemory` were both Components of type "memory", we could potentially load either one from a configuration file just by changing the `provider` and `config` in the file, without altering the agent code that *uses* the memory component (assuming the agent interacts via the standard `Memory` interface from Chapter 7).
## Under the Hood: How Saving and Loading Work
Let's peek behind the curtain.
**Saving (`dump_component`) Flow:**
```mermaid
sequenceDiagram
participant User
participant MyMemory as my_memory (ListMemory instance)
participant ListMemConfig as ListMemoryConfig (Pydantic Model)
participant CompModel as ComponentModel
User->>+MyMemory: dump_component()
MyMemory->>MyMemory: Calls internal self._to_config()
MyMemory->>+ListMemConfig: Creates Config object (name="...", contents=[...])
ListMemConfig-->>-MyMemory: Returns Config object
MyMemory->>MyMemory: Gets provider string ("autogen_core.memory.ListMemory")
MyMemory->>MyMemory: Gets component_type ("memory"), version, etc.
MyMemory->>+CompModel: Creates ComponentModel(provider=..., config=config_dict, ...)
CompModel-->>-MyMemory: Returns ComponentModel instance
MyMemory-->>-User: Returns ComponentModel instance
```
1. You call `my_memory.dump_component()`.
2. It calls its own `_to_config()` method. For `ListMemory`, this gathers the `name` and current `_contents`.
3. `_to_config()` returns a `ListMemoryConfig` object (a Pydantic model) holding these values.
4. `dump_component()` takes this `ListMemoryConfig` object, converts its data into a dictionary (`config` field).
5. It figures out its own class path (`provider`) and other metadata (`component_type`, `version`, etc.).
6. It packages all this into a `ComponentModel` object and returns it.
**Loading (`load_component`) Flow:**
```mermaid
sequenceDiagram
participant User
participant Loader as ComponentLoader (e.g., ListMemory.load_component)
participant Importer as Python Import System
participant ListMemClass as ListMemory (Class definition)
participant ListMemConfig as ListMemoryConfig (Pydantic Model)
participant NewMemory as New ListMemory Instance
User->>+Loader: load_component(component_model)
Loader->>Loader: Reads provider ("autogen_core.memory.ListMemory") from model
Loader->>+Importer: Imports the class `autogen_core.memory.ListMemory`
Importer-->>-Loader: Returns ListMemory class object
Loader->>+ListMemClass: Checks if it's a valid Component class
Loader->>ListMemClass: Gets expected config schema (ListMemoryConfig)
Loader->>+ListMemConfig: Validates `config` dict from model against schema
ListMemConfig-->>-Loader: Returns validated ListMemoryConfig object
Loader->>+ListMemClass: Calls _from_config(validated_config)
ListMemClass->>+NewMemory: Creates new ListMemory instance using config
NewMemory-->>-ListMemClass: Returns new instance
ListMemClass-->>-Loader: Returns new instance
Loader-->>-User: Returns the new ListMemory instance
```
1. You call `ListMemory.load_component(memory_model)`.
2. The loader reads the `provider` string from `memory_model`.
3. It dynamically imports the class specified by `provider`.
4. It verifies this class is a proper `Component` subclass.
5. It finds the configuration schema defined by the class (e.g., `ListMemoryConfig`).
6. It validates the `config` dictionary from `memory_model` using this schema.
7. It calls the class's `_from_config()` method, passing the validated configuration object.
8. `_from_config()` uses the configuration data to initialize and return a new instance of the class (e.g., a new `ListMemory` with the loaded name and content).
9. The loader returns this newly created instance.
**Code Glimpse:**
The core logic lives in `_component_config.py`.
* **`Component` Base Class:** Classes like `ListMemory` inherit from `Component`. This requires them to define `component_type`, `component_config_schema`, and implement `_to_config()` and `_from_config()`.
```python
# From: _component_config.py (Simplified Concept)
from pydantic import BaseModel
from typing import Type, TypeVar, Generic, ClassVar
# ... other imports
ConfigT = TypeVar("ConfigT", bound=BaseModel)
class Component(Generic[ConfigT]): # Generic over its config type
# Required Class Variables for Concrete Components
component_type: ClassVar[str]
component_config_schema: Type[ConfigT]
# Required Instance Method for Saving
def _to_config(self) -> ConfigT:
raise NotImplementedError
# Required Class Method for Loading
@classmethod
def _from_config(cls, config: ConfigT) -> Self:
raise NotImplementedError
# dump_component and load_component are also part of the system
# (often inherited from base classes like ComponentBase)
def dump_component(self) -> ComponentModel: ...
@classmethod
def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self: ...
```
* **`ComponentModel`:** As shown before, a Pydantic model to hold the `provider`, `config`, `type`, etc.
* **`dump_component` Implementation (Conceptual):**
```python
# Inside ComponentBase or similar
def dump_component(self) -> ComponentModel:
# 1. Get the specific config from the instance
obj_config: BaseModel = self._to_config()
config_dict = obj_config.model_dump() # Convert to dictionary
# 2. Determine the provider string (class path)
provider_str = _type_to_provider_str(self.__class__)
# (Handle overrides like self.component_provider_override)
# 3. Get other metadata
comp_type = self.component_type
comp_version = self.component_version
# ... description, label ...
# 4. Create and return the ComponentModel
model = ComponentModel(
provider=provider_str,
config=config_dict,
component_type=comp_type,
version=comp_version,
# ... other metadata ...
)
return model
```
* **`load_component` Implementation (Conceptual):**
```python
# Inside ComponentLoader or similar
@classmethod
def load_component(cls, model: ComponentModel | Dict[str, Any]) -> Self:
# 1. Ensure we have a ComponentModel object
if isinstance(model, dict):
loaded_model = ComponentModel(**model)
else:
loaded_model = model
# 2. Import the class based on the provider string
provider_str = loaded_model.provider
# ... (handle WELL_KNOWN_PROVIDERS mapping) ...
module_path, class_name = provider_str.rsplit(".", 1)
module = importlib.import_module(module_path)
component_class = getattr(module, class_name)
# 3. Validate the class and config
if not is_component_class(component_class): # Check it's a valid Component
raise TypeError(...)
schema = component_class.component_config_schema
validated_config = schema.model_validate(loaded_model.config)
# 4. Call the class's factory method to create instance
instance = component_class._from_config(validated_config)
# 5. Return the instance (after type checks)
return instance
```
This system provides a powerful and consistent way to manage the building blocks of your AutoGen applications.
## Wrapping Up
Congratulations! You've reached the end of our core concepts tour. You now understand the `Component` model – AutoGen Core's standard way to define configurable, savable, and loadable building blocks like `Memory`, `ChatCompletionClient`, `Tool`, and even aspects of `Agents` themselves.
* **Components** are like standardized Lego bricks.
* They use **`_to_config`** to describe their settings.
* They use **`_from_config`** to be built from settings.
* **`ComponentModel`** is the standard "box" storing the provider and config, enabling saving/loading (often via JSON/YAML).
This promotes:
* **Modularity:** Easily swap implementations (e.g., different LLM clients).
* **Reproducibility:** Save and load exact agent system configurations.
* **Configuration:** Manage settings in external files.
With these eight core concepts (`Agent`, `Messaging`, `AgentRuntime`, `Tool`, `ChatCompletionClient`, `ChatCompletionContext`, `Memory`, and `Component`), you have a solid foundation for understanding and building powerful multi-agent applications with AutoGen Core!
Happy building!
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/AutoGen Core/index.md
================================================
---
layout: default
title: "AutoGen Core"
nav_order: 3
has_children: true
---
# Tutorial: AutoGen Core
> This tutorial is AI-generated! To learn more, check out [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
AutoGen Core[View Repo](https://github.com/microsoft/autogen/tree/e45a15766746d95f8cfaaa705b0371267bec812e/python/packages/autogen-core/src/autogen_core) helps you build applications with multiple **_Agents_** that can work together.
Think of it like creating a team of specialized workers (*Agents*) who can communicate and use tools to solve problems.
The **_AgentRuntime_** acts as the manager, handling messages and agent lifecycles.
Agents communicate using a **_Messaging System_** (Topics and Subscriptions), can use **_Tools_** for specific tasks, interact with language models via a **_ChatCompletionClient_** while managing conversation history with **_ChatCompletionContext_**, and remember information using **_Memory_**.
**_Components_** provide a standard way to define and configure these building blocks.
```mermaid
flowchart TD
A0["0: Agent"]
A1["1: AgentRuntime"]
A2["2: Messaging System (Topic & Subscription)"]
A3["3: Component"]
A4["4: Tool"]
A5["5: ChatCompletionClient"]
A6["6: ChatCompletionContext"]
A7["7: Memory"]
A1 -- "Manages lifecycle" --> A0
A1 -- "Uses for message routing" --> A2
A0 -- "Uses LLM client" --> A5
A0 -- "Executes tools" --> A4
A0 -- "Accesses memory" --> A7
A5 -- "Gets history from" --> A6
A5 -- "Uses tool schema" --> A4
A7 -- "Updates LLM context" --> A6
A4 -- "Implemented as" --> A3
```
================================================
FILE: docs/Browser Use/01_agent.md
================================================
---
layout: default
title: "Agent"
parent: "Browser Use"
nav_order: 1
---
# Chapter 1: The Agent - Your Browser Assistant's Brain
Welcome to the `Browser Use` tutorial! We're excited to help you learn how to automate web tasks using the power of Large Language Models (LLMs).
Imagine you want to perform a simple task, like searching Google for "cute cat pictures" and clicking on the very first image result. For a human, this is easy! You open your browser, type in the search, look at the results, and click.
But how do you tell a computer program to do this? It needs to understand the goal, look at the webpage like a human does, decide what to click or type next, and then actually perform those actions. This is where the **Agent** comes in.
## What Problem Does the Agent Solve?
The Agent is the core orchestrator, the "brain" or "project manager" of your browser automation task. It connects all the different pieces needed to achieve your goal. Without the Agent, you'd have a bunch of tools (like a browser controller and an LLM) but no central coordinator telling them what to do and when.
The Agent solves the problem of turning a high-level goal (like "find cat pictures") into concrete actions on a webpage, using intelligence to adapt to what it "sees" in the browser.
## Meet the Agent: Your Project Manager
Think of the `Agent` like a project manager overseeing a complex task. It doesn't do *all* the work itself, but it coordinates specialists:
1. **Receives the Task:** You give the Agent the overall goal (e.g., "Search Google for 'cute cat pictures' and click the first image result.").
2. **Consults the Planner (LLM):** The Agent shows the current state of the webpage (using the [BrowserContext](03_browsercontext.md)) to a Large Language Model (LLM). It asks, "Here's the goal, and here's what the webpage looks like right now. What should be the very next step?" The LLM acts as a smart planner, suggesting actions like "type 'cute cat pictures' into the search bar" or "click the element with index 5". We'll learn more about how we instruct the LLM in the [System Prompt](02_system_prompt.md) chapter.
3. **Manages History:** The Agent keeps track of everything that has happened so far – the actions taken, the results, and the state of the browser at each step. This "memory" is managed by the [Message Manager](06_message_manager.md) and helps the LLM make better decisions.
4. **Instructs the Doer (Controller):** Once the LLM suggests an action (like "click element 5"), the Agent tells the [Action Controller & Registry](05_action_controller___registry.md) to actually perform that specific action within the browser.
5. **Observes the Results (BrowserContext):** After the Controller acts, the Agent uses the [BrowserContext](03_browsercontext.md) again to see the new state of the webpage (e.g., the Google search results page).
6. **Repeats:** The Agent repeats steps 2-5, continuously consulting the LLM, instructing the Controller, and observing the results, until the original task is complete or it reaches a stopping point.
## Using the Agent: A Simple Example
Let's see how you might use the Agent in Python code. Don't worry about understanding every detail yet; focus on the main idea. We're setting up the Agent with our task and the necessary components.
```python
# --- Simplified Example ---
# We need to import the necessary parts from the browser_use library
from browser_use import Agent, Browser, Controller, BrowserConfig, BrowserContextConfig
# Assume 'my_llm' is your configured Large Language Model (e.g., from OpenAI, Anthropic)
from my_llm_setup import my_llm # Placeholder for your specific LLM setup
# 1. Define the task for the Agent
my_task = "Go to google.com, search for 'cute cat pictures', and click the first image result."
# 2. Basic browser configuration (we'll learn more later)
browser_config = BrowserConfig() # Default settings
context_config = BrowserContextConfig() # Default settings
# 3. Initialize the components the Agent needs
# The Browser manages the underlying browser application
browser = Browser(config=browser_config)
# The Controller knows *how* to perform actions like 'click' or 'type'
controller = Controller()
async def main():
# The BrowserContext represents a single browser tab/window environment
# It uses the Browser and its configuration
async with BrowserContext(browser=browser, config=context_config) as browser_context:
# 4. Create the Agent instance!
agent = Agent(
task=my_task,
llm=my_llm, # The "brain" - the Language Model
browser_context=browser_context, # The "eyes" - interacts with the browser tab
controller=controller # The "hands" - executes actions
# Many other settings can be configured here!
)
print(f"Agent created. Starting task: {my_task}")
# 5. Run the Agent! This starts the loop.
# It will keep taking steps until the task is done or it hits the limit.
history = await agent.run(max_steps=15) # Limit steps for safety
# 6. Check the result
if history.is_done() and history.is_successful():
print("✅ Agent finished the task successfully!")
print(f"Final message from agent: {history.final_result()}")
else:
print("⚠️ Agent stopped. Maybe max_steps reached or task wasn't completed successfully.")
# The 'async with' block automatically cleans up the browser_context
await browser.close() # Close the browser application
# Run the asynchronous function
import asyncio
asyncio.run(main())
```
**What happens when you run this?**
1. An `Agent` object is created with your task, the LLM, the browser context, and the controller.
2. Calling `agent.run(max_steps=15)` starts the main loop.
3. The Agent gets the initial state of the browser (likely a blank page).
4. It asks the LLM what to do. The LLM might say "Go to google.com".
5. The Agent tells the Controller to execute the "go to URL" action.
6. The browser navigates to Google.
7. The Agent gets the new state (Google's homepage).
8. It asks the LLM again. The LLM says "Type 'cute cat pictures' into the search bar".
9. The Agent tells the Controller to type the text.
10. This continues step-by-step: pressing Enter, seeing results, asking the LLM, clicking the image.
11. Eventually, the LLM will hopefully tell the Agent the task is "done".
12. `agent.run()` finishes and returns the `history` object containing details of what happened.
## How it Works Under the Hood: The Agent Loop
Let's visualize the process with a simple diagram:
```mermaid
sequenceDiagram
participant User
participant Agent
participant LLM
participant Controller
participant BC as BrowserContext
User->>Agent: Start task("Search Google for cats...")
Note over Agent: Agent Loop Starts
Agent->>BC: Get current state (e.g., blank page)
BC-->>Agent: Current Page State
Agent->>LLM: What's next? (Task + State + History)
LLM-->>Agent: Plan: [Action: Type 'cute cat pictures', Action: Press Enter]
Agent->>Controller: Execute: type_text(...)
Controller->>BC: Perform type action
Agent->>Controller: Execute: press_keys('Enter')
Controller->>BC: Perform press action
Agent->>BC: Get new state (search results page)
BC-->>Agent: New Page State
Agent->>LLM: What's next? (Task + New State + History)
LLM-->>Agent: Plan: [Action: click_element(index=5)]
Agent->>Controller: Execute: click_element(index=5)
Controller->>BC: Perform click action
Note over Agent: Loop continues until done...
LLM-->>Agent: Plan: [Action: done(success=True, text='Found cat picture!')]
Agent->>Controller: Execute: done(...)
Controller-->>Agent: ActionResult (is_done=True)
Note over Agent: Agent Loop Ends
Agent->>User: Return History (Task Complete)
```
The core of the `Agent` lives in the `agent/service.py` file. The `Agent` class manages the overall process.
1. **Initialization (`__init__`)**: When you create an `Agent`, it sets up its internal state, stores the task, the LLM, the controller, and prepares the [Message Manager](06_message_manager.md) to keep track of the conversation history. It also figures out the best way to talk to the specific LLM you provided.
```python
# --- File: agent/service.py (Simplified __init__) ---
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser_context: BrowserContext,
controller: Controller,
# ... other settings like use_vision, max_failures, etc.
**kwargs
):
self.task = task
self.llm = llm
self.browser_context = browser_context
self.controller = controller
self.settings = AgentSettings(**kwargs) # Store various settings
self.state = AgentState() # Internal state (step count, failures, etc.)
# Setup message manager for history, using the task and system prompt
self._message_manager = MessageManager(
task=self.task,
system_message=self.settings.system_prompt_class(...).get_system_message(),
settings=MessageManagerSettings(...)
# ... more setup ...
)
# ... other initializations ...
logger.info("Agent initialized.")
```
2. **Running the Task (`run`)**: The `run` method orchestrates the main loop. It calls the `step` method repeatedly until the task is marked as done, an error occurs, or `max_steps` is reached.
```python
# --- File: agent/service.py (Simplified run method) ---
class Agent:
# ... (init) ...
async def run(self, max_steps: int = 100) -> AgentHistoryList:
self._log_agent_run() # Log start event
try:
for step_num in range(max_steps):
if self.state.stopped or self.state.consecutive_failures >= self.settings.max_failures:
break # Stop conditions
# Wait if paused
while self.state.paused: await asyncio.sleep(0.2)
step_info = AgentStepInfo(step_number=step_num, max_steps=max_steps)
await self.step(step_info) # <<< Execute one step of the loop
if self.state.history.is_done():
await self.log_completion() # Log success/failure
break # Exit loop if agent signaled 'done'
else:
logger.info("Max steps reached.") # Ran out of steps
finally:
# ... (cleanup, telemetry, potentially save history/gif) ...
pass
return self.state.history # Return the recorded history
```
3. **Taking a Step (`step`)**: This is the heart of the loop. In each step, the Agent:
* Gets the current browser state (`browser_context.get_state()`).
* Adds this state to the history via the `_message_manager`.
* Asks the LLM for the next action (`get_next_action()`).
* Tells the `Controller` to execute the action(s) (`multi_act()`).
* Records the outcome in the history.
* Handles any errors that might occur.
```python
# --- File: agent/service.py (Simplified step method) ---
class Agent:
# ... (init, run) ...
async def step(self, step_info: Optional[AgentStepInfo] = None) -> None:
logger.info(f"📍 Step {self.state.n_steps}")
state = None
model_output = None
result: list[ActionResult] = []
try:
# 1. Get current state from the browser
state = await self.browser_context.get_state() # Uses BrowserContext
# 2. Add state (+ previous result) to message history for LLM context
self._message_manager.add_state_message(state, self.state.last_result, ...)
# 3. Get LLM's decision on the next action(s)
input_messages = self._message_manager.get_messages()
model_output = await self.get_next_action(input_messages) # Calls the LLM
self.state.n_steps += 1 # Increment step counter
# 4. Execute the action(s) using the Controller
result = await self.multi_act(model_output.action) # Uses Controller
self.state.last_result = result # Store result for next step's context
# 5. Record step details (actions, results, state snapshot)
self._make_history_item(model_output, state, result, ...)
self.state.consecutive_failures = 0 # Reset failure count on success
except Exception as e:
# Handle errors, increment failure count, maybe retry later
result = await self._handle_step_error(e)
self.state.last_result = result
# ... (finally block for logging/telemetry) ...
```
## Conclusion
You've now met the `Agent`, the central coordinator in `Browser Use`. You learned that it acts like a project manager, taking your high-level task, consulting an LLM for step-by-step planning, managing the history, and instructing a `Controller` to perform actions within a `BrowserContext`.
The Agent's effectiveness heavily relies on how well we instruct the LLM planner. In the next chapter, we'll dive into exactly that: crafting the **System Prompt** to guide the LLM's behavior.
[Next Chapter: System Prompt](02_system_prompt.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/Browser Use/02_system_prompt.md
================================================
---
layout: default
title: "System Prompt"
parent: "Browser Use"
nav_order: 2
---
# Chapter 2: The System Prompt - Setting the Rules for Your AI Assistant
In [Chapter 1: The Agent](01_agent.md), we met the `Agent`, our project manager for automating browser tasks. We saw it consults a Large Language Model (LLM) – the "planner" – to decide the next steps based on the current state of the webpage. But how does the Agent tell the LLM *how* it should think, behave, and respond? Just giving it the task isn't enough!
Imagine hiring a new assistant. You wouldn't just say, "Organize my files!" You'd give them specific instructions: "Please sort the files alphabetically by client name, put them in the blue folders, and give me a summary list when you're done." Without these rules, the assistant might do something completely different!
The **System Prompt** solves this exact problem for our LLM. It's the set of core instructions and rules we give the LLM at the very beginning, telling it exactly how to act as a browser automation assistant and, crucially, how to format its responses so the `Agent` can understand them.
## What is the System Prompt? The AI's Rulebook
Think of the System Prompt like the AI assistant's fundamental operating manual, its "Prime Directive," or the rules of a board game. It defines:
1. **Persona:** "You are an AI agent designed to automate browser tasks."
2. **Goal:** "Your goal is to accomplish the ultimate task..."
3. **Input:** How to understand the information it receives about the webpage ([DOM Representation](04_dom_representation.md)).
4. **Capabilities:** What actions it can take ([Action Controller & Registry](05_action_controller___registry.md)).
5. **Limitations:** What it *shouldn't* do (e.g., hallucinate actions).
6. **Response Format:** The *exact* structure (JSON format) its thoughts and planned actions must follow.
Without this rulebook, the LLM might just chat casually, give vague suggestions, or produce output in a format the `Agent` code can't parse. The System Prompt ensures the LLM behaves like the specialized tool we need.
## Why is the Response Format So Important?
This is a critical point. The `Agent` code isn't a human reading the LLM's response. It's a program expecting data in a very specific structure. The System Prompt tells the LLM to *always* respond in a JSON format that looks something like this (simplified):
```json
{
"current_state": {
"evaluation_previous_goal": "Success - Found the search bar.",
"memory": "On google.com main page. Need to search for cats.",
"next_goal": "Type 'cute cat pictures' into the search bar."
},
"action": [
{
"input_text": {
"index": 5, // The index of the search bar element
"text": "cute cat pictures"
}
},
{
"press_keys": {
"keys": "Enter" // Press the Enter key
}
}
]
}
```
The `Agent` can easily read this JSON:
* It understands the LLM's thoughts (`current_state`).
* It sees the exact `action` list the LLM wants to perform.
* It passes these actions (like `input_text` or `press_keys`) to the [Action Controller & Registry](05_action_controller___registry.md) to execute them in the browser.
If the LLM responded with just "Okay, I'll type 'cute cat pictures' into the search bar and press Enter," the `Agent` wouldn't know *which* element index corresponds to the search bar or exactly which actions to call. The strict JSON format is essential for automation.
## A Peek Inside the Rulebook (`system_prompt.md`)
The actual instructions live in a text file within the `Browser Use` library: `browser_use/agent/system_prompt.md`. It's quite detailed, but here's a tiny snippet focusing on the response format rule:
```markdown
# Response Rules
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{{"current_state": {{"evaluation_previous_goal": "...",
"memory": "...",
"next_goal": "..."}},
"action":[{{"one_action_name": {{...}}}}, ...]}}
2. ACTIONS: You can specify multiple actions in the list... Use maximum {{max_actions}} actions...
```
*(This is heavily simplified! The real file has many more rules about element interaction, error handling, task completion, etc.)*
This file clearly defines the JSON structure (`current_state` and `action`) and other crucial behaviors required from the LLM.
## How the Agent Uses the System Prompt
The `Agent` uses a helper class called `SystemPrompt` (found in `agent/prompts.py`) to manage these rules. Here's the flow:
1. **Loading:** When you create an `Agent`, it internally creates a `SystemPrompt` object. This object reads the rules from the `system_prompt.md` file.
2. **Formatting:** The `SystemPrompt` object formats these rules into a special `SystemMessage` object that LLMs understand as foundational instructions.
3. **Conversation Start:** This `SystemMessage` is given to the [Message Manager](06_message_manager.md), which keeps track of the conversation history with the LLM. The `SystemMessage` becomes the *very first message*, setting the context for all future interactions in that session.
Think of it like starting a meeting: the first thing you do is state the agenda and rules (System Prompt), and then the discussion (LLM interaction) follows based on that foundation.
Let's look at a simplified view of the `SystemPrompt` class loading the rules:
```python
# --- File: agent/prompts.py (Simplified) ---
import importlib.resources # Helps find files within the installed library
from langchain_core.messages import SystemMessage # Special message type for LLMs
class SystemPrompt:
def __init__(self, action_description: str, max_actions_per_step: int = 10):
# We ignore these details for now
self.default_action_description = action_description
self.max_actions_per_step = max_actions_per_step
self._load_prompt_template() # <--- Loads the rules file
def _load_prompt_template(self) -> None:
"""Load the prompt rules from the system_prompt.md file."""
try:
# Finds the 'system_prompt.md' file inside the browser_use package
filepath = importlib.resources.files('browser_use.agent').joinpath('system_prompt.md')
with filepath.open('r') as f:
self.prompt_template = f.read() # Read the text content
print("System Prompt template loaded successfully!")
except Exception as e:
print(f"Error loading system prompt: {e}")
self.prompt_template = "Error: Could not load prompt." # Fallback
def get_system_message(self) -> SystemMessage:
"""Format the loaded rules into a message for the LLM."""
# Replace placeholders like {{max_actions}} with actual values
prompt = self.prompt_template.format(max_actions=self.max_actions_per_step)
# Wrap the final rules text in a SystemMessage object
return SystemMessage(content=prompt)
# --- How it plugs into Agent creation (Conceptual) ---
# from browser_use import Agent, SystemPrompt
# from my_llm_setup import my_llm # Your LLM
# ... other setup ...
# When you create an Agent:
# agent = Agent(
# task="Find cat pictures",
# llm=my_llm,
# browser_context=...,
# controller=...,
# # The Agent's __init__ method does something like this internally:
# # system_prompt_obj = SystemPrompt(action_description="...", max_actions_per_step=10)
# # system_message_for_llm = system_prompt_obj.get_system_message()
# # This system_message_for_llm is then passed to the Message Manager.
# )
```
This code shows how the `SystemPrompt` class finds and reads the `system_prompt.md` file and prepares the instructions as a `SystemMessage` ready for the LLM conversation.
## Under the Hood: Initialization and Conversation Flow
Let's visualize how the System Prompt fits into the Agent's setup and interaction loop:
```mermaid
sequenceDiagram
participant User
participant Agent_Init as Agent Initialization
participant SP as SystemPrompt Class
participant MM as Message Manager
participant Agent_Run as Agent Run Loop
participant LLM
User->>Agent_Init: Create Agent(task, llm, ...)
Note over Agent_Init: Agent needs the rules!
Agent_Init->>SP: Create SystemPrompt(...)
SP->>SP: _load_prompt_template() reads system_prompt.md
SP-->>Agent_Init: SystemPrompt instance
Agent_Init->>SP: get_system_message()
SP-->>Agent_Init: system_message (The Formatted Rules)
Note over Agent_Init: Pass rules to conversation manager
Agent_Init->>MM: Initialize MessageManager(task, system_message)
MM->>MM: Store system_message as message #1
MM-->>Agent_Init: MessageManager instance ready
Agent_Init-->>User: Agent created and ready
User->>Agent_Run: agent.run() starts the task
Note over Agent_Run: Agent needs context for LLM
Agent_Run->>MM: get_messages()
MM-->>Agent_Run: [system_message, user_message(state), ...]
Note over Agent_Run: Send rules + current state to LLM
Agent_Run->>LLM: Ask for next action (Input includes rules)
LLM-->>Agent_Run: JSON response (LLM followed rules!)
Agent_Run->>MM: add_model_output(...)
Note over Agent_Run: Loop continues...
```
Internally, the `Agent`'s initialization code (`__init__` in `agent/service.py`) explicitly creates the `SystemPrompt` and passes its output to the `MessageManager`:
```python
# --- File: agent/service.py (Simplified Agent __init__) ---
# ... other imports ...
from browser_use.agent.prompts import SystemPrompt # Import the class
from browser_use.agent.message_manager.service import MessageManager, MessageManagerSettings
class Agent:
def __init__(
self,
task: str,
llm: BaseChatModel,
browser_context: BrowserContext,
controller: Controller,
system_prompt_class: Type[SystemPrompt] = SystemPrompt, # Allows customizing the prompt class
max_actions_per_step: int = 10,
# ... other parameters ...
**kwargs
):
self.task = task
self.llm = llm
# ... store other components ...
# Get the list of available actions from the controller
self.available_actions = controller.registry.get_prompt_description()
# 1. Create the SystemPrompt instance using the provided class
system_prompt_instance = system_prompt_class(
action_description=self.available_actions,
max_actions_per_step=max_actions_per_step,
)
# 2. Get the formatted SystemMessage (the rules)
system_message = system_prompt_instance.get_system_message()
# 3. Initialize the Message Manager with the task and the rules
self._message_manager = MessageManager(
task=self.task,
system_message=system_message, # <--- Pass the rules here!
settings=MessageManagerSettings(...)
# ... other message manager setup ...
)
# ... rest of initialization ...
logger.info("Agent initialized with System Prompt.")
```
When the `Agent` runs its loop (`agent.run()` calls `agent.step()`), it asks the `MessageManager` for the current conversation history (`self._message_manager.get_messages()`). The `MessageManager` always ensures that the `SystemMessage` (containing the rules) is the very first item in that history list sent to the LLM.
## Conclusion
The System Prompt is the essential rulebook that governs the LLM's behavior within the `Browser Use` framework. It tells the LLM how to interpret the browser state, what actions it can take, and most importantly, dictates the exact JSON format for its responses. This structured communication is key to enabling the `Agent` to reliably understand the LLM's plan and execute browser automation tasks.
Without a clear System Prompt, the LLM would be like an untrained assistant – potentially intelligent, but unable to follow the specific procedures needed for the job.
Now that we understand how the `Agent` gets its fundamental instructions, how does it actually perceive the webpage it's supposed to interact with? In the next chapter, we'll explore the component responsible for representing the browser's state: the [BrowserContext](03_browsercontext.md).
[Next Chapter: BrowserContext](03_browsercontext.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/Browser Use/03_browsercontext.md
================================================
---
layout: default
title: "BrowserContext"
parent: "Browser Use"
nav_order: 3
---
# Chapter 3: BrowserContext - The Agent's Isolated Workspace
In the [previous chapter](02_system_prompt.md), we learned how the `System Prompt` acts as the rulebook for the AI assistant (LLM) that guides our `Agent`. We know the Agent uses the LLM to decide *what* to do next based on the current situation in the browser.
But *where* does the Agent actually "see" the webpage and perform its actions? How does it keep track of the current website address (URL), the page content, and things like cookies, all while staying focused on its specific task without getting mixed up with your other browsing?
This is where the **BrowserContext** comes in.
## What Problem Does BrowserContext Solve?
Imagine you ask your `Agent` to log into a specific online shopping website and check your order status. You might already be logged into that same website in your regular browser window with your personal account.
If the Agent just used your main browser window, it might:
1. Get confused by your existing login.
2. Accidentally use your personal cookies or saved passwords.
3. Interfere with other tabs you have open.
We need a way to give the Agent its *own*, clean, separate browsing environment for each task. It needs an isolated "workspace" where it can open websites, log in, click buttons, and manage its own cookies without affecting anything else.
The `BrowserContext` solves this by representing a single, isolated browser session.
## Meet the BrowserContext: Your Agent's Private Browser Window
Think of a `BrowserContext` like opening a brand new **Incognito Window** or creating a **separate User Profile** in your web browser (like Chrome or Firefox).
* **It's Isolated:** What happens in one `BrowserContext` doesn't affect others or your main browser session. It has its own cookies, its own history (for that session), and its own set of tabs.
* **It Manages State:** It keeps track of everything important about the current web session the Agent is working on:
* The current URL.
* Which tabs are open within its "window".
* Cookies specific to that session.
* The structure and content of the current webpage (the DOM - Document Object Model, which we'll explore in the [next chapter](04_dom_representation.md)).
* **It's the Agent's Viewport:** The `Agent` looks through the `BrowserContext` to "see" the current state of the webpage. When the Agent decides to perform an action (like clicking a button), it tells the [Action Controller](05_action_controller___registry.md) to perform it *within* that specific `BrowserContext`.
Essentially, the `BrowserContext` is like a dedicated, clean desk or workspace given to the Agent for its specific job.
## Using the BrowserContext
Before we can have an isolated session (`BrowserContext`), we first need the main browser application itself. This is handled by the `Browser` class. Think of `Browser` as the entire Chrome or Firefox application installed on your computer, while `BrowserContext` is just one window or profile within that application.
Here's a simplified example of how you might set up a `Browser` and then create a `BrowserContext` to navigate to a page:
```python
import asyncio
# Import necessary classes
from browser_use import Browser, BrowserConfig, BrowserContext, BrowserContextConfig
async def main():
# 1. Configure the main browser application (optional, defaults are usually fine)
browser_config = BrowserConfig(headless=False) # Show the browser window
# 2. Create the main Browser instance
# This might launch a browser application in the background (or connect to one)
browser = Browser(config=browser_config)
print("Browser application instance created.")
# 3. Configure the specific session/window (optional)
context_config = BrowserContextConfig(
user_agent="MyCoolAgent/1.0", # Example: Set a custom user agent
cookies_file="my_session_cookies.json" # Example: Save/load cookies
)
# 4. Create the isolated BrowserContext (like opening an incognito window)
# We use 'async with' to ensure it cleans up automatically afterwards
async with browser.new_context(config=context_config) as browser_context:
print(f"BrowserContext created (ID: {browser_context.context_id}).")
# 5. Use the context to interact with the browser session
start_url = "https://example.com"
print(f"Navigating to: {start_url}")
await browser_context.navigate_to(start_url)
# 6. Get information *from* the context
current_state = await browser_context.get_state() # Get current page info
print(f"Current page title: {current_state.title}")
print(f"Current page URL: {current_state.url}")
# The Agent would use this 'browser_context' object to see the page
# and tell the Controller to perform actions within it.
print("BrowserContext closed automatically.")
# 7. Close the main browser application when done
await browser.close()
print("Browser application closed.")
# Run the asynchronous code
asyncio.run(main())
```
**What happens here?**
1. We set up a `BrowserConfig` (telling it *not* to run headless so we can see the window).
2. We create a `Browser` instance, which represents the overall browser program.
3. We create a `BrowserContextConfig` to specify settings for our isolated session (like a custom name or where to save cookies).
4. Crucially, `browser.new_context(...)` creates our isolated session. The `async with` block ensures this session is properly closed later.
5. We use methods *on the `browser_context` object* like `navigate_to()` to control *this specific session*.
6. We use `browser_context.get_state()` to get information about the current page within *this session*. The `Agent` heavily relies on this method.
7. After the `async with` block finishes, the `browser_context` is closed (like closing the incognito window), and finally, we close the main `browser` application.
## How it Works Under the Hood
When the `Agent` needs to understand the current situation to decide the next step, it asks the `BrowserContext` for the latest state using the `get_state()` method. What happens then?
1. **Wait for Stability:** The `BrowserContext` first waits for the webpage to finish loading and for network activity to settle down (`_wait_for_page_and_frames_load`). This prevents the Agent from acting on an incomplete page.
2. **Analyze the Page:** It then uses the [DOM Representation](04_dom_representation.md) service (`DomService`) to analyze the current HTML structure of the page. This service figures out which elements are visible, interactive (buttons, links, input fields), and where they are.
3. **Capture Visuals:** It often takes a screenshot of the current view (`take_screenshot`). This can be helpful for advanced agents or debugging.
4. **Gather Metadata:** It gets the current URL, page title, and information about any other tabs open *within this context*.
5. **Package the State:** All this information (DOM structure, URL, title, screenshot, etc.) is bundled into a `BrowserState` object.
6. **Return to Agent:** The `BrowserContext` returns this `BrowserState` object to the `Agent`. The Agent then uses this information (often sending it to the LLM) to plan its next action.
Here's a simplified diagram of the `get_state()` process:
```mermaid
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant PlaywrightPage as Underlying Browser Page
participant DomService as DOM Service
Agent->>BC: get_state()
Note over BC: Wait for page to be ready...
BC->>PlaywrightPage: Ensure page/network is stable
PlaywrightPage-->>BC: Page is ready
Note over BC: Analyze the page content...
BC->>DomService: Get simplified DOM structure + interactive elements
DomService-->>BC: DOMState (element tree, etc.)
Note over BC: Get visuals and metadata...
BC->>PlaywrightPage: Take screenshot()
PlaywrightPage-->>BC: Screenshot data
BC->>PlaywrightPage: Get URL, Title
PlaywrightPage-->>BC: URL, Title data
Note over BC: Combine everything...
BC->>BC: Create BrowserState object
BC-->>Agent: Return BrowserState
```
Let's look at some simplified code snippets from the library.
The `BrowserContext` is initialized (`__init__` in `browser/context.py`) with its configuration and a reference to the main `Browser` instance that created it.
```python
# --- File: browser/context.py (Simplified __init__) ---
import uuid
# ... other imports ...
if TYPE_CHECKING:
from browser_use.browser.browser import Browser # Link to the Browser class
@dataclass
class BrowserContextConfig: # Configuration settings
# ... various settings like user_agent, cookies_file, window_size ...
pass
@dataclass
class BrowserSession: # Holds the actual Playwright context
context: PlaywrightBrowserContext # The underlying Playwright object
cached_state: Optional[BrowserState] = None # Stores the last known state
class BrowserContext:
def __init__(
self,
browser: 'Browser', # Reference to the main Browser instance
config: BrowserContextConfig = BrowserContextConfig(),
# ... other optional state ...
):
self.context_id = str(uuid.uuid4()) # Unique ID for this session
self.config = config # Store the configuration
self.browser = browser # Store the reference to the parent Browser
# The actual Playwright session is created later, when needed
self.session: BrowserSession | None = None
logger.debug(f"BrowserContext object created (ID: {self.context_id}). Session not yet initialized.")
# The 'async with' statement calls __aenter__ which initializes the session
async def __aenter__(self):
await self._initialize_session() # Creates the actual browser window/tab
return self
async def _initialize_session(self):
# ... (complex setup code happens here) ...
# Gets the main Playwright browser from self.browser
playwright_browser = await self.browser.get_playwright_browser()
# Creates the isolated Playwright context (like the incognito window)
context = await self._create_context(playwright_browser)
# Creates the BrowserSession to hold the context and state
self.session = BrowserSession(context=context, cached_state=None)
logger.debug(f"BrowserContext session initialized (ID: {self.context_id}).")
# ... (sets up the initial page) ...
return self.session
# ... other methods like navigate_to, close, etc. ...
```
The `get_state` method orchestrates fetching the current information from the browser session.
```python
# --- File: browser/context.py (Simplified get_state and helpers) ---
# ... other imports ...
from browser_use.dom.service import DomService # Imports the DOM analyzer
from browser_use.browser.views import BrowserState # Imports the state structure
class BrowserContext:
# ... (init, aenter, etc.) ...
async def get_state(self) -> BrowserState:
"""Get the current state of the browser session."""
logger.debug(f"Getting state for context {self.context_id}...")
# 1. Make sure the page is loaded and stable
await self._wait_for_page_and_frames_load()
# 2. Get the actual Playwright session object
session = await self.get_session()
# 3. Update the state (this does the heavy lifting)
session.cached_state = await self._update_state()
logger.debug(f"State update complete for {self.context_id}.")
# 4. Optionally save cookies if configured
if self.config.cookies_file:
asyncio.create_task(self.save_cookies())
return session.cached_state
async def _wait_for_page_and_frames_load(self, timeout_overwrite: float | None = None):
"""Ensures page is fully loaded before continuing."""
# ... (complex logic to wait for network idle, minimum times) ...
page = await self.get_current_page()
await page.wait_for_load_state('load', timeout=5000) # Simplified wait
logger.debug("Page load/network stability checks passed.")
await asyncio.sleep(self.config.minimum_wait_page_load_time) # Ensure minimum wait
async def _update_state(self) -> BrowserState:
"""Fetches all info and builds the BrowserState."""
session = await self.get_session()
page = await self.get_current_page() # Get the active Playwright page object
try:
# Use DomService to analyze the page content
dom_service = DomService(page)
# Get the simplified DOM tree and interactive elements map
content_info = await dom_service.get_clickable_elements(
highlight_elements=self.config.highlight_elements,
# ... other DOM options ...
)
# Take a screenshot
screenshot_b64 = await self.take_screenshot()
# Get URL, Title, Tabs, Scroll info etc.
url = page.url
title = await page.title()
tabs = await self.get_tabs_info()
pixels_above, pixels_below = await self.get_scroll_info(page)
# Create the BrowserState object
browser_state = BrowserState(
element_tree=content_info.element_tree,
selector_map=content_info.selector_map,
url=url,
title=title,
tabs=tabs,
screenshot=screenshot_b64,
pixels_above=pixels_above,
pixels_below=pixels_below,
)
return browser_state
except Exception as e:
logger.error(f'Failed to update state: {str(e)}')
# Maybe return old state or raise error
raise BrowserError("Failed to get browser state") from e
async def take_screenshot(self, full_page: bool = False) -> str:
"""Takes a screenshot and returns base64 encoded string."""
page = await self.get_current_page()
screenshot_bytes = await page.screenshot(full_page=full_page, animations='disabled')
return base64.b64encode(screenshot_bytes).decode('utf-8')
# ... many other helper methods (_get_current_page, get_tabs_info, etc.) ...
```
This shows how `BrowserContext` acts as a manager for a specific browser session, using underlying tools (like Playwright and `DomService`) to gather the necessary information (`BrowserState`) that the `Agent` needs to operate.
## Conclusion
The `BrowserContext` is a fundamental concept in `Browser Use`. It provides the necessary **isolated environment** for the `Agent` to perform its tasks, much like an incognito window or a separate browser profile. It manages the session's state (URL, cookies, tabs, page content) and provides the `Agent` with a snapshot of the current situation via the `get_state()` method.
Understanding the `BrowserContext` helps clarify *where* the Agent works. Now, how does the Agent actually understand the *content* of the webpage within that context? How is the complex structure of a webpage represented in a way the Agent (and the LLM) can understand?
In the next chapter, we'll dive into exactly that: the [DOM Representation](04_dom_representation.md).
[Next Chapter: DOM Representation](04_dom_representation.md)
---
Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge)
================================================
FILE: docs/Browser Use/04_dom_representation.md
================================================
---
layout: default
title: "DOM Representation"
parent: "Browser Use"
nav_order: 4
---
# Chapter 4: DOM Representation - Mapping the Webpage
In the [previous chapter](03_browsercontext.md), we learned about the `BrowserContext`, the Agent's private workspace for browsing. We saw that the Agent uses `browser_context.get_state()` to get a snapshot of the current webpage. But how does the Agent actually *understand* the content of that snapshot?
Imagine you're looking at the Google homepage. You instantly recognize the logo, the search bar, and the buttons. But a computer program just sees a wall of code (HTML). How can our `Agent` figure out: "This rectangular box is the search bar I need to type into," or "This specific image link is the first result I should click"?
This is the problem solved by **DOM Representation**.
## What Problem Does DOM Representation Solve?
Webpages are built using HTML (HyperText Markup Language), which describes the structure and content. Your browser reads this HTML and creates an internal, structured representation called the **Document Object Model (DOM)**. It's like the browser builds a detailed blueprint or an outline from the HTML instructions.
However, this raw DOM blueprint is incredibly complex and contains lots of information irrelevant to our Agent's task. The Agent doesn't need to know about every single tiny visual detail; it needs a *simplified map* focused on what's important for interaction:
1. **What elements are on the page?** (buttons, links, input fields, text)
2. **Are they visible to a user?** (Hidden elements shouldn't be interacted with)
3. **Are they interactive?** (Can you click it? Can you type in it?)
4. **How can the Agent refer to them?** (We need a simple way to say "click *this* button")
DOM Representation solves the problem of translating the complex, raw DOM blueprint into a simplified, structured map that highlights the interactive "landmarks" and pathways the Agent can use.
## Meet `DomService`: The Map Maker
The component responsible for creating this map is the `DomService`. Think of it as a cartographer specializing in webpages.
When the `Agent` (via the `BrowserContext`) asks for the current state of the page, the `BrowserContext` employs the `DomService` to analyze the page's live DOM.
Here's what the `DomService` does:
1. **Examines the Live Page:** It looks at the current structure rendered in the browser tab, not just the initial HTML source code (because JavaScript can change the page after it loads).
2. **Identifies Elements:** It finds all the meaningful elements like buttons, links, input fields, and text blocks.
3. **Checks Properties:** For each element, it determines crucial properties:
* **Visibility:** Is it actually displayed on the screen?
* **Interactivity:** Is it something a user can click, type into, or otherwise interact with?
* **Position:** Where is it located (roughly)?
4. **Assigns Interaction Indices:** This is key! For elements deemed interactive and visible, `DomService` assigns a unique number, called a `highlight_index` (like `[5]`, `[12]`, etc.). This gives the Agent and the LLM a simple, unambiguous way to refer to specific elements.
5. **Builds a Structured Tree:** It organizes this information into a simplified tree structure (`element_tree`) that reflects the page layout but is much easier to process than the full DOM.
6. **Creates an Index Map:** It generates a `selector_map`, which is like an index in a book, mapping each `highlight_index` directly to its corresponding element node in the tree.
The final output is a `DOMState` object containing the simplified `element_tree` and the handy `selector_map`. This `DOMState` is then included in the `BrowserState` that `BrowserContext.get_state()` returns to the Agent.
## The Output: `DOMState` - The Agent's Map
The `DOMState` object produced by `DomService` has two main parts:
1. **`element_tree`:** This is the root of our simplified map, represented as a `DOMElementNode` object (defined in `dom/views.py`). Each node in the tree can be either an element (`DOMElementNode`) or a piece of text (`DOMTextNode`). `DOMElementNode`s contain information like the tag name (`