Message Request:
```json
{
"tools": [
{
"name": "code_execution",
"type": "code_execution_20250522"
}
]
}
```
Tool Call Response:
```json
{
"role": "assistant",
"container": {
"id": "{string}",
"expires_at": "{timestamp}"
},
"content": [
{
"type": "server_tool_use",
"id": "{string}",
"name": "code_execution",
"input": {
"code": "{string}"
}
},
{
"type": "code_execution_tool_result",
"tool_use_id": "{string}",
"content": {
"type": "code_execution_result",
"stdout": "{string}",
"stderr": "{string}",
"return_code": "{integer}"
}
}
]
}
```
#### Commonalities
- **Tool Type Specification**: Providers consistently define a `code_interpreter` tool type within the `tools` array, indicating support for code execution capabilities.
- **Input and Output Handling**: Requests include mechanisms to specify code input (e.g., `input` or `code` fields), and responses return execution outputs, such as logs or files, in a structured format.
- **File Resource Support**: Most providers allow associating files with the code interpreter (e.g., via `file_ids` or `files`), enabling data input/output for code execution.
- **Execution Metadata**: Responses often include metadata about the execution process (e.g., `status`, `logs`, or `executionError`), which can be abstracted for standardized error handling and result processing.
#### Search and Retrieval
Azure AI Foundry Agent Service
Source: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/file-search-upload-files?pivots=rest
File Search Request:
```json
{
"tools": [
{
"type": "file_search"
}
],
"tool_resources": {
"file_search": {
"vector_store_ids": ["{string}"],
"vector_stores": [
{
"name": "{string}",
"configuration": {
"data_sources": [
{
"type": {
"id_asset": "{string}",
"uri_asset": "{string}"
},
"uri": "{string}"
}
]
}
}
]
}
}
}
```
File Search Tool Call Response:
```json
{
"tool_calls": [
{
"id": "{string}",
"type": "file_search",
"file_search": {
"ranking_options": {
"ranker": "{string}",
"score_threshold": "{float}"
},
"results": [
{
"file_id": "{string}",
"file_name": "{string}",
"score": "{float}",
"content": [
{
"text": "{string}",
"type": "{string}"
}
]
}
]
}
}
]
}
```
Azure AI Search Request:
```json
{
"tools": [
{
"type": "azure_ai_search"
}
],
"tool_resources": {
"azure_ai_search": {
"indexes": [
{
"index_connection_id": "{string}",
"index_name": "{string}",
"query_type": "{string}"
}
]
}
}
}
```
Azure AI Search Tool Call Response:
```json
{
"tool_calls": [
{
"id": "{string}",
"type": "azure_ai_search",
"azure_ai_search": {} // From documentation: Reserved for future use
}
]
}
```
OpenAI Assistant API
Source: https://platform.openai.com/docs/assistants/tools/file-search
Message Request:
```json
{
"tools": [
{
"type": "file_search"
}
],
"tool_resources": {
"file_search": {
"vector_store_ids": ["string"]
}
}
}
```
Tool Call Response:
```json
{
"tool_calls": [
{
"id": "{string}",
"type": "file_search",
"file_search": {
"ranking_options": {
"ranker": "{string}",
"score_threshold": "{float}"
},
"results": [
{
"file_id": "{string}",
"file_name": "{string}",
"score": "{float}",
"content": [
{
"text": "{string}",
"type": "{string}"
}
]
}
]
}
}
]
}
```
OpenAI Responses API
Source: https://platform.openai.com/docs/api-reference/responses/create
Message Request:
```json
{
"tools": [
{
"type": "file_search"
}
],
"tool_resources": {
"file_search": {
"vector_store_ids": ["string"]
}
}
}
```
Tool Call Response:
```json
{
"output": [
{
"id": "{string}",
"queries": ["{string}"],
"status": "{in_progress | searching | incomplete | failed | completed}",
"type": "file_search_call",
"results": [
{
"attributes": {},
"file_id": "{string}",
"filename": "{string}",
"score": "{float}",
"text": "{string}"
}
]
}
]
}
```
Amazon Bedrock Agents
Source: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_InvokeAgent.html
Message Request:
```json
{
"sessionState": {
"knowledgeBaseConfigurations": [
{
"knowledgeBaseId": "{string}",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"filter": {},
"implicitFilterConfiguration": {
"metadataAttributes": [
{
"description": "{string}",
"key": "{string}",
"type": "{string}"
}
],
"modelArn": "{string}"
},
"numberOfResults": "{number}",
"overrideSearchType": "{string}",
"rerankingConfiguration": {
"bedrockRerankingConfiguration": {
"metadataConfiguration": {
"selectionMode": "{string}",
"selectiveModeConfiguration": {}
},
"modelConfiguration": {
"additionalModelRequestFields": {
"string" : "{JSON string}"
},
"modelArn": "{string}"
},
"numberOfRerankedResults": "{number}"
},
"type": "{string}"
}
}
}
}
]
}
}
```
Tool Call Response:
```json
{
"trace": {
"orchestrationTrace": {
"invocationInput": {
"invocationType": "KNOWLEDGE_BASE",
"knowledgeBaseLookupInput": {
"knowledgeBaseId": "{string}",
"text": "{string}"
}
},
"observation": {
"type": "KNOWLEDGE_BASE",
"knowledgeBaseLookupOutput": {
"retrievedReferences": [
{
"metadata": {},
"content": {
"byteContent": "{string}",
"row": [
{
"columnName": "{string}",
"columnValue": "{string}",
"type": "{BLOB | BOOLEAN | DOUBLE | NULL | LONG | STRING}"
}
],
"text": "{string}",
"type": "{TEXT | IMAGE | ROW}"
}
}
],
"metadata": {
"clientRequestId": "{string}",
"endTime": "{timestamp}",
"operationTotalTimeMs": "{long}",
"startTime": "{timestamp}",
"totalTimeMs": "{long}",
"usage": {
"inputTokens": "{integer}",
"outputTokens": "{integer}"
}
}
}
}
}
}
}
```
Google
Source: https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-with-vertex-ai-search
Message Request:
```json
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "{string}"
}
]
}
],
"tools": [
{
"retrieval": {
"vertexAiSearch": {
"datastore": "{string}"
}
}
}
]
}
```
Tool Call Response:
```json
{
"content": {
"role": "model",
"parts": [
{
"text": "{string}"
}
]
},
"groundingMetadata": {
"retrievalQueries": [
"{string}"
],
"groundingChunks": [
{
"retrievedContext": {
"uri": "{string}",
"title": "{string}"
}
}
],
"groundingSupport": [
{
"segment": {
"startIndex": "{number}",
"endIndex": "{number}"
},
"segment_text": "{string}",
"supportChunkIndices": ["{number}"],
"confidenceScore": ["{number}"]
}
]
}
}
```
#### Commonalities
- **Vector Store Integration**: Providers like Azure and OpenAI use `vector_store_ids` or similar constructs to reference vector stores for file search, suggesting a common approach to retrieval-augmented generation.
- **Search Configuration**: Requests include configurations for search (e.g., `vectorSearchConfiguration`, `ranking_options`), allowing customization of retrieval parameters like result count or ranking.
- **Result Structure**: Responses contain a list of search results with fields like `file_id`, `score`, and `content` or `text`, enabling consistent processing of retrieved data.
- **Metadata Inclusion**: Search responses often include metadata (e.g., `score`, `timestamp`, `usage`), which can be abstracted for unified analytics and performance tracking.
#### Web Search
Azure AI Foundry Agent Service
Source: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/bing-code-samples?pivots=rest
Bing Search Message Request:
```json
{
"tools": [
{
"type": "bing_grounding",
"bing_grounding": {
"search_configurations": [
{
"connection_id": "{string}",
"count": "{number}",
"market": "{string}",
"set_lang": "{string}",
"freshness": "{string}",
}
]
}
}
]
}
```
Bing Search Tool Call Response:
```json
{
"tool_calls": [
{
"id": "{string}",
"type": "function",
"bing_grounding": {} // From documentation: Reserved for future use
}
]
}
```
OpenAI ChatCompletion API
Source: https://platform.openai.com/docs/guides/tools-web-search?api-mode=chat
Message Request:
```json
{
"web_search_options": {},
"messages": [
{
"role": "user",
"content": "{string}"
}
]
}
```
Tool Call Response:
```json
[
{
"index": 0,
"message": {
"role": "assistant",
"content": "{string}",
"annotations": [
{
"type": "url_citation",
"url_citation": {
"end_index": "{number}",
"start_index": "{number}",
"title": "{string}",
"url": "{string}"
}
}
]
}
}
]
```
OpenAI Responses API
Source: https://platform.openai.com/docs/guides/tools-web-search?api-mode=responses
Message Request:
```json
{
"tools": [
{
"type": "web_search_preview"
}
],
"input": "{string}"
}
```
Tool Call Response:
```json
{
"output": [
{
"type": "web_search_call",
"id": "{string}",
"status": "{string}"
},
{
"id": "{string}",
"type": "message",
"status": "{string}",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "{string}",
"annotations": [
{
"type": "url_citation",
"start_index": "{number}",
"end_index": "{string}",
"url": "{string}",
"title": "{string}"
}
]
}
]
}
]
}
```
Google
Source: https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-with-google-search
Message Request:
```json
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "{string}"
}
]
}
],
"tools": [
{
"googleSearch": {}
}
]
}
```
Tool Call Response:
```json
{
"content": {
"role": "model",
"parts": [
{
"text": "{string}"
}
]
},
"groundingMetadata": {
"webSearchQueries": [
"{string}"
],
"searchEntryPoint": {
"renderedContent": "{string}"
},
"groundingChunks": [
{
"web": {
"uri": "{string}",
"title": "{string}",
"domain": "{string}"
}
}
],
"groundingSupports": [
{
"segment": {
"startIndex": "{number}",
"endIndex": "{number}",
"text": "{string}"
},
"groundingChunkIndices": [
"{number}"
],
"confidenceScores": [
"{number}"
]
}
],
"retrievalMetadata": {}
}
}
```
Anthropic
Source: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/web-search-tool
Message Request:
```json
{
"tools": [
{
"name": "web_search",
"type": "web_search_20250305",
"max_uses": "{number}",
"allowed_domains": ["{string}"],
"blocked_domains": ["{string}"],
"user_location": {
"type": "approximate",
"city": "{string}",
"region": "{string}",
"country": "{string}",
"timezone": "{string}"
}
}
]
}
```
Tool Call Response:
```json
{
"role": "assistant",
"content": [
{
"type": "server_tool_use",
"id": "{string}",
"name": "web_search",
"input": {
"query": "{string}"
}
},
{
"type": "web_search_tool_result",
"tool_use_id": "{string}",
"content": [
{
"type": "web_search_result",
"url": "{string}",
"title": "{string}",
"encrypted_content": "{string}",
"page_age": "{string}"
}
]
},
{
"text": "{string}",
"type": "text",
"citations": [
{
"type": "web_search_result_location",
"url": "{string}",
"title": "{string}",
"encrypted_index": "{string}",
"cited_text": "{string}"
}
]
}
]
}
```
#### Commonalities
- **Tool-Based Activation**: Providers define web search as a tool (e.g., `web_search`, `bing_grounding`, `googleSearch`), typically within a `tools` array, allowing standardized activation of search capabilities.
- **Query Input**: Requests support passing a search query (e.g., via `input`, `content`, or `query`), enabling a unified interface for initiating searches.
- **Result Annotations**: Responses include search results with metadata like `url`, `title`, and sometimes `confidenceScores` or `citations`, which can be abstracted for consistent result presentation.
- **Grounding Metadata**: Most providers include grounding metadata (e.g., `groundingMetadata`, `annotations`), facilitating traceability and validation of search results.
#### Remote MCP Servers
OpenAI Responses API
Source: https://platform.openai.com/docs/guides/tools-remote-mcp
Message Request:
```json
{
"tools": [
{
"type": "mcp",
"server_label": "{string}",
"server_url": "{string}",
"require_approval": "{string}"
}
]
}
```
Tool Call Response:
```json
{
"output": [
{
"id": "{string}",
"type": "mcp_list_tools",
"server_label": "{string}",
"tools": [
{
"name": "{string}",
"input_schema": "{JSON Schema object}"
}
]
},
{
"id": "{string}",
"type": "mcp_call",
"approval_request_id": "{string}",
"arguments": "{JSON string}",
"error": "{string}",
"name": "{string}",
"output": "{string}",
"server_label": "{string}"
}
]
}
```
Google
Source: https://google.github.io/adk-docs/tools/mcp-tools/#using-mcp-tools-in-your-own-agent-out-of-adk-web
```python
async def get_agent_async():
toolset = MCPToolset(
tool_filter=['read_file', 'list_directory'] # Optional: filter specific tools
connection_params=SseServerParams(url="http://remote-server:port/path", headers={...})
)
# Use in an agent
root_agent = LlmAgent(
model='model', # Adjust model name if needed based on availability
name='agent_name',
instruction='agent_instructions',
tools=[toolset], # Provide the MCP tools to the ADK agent
)
return root_agent, toolset
```
Anthropic
Source: https://docs.anthropic.com/en/docs/agents-and-tools/mcp-connector
Message Request:
```json
{
"messages": [
{
"role": "user",
"content": "{string}"
}
],
"mcp_servers": [
{
"type": "url",
"url": "{string}",
"name": "{string}",
"tool_configuration": {
"enabled": true,
"allowed_tools": ["{string}"]
},
"authorization_token": "{string}"
}
]
}
```
Tool Use Response:
```json
{
"type": "mcp_tool_use",
"id": "{string}",
"name": "{string}",
"server_name": "{string}",
"input": { "param1": "{object}", "param2": "{object}" }
}
```
Tool Result Response:
```json
{
"type": "mcp_tool_result",
"tool_use_id": "{string}",
"is_error": "{boolean}",
"content": [
{
"type": "text",
"text": "{string}"
}
]
}
```
#### Commonalities
- **Server Configuration**: Providers specify remote servers via URL and metadata (e.g., `server_url`, `url`, `name`), enabling a standardized way to connect to external MCP services.
- **Tool Integration**: MCP tools are integrated into the `tools` or `mcp_servers` array, allowing agents to interact with remote tools in a consistent manner.
- **Input/Output Structure**: Requests and responses include structured input (e.g., `input`, `arguments`) and output (e.g., `output`, `content`), supporting abstraction for tool execution workflows.
- **Authorization Support**: Most providers include mechanisms for authentication (e.g., `authorization_token`, `headers`), which can be abstracted for secure communication with remote servers.
#### Computer Use
OpenAI Responses API
Source: https://platform.openai.com/docs/guides/tools-computer-use
Message Request:
```json
{
"tools": [
{
"type": "computer_use_preview",
"display_width": "{number}",
"display_height": "{number}",
"environment": "{browser | mac | windows | ubuntu}"
}
]
}
```
Tool Call Response:
```json
{
"output": [
{
"type": "reasoning",
"id": "{string}",
"summary": [
{
"type": "summary_text",
"text": "{string}"
}
]
},
{
"type": "computer_call",
"id": "{string}",
"call_id": "{string}",
"action": {
"type": "{click | double_click | drag | keypress | move | screenshot | scroll | type | wait}",
// Other properties are associated with specific action type.
},
"pending_safety_checks": [],
"status": "{in_progress | completed | incomplete}"
}
]
}
```
Amazon Bedrock Agents
Source: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgentActionGroup.html#API_agent_CreateAgentActionGroup_RequestSyntax
Source: https://docs.aws.amazon.com/bedrock/latest/userguide/agent-computer-use-handle-tools.html
CreateAgentActionGroup Request:
```json
{
"actionGroupName": "{string}",
"parentActionGroupSignature": "ANTHROPIC.Computer",
"actionGroupState": "ENABLED"
}
```
Tool Call Response:
```json
{
"returnControl": {
"invocationId": "{string}",
"invocationInputs": [
{
"functionInvocationInput": {
"actionGroup": "{string}",
"actionInvocationType": "RESULT",
"agentId": "{string}",
"function": "{string}",
"parameters": [
{
"name": "{string}",
"type": "string",
"value": "{string}"
}
]
}
}
]
}
}
```
Anthropic
Source: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/computer-use-tool
Message Request:
```json
{
"tools": [
{
"type": "computer_20250124",
"name": "computer",
"display_width_px": "{number}",
"display_height_px": "{number}",
"display_number": "{number}"
},
]
}
```
Tool Call Response:
```json
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "{string}",
"name": "{string}",
"input": "{object}"
}
]
}
```
#### Commonalities
- **Tool Type Definition**: Providers define a computer use tool (e.g., `computer_use_preview`, `computer_20250124`, `ANTHROPIC.Computer`) within the `tools` array, indicating support for computer interaction capabilities.
- **Action Specification**: Responses include actions (e.g., `click`, `keypress`, `type`) with associated parameters, enabling standardized interaction with computer environments.
- **Environment Configuration**: Requests allow specifying the environment (e.g., `browser`, `windows`, `display_width`), which can be abstracted for cross-platform compatibility.
- **Status Tracking**: Responses include status indicators (e.g., `status`, `pending_safety_checks`), facilitating consistent monitoring of computer use tasks.
#### OpenAPI Spec Tool
Azure AI Foundry Agent Service
Source: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/openapi-spec-samples?pivots=rest-api
Source: https://learn.microsoft.com/en-us/rest/api/aifoundry/aiagents/run-steps/get-run-step?view=rest-aifoundry-aiagents-v1&tabs=HTTP#runstepopenapitoolcall
Message Request:
```json
{
"tools": [
{
"type": "openapi",
"openapi": {
"description": "{string}",
"name": "{string}",
"auth": {
"type": "{string}"
},
"spec": "{OpenAPI specification object}"
}
}
]
}
```
Tool Call Response:
```json
{
"tool_calls": [
{
"id": "{string}",
"type": "openapi",
"openapi": {} // From documentation: Reserved for future use
}
]
}
```
Amazon Bedrock Agents
Source: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgentActionGroup.html#API_agent_CreateAgentActionGroup_RequestSyntax
CreateAgentActionGroup Request:
```json
{
"apiSchema": {
"payload": "{JSON or YAML OpenAPI specification string}"
}
}
```
Tool Call Response:
```json
{
"invocationInputs": [
{
"apiInvocationInput": {
"actionGroup": "{string}",
"apiPath": "{string}",
"httpMethod": "{string}",
"parameters": [
{
"name": "{string}",
"type": "{string}",
"value": "{string}"
}
]
}
}
]
}
```
#### Commonalities
- **OpenAPI Specification**: Both providers support defining tools using OpenAPI specifications, either as a JSON/YAML payload or a structured `spec` object, enabling standardized API integration.
- **Tool Type Identification**: The tool is identified as `openapi` or via an `apiSchema`, providing a clear entry point for OpenAPI-based tool usage.
- **Parameter Handling**: Responses include parameters (e.g., `parameters`, `apiPath`, `httpMethod`) for API invocation, which can be abstracted for unified API call execution.
#### Stateful Functions
Azure AI Foundry Agent Service
Source: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/azure-functions-samples?pivots=rest
Message Request:
```json
{
"tools": [
{
"type": "azure_function",
"azure_function": {
"function": {
"name": "{string}",
"description": "{string}",
"parameters": "{JSON Schema object}"
},
"input_binding": {
"type": "storage_queue",
"storage_queue": {
"queue_service_endpoint": "{string}",
"queue_name": "{string}"
}
},
"output_binding": {
"type": "storage_queue",
"storage_queue": {
"queue_service_endpoint": "{string}",
"queue_name": "{string}"
}
}
}
}
]
}
```
Tool Call Response: Not specified in the documentation.
Amazon Bedrock Agents
Source: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgentActionGroup.html#API_agent_CreateAgentActionGroup_RequestSyntax
CreateAgentActionGroup Request:
```json
{
"apiSchema": {
"payload": "{JSON or YAML OpenAPI specification string}"
}
}
```
Tool Call Response:
```json
{
"invocationInputs": [
{
"apiInvocationInput": {
"actionGroup": "{string}",
"apiPath": "{string}",
"httpMethod": "{string}",
"parameters": [
{
"name": "{string}",
"type": "{string}",
"value": "{string}"
}
]
}
}
]
}
```
#### Commonalities
- **API-Driven Interaction**: Both providers use API-based structures (e.g., `apiSchema`, `azure_function`) to define stateful functions, enabling integration with external services.
- **Parameter Specification**: Requests include parameter definitions (e.g., `parameters`, `JSON Schema object`), supporting standardized input handling.
#### Text Editor
Anthropic
Source: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/text-editor-tool
Message Request:
```json
{
"tools": [
{
"type": "text_editor_20250429",
"name": "str_replace_based_edit_tool"
}
]
}
```
Tool Call Response:
```json
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "{string}",
"name": "str_replace_based_edit_tool",
"input": {
"command": "{string}",
"path": "{string}"
}
}
]
}
```
#### Microsoft Fabric
Azure AI Foundry Agent Service
Source: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/fabric?pivots=rest
Message Request:
```json
{
"tools": [
{
"type": "fabric_dataagent",
"fabric_dataagent": {
"connections": [
{
"connection_id": "{string}"
}
]
}
}
]
}
```
Tool Call Response: Not specified in the documentation.
#### Image Generation
OpenAI Responses API
Source: https://platform.openai.com/docs/guides/tools-image-generation
Message Request:
```json
{
"tools": [
{
"type": "image_generation"
}
]
}
```
Tool Call Response:
```json
{
"output": [
{
"type": "image_generation_call",
"id": "{string}",
"result": "{Base64 string}",
"status": "{string}"
}
]
}
```
## Decision Outcome
TBD.
================================================
FILE: docs/decisions/0003-agent-opentelemetry-instrumentation.md
================================================
---
status: proposed
contact: rogerbarreto
date: 2025-07-14
deciders: stephentoub, markwallace-microsoft, rogerbarreto, westey-m
informed: {}
---
# Agent OpenTelemetry Instrumentation
## Context and Problem Statement
Currently, the Agent Framework lacks comprehensive observability and telemetry capabilities, making it difficult for developers to monitor agent performance, track usage patterns, debug issues, and gain insights into agent behavior in production environments. While the underlying ChatClient implementations may have their own telemetry, there is no standardized way to capture agent-specific metrics and traces that provide visibility into agent operations, token usage, response times, and error patterns at the agent abstraction level.
## Decision Drivers
- **Compliance**: The implementation should adhere to established OpenTelemetry semantic conventions for agents, ensuring consistency and interoperability with existing telemetry systems.
- **Observability Requirements**: Developers need comprehensive telemetry to monitor agent performance, track usage patterns, and debug issues in production environments.
- **Standardization**: The solution must follow established OpenTelemetry semantic conventions and integrate seamlessly with existing .NET telemetry infrastructure.
- **Microsoft.Extensions.AI Alignment**: The implementation should follow the exact patterns and conventions established by Microsoft.Extensions.AI's OpenTelemetry instrumentation.
- **Non-Intrusive Design**: Telemetry should be optional and not impact the core agent functionality or performance when disabled.
- **Agent-Level Insights**: The telemetry should capture agent-specific operations without duplicating underlying ChatClient telemetry.
- **Extensibility**: The solution should support future enhancements and additional telemetry scenarios.
## Considered Options
### Option 1: Direct Integration into Core Agent Classes
Embed OpenTelemetry instrumentation directly into the base `Agent` class and `ChatClientAgent` implementations.
#### Pros
- Automatic telemetry for all agent implementations
- No additional wrapper classes needed
- Consistent telemetry across all agents
#### Cons
- Violates single responsibility principle
- Increases complexity of core agent classes
- Makes telemetry mandatory rather than optional
- Harder to test and maintain
- Couples telemetry concerns with business logic
### Option 2: Aspect-Oriented Programming (AOP) Approach
Use interceptors or AOP frameworks to inject telemetry behavior into agent methods.
#### Pros
- Clean separation of concerns
- Non-intrusive to existing code
- Can be applied selectively
#### Cons
- Adds complexity with AOP framework dependencies
- Runtime overhead for interception
- Harder to debug and understand
- Not consistent with Microsoft.Extensions.AI patterns
### Option 3: OpenTelemetryAgent Wrapper Pattern
Create a delegating `OpenTelemetryAgent` wrapper class that implements the `Agent` interface and wraps any existing agent with telemetry instrumentation, following the exact pattern of Microsoft.Extensions.AI's `OpenTelemetryChatClient`.
#### Pros
- Follows established Microsoft.Extensions.AI patterns exactly
- Clean separation of concerns
- Optional and non-intrusive
- Easy to test and maintain
- Consistent with .NET telemetry conventions
- Supports any agent implementation
- Provides agent-level telemetry without duplicating ChatClient telemetry
#### Cons
- Requires explicit wrapping of agents
- Additional object allocation for wrapper
## Decision Outcome
Chosen option: "OpenTelemetryAgent Wrapper Pattern", because it follows the established Microsoft.Extensions.AI patterns exactly, provides clean separation of concerns, maintains optional telemetry, and offers the best balance of functionality, maintainability, and consistency with existing .NET telemetry infrastructure.
### Implementation Details
The implementation includes:
1. **OpenTelemetryAgent Wrapper Class**: A delegating agent that wraps any `Agent` implementation with telemetry instrumentation
2. **AgentOpenTelemetryConsts**: Comprehensive constants for telemetry attribute names and metric definitions
3. **Extension Methods**: `.WithOpenTelemetry()` extension method for easy agent wrapping
4. **Comprehensive Test Suite**: Full test coverage following Microsoft.Extensions.AI testing patterns
### Telemetry Data Captured
**Activities/Spans:**
- `agent.operation.name` (agent.run, agent.run_streaming)
- `agent.request.id`, `agent.request.name`, `agent.request.instructions`
- `agent.request.message_count`, `agent.request.thread_id`
- `agent.response.id`, `agent.response.message_count`, `agent.response.finish_reason`
- `agent.usage.input_tokens`, `agent.usage.output_tokens`
- Error information and activity status codes
**Metrics:**
- Operation duration histogram with proper buckets
- Token usage histogram (input/output tokens)
- Request count counter
- All metrics tagged with operation type and agent name
### Consequences
- **Good**: Provides comprehensive agent-level observability following established patterns
- **Good**: Non-intrusive and optional implementation that doesn't affect core functionality
- **Good**: Consistent with Microsoft.Extensions.AI telemetry conventions
- **Good**: Easy to integrate with existing OpenTelemetry infrastructure
- **Good**: Supports debugging, monitoring, and performance analysis
- **Neutral**: Requires explicit wrapping of agents with `.WithOpenTelemetry()`
- **Neutral**: Additional object allocation for telemetry wrapper
## Validation
The implementation is validated through:
1. **Comprehensive Unit Tests**: 16 test methods covering all scenarios including success, error, streaming, and edge cases
2. **Integration Testing**: Step05 telemetry sample demonstrating real-world usage
3. **Pattern Compliance**: Exact adherence to Microsoft.Extensions.AI OpenTelemetry patterns
4. **Semantic Convention Compliance**: Follows OpenTelemetry semantic conventions for telemetry data
## More Information
### Usage Example
```csharp
// Create TracerProvider
using var tracerProvider = Sdk.CreateTracerProviderBuilder()
.AddSource(AgentOpenTelemetryConsts.DefaultSourceName)
.AddConsoleExporter()
.Build();
// Create and wrap agent with telemetry
var baseAgent = new ChatClientAgent(chatClient, options);
using var telemetryAgent = baseAgent.WithOpenTelemetry();
// Use agent normally - telemetry is captured automatically
var response = await telemetryAgent.RunAsync(messages);
```
### Relationship to Microsoft.Extensions.AI
This implementation follows the exact patterns established by Microsoft.Extensions.AI's OpenTelemetry instrumentation, ensuring consistency across the AI ecosystem and leveraging proven patterns for telemetry integration.
================================================
FILE: docs/decisions/0004-foundry-sdk-extensions.md
================================================
---
# These are optional elements. Feel free to remove any of them.
status: proposed
contact: markwallace-microsoft
date: 2025-08-06
deciders: markwallace-microsoft, westey-m, quibitron, trrwilson
consulted:
informed:
---
# `Azure.AI.Agents.Persistent` package Extensions Methods for Agent Framework
## Context and Problem Statement
To align the `Azure.AI.Agents.Persistent` package and Agent Framework a set of extensions methods have been created which allow a developer to create or retrieve an `AIAgent` using the `PersistentAgentsClient`.
The purpose of this ADR is to decide where these extension methods should live.
## Decision Drivers
- Provide the optimum experience for developers.
- Avoid adding additional dependencies to the `Azure.AI.Agents.Persistent` package (and not in the future)
## Considered Options
- Add the extension methods to the `Azure.AI.Agents.Persistent` package and change it's dependencies
- Add the extension methods to the `Azure.AI.Agents.Persistent` package without changing it's dependencies
- Add the extension methods to a `Microsoft.Extensions.AI.Azure` package
### Add the extension methods to the `Azure.AI.Agents.Persistent` package and change it's dependencies
- `Azure.AI.Agents.Persistent` would depend on `Microsoft.Extensions.AI` instead of `Microsoft.Extensions.AI.Abstractions`
- Good because, extension methods are in the `Azure.AI.Agents.Persistent` package and can be easily kept up-to-date
- Good because, developers don't need to explicitly depend on a new package to get Agent Framework functionality
- Bad because, it introduces additional dependencies which would possibly grow overtime
### - Add the extension methods to the `Azure.AI.Agents.Persistent` package without changing it's dependencies
- `Azure.AI.Agents.Persistent` would depend on `Microsoft.Extensions.AI.Abstractions` (as it currently does)
- `ChatClientAgent` and `FunctionInvokingChatClient` would move to `Microsoft.Extensions.AI.Abstractions`
- Good because, extension methods are in the `Azure.AI.Agents.Persistent` package and can be easily kept up-to-date
- Good because, developers don't need to explicitly depend on a new package to get Agent Framework functionality
- Good because, it introduces minimal additional dependencies
- Bad because, it adds additional dependencies to `Microsoft.Extensions.AI.Abstractions` and these additional dependencies add up as transitive to `Azure`.AI.Agents.Persistent`
### Add the extension methods to a `Microsoft.Extensions.AI.Azure` package
- Introduce a new package called `Microsoft.Extensions.AI.Azure` where the extension methods would live
- `Azure.AI.Agents.Persistent` does not change
- Good because, it introduces no additional dependencies to `Azure.AI.Agents.Persistent` package
- Bad because, extension methods are not in the `Azure.AI.Agents.Persistent` package and cannot be easily kept up-to-date
- Bad because, developers need to explicitly depend on a new package to get Agent Framework functionality
## Decision Outcome
Chosen option: "Add the extension methods to a `Microsoft.Extensions.AI.Azure` package", because
it introduces no additional dependencies to `Azure.AI.Agents.Persistent` package.
================================================
FILE: docs/decisions/0005-python-naming-conventions.md
================================================
---
status: accepted
contact: eavanvalkenburg
date: 2025-09-04
deciders: markwallace-microsoft, dmytrostruk, peterychang, ekzhu, sphenry
consulted: taochenosu, alliscode, moonbox3, johanste
---
# Python naming conventions and renames (ADR)
## Context and Problem Statement
The project has a public .NET surface and a Python surface. During a cross-language alignment effort the community proposed renames to make the Python surface more idiomatic while preserving discoverability and mapping to the .NET names. This ADR captures the final naming decisions (or the proposed ones), the rationale, and the alternatives considered and rejected.
## Decision drivers
- Follow Python naming conventions (PEP 8) where appropriate (snake_case for functions and module-level variables, PascalCase for classes).
- Preserve conceptual parity with .NET names to make it easy for developers reading both surfaces to correlate types and behaviors.
- Avoid ambiguous or overloaded names in Python that could conflict with stdlib, common third-party packages, or existing package/module names.
- Prefer clarity and discoverability in the public API surface over strict symmetry with .NET when Python conventions conflict.
- Minimize churn and migration burden for existing Python users where backwards compatibility is feasible.
## Principles applied
- Map .NET PascalCase class names to PascalCase Python classes when they represent types.
- Map .NET method/field names that are camelCase to snake_case in Python where they will be used as functions or module-level attributes.
- When a .NET name is an acronym or initialism, use Python-friendly casing (e.g., `Http` -> `HTTP` in classes, but acronyms in function names should be lowercased per PEP 8 where sensible).
- Avoid names that shadow common stdlib modules (e.g., `logging`, `asyncio`) or widely used third-party modules.
- When multiple reasonable Python names exist, prefer the one that communicates intent most clearly to Python users, and record rejected alternatives in the table with justification.
## Renaming table
The table below represents the majority of the naming changes discussed in issue #506. Each row has:
- Original and/or .NET name — the canonical name used in dotnet or earlier Python variants.
- New name — the chosen Python name.
- Status — accepted if the new name differs from the original, rejected if unchanged.
- Reasoning — short rationale why the new name was chosen.
- Rejected alternatives — other candidate new names that were considered and rejected; include the rejected 'new name' values and the reason each was rejected.
| Original and/or .NET name | New name (Python) | Status | Reasoning | Rejected alternatives (as "new name" + reason rejected) |
|---|---|---|---|---|
| AIAgent | AgentProtocol | accepted | The AI prefix is meaningless in the context of the Agent Framework, and the `protocol` suffix makes it very clear that this is a protocol, and not a concrete agent implementation. |
AgentLike, not seen in many other places, but was a frontrunner.
Agent, as too generic.
BaseAgent/AbstractAgent, it is not a base/ABC class and should not be treated as such.
|
| ChatClientAgent | ChatAgent | accepted | Type name is shorter, while it is still clear that a ChatClient is used, also by virtue of the first parameter for initialization. | Agent, as too generic. |
| ChatClient/IChatClient (in dotnet) | ChatClientProtocol | accepted | Keeping this protocol in sync with the AgentProtocol naming. | Similar as AgentProtocol. |
| ChatClientBase | BaseChatClient | accepted | Following convention, serves as base class so, should be named accordingly. | None |
| AITool | ToolProtocol | accepted | In line with other protocols. | Tool, too generic. |
| AIToolBase | BaseTool | accepted | More descriptive than just Tool, while still concise. | AbstractTool/BaseTool, it is not an abstract/base class and should not be treated as such. |
| ChatRole | Role | accepted | More concise while still clear in context. | None |
| ChatFinishReason | FinishReason | accepted | More concise while still clear in context. | None |
| AIContent | BaseContent | accepted | More accurate as it serves as the base class for all content types. | Content, too generic. |
| AIContents | Contents | accepted | This is the annotated typing object that is the union of all concrete content types, so plural makes sense and since this is used as a type hint, the generic nature of the name is acceptable. | None |
| AIAnnotations | Annotations | accepted | In sync with contents | None |
| AIAnnotation | BaseAnnotation | accepted | In sync with contents | None |
| *Mcp* & *Http* | *MCP* & *HTTP* | accepted | Acronyms should be uppercased in class names, according to PEP 8. | None |
| `agent.run_streaming` | `agent.run_stream` | accepted | Shorter and more closely aligns with AutoGen and Semantic Kernel names for the same methods. | None |
| `workflow.run_streaming` | `workflow.run_stream` | accepted | In sync with `agent.run_stream` and shorter and more closely aligns with AutoGen and Semantic Kernel names for the same methods. | None |
| AgentResponse & AgentResponseUpdate | AgentResponse & AgentResponseUpdate | rejected | Rejected, because it is the response to a run invocation and AgentResponse is too generic. | None |
| *Content | * | rejected | Rejected other content type renames (removing `Content` suffix) because it would reduce clarity and discoverability. | Item was also considered, but rejected as it is very similar to Content, but would be inconsistent with dotnet. |
| ChatResponse & ChatResponseUpdate | Response & ResponseUpdate | rejected | Rejected, because Response is too generic. | None |
## Naming guidance
In general Python tends to prefer shorter names, while .NET tends to prefer more descriptive names. The table above captures the specific renames agreed upon, but in general the following guidelines were applied:
- Use [PEP 8](https://peps.python.org/pep-0008/) for generic naming conventions (snake_case for functions and module-level variables, PascalCase for classes).
When mapping .NET names to Python:
- Remove `AI` prefix when appropriate, as it is often redundant in the context of an AI SDK.
- Remove `Chat` prefix when the context is clear (e.g., Role and FinishReason).
- Use `Protocol` suffix for interfaces/protocols to clarify their purpose.
- Use `Base` prefix for base classes that are not abstract but serve as a common ancestor for internal implementations.
- When readability improves while it is still easy to understand what it does and how it maps to the .NET name, prefer the shorter name.
================================================
FILE: docs/decisions/0006-userapproval.md
================================================
---
# These are optional elements. Feel free to remove any of them.
status: accepted
contact: westey-m
date: 2025-09-12 {YYYY-MM-DD when the decision was last updated}
deciders: sergeymenshykh, markwallace-microsoft, rogerbarreto, dmytrostruk, westey-m, eavanvalkenburg, stephentoub, peterychang
consulted:
informed:
---
# Agent User Approvals Content Types and FunctionCall approvals Design
## Context and Problem Statement
When agents are operating on behalf of a user, there may be cases where the agent requires user approval to continue an operation.
This is complicated by the fact that an agent may be remote and the user may not immediately be available to provide the approval.
Inference services are also increasingly supporting built-in tools or service side MCP invocation, which may require user approval before the tool can be invoked.
This document aims to provide options and capture the decision on how to model this user approval interaction with the agent caller.
See various features that would need to be supported via this type of mechanism, plus how various other frameworks support this:
- Also see [dotnet issue 6492](https://github.com/dotnet/extensions/issues/6492), which discusses the need for a similar pattern in the context of MCP approvals.
- Also see [the openai human-in-the-loop guide](https://openai.github.io/openai-agents-js/guides/human-in-the-loop/#approval-requests).
- Also see [the openai MCP guide](https://openai.github.io/openai-agents-js/guides/mcp/#optional-approval-flow).
- Also see [MCP Approval Requests from OpenAI](https://platform.openai.com/docs/guides/tools-remote-mcp#approvals).
- Also see [Azure AI Foundry MCP Approvals](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/model-context-protocol-samples?pivots=rest#submit-your-approval).
- Also see [MCP Elicitation requests](https://modelcontextprotocol.io/specification/draft/client/elicitation)
## Decision Drivers
- Agents should encapsulate their internal logic and not leak it to the caller.
- We need to support approvals for local actions as well as remote actions.
- We need to support approvals for service-side tool use, such as remote MCP tool invocations
- We should consider how other user input requests will be modeled, so that we can have a consistent approach for user input requests and approvals.
## Considered Options
### 1. Return a FunctionCallContent to the agent caller, that it executes
This introduces a manual function calling element to agents, where the caller of the agent is expected to invoke the function if the user approves it.
This approach is problematic for a number of reasons:
- This may not work for remote agents (e.g. via A2A), where the function that the agent wants to call does not reside on the caller's machine.
- The main value prop of an agent is to encapsulate the internal logic of the agent, but this leaks that logic to the caller, requiring the caller to know how to invoke the agent's function calls.
- Inference services are introducing their own approval content types for server side tool or function invocation, and will not be addressed by this approach.
### 2. Introduce an ApprovalCallback in AgentRunOptions and ChatOptions
This approach allows a caller to provide a callback that the agent can invoke when it requires user approval.
This approach is easy to use when the user and agent are in the same application context, such as a desktop application, where the application can show the approval request to the user and get their response from the callback before continuing the agent run.
This approach does not work well for cases where the agent is hosted in a remote service, and where there is no user available to provide the approval in the same application context.
For cases like this, the agent needs to be suspended, and a network response must be sent to the client app. After the user provides their approval, the client app must call the service that hosts the agent again, with the user's decision, and the agent needs to be resumed. However, with a callback, the agent is deep in the call stack and cannot be suspended or resumed like this.
```csharp
class AgentRunOptions
{
public Func>? ApprovalCallback { get; set; }
}
agent.RunAsync("Please book me a flight for Friday to Paris.", thread, new AgentRunOptions
{
ApprovalCallback = async (approvalRequest) =>
{
// Show the approval request to the user in the appropriate format.
// The user can then approve or reject the request.
// The optional FunctionCallContent can be used to show the user what function the agent wants to call with the parameter set:
// approvalRequest.FunctionCall?.Arguments.
// If the user approves:
return true;
}
});
```
### 3. Introduce new ApprovalRequestContent and ApprovalResponseContent types
The agent would return an `ApprovalRequestContent` to the caller, which would then be responsible for getting approval from the user in whatever way is appropriate for the application.
The caller would then invoke the agent again with an `ApprovalResponseContent` to the agent containing the user decision.
When an agent returns an `ApprovalRequestContent`, the run is finished for the time being, and to continue, the agent must be invoked again with an `ApprovalResponseContent` on the same thread as the original request. This doesn't of course have to be the exact same thread object, but it should have the equivalent contents as the original thread, since the agent would have stored the `ApprovalRequestContent` in its thread state.
The `ApprovalRequestContent` could contain an optional `FunctionCallContent` if the approval is for a function call, along with any additional information that the agent wants to provide to the user to help them make a decision.
It is up to the agent to decide when and if a user approval is required, and therefore when to return an `ApprovalRequestContent`.
`ApprovalRequestContent` and `ApprovalResponseContent` will not necessarily always map to a supported content type for the underlying service or agent thread storage.
Specifically, when we are deciding in the IChatClient stack to ask for approval from the user, for a function call, this does not mean that the underlying ai service or
service side thread type (where applicable) supports the concept of a function call approval request. While we can store the approval requests and response in local
threads, service managed threads won't necessarily support this. For service managed threads, there will therefore be no long term record of the approval request in the chat history.
We should however log approvals so that there is a trace of this for debugging and auditing purposes.
Suggested Types:
```csharp
class ApprovalRequestContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string Id { get; set; }
// An optional user targeted message to explain what needs to be approved.
public string? Text { get; set; }
// Optional: If the approval is for a function call, this will contain the function call content.
public FunctionCallContent? FunctionCall { get; set; }
public ApprovalResponseContent CreateApproval()
{
return new ApprovalResponseContent
{
Id = this.Id,
Approved = true,
FunctionCall = this.FunctionCall
};
}
public ApprovalResponseContent CreateRejection()
{
return new ApprovalResponseContent
{
Id = this.Id,
Approved = false,
FunctionCall = this.FunctionCall
};
}
}
class ApprovalResponseContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string Id { get; set; }
// Indicates whether the user approved the request.
public bool Approved { get; set; }
// Optional: If the approval is for a function call, this will contain the function call content.
public FunctionCallContent? FunctionCall { get; set; }
}
var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread);
while (response.ApprovalRequests.Count > 0)
{
List messages = new List();
foreach (var approvalRequest in response.ApprovalRequests)
{
// Show the approval request to the user in the appropriate format.
// The user can then approve or reject the request.
// The optional FunctionCallContent can be used to show the user what function the agent wants to call with the parameter set:
// approvalRequest.FunctionCall?.Arguments.
// The Text property of the ApprovalRequestContent can also be used to show the user any additional textual context about the request.
// If the user approves:
messages.Add(new ChatMessage(ChatRole.User, [approvalRequest.CreateApproval()]));
}
// Get the next response from the agent.
response = await agent.RunAsync(messages, thread);
}
class AgentResponse
{
...
// A new property on AgentResponse to aggregate the ApprovalRequestContent items from
// the response messages (Similar to the Text property).
public IEnumerable ApprovalRequests { get; set; }
...
}
```
### 4. Introduce new Container UserInputRequestContent and UserInputResponseContent types
This approach is similar to the `ApprovalRequestContent` and `ApprovalResponseContent` types, but is more generic and can be used for any type of user input request, not just approvals.
There is some ambiguity with this approach. When using an LLM based agent the LLM may return a text response about missing user input.
E.g the LLM may need to invoke a function but the user did not supply all necessary information to fill out all arguments.
Typically an LLM would just respond with a text message asking the user for the missing information.
In this case, the message is not distinguishable from any other result message, and therefore cannot be returned to the caller as a `UserInputRequestContent`, even though it is conceptually a type of unstructured user input request. Ultimately our types are modeled to make it easy for callers to decide on the right way to represent this to users. E.g. is it just a regular message to show to users, or do we need a special UX for it.
Suggested Types:
```csharp
class UserInputRequestContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string ApprovalId { get; set; }
// DecisionTarget could contain:
// FunctionCallContent: The function call that the agent wants to invoke.
// TextContent: Text that describes the question for that the user should answer.
object? DecisionTarget { get; set; } // Anything else the user may need to make a decision about.
// Possible InputFormat subclasses:
// SchemaInputFormat: Contains a schema for the user input.
// ApprovalInputFormat: Indicates that the user needs to approve something.
// FreeformTextInputFormat: Indicates that the user can provide freeform text input.
// Other formats can be added as needed, e.g. cards when using activity protocol.
public InputFormat InputFormat { get; set; } // How the user should provide input (e.g., form, options, etc.).
}
class UserInputResponseContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string ApprovalId { get; set; }
// Possible UserInputResult subclasses:
// SchemaInputResult: Contains the structured data provided by the user.
// ApprovalResult: Contains a bool with approved / rejected.
// FreeformTextResult: Contains the freeform text input provided by the user.
public UserInputResult Result { get; set; } // The user input.
public object? DecisionTarget { get; set; } // A copy of the DecisionTarget from the UserInputRequestContent, if applicable.
}
var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread);
while (response.UserInputRequests.Any())
{
List messages = new List();
foreach (var userInputRequest in response.UserInputRequests)
{
// Show the user input request to the user in the appropriate format.
// The DecisionTarget can be used to show the user what function the agent wants to call with the parameter set.
// The InputFormat property can be used to determine the type of UX when allowing users to provide input.
if (userInputRequest.InputFormat is ApprovalInputFormat approvalInputFormat)
{
// Here we need to show the user an approval request.
// We can use the DecisionTarget to show e.g. the function call that the agent wants to invoke.
// The user can then approve or reject the request.
// If the user approves:
var approvalMessage = new ChatMessage(ChatRole.User, new UserInputResponseContent {
ApprovalId = userInputRequest.ApprovalId,
Result = new ApprovalResult { Approved = true },
DecisionTarget = userInputRequest.DecisionTarget
});
messages.Add(approvalMessage);
}
else
{
throw new NotSupportedException("Unsupported InputFormat type.");
}
}
// Get the next response from the agent.
response = await agent.RunAsync(messages, thread);
}
class AgentResponse
{
...
// A new property on AgentResponse to aggregate the UserInputRequestContent items from
// the response messages (Similar to the Text property).
public IReadOnlyList UserInputRequests { get; set; }
...
}
```
### 5. Introduce new Base UserInputRequestContent and UserInputResponseContent types
This approach is similar to option 4, but the `UserInputRequestContent` and `UserInputResponseContent` types are base classes rather than generic container types.
Suggested Types:
```csharp
class UserInputRequestContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string Id { get; set; }
}
class UserInputResponseContent : AIContent
{
// An ID to uniquely identify the approval request/response pair.
public string Id { get; set; }
}
// -----------------------------------
// Used for approving a function call.
class FunctionApprovalRequestContent : UserInputRequestContent
{
// Contains the function call that the agent wants to invoke.
public FunctionCallContent FunctionCall { get; set; }
public ApprovalResponseContent CreateApproval()
{
return new ApprovalResponseContent
{
Id = this.Id,
Approved = true,
FunctionCall = this.FunctionCall
};
}
public ApprovalResponseContent CreateRejection()
{
return new ApprovalResponseContent
{
Id = this.Id,
Approved = false,
FunctionCall = this.FunctionCall
};
}
}
class FunctionApprovalResponseContent : UserInputResponseContent
{
// Indicates whether the user approved the request.
public bool Approved { get; set; }
// Contains the function call that the agent wants to invoke.
public FunctionCallContent FunctionCall { get; set; }
}
// --------------------------------------------------
// Used for approving a request described using text.
class TextApprovalRequestContent : UserInputRequestContent
{
// A user targeted message to explain what needs to be approved.
public string Text { get; set; }
}
class TextApprovalResponseContent : UserInputResponseContent
{
// Indicates whether the user approved the request.
public bool Approved { get; set; }
}
// ------------------------------------------------
// Used for providing input in a structured format.
class StructuredDataInputRequestContent : UserInputRequestContent
{
// A user targeted message to explain what is being requested.
public string? Text { get; set; }
// Contains the schema for the user input.
public JsonElement Schema { get; set; }
}
class StructuredDataInputResponseContent : UserInputResponseContent
{
// Contains the structured data provided by the user.
public JsonElement StructuredData { get; set; }
}
var response = await agent.RunAsync("Please book me a flight for Friday to Paris.", thread);
while (response.UserInputRequests.Any())
{
List messages = new List();
foreach (var userInputRequest in response.UserInputRequests)
{
if (userInputRequest is FunctionApprovalRequestContent approvalRequest)
{
// Here we need to show the user an approval request.
// We can use the FunctionCall property to show e.g. the function call that the agent wants to invoke.
// If the user approves:
messages.Add(new ChatMessage(ChatRole.User, approvalRequest.CreateApproval()));
}
}
// Get the next response from the agent.
response = await agent.RunAsync(messages, thread);
}
class AgentResponse
{
...
// A new property on AgentResponse to aggregate the UserInputRequestContent items from
// the response messages (Similar to the Text property).
public IEnumerable UserInputRequests { get; set; }
...
}
```
## Decision Outcome
Chosen option 5.
## Appendices
### ChatClientAgent Approval Process Flow
1. User passes a User message to the agent with a request.
1. Agent calls IChatClient with any functions registered on the agent.
(IChatClient has FunctionInvokingChatClient)
1. Model responds with FunctionCallContent indicating function calls required.
1. FunctionInvokingChatClient decorator identifies any function calls that require user approval and returns an FunctionApprovalRequestContent.
(If there are multiple parallel function calls, all function calls will be returned as FunctionApprovalRequestContent even if only some require approval.)
1. Agent updates the thread with the FunctionApprovalRequestContent (or this may have already been done by a service threaded agent).
1. Agent returns the FunctionApprovalRequestContent to the caller which shows it to the user in the appropriate format.
1. User (via caller) invokes the agent again with FunctionApprovalResponseContent.
1. Agent adds the FunctionApprovalResponseContent to the thread.
1. Agent calls IChatClient with the provided FunctionApprovalResponseContent.
1. Agent invokes IChatClient with FunctionApprovalResponseContent and the FunctionInvokingChatClient decorator identifies the response as an approval for the function call.
Any rejected approvals are converted to FunctionResultContent with a message indicating that the function invocation was denied.
Any approved approvals are executed by the FunctionInvokingChatClient decorator.
1. FunctionInvokingChatClient decorator passes the FunctionCallContent and FunctionResultContent for the approved and rejected function calls to the model.
1. Model responds with the result.
1. FunctionInvokingChatClient returns the FunctionCallContent, FunctionResultContent, and the result message to the agent.
1. Agent responds to caller with the same messages and updates the thread with these as well.
### CustomAgent Approval Process Flow
1. User passes a User message to the agent with a request.
1. Agent adds this message to the thread.
1. Agent executes various steps.
1. Agent encounters a step for which it requires user input to continue.
1. Agent responds with an UserInputRequestContent and also adds it to its thread.
1. User (via caller) invokes the agent again with UserInputResponseContent.
1. Agent adds the UserInputResponseContent to the thread.
1. Agent responds to caller with result message and thread is updated with the result message.
### Sequence Diagram: FunctionInvokingChatClient with built in Approval Generation
This is a ChatClient Approval Stack option has been proven to work via a proof of concept implementation.
```mermaid
---
title: Multiple Functions with partial approval
---
sequenceDiagram
note right of Developer: Developer asks question with two functions.
Developer->>+FunctionInvokingChatClient: What is the special soup today? [GetMenu, GetSpecials]
FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today? [GetMenu, GetSpecials]
ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)]
note right of FunctionInvokingChatClient: FICC turns FunctionCallContent into FunctionApprovalRequestContent
FunctionInvokingChatClient->>+Developer: [FunctionApprovalRequestContent(GetMenu)] [FunctionApprovalRequestContent(GetSpecials)]
note right of Developer:Developer asks user for approval
Developer->>+FunctionInvokingChatClient: [FunctionApprovalRequestContent(GetMenu, approved=false)] [FunctionApprovalRequestContent(GetSpecials, approved=true)]
note right of FunctionInvokingChatClient:FunctionInvokingChatClient executes the approved function and generates a failed FunctionResultContent for the rejected one, before invoking the model again.
FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today? [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)], [FunctionResultContent(GetMenu, Function invocation denied")] [FunctionResultContent(GetSpecials, "Special Soup: Clam Chowder...")]
ResponseChatClient-->>-FunctionInvokingChatClient: [TextContent("The specials soup is...")]
FunctionInvokingChatClient->>+Developer: [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)], [FunctionResultContent(GetMenu, Function invocation denied")] [FunctionResultContent(GetSpecials, "Special Soup: Clam Chowder...")] [TextContent("The specials soup is...")]
```
### Sequence Diagram: Post FunctionInvokingChatClient ApprovalGeneratingChatClient - Multiple function calls with partial approval
This is a discarded ChatClient Approval Stack option, but is included here for reference.
```mermaid
---
title: Multiple Functions with partial approval
---
sequenceDiagram
note right of Developer: Developer asks question with two functions.
Developer->>+FunctionInvokingChatClient: What is the special soup today? [GetMenu, GetSpecials]
FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: What is the special soup today? [GetMenu, GetSpecials]
ApprovalGeneratingChatClient->>+ResponseChatClient: What is the special soup today? [GetMenu, GetSpecials]
ResponseChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)]
ApprovalGeneratingChatClient-->>-FunctionInvokingChatClient: [FunctionApprovalRequestContent(GetMenu)], [FunctionApprovalRequestContent(GetSpecials)]
FunctionInvokingChatClient-->>-Developer: [FunctionApprovalRequestContent(GetMenu)] [FunctionApprovalRequestContent(GetSpecials)]
note right of Developer: Developer approves one function call and rejects the other.
Developer->>+FunctionInvokingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)] [FunctionApprovalResponseContent(GetSpecials, approved=false)]
FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)] [FunctionApprovalResponseContent(GetSpecials, approved=false)]
note right of FunctionInvokingChatClient: ApprovalGeneratingChatClient only returns FunctionCallContent for approved FunctionApprovalResponseContent.
ApprovalGeneratingChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)]
note right of FunctionInvokingChatClient: FunctionInvokingChatClient has to also include all FunctionApprovalResponseContent in the new downstream request.
FunctionInvokingChatClient->>+ApprovalGeneratingChatClient: [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionApprovalResponseContent(GetMenu, approved=true)] [FunctionApprovalResponseContent(GetSpecials, approved=false)]
note right of ApprovalGeneratingChatClient: ApprovalGeneratingChatClient now throws away approvals for executed functions, and creates failed FunctionResultContent for denied function calls.
ApprovalGeneratingChatClient->>+ResponseChatClient: [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionResultContent(GetSpecials, "Function invocation denied")]
```
### Sequence Diagram: Pre FunctionInvokingChatClient ApprovalGeneratingChatClient - Multiple function calls with partial approval
This is a discarded ChatClient Approval Stack option, but is included here for reference.
It doesn't work for the scenario where we have multiple function calls for the same function in serial with different arguments.
Flow:
- AGCC turns AIFunctions into AIFunctionDefinitions (not invocable) and FICC ignores these.
- We get back a FunctionCall for one of these and it gets approved.
- We invoke the FICC again, this time with an AIFunction.
- We call the service with the FCC and FRC.
- We get back a new Function call for the same function again with different arguments.
- Since we were passed an AIFunction instead of an AIFunctionDefinition, we now incorrectly execute this FC without approval.
```mermaid
---
title: Multiple Functions with partial approval
---
sequenceDiagram
note right of Developer: Developer asks question with two functions.
Developer->>+ApprovalGeneratingChatClient: What is the special soup today? [GetMenu, GetSpecials]
note right of ApprovalGeneratingChatClient: AGCC marks functions as not-invocable
ApprovalGeneratingChatClient->>+FunctionInvokingChatClient: What is the special soup today? [GetMenu(invocable=false)] [GetSpecials(invocable=false)]
FunctionInvokingChatClient->>+ResponseChatClient: What is the special soup today? [GetMenu(invocable=false)] [GetSpecials(invocable=false)]
ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)]
note right of FunctionInvokingChatClient: FICC doesn't invoke functions since they are not invocable.
FunctionInvokingChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)], [FunctionCallContent(GetSpecials)]
note right of ApprovalGeneratingChatClient: AGCC turns functions into approval requests
ApprovalGeneratingChatClient-->>-Developer: [FunctionApprovalRequestContent(GetMenu)] [FunctionApprovalRequestContent(GetSpecials)]
note right of Developer: Developer approves one function call and rejects the other.
Developer->>+ApprovalGeneratingChatClient: [FunctionApprovalResponseContent(GetMenu, approved=true)] [FunctionApprovalResponseContent(GetSpecials, approved=false)]
note right of ApprovalGeneratingChatClient: AGCC turns turns approval requests into FCC or failed function calls
ApprovalGeneratingChatClient->>+FunctionInvokingChatClient: [FunctionCallContent(GetMenu)] [FunctionCallContent(GetSpecials) [FunctionResultContent(GetSpecials, "Function invocation denied"))]
note right of FunctionInvokingChatClient: FICC invokes GetMenu since it's the only remaining one.
FunctionInvokingChatClient->>+ResponseChatClient: [FunctionCallContent(GetMenu)] [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionCallContent(GetSpecials) [FunctionResultContent(GetSpecials, "Function invocation denied"))]
ResponseChatClient-->>-FunctionInvokingChatClient: [FunctionCallContent(GetMenu)] [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionCallContent(GetSpecials) [FunctionResultContent(GetSpecials, "Function invocation denied"))] [TextContent("The specials soup is...")]
FunctionInvokingChatClient-->>-ApprovalGeneratingChatClient: [FunctionCallContent(GetMenu)] [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionCallContent(GetSpecials) [FunctionResultContent(GetSpecials, "Function invocation denied"))] [TextContent("The specials soup is...")]
ApprovalGeneratingChatClient-->>-Developer: [FunctionCallContent(GetMenu)] [FunctionResultContent(GetMenu, "mains.... deserts...")] [FunctionCallContent(GetSpecials) [FunctionResultContent(GetSpecials, "Function invocation denied"))] [TextContent("The specials soup is...")]
```
================================================
FILE: docs/decisions/0007-agent-filtering-middleware.md
================================================
---
status: proposed
contact: rogerbarreto
date: 2025-09-15
deciders: markwallace-microsoft, rogerbarreto, westey-m, dmytrostruk, sergeymenshykh
informed: {}
---
# Agent Filtering Middleware Design
## Context and Problem Statement
The current Agent Framework lacks a standardized, extensible mechanism for intercepting and processing agent execution. Developers need the ability to add custom filters/middleware to intercept and modify agent behavior at various stages of the execution pipeline. While the framework has basic agent abstractions with `RunAsync` and `RunStreamingAsync` methods, and standards like approval workflows, there is no middleware that allows developers to intercept and modify agent behavior at different agent execution contexts.
The challenge is to design an architecture that supports:
- Multiple execution contexts (invocation, function calls, approval requests, error handling)
- Support for both streaming and non-streaming scenarios
- Dependency injection friendly setup
## Decision Drivers
- Agents should be able to intercept and modify agent behavior at various stages of the execution pipeline.
- The design should be simple and intuitive for developers to understand and use.
- The design should be extensible to support new execution contexts and scenarios.
- The design should support both manual and dependency injection configuration.
- The design should allow flexible custom behaviors provided by enough context information.
- The design should be exception friendly and allow clear error handling and recovery mechanisms.
## Other AI Agent Framework Analysis
This section provides an analysis of how other major AI agent frameworks handle filtering, middleware, hooks, or similar interception capabilities. The goal is to identify ubiquitous language, design patterns, and approaches that could inform our Agent Middleware design also providing valuable insights into achieving a more idiomatic designs.
### Overview Comparison Table
| Provider | Language | Supports (Y/N) | Naming | TL;DR Observation |
|---------------------------|----------|----------------|---------------------------------|------------------------|
| LangChain (Python) | Python | Y (read) | Callbacks (BaseCallbackHandler) | Uses observer pattern with event methods for interception (e.g., on_chain_start); supports agent actions and errors; handlers can read inputs/outputs and modification is limited to the parameters or by raising exceptions to influence flow. [Details](#langchain) |
| LangChain (JS) | JS | Y (read/write) | Callbacks (BaseCallbackHandler) | Similar observer pattern to Python, with event methods adapted for JS async handling; supports chain/agent interception; handlers can read inputs/outputs and modify metadata or raise exceptions to influence flow. [Details](#langchain) |
| LangChain | JS/Python/TS | Y (read/write) | Middleware | Middleware concept was recently introduced in LangChain 1.0 alpha; [Details](https://blog.langchain.com/agent-middleware/) |
| LangGraph | Python | Y (read/write) | Hooks/Callbacks (inherited from LangChain) | Event-driven with runtime handlers; integrates callbacks for observability in graphs; inherits LangChain's ability to read/modify metadata or interrupt execution. [Details](#langgraph) |
| AutoGen (Python) | Python | Y (read/write) | Reply Functions (register_reply) | Reply functions intercept and process messages; middleware-like for agent replies; can directly modify messages or replies before continuing. [Details](#autogen) |
| AutoGen (C#) | C# | Y (read/write) | Middleware (MiddlewareAgent) | Decorator/wrapper with middleware delegates for message modification; delegates can read and alter message content or options. [Details](#autogen) |
| Semantic Kernel (C#) | C# | Y (read/write) | Filters (IFunctionInvocationFilter, etc.) | Interface-based middleware pattern for function/prompt interception; filters can read and modify context, arguments, or results. [Details](#semantic-kernel) |
| Semantic Kernel (Python) | Python | Y (read/write) | Filters (add_filter, @kernel.filter decorator) | Function and decorator-based for interception; no explicit interfaces like C#, focuses on async functions for filters; can read and modify context/arguments/results. [Details](#semantic-kernel) |
| CrewAI | Python | Y (read) | Events/Callbacks (BaseEventListener) | Event-driven orchestration with listeners for workflows; listeners can observe events (e.g., read source/event data) but are primarily for logging/reactions without direct modification of workflow state. [Details](#crewai) |
| LlamaIndex | Python | Y (read) | Callbacks (CallbackManager) | Observer pattern with event methods for queries and tools; handlers can observe events/payloads (e.g., read prompts/responses) but are designed for debugging/tracing without modifying execution context. [Details](#llamaindex) |
| Haystack | Python | N (Pipeline-based interception) | N/A (Pipeline Components/Routers) | Relies on modular pipelines for implicit interception but lacks explicit middleware/filters; custom components can read/write data flow via routing/transformations, but this is compositional rather than hook-based interception. [Details](#haystack) |
| OpenAI Swarm | Python | N | N/A | No explicit middleware/filters; interception requires custom wrappers or manual handling (e.g., function decorators, client subclassing), lacking native framework support for built-in components to accept such modifications. [Details](#openai-swarm) |
| Atomic Agents | Python | N | N/A (Composable Components) | No explicit middleware/filters; modularity allows composable units but no dedicated interception hooks or callbacks for custom reading/modification mid-execution. [Details](#atomic-agents) |
| Smolagents (Hugging Face)| Python | N | N/A | No explicit support; focuses on simple agent building without interception mechanisms or hooks for reading/modifying execution. [Details](#smolagents-hugging-face) |
| Phidata (Agno) | Python | N | N/A | No explicit middleware/filters; agents use tools/memory but no interception hooks for custom reading/modification of calls. [Details](#phidata-agno) |
| PromptFlow (Microsoft) | Python | N (Tracing only) | Tracing | Supports tracing for LLM interactions, acting as callbacks for debugging/iteration; tracing is read-only for observability/telemetry without options to modify context or intercept calls beyond logging. [Details](#promptflow-microsoft) |
| n8n | JS/TS | Y (read/write) | Callbacks (inherited from LangChain) | AI Agent node uses LangChain under the hood, inheriting callbacks for observability; supports reading/modifying metadata or interrupting flow as in LangChain. [Details](#n8n) |
## Considered Options
### Option 1: Semantic Kernel Approach
Similar to the Semantic Kernel kernel filters this option involves exposing different interface and properties for each specialized filter.
```csharp
var services = new ServiceCollection();
services.AddSingleton();
services.AddSingleton();
// Using DI
var agent = new MyAgent(services.BuildServiceProvider());
// Manual
var agent = new MyAgent();
agent.RunFilters.Add(new MyAgentRunFilter());
agent.FunctionCallFilters.Add(new MyAgentFunctionCallFilter());
public class MyAgentRunFilter : IAgentRunFilter
{
public async Task OnRunAsync(AgentRunContext context, Func next, CancellationToken cancellationToken = default)
{
// Pre-run logic
await next(context);
// Post-run logic
}
}
public interface IAgentRunFilter
{
Task OnRunAsync(AgentRunContext context, Func next, CancellationToken cancellationToken = default);
}
public interface IAgentFunctionCallFilter
{
Task OnFunctionCallAsync(AgentFunctionCallContext context, Func next, CancellationToken cancellationToken = default);
}
public class AIAgent
{
private readonly AgentFilterProcessor _filterProcessor;
public AIAgent(AgentFilterProcessor? filterProcessor = null)
{
_filterProcessor = filterProcessor ?? new AgentFilterProcessor();
}
public AIAgent(IServiceProvider serviceProvider)
{
_filterProcessor = serviceProvider.GetService() ?? new AgentFilterProcessor();
// Auto-register filters from DI
var filters = serviceProvider.GetServices();
foreach (var filter in filters)
{
_filterProcessor.AddFilter(filter);
}
}
public async Task RunAsync(
IReadOnlyCollection messages,
AgentThread? thread = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
var context = new AgentRunContext(messages, thread, options);
// Process through filter pipeline using the same pattern as Semantic Kernel
await _filterProcessor.ProcessAsync(context, async ctx =>
{
// Core agent logic - implement actual agent execution here
var response = await this.ExecuteCoreLogicAsync(ctx.Messages, ctx.Thread, ctx.Options, cancellationToken);
ctx.Response = response;
}, cancellationToken);
// Extract the response from the context
return context.Response ?? throw new InvalidOperationException("Agent execution did not produce a response");
}
protected abstract Task ExecuteCoreLogicAsync(
IReadOnlyCollection messages,
AgentThread? thread,
AgentRunOptions? options,
CancellationToken cancellationToken);
}
```
#### Pros
- Clean separation of concerns
- Follows established patterns in Semantic Kernel and easy migration path
- No resistance or complaints from the community when used in Semantic Kernel
- Composable and reusable filter components
#### Cons
- Adding more filters may require adding more properties to the agent class.
- Filters are not always used, and adding this responsibility to the `AIAgent` abstraction level, may be an overkill.
### Option 2: Agent Filter Decorator Pattern
Similar to the `OpenTelemetryAgent` and the `DelegatingChatClient` in `Microsoft.Extensions.AI`, this option involves creating decorator agents that wrap the inner agent and allow interception of method calls. The current POC implementation demonstrates two approaches:
#### 2a. Direct Decorator Implementation (GuardrailCallbackAgent)
```csharp
// Current POC implementation from samples
var agent = persistentAgentsClient.CreateAIAgent(model).AsBuilder()
.Use((innerAgent) => new GuardrailCallbackAgent(innerAgent)) // Decoration based agent run handling
.Use(async (context, next) => // Context based handling
{
// Guardrail: Filter input messages for PII
context.Messages = context.Messages.Select(m => new ChatMessage(m.Role, FilterPii(m.Text))).ToList();
Console.WriteLine($"Pii Middleware - Filtered messages: {new ChatResponse(context.Messages).Text}");
await next(context);
if (!context.IsStreaming)
{
// Guardrail: Filter output messages for PII
context.Messages = context.Messages.Select(m => new ChatMessage(m.Role, FilterPii(m.Text))).ToList();
}
else
{
context.SetRawResponse(StreamingPiiDetectionAsync(context.RunStreamingResponse!));
}
})
.Build();
// Direct decorator implementation
internal sealed class GuardrailCallbackAgent : DelegatingAIAgent
{
private readonly string[] _forbiddenKeywords = { "harmful", "illegal", "violence" };
public GuardrailCallbackAgent(AIAgent innerAgent) : base(innerAgent) { }
public override async Task RunAsync(IEnumerable messages, AgentThread? thread = null, AgentRunOptions? options = null, CancellationToken cancellationToken = default)
{
var filteredMessages = this.FilterMessages(messages);
Console.WriteLine($"Guardrail Middleware - Filtered messages: {new ChatResponse(filteredMessages).Text}");
var response = await this.InnerAgent.RunAsync(filteredMessages, thread, options, cancellationToken);
response.Messages = response.Messages.Select(m => new ChatMessage(m.Role, this.FilterContent(m.Text))).ToList();
return response;
}
public override async IAsyncEnumerable RunStreamingAsync(IEnumerable messages, AgentThread? thread = null, AgentRunOptions? options = null, [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var filteredMessages = this.FilterMessages(messages);
await foreach (var update in this.InnerAgent.RunStreamingAsync(filteredMessages, thread, options, cancellationToken))
{
if (update.Text != null)
{
yield return new AgentResponseUpdate(update.Role, this.FilterContent(update.Text));
}
else
{
yield return update;
}
}
}
private List FilterMessages(IEnumerable messages)
{
return messages.Select(m => new ChatMessage(m.Role, this.FilterContent(m.Text))).ToList();
}
private string FilterContent(string content)
{
foreach (var keyword in this._forbiddenKeywords)
{
if (content.Contains(keyword, StringComparison.OrdinalIgnoreCase))
{
return "[REDACTED: Forbidden content]";
}
}
return content;
}
}
```
#### 2b. Context-Based Middleware (RunningCallbackHandlerAgent)
The POC also includes a context-based approach using `RunningCallbackHandlerAgent` that wraps the agent and provides a context object for middleware processing:
```csharp
// Internal implementation that supports the .Use() pattern
internal sealed class RunningCallbackHandlerAgent : DelegatingAIAgent
{
private readonly Func, Task> _func;
internal RunningCallbackHandlerAgent(AIAgent innerAgent, Func, Task> func) : base(innerAgent)
{
this._func = func;
}
public override async Task RunAsync(IEnumerable messages, AgentThread? thread = null, AgentRunOptions? options = null, CancellationToken cancellationToken = default)
{
var context = new AgentInvokeCallbackContext(this, messages, thread, options, isStreaming: false, cancellationToken);
async Task CoreLogicAsync(AgentInvokeCallbackContext ctx)
{
var response = await this.InnerAgent.RunAsync(ctx.Messages, ctx.Thread, ctx.Options, ctx.CancellationToken);
ctx.SetRawResponse(response);
}
await this._func(context, CoreLogicAsync);
return context.RunResponse!;
}
}
```
#### 2c. Function Invocation Filtering
The POC also demonstrates function invocation filtering using a similar decorator pattern:
```csharp
// Function invocation middleware using .Use() pattern
var agent = persistentAgentsClient.CreateAIAgent(model)
.AsBuilder()
.Use((functionInvocationContext, next, ct) =>
{
Console.WriteLine($"IsStreaming: {functionInvocationContext!.IsStreaming}");
return next(functionInvocationContext.Arguments, ct);
})
.Use((functionInvocationContext, next, ct) =>
{
Console.WriteLine($"City Name: {(functionInvocationContext!.Arguments.TryGetValue("location", out var location) ? location : "not provided")}");
return next(functionInvocationContext.Arguments, ct);
})
.Build();
```
This demonstrates that the current POC supports both agent-level and function-level filtering through consistent patterns.
#### Pros
- Clean separation of concerns
- Follows established patterns in `Microsoft.Extensions.AI` (DelegatingChatClient, OpenTelemetryAgent)
- Non-intrusive to existing agent implementations
- Supports both manual and DI configuration through builder pattern
- Context-specific processing middleware with `AgentInvokeCallbackContext`
- Composable and reusable filter components
- Flexible implementation allowing both direct decorators and context-based middleware
- Seamless integration with builder pattern using `.Use()` method
- Support for both streaming and non-streaming scenarios
- Rich context object providing access to messages, thread, options, and response handling
### Option 3: Dedicated Processor Component for Middleware
This approach involves creating a dedicated `CallbackMiddlewareProcessor` that manages collections of `ICallbackMiddleware` instances. The current POC implementation demonstrates this pattern with the `CallbackEnabledAgent` and processor architecture.
#### Current POC Implementation
```csharp
// Current POC usage from samples
var agent = persistentAgentsClient.CreateAIAgent(model)
.AsBuilder()
.UseCallbacks(config =>
{
config.AddCallback(new PiiDetectionMiddleware());
config.AddCallback(new GuardrailCallbackMiddleware());
}).Build();
// Middleware implementation
internal sealed class PiiDetectionMiddleware : CallbackMiddleware
{
public override async Task OnProcessAsync(AgentInvokeCallbackContext context, Func next, CancellationToken cancellationToken)
{
// Guardrail: Filter input messages for PII
context.Messages = context.Messages.Select(m => new ChatMessage(m.Role, FilterPii(m.Text))).ToList();
Console.WriteLine($"Pii Middleware - Filtered messages: {new ChatResponse(context.Messages).Text}");
await next(context);
if (!context.IsStreaming)
{
// Guardrail: Filter output messages for PII
context.Messages = context.Messages.Select(m => new ChatMessage(m.Role, FilterPii(m.Text))).ToList();
}
else
{
context.SetRawResponse(StreamingPiiDetectionAsync(context.RunStreamingResponse!));
}
}
private static string FilterPii(string content)
{
// PII detection logic...
}
}
internal sealed class GuardrailCallbackMiddleware : CallbackMiddleware
{
private readonly string[] _forbiddenKeywords = { "harmful", "illegal", "violence" };
public override async Task OnProcessAsync(AgentInvokeCallbackContext context, Func next, CancellationToken cancellationToken)
{
// Guardrail: Filter input messages for forbidden content
context.Messages = this.FilterMessages(context.Messages);
Console.WriteLine($"Guardrail Middleware - Filtered messages: {new ChatResponse(context.Messages).Text}");
await next(context);
if (!context.IsStreaming)
{
// Guardrail: Filter output messages for forbidden content
context.Messages = this.FilterMessages(context.Messages);
}
else
{
context.SetRawResponse(StreamingGuardRailAsync(context.RunStreamingResponse!));
}
}
}
```
#### Function Invocation Filtering
The POC also demonstrates function invocation filtering using the processor pattern:
```csharp
// Processor-based function invocation middleware
var agent = persistentAgentsClient.CreateAIAgent(model)
.AsBuilder()
.UseCallbacks(config =>
{
config.AddCallback(new UsedApiFunctionInvocationCallback());
config.AddCallback(new CityInformationFunctionInvocationCallback());
}).Build();
internal sealed class UsedApiFunctionInvocationCallback : CallbackMiddleware
{
public override async Task OnProcessAsync(AgentFunctionInvocationCallbackContext context, Func next, CancellationToken cancellationToken)
{
Console.WriteLine($"IsStreaming: {context!.IsStreaming}");
await next(context);
}
}
internal sealed class CityInformationFunctionInvocationCallback : CallbackMiddleware
{
public override async Task OnProcessAsync(AgentFunctionInvocationCallbackContext context, Func next, CancellationToken cancellationToken)
{
Console.WriteLine($"City Name: {(context!.Arguments.TryGetValue("location", out var location) ? location : "not provided")}");
await next(context);
}
}
```
This demonstrates that the current POC supports both agent-level and function-level filtering through consistent patterns.
#### Processor Implementation
The `CallbackMiddlewareProcessor` manages the filter pipeline and chain execution:
```csharp
public sealed class CallbackMiddlewareProcessor
{
// For thread-safety when used as a Singleton
private readonly ConcurrentBag _agentCallbacks = [];
public CallbackMiddlewareProcessor(IEnumerable? callbacks = null)
{
if (callbacks is not null)
{
foreach (var callback in callbacks)
{
AddCallback(callback);
}
}
}
internal CallbackMiddlewareProcessor AddCallback(ICallbackMiddleware middleware)
{
switch (middleware)
{
case CallbackMiddleware:
this._agentCallbacks.Add(middleware);
break;
default:
throw new ArgumentException($"The middleware type '{middleware.GetType().FullName}' is not supported.", nameof(middleware));
}
return this;
}
public async Task ProcessAsync(TContext context, Func coreLogic, CancellationToken cancellationToken = default)
where TContext : CallbackContext
{
var applicableCallbacks = this.GetApplicableCallbacks().ToList();
await this.InvokeChainAsync(context, applicableCallbacks, 0, coreLogic, cancellationToken);
}
private IEnumerable GetApplicableCallbacks()
where TContext : CallbackContext
{
return this._agentCallbacks.Where(callback => callback.CanProcess());
}
}
```
#### CallbackEnabledAgent Implementation
```csharp
public sealed class CallbackEnabledAgent : DelegatingAIAgent
{
private readonly CallbackMiddlewareProcessor _callbacksProcessor;
public CallbackEnabledAgent(AIAgent agent, CallbackMiddlewareProcessor? callbackMiddlewareProcessor) : base(agent)
{
this._callbacksProcessor = callbackMiddlewareProcessor ?? new();
}
public override async Task RunAsync(
IEnumerable messages,
AgentThread? thread = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
AgentInvokeCallbackContext roamingContext = null!;
async Task CoreLogic(AgentInvokeCallbackContext ctx)
{
roamingContext ??= ctx;
var result = await this.InnerAgent.RunAsync(ctx.Messages, ctx.Thread, ctx.Options, ctx.CancellationToken);
ctx.SetRawResponse(result);
}
await this._callbacksProcessor.ProcessAsync(
new AgentInvokeCallbackContext(
agent: this,
messages: messages,
thread,
options,
isStreaming: false,
cancellationToken),
CoreLogic,
cancellationToken);
return roamingContext.RunResponse!;
}
}
```
#### Pros
- Flexibility: Use shared processor for multiple agents or create per-agent instances
- Clean fluent configuration API with `.UseCallbacks()` builder method
- Type-safe middleware registration with `CallbackMiddleware` base class
- Thread-safe processor implementation using `ConcurrentBag`
- Extensible context system with `AgentInvokeCallbackContext` providing rich execution context
- Seamless integration with existing agent builder pattern
- Support for both streaming and non-streaming scenarios in middleware
- Clear separation between middleware logic and agent core functionality
- Simplicity: Agents stay lean, middleware is externalized to processor
- Extensibility: Add new contexts/filters without changing agent implementation
#### Cons
- Additional complexity with processor class and context management
- Requires understanding of middleware lifecycle and context passing
- Type switching in processor for different middleware types
- Roaming context pattern needed to capture specialized contexts through middleware chain
## APPENDIX 1: Proposed Middleware Contexts
The following context classes would be needed to support the filtering architecture:
```csharp
public abstract class AgentContext
{
// For scenarios where the filter is processed by multiple agents sounds very desirable to provide access to the invoking agent
public AIAgent Agent { get; }
public AgentRunOptions? Options { get; set; } // Options are allowed to be set by filters
protected AgentContext(AIAgent agent, AgentRunOptions? options)
{
Agent = agent;
Options = options;
}
}
public class AgentRunContext : AgentContext
{
public IList Messages { get; set; }
public AgentResponse? Response { get; set; }
public AgentThread? Thread { get; }
public AgentRunContext(AIAgent agent, IList messages, AgentThread? thread, AgentRunOptions? options)
: base(agent, options)
{
Messages = messages;
Thread = thread;
}
}
public class AgentFunctionInvocationContext : AgentToolContext
{
// Similar to MEAI.FunctionInvocationContext
public AIFunction Function { get; set; }
public AIFunctionArguments Arguments { get; set; }
public FunctionCallContent CallContent { get; set; }
public IList Messages { get; set; }
public ChatOptions? Options { get; set; }
public int Iteration { get; set; }
public int FunctionCallIndex { get; set; }
public int FunctionCount { get; set; }
public bool Terminate { get; set; }
public bool IsStreaming { get; set; }
}
```
## APPENDIX 2: Setting Up Middleware Options
### 1. Semantic Kernel Setup
Has the benefit of clear separation of concerns, but this approach requires developers
to manage and maintain separate collections for each filter type, increasing code complexity and maintenance overhead.
```csharp
// Use Case
var agent = new MyAgent();
agent.RunFilters.Add(new MyAgentRunFilter());
agent.RunFilters.Add(new MyMultipleFilterImplementation());
agent.FunctionCallFilters.Add(new MyAgentFunctionCallFilter());
agent.FunctionCallFilters.Add(new MyMultipleFilterImplementation());
agent.AYZFilters.Add(new MyAgentAYZFilter());
agent.AYZFilters.Add(new MyMultipleFilterImplementation());
// Impl
interface IAgentRunFilter
{
Task OnRunAsync(AgentRunContext context, Func next, CancellationToken cancellationToken = default);
}
interface IAgentFunctionCallFilter
{
Task OnFunctionCallAsync(AgentFunctionCallContext context, Func next, CancellationToken cancellationToken = default);
}
```
#### Pros
- Clean separation of concerns
- Follows established patterns in Semantic Kernel and easy migration path
- No resistance or complaints from the community when used in Semantic Kernel
#### Cons
- Adding more filters may require adding more properties to the agent/processor class.
- Adding more filters requires bigger code changes downstream to callers.
### 2. Setup with Generic Method
Instead of properties, exposing as a method may be more appropriate while still maintaining those filters in separate buckets internally.
```csharp
// Use Case
var agent = new MyAgent();
agent.AddFilters([new MyAgentRunFilter(), new MyMultipleFilterImplementation()]);
agent.AddFilters([new MyAgentFunctionCallFilter(), new MyMultipleFilterImplementation()]);
agent.AddFilters([new MyAgentAYZFilter(), new MyMultipleFilterImplementation()]);
```
#### Pros
- Clean separation of concerns
- Cleaner API for adding filters compared to option 1
- No resistance or complaints from the community when used in Semantic Kernel
#### Cons
- Adding more filters may require adding more properties to the agent/processor class.
- Adding more filters requires bigger code changes downstream to callers.
### 3. Setup with Filter Hierarchy, Fully Generic Setup
In a more generic approach, filters can be grouped in the same bucket and processed based on the context.
One generic interface for all filters, with context-specific implementations.
Allow simple grouping of filters in the same list and adding new filter types with low code-changes.
```csharp
// Use Case
var agent = new MyAgent();
agent.Filters.Add(new MyAgentRunFilter());
agent.Filters.Add(new MyAgentFunctionCallFilter());
agent.Filters.Add(new MyAgentAYZFilter());
agent.Filters.Add(new MyMultipleFilterImplementation());
// OR Via constructor (Also DI Friendly)
var agent = new MyAgent(new List {
new MyAgentRunFilter(),
new MyAgentFunctionCallFilter(),
new MyAgentAYZFilter(),
new MyMultipleFilterImplementation() });
// Impl
interface IAgentFilter
{
bool CanProcess(AgentContext context);
Task OnProcessAsync(AgentContext context, Func next, CancellationToken cancellationToken = default);
}
interface IAgentFilter : IAgentFilter where T : AgentContext
{
Task OnProcessAsync(T context, Func next, CancellationToken cancellationToken = default);
}
class MySingleFilterImplementation : IAgentFilter
{
public bool CanProcess(AgentContext context)
=> context is AgentRunContext;
public async Task OnProcessAsync(AgentContext context, Func next, CancellationToken cancellationToken = default)
{
Func wrappedNext = async ctx => await next(ctx);
await OnProcessAsync((AgentRunContext)context, wrappedNext, cancellationToken);
}
public async Task OnProcessAsync(AgentRunContext context, Func next, CancellationToken cancellationToken = default)
{
// Pre-run logic
await next(context);
// Post-run logic
}
}
class MyMultipleFilterImplementation : IAgentFilter, IAgentFilter
{
public bool CanProcess(AgentContext context)
=> context is AgentRunContext or FunctionCallAgentContext;
public async Task OnProcessAsync(AgentContext context, Func next, CancellationToken cancellationToken = default)
{
if (context is AgentRunContext runContext)
{
Func wrappedNext = async ctx => await next(ctx);
await OnProcessAsync(runContext, wrappedNext, cancellationToken);
return;
}
if (context is FunctionCallAgentContext callContext)
{
Func wrappedNext = async ctx => await next(ctx);
await OnProcessAsync(callContext, wrappedNext, cancellationToken);
return;
}
await next(context);
}
public async Task OnProcessAsync(AgentRunContext context, Func next, CancellationToken cancellationToken = default)
{
// Pre-run logic
await next(context);
// Post-run logic
}
public async Task OnProcessAsync(FunctionCallAgentContext context, Func next, CancellationToken cancellationToken = default)
{
// Pre-function call logic
await next(context);
// Post-function call logic
}
}
```
#### Pros
- Simple grouping of filters in the same list, help with DI registration and filtering iteration
- Lower maintenance and learning curve when adding new filter types
- Can be combined with other patterns like the `AgentFilterProcessor`
#### Cons
- Less clear separation of concerns compared to dedicated filter types
- Requires extra runtime type checking and casting for context-specific processing
## Decision Outcome
- **Option 2 (Decorator Pattern)** is the preferred approach for the following reasons:
- Adding a processor pattern seems an overkill as we can achieve same results without introducing new abstractions and complexity.
- Direct decorator on agents and tools for agent and function invocation middleware.
- Support for Context-based middleware also leveraging closer patterns to Semantic Kernel filters.
- Agent Builder pattern integration with `.Use()` method for fluent configuration
**Key POC Insights**:
1. Both patterns actually work
2. The decorator pattern offers more direct control and simpler and more flexible implementation
2. The processor seems an overkill compared to decorator as it adds more extra abstractions and complexity
4. Function invocation filtering is supported in both patterns
5. Streaming scenarios are well-supported in both approaches
6. Function approval request filtering is supported in both patterns
7. Builder pattern added as part of the POC is a must-have and mades both approaches developer-friendly
## Appendix: Other AI Agent Framework Analysis Details
#### LangChain
LangChain uses callbacks for interception, which can be passed at runtime or during construction.
Naming (Python): Callbacks (BaseCallbackHandler)
Supports: Y (read/write)
Observation: Uses observer pattern with event methods for interception (e.g., on_chain_start); supports agent actions and errors; handlers can read inputs/outputs and modify metadata or raise exceptions to influence flow.
**Python Example:** For more details, see the official documentation: [Callbacks - Python LangChain](https://python.langchain.com/docs/concepts/callbacks/).
```python
from langchain_core.callbacks import BaseCallbackHandler
class MyHandler(BaseCallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
inputs['number'] += 1 # Modify inputs (write capability)
print("Chain started!")
handler = MyHandler()
# Pass callback at runtime
chain.invoke({"number": 25}, {"callbacks": [handler]})
# Or at constructor time
chain = SomeChain(callbacks=[handler])
chain.invoke({"number": 25})
```
Naming (JS): Callbacks (BaseCallbackHandler)
Supports: Y (read/write)
Observation: Similar observer pattern to Python, with event methods adapted for JS async handling; supports chain/agent interception; handlers can read inputs/outputs and modify metadata or raise exceptions to influence flow.
**JS Example:** For more details, see the official documentation: [Callbacks - LangChain.js](https://js.langchain.com/docs/concepts/callbacks/). (Adapted for async handling in JS.)
```javascript
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
class MyHandler extends BaseCallbackHandler {
name = "my_handler";
async handleChainStart(chain, inputs) {
inputs.number += 1; # Modify inputs (write capability)
console.log("Chain started!");
}
}
const handler = new MyHandler();
// Pass callback at runtime
await chain.invoke({ number: 25 }, { callbacks: [handler] });
// Or at constructor time
const chainWithHandler = new SomeChain({ callbacks: [handler] });
await chainWithHandler.invoke({ number: 25 });
```
#### LangGraph
LangGraph inherits callbacks from LangChain and often uses them with handlers for observability (e.g., via Langfuse).
Naming (Python): Hooks/Callbacks (inherited from LangChain)
Supports: Y (read/write)
Observation: Event-driven with runtime handlers; integrates callbacks for observability in graphs; inherits LangChain's ability to read/modify metadata or interrupt execution.
For more details, see the official documentation (inherited from LangChain): [Callbacks - Python LangChain](https://python.langchain.com/docs/concepts/callbacks/). Here's an example of streaming with a callback handler (Python):
```python
from langfuse.langchain import CallbackHandler
from langchain_core.messages import HumanMessage
class MyLangfuseHandler(CallbackHandler):
def on_chain_start(self, serialized, inputs, **kwargs):
inputs['messages'][0].content += " modified" # Modify input messages (write capability)
super().on_chain_start(serialized, inputs, **kwargs)
langfuse_handler = MyLangfuseHandler()
# Stream with callback in config
for s in graph.stream(
{"messages": [HumanMessage(content="What is Langfuse?")]},
config={"callbacks": [langfuse_handler]}
):
print(s)
```
#### AutoGen
AutoGen supports middleware-like behavior in both languages.
Naming (Python): Reply Functions (register_reply)
Supports: Y (read/write)
Observation: Reply functions intercept and process messages; middleware-like for agent replies; can directly modify messages or replies before continuing.
**Python Example:** For more details, see the official documentation: [agentchat.conversable_agent | AutoGen 0.2](https://microsoft.github.io/autogen/0.2/docs/reference/agentchat/conversable_agent). Uses `register_reply` to add reply functions that intercept and process messages.
```python
def print_messages(recipient, messages, sender, config):
if "callback" in config and config["callback"] is not None:
callback = config["callback"]
callback(sender, recipient, messages[-1])
messages[-1]["content"] += " modified" # Modify last message content (write capability)
print(f"Messages sent to: {recipient.name} | num messages: {len(messages)}")
return False, None # required to ensure the agent communication flow continues
user_proxy.register_reply(
[autogen.Agent, None],
reply_func=print_messages,
config={"callback": None},
)
assistant.register_reply(
[autogen.Agent, None],
reply_func=print_messages,
config={"callback": None},
)
```
Naming (C#): Middleware (MiddlewareAgent)
Supports: Y (read/write)
Observation: Decorator/wrapper with middleware delegates for message modification; delegates can read and alter message content or options.
**C# Example:** For more details, see the official documentation: [Use middleware in an agent - AutoGen for .NET](https://microsoft.github.io/autogen-for-net/articles/Middleware-overview.html). Registers middleware to modify messages.
```csharp
// Register middleware to modify messages
var middlewareAgent = new MiddlewareAgent(innerAgent: agent);
middlewareAgent.Use(async (messages, options, agent, ct) =>
{
if (messages.Last() is TextMessage lastMessage && lastMessage.Content.Contains("Hello World"))
{
lastMessage.Content = $"[middleware] {lastMessage.Content}"; # Modify message content (write capability)
return lastMessage;
}
return await agent.GenerateReplyAsync(messages, options, ct);
});
```
#### Semantic Kernel
Semantic Kernel uses filters added to the kernel for interception during function invocation, prompt rendering, etc. Implementations differ by language: C# use interfaces, while Python uses functions and decorators.
Naming (C#): Filters (IFunctionInvocationFilter, etc.)
Supports: Y (read/write)
Observation: Interface-based middleware for function/prompt interception; filters can read and modify context, arguments, or results.
**C# Example:** For more details, see the official documentation: [Semantic Kernel Filters | Microsoft Learn](https://learn.microsoft.com/en-us/semantic-kernel/concepts/enterprise-readiness/filters). Adding a function invocation filter using interfaces.
```csharp
using Microsoft.SemanticKernel;
IKernelBuilder builder = Kernel.CreateBuilder();
builder.Services.AddSingleton();
Kernel kernel = builder.Build();
// Alternatively, add directly
kernel.FunctionInvocationFilters.Add(new LoggingFilter(logger));
// Define the filter
public sealed class LoggingFilter(ILogger logger) : IFunctionInvocationFilter
{
public async Task OnFunctionInvocationAsync(FunctionInvocationContext context, Func next)
{
context.Arguments["new_arg"] = "modified_value"; # Modify arguments by adding a new key (write capability)
logger.LogInformation("Invoking {FunctionName}", context.Function.Name);
await next(context);
logger.LogInformation("Invoked {FunctionName}", context.Function.Name);
}
}
```
Naming (Python): Filters (add_filter, @kernel.filter decorator)
Supports: Y (read/write)
Observation: Function and decorator-based for interception; no explicit interfaces like C#, focuses on async functions for filters; can read and modify context/arguments/results.
**Python Example:** For more details, see the official documentation: [Semantic Kernel Filters | Microsoft Learn](https://learn.microsoft.com/en-us/semantic-kernel/concepts/enterprise-readiness/filters). Adding function invocation filters (one as a standalone function and one via decorator).
```python
import logging
from typing import Callable, Coroutine, Any
from semantic_kernel import Kernel
from semantic_kernel.filters import FilterTypes, FunctionInvocationContext
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import ChatHistory
from semantic_kernel.exceptions import OperationCancelledException
logger = logging.getLogger(__name__)
async def input_output_filter(
context: FunctionInvocationContext,
next: Callable[[FunctionInvocationContext], Coroutine[Any, Any, None]],
) -> None:
if context.function.plugin_name != "chat":
await next(context)
return
try:
user_input = input("User:> ")
except (KeyboardInterrupt, EOFError) as exc:
raise OperationCancelledException("User stopped the operation") from exc
if user_input == "exit":
raise OperationCancelledException("User stopped the operation")
context.arguments["chat_history"].add_user_message(user_input) # Modify arguments by adding message (write capability)
await next(context)
if context.result:
logger.info(f"Usage: {context.result.metadata.get('usage')}")
context.arguments["chat_history"].add_message(context.result.value[0])
print(f"Mosscap:> {context.result!s}")
kernel = Kernel()
kernel.add_service(AzureChatCompletion(service_id="chat-gpt"))
# Add filter as a standalone function
kernel.add_filter("function_invocation", input_output_filter)
# Add filter via decorator
@kernel.filter(filter_type=FilterTypes.FUNCTION_INVOCATION)
async def exception_catch_filter(
context: FunctionInvocationContext, next: Coroutine[FunctionInvocationContext, Any, None]
):
try:
await next(context)
except Exception as e:
logger.info(e)
# Example invocation (assuming a "chat" plugin is added)
history = ChatHistory()
result = await kernel.invoke(
function_name="chat",
plugin_name="chat",
chat_history=history,
)
```
#### CrewAI
CrewAI uses event listeners for callbacks.
Naming (Python): Events/Callbacks (BaseEventListener)
Supports: Y (read)
Observation: Event-driven orchestration with listeners for workflows; listeners can observe events (e.g., read source/event data) but are primarily for logging/reactions without direct modification of workflow state.
For more details, see the official documentation: [Event Listeners - CrewAI Documentation](https://docs.crewai.com/concepts/event-listener). Here's an example of setting up a custom listener (Python):
```python
from crewai.utilities.events import (
CrewKickoffStartedEvent,
BaseEventListener,
crewai_event_bus
)
class MyCustomListener(BaseEventListener):
def setup_listeners(self, crewai_event_bus):
@crewai_event_bus.on(CrewKickoffStartedEvent)
def on_crew_started(source, event):
print(f"Crew '{event.crew_name}' started!")
my_listener = MyCustomListener() # Automatically registers on init
# Use in a crew
crew = Crew(agents=[...], tasks=[...])
```
#### LlamaIndex
LlamaIndex uses callback managers with handlers.
Naming (Python): Callbacks (CallbackManager, BaseCallbackHandler)
Supports: Y (read)
Observation: Observer pattern with event methods for queries and tools; handlers can observe events/payloads (e.g., read prompts/responses) but are designed for debugging/tracing without modifying execution context.
For more details, see the official documentation: [Callbacks - LlamaIndex](https://docs.llamaindex.ai/en/stable/module_guides/observability/callbacks/). Here's an example setup (Python):
```python
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
debug_handler = LlamaDebugHandler() # Concrete handler subclassing BaseCallbackHandler
callback_manager = CallbackManager([debug_handler])
# Assign to components, e.g., an index or query engine
index = VectorStoreIndex.from_documents(documents, callback_manager=callback_manager)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")
```
#### Haystack
Haystack does not support explicit middleware or filters like the others. Instead, it uses a modular pipeline architecture for interception via components (e.g., ConditionalRouter for routing based on conditions like tool calls) and observability through logging/tracing integrations (e.g., Langfuse).
Naming (Python): N/A (Pipeline Components/Routers)
Supports: N (Pipeline-based interception)
Observation: Relies on modular pipelines for implicit interception but lacks explicit middleware/filters; custom components can read/write data flow via routing/transformations, but this is compositional rather than hook-based interception.
For more details, see the official documentation: [Pipelines - Haystack Documentation](https://docs.haystack.deepset.ai/docs/pipelines). Here's an example of pipeline-based interception with a custom collector component (Python):
```python
from haystack import Pipeline
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.routers import ConditionalRouter
from haystack.components.tools import ToolInvoker
from haystack.tools import ComponentTool
from haystack.components.websearch import SerperDevWebSearch
from haystack.dataclasses import ChatMessage
from typing import Any, Dict, List
from haystack import component
from haystack.core.component.types import Variadic
# Custom component to collect/observe messages (for interception/observation)
@component()
class MessageCollector:
def __init__(self):
self._messages = []
@component.output_types(messages=List[ChatMessage])
def run(self, messages: Variadic[List[ChatMessage]]) -> Dict[str, Any]:
self._messages.extend([msg for inner in messages for msg in inner])
return {"messages": self._messages}
def clear(self):
self._messages = []
# Define a tool
web_tool = ComponentTool(component=SerperDevWebSearch(top_k=3))
# Define routes for filtering (e.g., check for tool calls)
routes = [
{
"condition": "{{replies[0].tool_calls | length > 0}}",
"output": "{{replies}}",
"output_name": "there_are_tool_calls",
"output_type": List[ChatMessage],
},
{
"condition": "{{replies[0].tool_calls | length == 0}}",
"output": "{{replies}}",
"output_name": "final_replies",
"output_type": List[ChatMessage],
},
]
# Build the pipeline
pipeline = Pipeline()
pipeline.add_component("generator", OpenAIChatGenerator(model="gpt-4o-mini"))
pipeline.add_component("router", ConditionalRouter(routes=routes))
pipeline.add_component("tool_invoker", ToolInvoker(tools=[web_tool]))
pipeline.add_component("message_collector", MessageCollector())
# Connect components (interception via routing and collection)
pipeline.connect("generator.replies", "router.replies")
pipeline.connect("router.there_are_tool_calls", "tool_invoker.messages")
pipeline.connect("tool_invoker.messages", "message_collector.messages")
pipeline.connect("router.final_replies", "message_collector.messages")
# Run the pipeline (observes via collector, filters via router)
result = pipeline.run({"generator": {"messages": [ChatMessage.from_user("What's the weather in Berlin?")]}})
print(result["message_collector"]["messages"])
```
#### OpenAI Swarm
OpenAI Swarm does not provide native support for middleware, filters, callbacks, or hooks. While interception can be achieved through custom implementations (e.g., function wrappers, client subclassing, or manual tool execution with `execute_tools=False`), this requires the caller to implement their own logic, which is not considered built-in framework support.
Naming (Python): N/A
Supports: N
Observation: No explicit middleware/filters; interception requires custom wrappers or manual handling (e.g., function decorators, client subclassing), lacking native framework support for built-in components to accept such modifications.
For more details, see the official GitHub repository: [OpenAI Swarm GitHub](https://github.com/openai/swarm). No native code examples available for interception; custom approaches are possible but not framework-native.
#### Atomic Agents
Atomic Agents does not support explicit middleware, callbacks, hooks, or filters. Its modularity allows composable components, but no dedicated interception mechanisms are documented.
Naming (Python): N/A (Composable Components)
Supports: N
Observation: No explicit middleware/filters; modularity allows composable units but no dedicated interception hooks or callbacks for custom reading/modification mid-execution.
For more details, see the official documentation: [Atomic Agents Docs](https://brainblend-ai.github.io/atomic-agents/). No specific code examples available for interception.
#### Smolagents (Hugging Face)
Smolagents does not support explicit middleware, callbacks, hooks, or filters; it focuses on simple agent building.
Naming (Python): N/A
Supports: N
Observation: No explicit support; focuses on simple agent building without interception mechanisms or hooks for reading/modifying execution.
For more details, see the official documentation: [Smolagents Docs](https://huggingface.co/docs/smolagents/en/index). No specific code examples available for interception.
#### Phidata (Agno)
Phidata (Agno) does not support explicit middleware, callbacks, hooks, or filters; agents rely on tools and memory.
Naming (Python): N/A
Supports: N
Observation: No explicit middleware/filters; agents use tools/memory but no interception hooks for custom reading/modification of calls.
For more details, see the official documentation: [Phidata Docs](https://docs.phidata.com/). No specific code examples available for interception.
#### PromptFlow (Microsoft)
PromptFlow supports tracing for LLM interactions, which acts like callbacks for debugging and iteration.
Naming (Python): Tracing
Supports: N (Tracing only)
Observation: Supports tracing for LLM interactions, acting as callbacks for debugging/iteration; tracing is read-only for observability/telemetry without options to modify context or intercept calls beyond logging.
For more details, see the official documentation: [Tracing in PromptFlow](https://microsoft.github.io/promptflow/how-to-guides/tracing/index.html). No direct code examples in the browsed content, but tracing is integrated into flow debugging (Python).
#### n8n
n8n's AI Agent node inherits callbacks from LangChain for observability in workflows.
Naming (JS/TS): Callbacks (inherited from LangChain)
Supports: Y (read/write)
Observation: AI Agent node uses LangChain under the hood, inheriting callbacks for observability; supports reading/modifying metadata or interrupting flow as in LangChain.
For more details, see the official documentation: [AI Agent Node Docs](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.agent/). (Inherits from LangChain; refer to LangChain docs for callback examples.) No specific n8n-unique code in the content, but uses LangChain's observer pattern. Here's an adapted LangChain JS example for consistency:
```javascript
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
class MyHandler extends BaseCallbackHandler {
name = "my_handler";
async handleChainStart(chain, inputs) {
inputs.number += 1; # Modify inputs (write capability)
console.log("Chain started!");
}
}
const handler = new MyHandler();
// Pass callback at runtime
await chain.invoke({ number: 25 }, { callbacks: [handler] });
// Or at constructor time
const chainWithHandler = new SomeChain({ callbacks: [handler] });
await chainWithHandler.invoke({ number: 25 });
```
================================================
FILE: docs/decisions/0008-python-subpackages.md
================================================
---
status: accepted
contact: eavanvalkenburg
date: 2025-09-19
deciders: eavanvalkenburg, markwallace-microsoft, ekzhu, sphenry, alliscode
consulted: taochenosu, moonbox3, dmytrostruk, giles17
---
# Python Subpackages Design
## Context and Problem Statement
The goal is to design a subpackage structure for the Python agent framework that balances ease of use, maintainability, and scalability. How can we organize the codebase to facilitate the development and integration of connectors while minimizing complexity for users?
## Decision Drivers
- Ease of use for developers
- Maintainability of the codebase
- User experience for installing and using the integrations
- Clear lifecycle management for integrations
- Minimize non-GA dependencies in the main package
## Considered Options
1. One subpackage per vendor, so a `google` package that contains all Google related connectors, such as `GoogleChatClient`, `BigQueryCollection`, etc.
* Pros:
- fewer packages to manage, publish and maintain
- easier for users to find and install the right package.
- users that work primarily with one platform have a single package to install.
* Cons:
- larger packages with more dependencies
- larger installation sizes
- more difficult to version, since some parts may be GA, while other are in preview.
2. One subpackage per connector, so a i.e. `google_chat` package, a i.e. `google_bigquery` package, etc.
* Pros:
- smaller packages with fewer dependencies
- smaller installation sizes
- easy to version and do lifecycle management on
* Cons:
- more packages to manage, register, publish and maintain
- more extras, means more difficult for users to find and install the right package.
3. Group connectors by vendor and maturity, so that you can graduate something from the i.e. the `google-preview` package to the `google` package when it becomes GA.
* Pros:
- fewer packages to manage, publish and maintain
- easier for users to find and install the right package.
- users that work primarily with one platform have a single package to install.
- clear what the status is based on extra name
* Cons:
- moving something from one to the other might be a breaking change
- still larger packages with more dependencies
It could be mitigated that the `google-preview` package is still imported from `agent_framework.google`, so that the import path does not change, when something graduates, but it is still a clear choice for users to make. And we could then have three extras on that package, `google`, `google-preview` and `google-all` to make it easy to install the right package or just all.
4. Group connectors by vendor and type, so that you have a `google-chat` package, a `google-data` package, etc.
* Pros:
- smaller packages with fewer dependencies
- smaller installation sizes
* Cons:
- more packages to manage, register, publish and maintain
- more extras, means more difficult for users to find and install the right package.
- still keeps the lifecycle more difficult, since some parts may be GA, while other are in preview.
5. Add `meta`-extras, that combine different subpackages as one extra, so we could have a `google` extra that includes `google-chat`, `google-bigquery`, etc.
* Pros:
- easier for users on a single platform
* Cons:
- more packages to manage, register, publish and maintain
- more extras, means more difficult for users to find and install the right package.
- makes developer package management more complex, because that meta-extra will include both GA and non-GA packages, so during dev they could use that, but then during prod they have to figure out which one they actually need and make a change in their dependencies, leading to mismatches between dev and prod.
6. Make all imports happen from `agent_framework.connectors` (or from two or three groups `agent_framework.chat_clients`, `agent_framework.context_providers`, or something similar) while the underlying code comes from different packages.
* Pros:
- best developer experience, since all imports are from the same place and it is easy to find what you need, and we can raise a meaningfull error with which extra to install.
- easier for users to find and install the right package.
* Cons:
- larger overhead in maintaining the `__init__.py` files that do the lazy loading and error handling.
- larger overhead in package management, since we have to ensure that the main package.
7. Subpackage existence will be based off status of dependencies and/or possibilities of a external support mechanism. What this means is that:
- Integrations that need non-GA dependencies will be subpackages, so that we can avoid having non-GA dependencies in the main package.
- Integrations where the AF-code is still experimental, preview or release candidate will be subpackages, so that we can avoid having non-GA code in the main package and we can version those packages properly.
- Integrations that are outside Microsoft and where we might not always be able to fast-follow breaking changes, will stay as subpackages, to provide some isolation and to be able to version them properly.
- Integrations that are mature and that have released (GA) dependencies and or features on the service side will be moved into the main package, the dependencies of those packages will stay installable under the same `extra` name, so that users do not have to change anything, and we then remove the subpackage itself.
- All subpackage imports in the code should be from a stable place, mostly vendor-based, so that when something moves from a subpackage to the main package, the import path does not change, so `from agent_framework.google import GoogleChatClient` will always work, even if it moves from the `agent-framework-google` package to the main `agent-framework` package.
- The imports in those vendor namespaces (these won't be actual python namespaces, just the folders with a __init__.py file and any code) will do lazy loading and raise a meaningful error if the subpackage or dependencies are not installed, so that users know which extra to install with ease.
- On a case by case basis we can decide to create additional `extras`, that combine multiple subpackages into one extra, so that users that work primarily with one platform can install everything they need with a single extra, for instance you can install with the `agent-framework[azure-purview]` extra that only implement a Azure Purview Middleware, or you can install with the `agent-framework[azure]` extra that includes all Azure related connectors, like `purview`, `content safety` and others (all examples, not actual packages (yet)), regardless of where the code sits, these should always be importable from `agent_framework.azure`.
- Subpackage naming should also follow this, so in principle a package name is `-`, so `google-gemini`, `azure-purview`, `microsoft-copilotstudio`, etc. For smaller vendors, with less likely to have a multitude of connectors, we can skip the feature/brand part, so `mem0`, `redis`, etc.
## Decision Outcome
Option 7: This provides us a good balance between developer experience, user experience, package management and maintenance, while also allowing us to evolve the package structure over time as dependencies and features mature. And it ensures the main package, installed without extras does not include non-GA dependencies or code, extras do not carry that guarantee, for both the code and the dependencies.
# Microsoft vs Azure packages
Another consideration is for Microsoft, since we have a lot of Azure services, but also other Microsoft services, such as Microsoft Copilot Studio, and potentially other services in the future, and maybe Foundry also will be marketed separate from Azure at some point. We could also have both a `microsoft` and an `azure` package, where the `microsoft` package contains all Microsoft services, excluding Azure, while the `azure` package only contains Azure services. Only applicable for the variants where we group by vendor, including with meta packages.
## Decision Outcome
Azure and Microsoft will be the two vendor folders for Microsoft services, so Copilot Studio will be imported from `agent_framework.microsoft`, while Foundry, Azure OpenAI and other Azure services will be imported from `agent_framework.azure`.
================================================
FILE: docs/decisions/0009-support-long-running-operations.md
================================================
---
status: accepted
contact: sergeymenshykh
date: 2025-10-15
deciders: markwallace, rbarreto, westey-m, stephentoub
informed: {}
---
## Long-Running Operations Design
## Context and Problem Statement
The Agent Framework currently supports synchronous request-response patterns for AI agent interactions,
where agents process requests and return results immediately. Similarly, MEAI chat clients follow the same
synchronous pattern for AI interactions. However, many real-world AI scenarios involve complex tasks that
require significant processing time, such as:
- Code generation and analysis tasks
- Complex reasoning and research operations
- Image and content generation
- Large document processing and summarization
The current Agent Framework architecture needs native support for long-running operations, as it is
essential for handling these scenarios effectively. Additionally, as MEAI chat clients need to start supporting
long-running operations as well to be used together with AF agents, the design should consider integration
patterns and consistency with the broader Microsoft.Extensions.AI ecosystem to provide a unified experience
across both agent and chat client scenarios.
## Decision Drivers
- Chat clients and agents should support long-running execution as well as quick prompts.
- The design should be simple and intuitive for developers to use.
- The design should be extensible to allow new long-running execution features to be added in the future.
- The design should be additive rather than disruptive to allow existing chat clients to iteratively add
support for long-running operations without breaking existing functionality.
## Comparison of Long-Running Operation Features
| Feature | OpenAI Responses | Foundry Agents | A2A |
|-----------------------------|---------------------------|-------------------------------------|----------------------|
| Initiated by | User (Background = true) | Long-running execution is always on | Agent |
| Modeled as | Response | Run | Task |
| Supported modes1 | Sync, Async | Async | Sync, Async |
| Getting status support | ✅ | ✅ | ✅ |
| Getting result support | ✅ | ✅ | ✅ |
| Update support | ❌ | ❌ | ✅ |
| Cancellation support | ✅ | ✅ | ✅ |
| Delete support | ✅ | ❌ | ❌ |
| Non-streaming support | ✅ | ✅ | ✅ |
| Streaming support | ✅ | ✅ | ✅ |
| Execution statuses | InProgress, Completed, Queued Cancelled, Failed, Incomplete | InProgress, Completed, Queued Cancelled, Failed, Cancelling, RequiresAction, Expired | Working, Completed, Canceled, Failed, Rejected, AuthRequired, InputRequired, Submitted, Unknown |
1 Sync is a regular message-based request/response communication pattern; Async is a pattern for long-running operations/tasks where the agent returns an ID for a run/task and allows polling for status and final results by the ID.
**Note:** The names for new classes, interfaces, and their members used in the sections below are tentative and will be discussed in a dedicated section of this document.
## Long-Running Operations Support for Chat Clients
This section describes different options for various aspects required to add long-running operations support to chat clients.
### 1. Methods for Working with Long-Running Operations
Based on the analysis of existing APIs that support long-running operations (such as OpenAI Responses, Azure AI Foundry Agents, and A2A),
the following operations are used for working with long-running operations:
- Common operations:
- **Start Long-Running Execution**: Initiates a long-running operation and returns its Id.
- **Get Status of Long-Running Execution**: This method retrieves the status of a long-running operation.
- **Get Result of Long-Running Execution**: Retrieves the result of a long-running operation.
- Uncommon operations:
- **Update Long-Running Execution**: This method updates a long-running operation, such as adding new messages or modifying existing ones.
- **Cancel Long-Running Execution**: This method cancels a long-running operation.
- **Delete Long-Running Execution**: This method deletes a long-running operation.
To support these operations by `IChatClient` implementations, the following options are available:
- **1.1 New IAsyncChatClient Interface for All Long-Running Execution Operations**
- **1.2 Get{Streaming}ResponseAsync for Common Operations & New IAsyncChatClient Interface for Uncommon Operations**
- **1.3 Get{Streaming}ResponseAsync for Common Operations & New IAsyncChatClient Interface for Uncommon Operations & Capability Check**
- **1.4 Get{Streaming}ResponseAsync for Common Operations & Individual Interface per Uncommon Operation**
#### 1.1 New IAsyncChatClient Interface for All Long-Running Execution Operations
This option suggests adding a new interface `IAsyncChatClient` that some implementations of `IChatClient` may implement to support long-running operations.
```csharp
public interface IAsyncChatClient
{
Task StartAsyncRunAsync(IList chatMessages, RunOptions? options = null, CancellationToken ct = default);
Task GetAsyncRunStatusAsync(string runId, CancellationToken ct = default);
Task GetAsyncRunResultAsync(string runId, CancellationToken ct = default);
Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken ct = default);
Task CancelAsyncRunAsync(string runId, CancellationToken ct = default);
Task DeleteAsyncRunAsync(string runId, CancellationToken ct = default);
}
public class CustomChatClient : IChatClient, IAsyncChatClient
{
...
}
```
Consumer code example:
```csharp
IChatClient chatClient = new CustomChatClient();
string prompt = "..."
// Determine if the prompt should be run as a long-running execution
if(chatClient.GetService() is { } asyncChatClient && ShouldRunPromptAsynchronously(prompt))
{
try
{
// Start a long-running execution
AsyncRunResult result = await asyncChatClient.StartAsyncRunAsync(prompt);
}
catch (NotSupportedException)
{
Console.WriteLine("This chat client does not support long-running operations.");
throw;
}
AsyncRunContent? asyncRunContent = GetAsyncRunContent(result);
// Poll for the status of the long-running execution
while (asyncRunContent.Status is AsyncRunStatus.InProgress or AsyncRunStatus.Queued)
{
result = await asyncChatClient.GetAsyncRunStatusAsync(asyncRunContent.RunId);
asyncRunContent = GetAsyncRunContent(result);
}
// Get the result of the long-running execution
result = await asyncChatClient.GetAsyncRunStatusAsync(asyncRunContent.RunId);
Console.WriteLine(result);
}
else
{
// Complete a quick prompt
ChatResponse response = await chatClient.GetResponseAsync(prompt);
Console.WriteLine(response);
}
```
**Pros:**
- Not a breaking change: Existing chat clients are not affected.
- Callers can determine if a chat client supports long-running operations by calling its `GetService()` method.
**Cons:**
- Not extensible: Adding new methods to the `IAsyncChatClient` interface after its release will break existing implementations of the interface.
- Missing capability check: Callers cannot determine if chat clients support specific uncommon operations before attempting to use them.
- Insufficient information: Callers may not have enough information to decide whether a prompt should run as a long-running operation.
- The new method calls bypass existing decorators such as logging, telemetry, etc.
- An alternative solution for decorating the new methods will have to be put in place because the new method calls bypass existing decorators
such as logging, telemetry, etc.
#### 1.2 Get{Streaming}ResponseAsync for Common Operations & New IAsyncChatClient Interface for Uncommon Operations
This option suggests using the existing `GetResponseAsync` and `GetStreamingResponseAsync` methods of the `IChatClient` interface to support
common long-running operations, such as starting long-running operations, getting their status, their results, and potentially
updating them, in addition to their existing functionality of serving quick prompts. Methods for the uncommon operations, such as updating,
cancelling, and deleting long-running operations, will be added to a new `IAsyncChatClient` interface that will be implemented by chat clients
that support them.
This option presumes that Option 3.2 (Have one method for getting long-running execution status and result) is selected.
```csharp
public interface IAsyncChatClient
{
/// The update can be handled by GetResponseAsync method as well.
Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken ct = default);
Task CancelAsyncRunAsync(string runId, CancellationToken ct = default);
Task DeleteAsyncRunAsync(string runId, CancellationToken ct = default);
}
public class ResponsesChatClient : IChatClient, IAsyncChatClient
{
public async Task GetResponseAsync(string prompt, ChatOptions? options = null, CancellationToken ct = default)
{
ClientResult? result = null;
// If long-running execution mode is enabled, we run the prompt as a long-running execution
if(enableLongRunningResponses)
{
// No RunId is provided, so we start a long-running execution
if(options?.RunId is null)
{
result = await this._openAIResponseClient.CreateResponseAsync(prompt, new ResponseCreationOptions
{
Background = true,
});
}
else // RunId is provided, so we get the status of a long-running execution
{
result = await this._openAIResponseClient.GetResponseAsync(options.RunId);
}
}
else
{
// Handle the case when the prompt should be run as a quick prompt
result = await this._openAIResponseClient.CreateResponseAsync(prompt, new ResponseCreationOptions
{
Background = false
});
}
...
}
public Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken ct = default)
{
throw new NotSupportedException("This chat client does not support updating long-running operations.");
}
public Task CancelAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.CancelResponseAsync(runId, cancellationToken);
}
public Task DeleteAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.DeleteResponseAsync(runId, cancellationToken);
}
}
```
Consumer code example:
```csharp
IChatClient chatClient = new ResponsesChatClient();
ChatResponse response = await chatClient.GetResponseAsync("");
if (GetAsyncRunContent(response) is AsyncRunContent asyncRunContent)
{
// Get result of the long-running execution
response = await chatClient.GetResponseAsync([], new ChatOptions
{
RunId = asyncRunContent.RunId
});
// After some time
// If it's still running, cancel and delete the run
if (GetAsyncRunContent(response).Status is AsyncRunStatus.InProgress or AsyncRunStatus.Queued)
{
IAsyncChatClient? asyncChatClient = chatClient.GetService();
try
{
await asyncChatClient?.CancelAsyncRunAsync(asyncRunContent.RunId);
}
catch (NotSupportedException)
{
Console.WriteLine("This chat client does not support cancelling long-running operations.");
}
try
{
await asyncChatClient?.DeleteAsyncRunAsync(asyncRunContent.RunId);
}
catch (NotSupportedException)
{
Console.WriteLine("This chat client does not support deleting long-running operations.");
}
}
}
else
{
// Handle the case when the response is a quick prompt completion
Console.WriteLine(response);
}
```
This option addresses the issue that the option above has with callers needing to know whether the prompt should
be run as a long-running operation or a quick prompt. It allows callers to simply call the existing `GetResponseAsync` method,
and the chat client will decide whether to run the prompt as a long-running operation or a quick prompt. If control over
the execution mode is still needed, and the underlying API supports it, it will be possible for callers to set the mode at
the chat client invocation or configuration. More details about this are provided in one of the sections below about enabling long-running operation mode.
Additionally, it addresses another issue where the `GetResponseAsync` method may return a long-running
execution response and the `StartAsyncRunAsync` method may return a quick prompt response. Having one method that handles both cases
allows callers to not worry about this behavior and simply check the type of the response to determine if it is a long-running operation
or a quick prompt completion.
With the `GetResponseAsync` method becoming responsible for starting, getting status, getting results and updating long-running operations,
there are only a few operations left in the `IAsyncChatClient` interface - cancel and delete. As a result, the `IAsyncChatClient` interface
name may not be the best fit, as it suggests that it is responsible for all long-running operations while it is not. Should
the interface be renamed to reflect the operations it supports? What should the new name be? Option 1.4 considers an alternative
that might solve the naming issue.
**Pros:**
- Delegation and control: Callers delegate the decision of whether to run a prompt as a long-running operation or quick prompt to chat clients,
while still having the option to control the execution mode to determine how to handle prompts if needed.
- Not a breaking change: Existing chat clients are not affected.
**Cons:**
- Not extensible: Adding new methods to the `IAsyncChatClient` interface after its release will break existing implementations of the interface.
- Missing capability check: Callers cannot determine if chat clients support specific uncommon operations before attempting to use them.
- An alternative solution for decorating the new methods will have to be put in place because the new method calls bypass existing decorators
such as logging, telemetry, etc.
#### 1.3 Get{Streaming}ResponseAsync for Common Operations & New IAsyncChatClient Interface for Uncommon Operations & Capability Check
This option extends the previous option with a way for callers to determine if a chat client supports uncommon operations before attempting to use them.
```csharp
public interface IAsyncChatClient
{
bool CanUpdateAsyncRun { get; }
bool CanCancelAsyncRun { get; }
bool CanDeleteAsyncRun { get; }
Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken ct = default);
Task CancelAsyncRunAsync(string runId, CancellationToken ct = default);
Task DeleteAsyncRunAsync(string runId, CancellationToken ct = default);
}
public class ResponsesChatClient : IChatClient, IAsyncChatClient
{
public async Task GetResponseAsync(string prompt, ChatOptions? options = null, CancellationToken ct = default)
{
...
}
public bool CanUpdateAsyncRun => false; // This chat client does not support updating long-running operations.
public bool CanCancelAsyncRun => true; // This chat client supports cancelling long-running operations.
public bool CanDeleteAsyncRun => true; // This chat client supports deleting long-running operations.
public Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken ct = default)
{
throw new NotSupportedException("This chat client does not support updating long-running operations.");
}
public Task CancelAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.CancelResponseAsync(runId, cancellationToken);
}
public Task DeleteAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.DeleteResponseAsync(runId, cancellationToken);
}
}
```
Consumer code example:
```csharp
IChatClient chatClient = new ResponsesChatClient();
ChatResponse response = await chatClient.GetResponseAsync("");
if (GetAsyncRunContent(response) is AsyncRunContent asyncRunContent)
{
// Get result of the long-running execution
response = await chatClient.GetResponseAsync([], new ChatOptions
{
RunId = asyncRunContent.RunId
});
// After some time
IAsyncChatClient? asyncChatClient = chatClient.GetService();
// If it's still running, cancel and delete the run
if (GetAsyncRunContent(response).Status is AsyncRunStatus.InProgress or AsyncRunStatus.Queued)
{
if(asyncChatClient?.CanCancelAsyncRun ?? false)
{
await asyncChatClient?.CancelAsyncRunAsync(asyncRunContent.RunId);
}
if(asyncChatClient?.CanDeleteAsyncRun ?? false)
{
await asyncChatClient?.DeleteAsyncRunAsync(asyncRunContent.RunId);
}
}
}
else
{
// Handle the case when the response is a quick prompt completion
Console.WriteLine(response);
}
```
**Pros:**
- Delegation and control: Callers delegate the decision of whether to run a prompt as a long-running execution or quick prompt to chat clients,
while still having the option to control the execution mode to determine how to handle prompts if needed.
- Not a breaking change: Existing chat clients are not affected.
- Capability check: Callers can determine if the chat client supports an uncommon operation before attempting to use it.
**Cons:**
- Not extensible: Adding new members to the `IAsyncChatClient` interface after its release will break existing implementations of the interface.
- An alternative solution for decorating the new methods will have to be put in place because the new method calls bypass existing decorators
such as logging, telemetry, etc.
#### 1.4 Get{Streaming}ResponseAsync for Common Operations & Individual Interface per Uncommon Operation
This option suggests using the existing `Get{Streaming}ResponseAsync` methods of the `IChatClient` interface to support
common long-running operations, such as starting long-running operations, getting their status, and their results, and potentially
updating them, in addition to their existing functionality of serving quick prompts.
The uncommon operations that are not supported by all analyzed APIs, such as updating (which can be handled by `Get{Streaming}ResponseAsync`), cancelling,
and deleting long-running operations, as well as future ones, will be added to their own interfaces that will be implemented by chat clients
that support them.
This option presumes that Option 3.2 (Have one method for getting long-running execution status and result) is selected.
The interfaces can inherit from `IChatClient` to allow callers to use an instance of `ICancelableChatClient`, `IUpdatableChatClient`, or `IDeletableChatClient`
for calling the `Get{Streaming}ResponseAsync` methods as well. However, those methods belong to a leaf chat client that, if obtained via the `GetService()`
method, won't be decorated by existing decorators such as function invocation, logging, etc. As a result, an alternative solution (wrap the instance of the leaf
chat client in a decorator at the `GetService` method call) will need to be applied not only to the new methods of one of the interfaces but also to the existing
`Get{Streaming}ResponseAsync` ones.
```csharp
public interface ICancelableChatClient
{
Task CancelAsyncRunAsync(string runId, CancellationToken cancellationToken = default);
}
public interface IUpdatableChatClient
{
Task UpdateAsyncRunAsync(string runId, IList chatMessages, CancellationToken cancellationToken = default);
}
public interface IDeletableChatClient
{
Task DeleteAsyncRunAsync(string runId, CancellationToken cancellationToken = default);
}
// Responses chat client that supports standard long-running operations + cancellation and deletion
public class ResponsesChatClient : IChatClient, ICancelableChatClient, IDeletableChatClient
{
public async Task GetResponseAsync(string prompt, ChatOptions? options = null, CancellationToken ct = default)
{
...
}
public Task CancelAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.CancelResponseAsync(runId, cancellationToken);
}
public Task DeleteAsyncRunAsync(string runId, CancellationToken cancellationToken = default)
{
return this._openAIResponseClient.DeleteResponseAsync(runId, cancellationToken);
}
}
```
Example that starts a long-running operation, gets its status, and cancels and deletes it if it's not completed after some time:
```csharp
IChatClient chatClient = new ResponsesChatClient();
ChatResponse response = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = true });
if (GetAsyncRunContent(response) is AsyncRunContent asyncRunContent)
{
// Get result
response = await chatClient.GetResponseAsync([], new ChatOptions
{
RunId = asyncRunContent.RunId
});
// After some time
// If it's still running, cancel and delete the run
if (GetAsyncRunContent(response).Status is AsyncRunStatus.InProgress or AsyncRunStatus.Queued)
{
if(chatClient.GetService() is {} cancelableChatClient)
{
await cancelableChatClient.CancelAsyncRunAsync(asyncRunContent.RunId);
}
if(chatClient.GetService() is {} deletableChatClient)
{
await deletableChatClient.DeleteAsyncRunAsync(asyncRunContent.RunId);
}
}
}
```
**Pros:**
- Extensible: New interfaces can be added and implemented to support new long-running operations without breaking
existing chat client implementations.
- Not a breaking change: Existing chat clients that implement the `IChatClient` interface are not affected.
- Delegation and control: Callers delegate the decision of whether to run a prompt as a long-running operation or quick prompt
to chat clients, while still having the option to control the execution mode to determine how to handle prompts if needed.
**Cons:**
- Breaking changes: Changing the signatures of the methods of the operation-specific interfaces or adding new members to them will
break existing implementations of those interfaces. However, the blast radius of this change is much smaller and limited to a subset
of chat clients that implement the operation-specific interfaces. However, this is still a breaking change.
### 2. Enabling Long-Running Operations
Based on the API analysis, some APIs must be explicitly configured to run in long-running operation mode,
while others don't need additional configuration because they either decide themselves whether a request
should run as a long-running operation, or they always operate in long-running operation mode or quick prompt mode:
| Feature | OpenAI Responses | Foundry Agents | A2A |
|-----------------------------|---------------------------|-------------------------------------|----------------------|
| Long-running execution | User (Background = true) | Long-running execution is always on | Agent |
The options below consider how to enable long-running operation mode for chat clients that support both quick prompts and long-running operations.
#### 2.1 Execution Mode per `Get{Streaming}ResponseAsync` Invocation
This option proposes adding a new nullable `AllowLongRunningResponses` property to the `ChatOptions` class.
The property value will be `true` if the caller requests a long-running operation, `false`, `null` or omitted otherwise.
Chat clients that work with APIs requiring explicit configuration per operation will use this property to determine whether to run the prompt as a long-running
operation or quick prompt. Chat clients that work with APIs that don't require explicit configuration will ignore this property and operate according
to their own logic/configuration.
```csharp
public class ChatOptions
{
// Existing properties...
public bool? AllowLongRunningResponses { get; set; }
}
// Consumer code example
IChatClient chatClient = ...; // Get an instance of IChatClient
// Start a long-running execution for the prompt if supported by the underlying API
ChatResponse response = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = true });
// Start a quick prompt
ChatResponse quickResponse = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = false });
```
**Pros:**
- Callers can switch between quick prompts and long-running operation per invocation of the `Get{Streaming}ResponseAsync` methods without
changing the client configuration.
- Enables explicit control over the execution mode by callers per invocation, meaning that no caller site is broken if the agent is injected via DI,
and the caller can turn on the long-running operation mode when it can handle it.
**Con:** This may not be valuable for all callers, as they may not have enough information to decide whether the prompt should run as a long-running operation or quick prompt.
#### 2.2 Execution Mode per `Get{Streaming}ResponseAsync` Invocation + Model Class
This option is similar to the previous one, but suggest using a model class `LongRunningResponsesOptions` for properties related to long-running operations.
```csharp
public class LongRunningResponsesOptions
{
public bool? Allow { get; set; }
//public PollingSettings? PollingSettings { get; set; } // Can be added leter if necessary
}
public class ChatOptions
{
public LongRunningResponsesOptions? LongRunningResponsesOptions { get; set; }
}
// Consumer code example
IChatClient chatClient = ...; // Get an instance of IChatClient
// Start a long-running execution for the prompt if supported by the underlying API
ChatResponse response = await chatClient.GetResponseAsync("", new ChatOptions { LongRunningResponsesOptions = new() { Allow = true } });
```
**Pros:**
- Enables explicit control over the execution mode by callers per invocation, meaning that no caller site is broken if the agent is injected via DI,
and the caller can turn on the long-running operation mode when it can handle it.
- No proliferation of long-running operation-related properties in the `ChatOptions` class.
**Con:** Slightly more complex initialization.
#### 2.3 Execution Mode per Chat Client Instance
This option proposes adding a new `enableLongRunningResponses` parameter to constructors of chat clients that support both quick prompts and long-running operations.
The parameter value will be `true` if the chat client should operate in long-running operation mode, `false` if it should operate in quick prompt mode.
Chat clients that work with APIs requiring explicit configuration will use this parameter to determine whether to run prompts as long-running operations or quick prompts.
Chat clients that work with APIs that don't require explicit configuration won't have this parameter in their constructors and will operate according to their own
logic/configuration.
```csharp
public class CustomChatClient : IChatClient
{
private readonly bool _enableLongRunningResponses;
public CustomChatClient(bool enableLongRunningResponses)
{
this._enableLongRunningResponses = enableLongRunningResponses;
}
// Existing methods...
}
// Consumer code example
IChatClient chatClient = new CustomChatClient(enableLongRunningResponses: true);
// Start a long-running execution for the prompt
ChatResponse response = await chatClient.GetResponseAsync("");
```
Chat clients can be configured to always operate in long-running operation mode or quick prompt mode based on their role in a specific scenario.
For example, a chat client responsible for generating ideas for images can be configured for quick prompt mode, while a chat client responsible for image
generation can be configured to always use long-running operation mode.
**Pro:** Can be beneficial for scenarios where chat clients need to be configured upfront in accordance with their role in a scenario.
**Con:** Less flexible than the previous option, as it requires configuring the chat client upfront at instantiation time. However, this flexibility might not be needed.
#### 2.4 Combined Approach
This option proposes a combined approach that allows configuration per chat client instance and per `Get{Streaming}ResponseAsync` method invocation.
The chat client will use whichever configuration is provided, whether set in the chat client constructor or in the options for the `Get{Streaming}ResponseAsync`
method invocation. If both are set, the one provided in the `Get{Streaming}ResponseAsync` method invocation takes precedence.
```csharp
public class CustomChatClient : IChatClient
{
private readonly bool _enableLongRunningResponses;
public CustomChatClient(bool enableLongRunningResponses)
{
this._enableLongRunningResponses = enableLongRunningResponses;
}
public async Task GetResponseAsync(string prompt, ChatOptions? options = null, CancellationToken ct = default)
{
bool enableLongRunningResponses = options?.AllowLongRunningResponses ?? this._enableLongRunningResponses;
// Logic to handle the prompt based on enableLongRunningResponses...
}
}
// Consumer code example
IChatClient chatClient = new CustomChatClient(enableLongRunningResponses: true);
// Start a long-running execution for the prompt
ChatResponse response = await chatClient.GetResponseAsync("");
// Start a quick prompt
ChatResponse quickResponse = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = false });
```
**Pros:** Flexible approach that combines the benefits of both previous options.
### 3. Getting Status and Result of Long-Running Execution
The explored APIs use different approaches for retrieving the status and results of long-running operations. Some are using
one method to retrieve both status and result, while others use two separate methods for each operation:
| Feature | OpenAI Responses | Foundry Agents | A2A |
|-------------------|-------------------------------|----------------------------------------------------|-----------------------|
| API to Get Status | GetResponseAsync(responseId) | Runs.GetRunAsync(thread.Id, threadRun.Id) | GetTaskAsync(task.Id) |
| API to Get Result | GetResponseAsync(responseId) | Messages.GetMessagesAsync(thread.Id, threadRun.Id) | GetTaskAsync(task.Id) |
Taking into account the differences, the following options propose a few ways to model the API for getting the status and result of
long-running operations for the `AIAgent` interface implementations.
#### 3.1 Two Separate Methods for Status and Result
This option suggests having two separate methods for getting the status and result of long-running operations:
```csharp
public interface IAsyncChatClient
{
Task GetAsyncRunStatusAsync(string runId, CancellationToken ct = default);
Task GetAsyncRunResultAsync(string runId, CancellationToken ct = default);
}
```
**Pros:** Could be more intuitive for developers, as it clearly separates the concerns of checking the status and retrieving the result of a long-running operation.
**Cons:** Creates inefficiency for chat clients that use APIs that return both status and result in a single call,
as callers might make redundant calls to get the result after checking the status that already contains the result.
#### 3.2 One Method to Get Status and Result
This option suggests having a single method for getting both the status and result of long-running operations:
```csharp
public interface IAsyncChatClient
{
Task GetAsyncRunResultAsync(string runId, AgentThread? thread = null, CancellationToken ct = default);
}
```
This option will redirect the call to the appropriate method of the underlying API that uses one method to retrieve both.
For APIs that use two separate methods, the method will first get the status and if the status indicates that the
operation is still running, it will return the status to the caller. If the status indicates that the operation is completed,
it will then call the method to get the result of the long-running operation and return it together with the status.
**Pros:**
- Simplifies the API by providing a single, intuitive method for retrieving long-running operation information.
- More optimal for chat clients that use APIs that return both status and result in a single call, as it avoids unnecessary API calls.
### 4. Place For RunId, Status, and UpdateId of Long-Running Operations
This section considers different options for exposing the `RunId`, `Status`, and `UpdateId` properties of long-running operations.
#### 4.1. As AIContent
The `AsyncRunContent` class will represent a long-running operation initiated and managed by an agent/LLM.
Items of this content type will be returned in a chat message as part of the `AgentResponse` or `ChatResponse`
response to represent the long-running operation.
The `AsyncRunContent` class has two properties: `RunId` and `Status`. The `RunId` identifies the
long-running operation, and the `Status` represents the current status of the operation. The class
inherits from `AIContent`, which is a base class for all AI-related content in MEAI and AF.
The `AsyncRunStatus` class represents the status of a long-running operation. Initially, it will have
a set of predefined statuses that represent the possible statuses used by existing Agent/LLM APIs that support
long-running operations. It will be extended to support additional statuses as needed while also
allowing custom, not-yet-defined statuses to propagate as strings from the underlying API to the callers.
The content class type can be used by both agents and chat clients to represent long-running operations.
For chat clients to use it, it should be declared in one of the MEAI packages.
```csharp
public class AsyncRunContent : AIContent
{
public string RunId { get; }
public AsyncRunStatus? Status { get; }
}
public readonly struct AsyncRunStatus : IEquatable
{
public static AsyncRunStatus Queued { get; } = new("Queued");
public static AsyncRunStatus InProgress { get; } = new("InProgress");
public static AsyncRunStatus Completed { get; } = new("Completed");
public static AsyncRunStatus Cancelled { get; } = new("Cancelled");
public static AsyncRunStatus Failed { get; } = new("Failed");
public static AsyncRunStatus RequiresAction { get; } = new("RequiresAction");
public static AsyncRunStatus Expired { get; } = new("Expired");
public static AsyncRunStatus Rejected { get; } = new("Rejected");
public static AsyncRunStatus AuthRequired { get; } = new("AuthRequired");
public static AsyncRunStatus InputRequired { get; } = new("InputRequired");
public static AsyncRunStatus Unknown { get; } = new("Unknown");
public string Label { get; }
public AsyncRunStatus(string label)
{
if (string.IsNullOrWhiteSpace(label))
{
throw new ArgumentException("Label cannot be null or whitespace.", nameof(label));
}
this.Label = label;
}
/// Other members
}
````
The streaming API may return an UpdateId identifying a particular update within a streamed response.
This UpdateId should be available together with RunId to callers, allowing them to resume a long-running operation identified
by the RunId from the last received update, identified by the UpdateId.
#### 4.2. As Properties Of ChatResponse{Update}
This option suggests adding properties related to long-running operations directly to the `ChatResponse` and `ChatResponseUpdate` classes rather
than using a separate content class for that. See section "6. Model To Support Long-Running Operations" for more details.
### 5. Streaming Support
All analyzed APIs that support long-running operations also support streaming.
Some of them natively support resuming streaming from a specific point in the stream, while for others, this is either implementation-dependent or needs to be emulated:
| API | Can Resume Streaming | Model |
|-------------------------|--------------------------------------|------------------------------------------------------------------------------------------------------------|
| OpenAI Responses | Yes | StreamingResponseUpdate.**SequenceNumber** + GetResponseStreamingAsync(responseId, **startingAfter**, ct) |
| Azure AI Foundry Agents | Emulated2 | RunStep.**Id** + custom pseudo code: client.Runs.GetRunStepsAsync(...).AllStepsAfter(**stepId**) |
| A2A | Implementation dependent1 | |
1 The [A2A specification](https://github.com/a2aproject/A2A/blob/main/docs/topics/streaming-and-async.md#1-streaming-with-server-sent-events-sse)
allows an A2A agent implementation to decide how to handle streaming resumption: _If a client's SSE connection breaks prematurely while
a task is still active (and the server hasn't sent a final: true event for that phase), the client can attempt to reconnect to the stream using the tasks/resubscribe RPC method.
The server's behavior regarding missed events during the disconnection period (e.g., whether it backfills or only sends new updates) is implementation-dependent._
2 The Azure AI Foundry Agents API has an API to start a streaming run but does not have an API to resume streaming from a specific point in the stream.
However, it has non-streaming APIs to access already started runs, which can be used to emulate streaming resumption by accessing a run and its steps and streaming all the steps after a specific step.
#### Required Changes
To support streaming resumption, the following model changes are required:
- The `ChatOptions` class needs to be extended with a new `StartAfter` property that will identify an update to resume streaming from and to start generating responses after.
- The `ChatResponseUpdate` class needs to be extended with a new `SequenceNumber` property that will identify the update number within the stream.
All the chat clients supporting the streaming resumption will need to return the `SequenceNumber` property as part of the `ChatResponseUpdate` class and
honor the `StartAfter` property of the `ChatOptions` class.
#### Function Calling
Function calls over streaming are communicated to chat clients through a series of updates. Chat clients accumulate these updates in their internal state to build
the function call content once the last update has been received. The completed function call content is then returned to the function-calling chat client,
which eventually invokes it.
Since chat clients keep function call updates in their internal state, resuming streaming from a specific update can be impossible if the resumption request
is made using a chat client that does not have the previous updates stored. This situation can occur if a host suspends execution during an ongoing function call
stream and later resumes from that particular update. Because chat clients' internal state is not persisted, they will lack the prior updates needed to continue
the function call, leading to a failure in resumption.
To address this issue, chat clients can only return sequence numbers for updates that are resumable. For updates that cannot be resumed from, chat clients can
return the sequence number of the most recent update received before the non-resumable one. This allows callers to resume from that earlier update,
even if it means re-processing some updates that have already been handled.
Chat clients will continue returning the sequence number of the last resumable update until a new resumable update becomes available. For example, a chat client might
keep returning sequence number 2, corresponding to the last resumable update received before an update for the first function call. Once **all** function call updates
are received and processed, and the model returns a non-function call response, the chat client will then return a sequence number, say 10, which corresponds to the
first non-function call update.
##### Status of Streaming Updates
Different APIs provide different statuses for streamed function call updates
Sequence of updates from OpenAI Responses API to answer the question "What time is it?" using a function call:
| Id | SN | Update.Kind | Response.Status | ChatResponseUpdate.Status | Description |
|--------|----|--------------------------|-----------------|---------------------------|---------------------------------------------------|
| resp_1 | 0 | resp.created | Queued | Queued | |
| resp_1 | 1 | resp.queued | Queued | Queued | |
| resp_1 | 2 | resp.in_progress | InProgress | InProgress | |
| resp_1 | 3 | resp.output_item.added | - | InProgress | |
| resp_1 | 4 | resp.func_call.args.delta| - | InProgress | |
| resp_1 | 5 | resp.func_call.args.done | - | InProgress | |
| resp_1 | 6 | resp.output_item.done | - | InProgress | |
| resp_1 | 7 | resp.completed | Completed | Complete | |
| resp_1 | - | - | - | null | FunctionInvokingChatClient yields function result |
| | | | OpenAI Responses created a new response to handle function call result |
| resp_2 | 0 | resp.created | Queued | Queued | |
| resp_2 | 1 | resp.queued | Queued | Queued | |
| resp_2 | 2 | resp.in_progress | InProgress | InProgress | |
| resp_2 | 3 | resp.output_item.added | - | InProgress | |
| resp_2 | 4 | resp.cnt_part.added | - | InProgress | |
| resp_2 | 5 | resp.output_text.delta | - | InProgress | |
| resp_2 | 6 | resp.output_text.delta | - | InProgress | |
| resp_2 | 7 | resp.output_text.delta | - | InProgress | |
| resp_2 | 8 | resp.output_text.done | - | InProgress | |
| resp_2 | 9 | resp.cnt_part.done | - | InProgress | |
| resp_2 | 10 | resp.output_item.done | - | InProgress | |
| resp_2 | 11 | resp.completed | Completed | Completed | |
Sequence of updates from Azure AI Foundry Agents API to answer the question "What time is it?" using a function call:
| Id | SN | UpdateKind | Run.Status | Step.Status | Message.Status | ChatResponseUpdate.Status | Description |
|--------|---------|-------------------|----------------|-------------|-----------------|---------------------------|---------------------------------------------------|
| run_1 | - | RunCreated | Queued | - | - | Queued | |
| run_1 | step_1 | - | RequiredAction | InProgress | - | RequiredAction | |
| TBD | - | - | - | - | - | - | FunctionInvokingChatClient yields function result |
| run_1 | - | RunStepCompleted | Completed | - | - | InProgress | |
| run_1 | - | RunQueued | Queued | - | - | Queued | |
| run_1 | - | RunInProgress | InProgress | - | - | InProgress | |
| run_1 | step_2 | RunStepCreated | - | InProgress | - | InProgress | |
| run_1 | step_2 | RunStepInProgress | - | InProgress | - | InProgress | |
| run_1 | - | MessageCreated | - | - | InProgress | InProgress | |
| run_1 | - | MessageInProgress | - | - | InProgress | InProgress | |
| run_1 | - | MessageUpdated | - | - | - | InProgress | |
| run_1 | - | MessageUpdated | - | - | - | InProgress | |
| run_1 | - | MessageUpdated | - | - | - | InProgress | |
| run_1 | - | MessageCompleted | - | - | Completed | InProgress | |
| run_1 | step_2 | RunStepCompleted | Completed | - | - | InProgress | |
| run_1 | - | RunCompleted | Completed | - | - | Completed | |
### 6. Model To Support Long-Running Operations
To support long-running operations, the following values need to be returned by the GetResponseAsync and GetStreamingResponseAsync methods:
- `ResponseId` - identifier of the long-running operation or an entity representing it, such as a task.
- `ConversationId` - identifier of the conversation or thread the long-running operation is part of. Some APIs, like Azure AI Foundry Agents, use
this identifier together with the ResponseId to identify a run.
- `SequenceNumber` - identifier of an update within a stream of updates. This is required to support streaming resumption by the GetStreamingResponseAsync method only.
- `Status` - status of the long-running operation: whether it is queued, running, failed, cancelled, completed, etc.
These values need to be supplied to subsequent calls of the GetResponseAsync and GetStreamingResponseAsync methods to get the status and result of long-running operations.
#### 6.1 ChatOptions
The following options consider different ways of extending the `ChatOptions` class to include the following properties to support long-running operations:
- `AllowLongRunningResponses` - a boolean property that indicates whether the caller allows the chat client to run in long-running operation mode if it's supported by the chat client.
- `ResponseId` - a string property that represents the identifier of the long-running operation or an entity representing it. A non-null value of this property would indicate to chat clients
that callers want to get the status and result of an existing long-running operation, identified by the property value, rather than starting a new one.
- `StartAfter` - a string property that represents the sequence number of an update within a stream of updates so that the chat client can resume streaming after the last received update.
##### 6.1.1 Direct Properties in ChatOptions
```csharp
public class ChatOptions
{
// Existing properties...
/// Gets or sets an optional identifier used to associate a request with an existing conversation.
public string? ConversationId { get; set; }
...
// New properties...
public bool? AllowLongRunningResponses { get; set; }
public string? ResponseId { get; set; }
public string? StartAfter { get; set; }
}
// Usage example
var response = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = true });
// If the response indicates a long-running operation, get its status and result
if(response.Status is {} status)
{
response = await chatClient.GetResponseAsync([], new ChatOptions
{
AllowLongRunningResponses = true,
ResponseId = response.ResponseId,
ConversationId = response.ConversationId,
//StartAfter = response.SequenceNumber // for GetStreamingResponseAsync only
});
}
```
**Con:** Proliferation of long-running operation properties in the `ChatOptions` class.
##### 6.1.2 LongRunOptions Model Class
```csharp
public class ChatOptions
{
// Existing properties...
public string? ConversationId { get; set; }
...
// New properties...
public bool? AllowLongRunningResponses { get; set; }
public LongRunOptions? LongRunOptions { get; set; }
}
public class LongRunOptions
{
public string? ResponseId { get; set; }
public string? ConversationId { get; set; }
public string? StartAfter { get; set; }
// Alternatively, ChatResponse can have an extension method ToLongRunOptions.
public LongRunOptions FromChatResponse(ChatResponse response)
{
return new LongRunOptions
{
ResponseId = response.ResponseId,
ConversationId = response.ConversationId,
};
}
// Alternatively, ChatResponseUpdate can have an extension method ToLongRunOptions.
public LongRunOptions FromChatResponseUpdate(ChatResponseUpdate update)
{
return new LongRunOptions
{
ResponseId = update.ResponseId,
ConversationId = update.ConversationId,
StartAfter = update.SequenceNumber,
};
}
}
// Usage example
var response = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = true });
// If the response indicates a long-running operation, get its status and result
if(response.Status is {} status)
{
while(status != ResponseStatus.Completed)
{
response = await chatClient.GetResponseAsync([], new ChatOptions
{
AllowLongRunningResponses = true,
LongRunOptions = LongRunOptions.FromChatResponse(response)
// or extension method
LongRunOptions = response.ToLongRunOptions()
// or implicit conversion
LongRunOptions = response
});
}
}
```
**Pro:** No proliferation of long-running operation properties in the `ChatOptions` class.
**Con:** Duplicated property `ConversationId`.
##### 6.1.3 Continuation Token of System.ClientModel.ContinuationToken Type
This option suggests using `System.ClientModel.ContinuationToken` to encapsulate all properties required for long-running operations.
The continuation token will be returned by chat clients as part of the `ChatResponse` and `ChatResponseUpdate` responses to indicate that
the response is part of a long-running execution. A null value of the property will indicate that the response is not part of a long-running execution.
Chat clients will accept a non-null value of the property to indicate that callers want to get the status and result of an existing long-running operation.
Each chat client will implement its own continuation token class that inherits from `ContinuationToken` to encapsulate properties required for long-running operations
that are specific to the underlying API the chat client works with. For example, for the OpenAI Responses API, the continuation token class will encapsulate
the `ResponseId` and `SequenceNumber` properties.
```csharp
public class ChatOptions
{
// Existing properties...
public string? ConversationId { get; set; }
...
// New properties...
public bool? AllowLongRunningResponses { get; set; }
public ContinuationToken? ContinuationToken { get; set; }
}
internal sealed class LongRunContinuationToken : ContinuationToken
{
public LongRunContinuationToken(string responseId)
{
this.ResponseId = responseId;
}
public string ResponseId { get; set; }
public int? SequenceNumber { get; set; }
public static LongRunContinuationToken FromToken(ContinuationToken token)
{
if (token is LongRunContinuationToken longRunContinuationToken)
{
return longRunContinuationToken;
}
BinaryData data = token.ToBytes();
Utf8JsonReader reader = new(data);
string responseId = null!;
int? startAfter = null;
reader.Read();
// Reading functionality
return new(responseId)
{
SequenceNumber = startAfter
};
}
}
// Usage example
ChatOptions options = new() { AllowLongRunningResponses = true };
var response = await chatClient.GetResponseAsync("", options);
while (response.ContinuationToken is { } token)
{
options.ContinuationToken = token;
response = await chatClient.GetResponseAsync([], options);
}
Console.WriteLine(response.Text);
```
**Pro:** No proliferation of long-running operation properties in the `ChatOptions` class, including the `Status` property.
##### 6.1.4 Continuation Token of String Type
This options is similar to the previous one but suggests using a string type for the continuation token instead of the `System.ClientModel.ContinuationToken` type.
```csharp
internal sealed class LongRunContinuationToken
{
public LongRunContinuationToken(string responseId)
{
this.ResponseId = responseId;
}
public string ResponseId { get; set; }
public int? SequenceNumber { get; set; }
public static LongRunContinuationToken Deserialize(string json)
{
Throw.IfNullOrEmpty(json);
var token = JsonSerializer.Deserialize(json, OpenAIJsonContext2.Default.LongRunContinuationToken)
?? throw new InvalidOperationException("Failed to deserialize LongRunContinuationToken.");
return token;
}
public string Serialize()
{
return JsonSerializer.Serialize(this, OpenAIJsonContext2.Default.LongRunContinuationToken);
}
}
public class ChatOptions
{
public string? ContinuationToken { get; set; }
}
```
**Pro:** No dependency on the `System.ClientModel` package.
##### 6.1.5 Continuation Token of a Custom Type
The option is similar the the "6.1.3 Continuation Token of System.ClientModel.ContinuationToken Type" option but suggests using a
custom type for the continuation token instead of the `System.ClientModel.ContinuationToken` type.
**Pros**
- There is no dependency on the `System.ClientModel` package.
- There is no ambiguity between extension methods for `IChatClient` that would occur if a new extension method, which accepts a continuation token of string type as the first parameter, is added.
#### 6.2 Overloads of GetResponseAsync and GetStreamingResponseAsync
This option proposes introducing overloads of the `GetResponseAsync` and `GetStreamingResponseAsync` methods that will accept long-running operation parameters directly:
```csharp
public interface ILongRunningChatClient
{
Task GetResponseAsync(
IEnumerable messages,
string responseId,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
IAsyncEnumerable GetStreamingResponseAsync(
IEnumerable messages,
string responseId,
string? startAfter = null,
ChatOptions? options = null,
CancellationToken cancellationToken = default);
}
public class CustomChatClient : IChatClient, ILongRunningChatClient
{
...
}
// Usage example
IChatClient chatClient = ...; // Get an instance of IChatClient
ChatResponse response = await chatClient.GetResponseAsync("", new ChatOptions { AllowLongRunningResponses = true });
if(response.Status is {} status && chatClient.GetService() is {} longRunningChatClient)
{
while(status != AsyncRunStatus.Completed)
{
response = await longRunningChatClient.GetResponseAsync([], response.ResponseId, new ChatOptions { ConversationId = response.ConversationId });
}
...
}
```
**Pros:**
- No proliferation of long-running operation properties in the ChatOptions class, except for the new AllowLongRunningResponses property discussed in section 2.
**Cons:**
- Interface switching: Callers need to switch to the `ILongRunningChatClient` interface to get the status and result of long-running operations.
- An alternative solution for decorating the new methods will have to be put in place.
## Long-Running Operations Support for AF Agents
### 1. Methods for Working with Long-Running Operations
The design for supporting long-running operations by agents is very similar to that for chat clients because it is based on
the same analysis of existing APIs and anticipated consumption patterns.
#### 1.1 Run{Streaming}Async Methods for Common Operations and the Update Operation + New Method Per Uncommon Operation
This option suggests using the existing `Run{Streaming}Async` methods of the `AIAgent` interface implementations to start, get results, and update long-running operations.
For cancellation and deletion of long-running operations, new methods will be added to the `AIAgent` interface implementations.
```csharp
public abstract class AIAgent
{
// Existing methods...
public Task RunAsync(string message, AgentThread? thread = null, AgentRunOptions? options = null, CancellationToken cancellationToken = default) { ... }
public IAsyncEnumerable RunStreamingAsync(string message, AgentThread? thread = null, AgentRunOptions? options = null, CancellationToken cancellationToken = default) { ... }
// New methods for uncommon operations
public virtual Task CancelRunAsync(string id, AgentCancelRunOptions? options = null, CancellationToken cancellationToken = default)
{
return Task.FromResult(null);
}
public virtual Task DeleteRunAsync(string id, AgentDeleteRunOptions? options = null, CancellationToken cancellationToken = default)
{
return Task.FromResult(null);
}
}
// Agent that supports update and cancellation
public class CustomAgent : AIAgent
{
public override async Task CancelRunAsync(string id, AgentCancelRunOptions? options = null, CancellationToken cancellationToken = default)
{
var response = await this._client.CancelRunAsync(id, options?.Thread?.ConversationId);
return ConvertToAgentResponse(response);
}
// No overload for DeleteRunAsync as it's not supported by the underlying API
}
// Usage
AIAgent agent = new CustomAgent();
AgentThread thread = agent.GetNewThread();
AgentResponse response = await agent.RunAsync("What is the capital of France?");
response = await agent.CancelRunAsync(response.ResponseId, new AgentCancelRunOptions { Thread = thread });
```
In case an agent supports either or both cancellation and deletion of long-running operations, it will override the corresponding methods.
Otherwise, it won't override them, and the base implementations will return null by default.
Some agents, for example Azure AI Foundry Agents, require the thread identifier to cancel a run. To accommodate this requirement, the `CancelRunAsync` method
accepts an optional `AgentCancelRunOptions` parameter that allows callers to specify the thread associated with the run they want to cancel.
```csharp
public class AgentCancelRunOptions
{
public AgentThread? Thread { get; set; }
}
```
Similar design considerations can be applied to the `DeleteRunAsync` method and the `AgentDeleteRunOptions` class.
Having options in the method signatures allows for future extensibility; however, they can be added later if needed to the method overloads.
**Pros:**
- Existing `Run{Streaming}Async` methods are reused for common operations.
- New methods for uncommon operations can be added in a non-breaking way.
### 2. Enabling Long-Running Operations
The options for enabling long-running operations are exactly the same as those discussed in section "2. Enabling Long-Running Operations" for chat clients:
- Execution Mode per `Run{Streaming}Async` Invocation
- Execution Mode per `Run{Streaming}Async` Invocation + Model Class
- Execution Mode per agent instance
- Combined Approach
Below are the details of the option selected for chat clients that is also selected for agents.
#### 2.1 Execution Mode per `Run{Streaming}Async` Invocation
This option proposes adding a new nullable `AllowLongRunningResponses` property of bool type to the `AgentRunOptions` class.
The property value will be `true` if the caller requests a long-running operation, `false`, `null` or omitted otherwise.
AI agents that work with APIs requiring explicit configuration per operation will use this property to determine whether to run the prompt as a long-running
operation or quick prompt. Agents that work with APIs that don't require explicit configuration will ignore this property and operate according
to their own logic/configuration.
```csharp
public class AgentRunOptions
{
// Existing properties...
public bool? AllowLongRunningResponses { get; set; }
}
// Consumer code example
AIAgent agent = ...; // Get an instance of an AIAgent
// Start a long-running execution for the prompt if supported by the underlying API
AgentResponse response = await agent.RunAsync("", new AgentRunOptions { AllowLongRunningResponses = true });
// Start a quick prompt
AgentResponse response = await agent.RunAsync("");
```
**Pros:**
- Callers can switch between quick prompts and long-running operations per invocation of the `Run{Streaming}Async` methods without
changing agent configuration.
- Enables explicit control over the execution mode by callers per invocation, meaning that no caller site is broken if the agent is injected via DI,
and the caller can turn on the long-running operation mode when it can handle it.
**Con:** This may not be valuable for all callers, as they may not have enough information to decide whether the prompt should run as a long-running operation or quick prompt.
### 3. Model To Support Long-Running Operations
The options for modeling long-running operations are exactly the same as those for chat clients discussed in section "6. Model To Support Long-Running Operations" above:
- Direct Properties in ChatOptions
- LongRunOptions Model Class
- Continuation Token of System.ClientModel.ContinuationToken Type
- Continuation Token of String Type
- Continuation Token of a Custom Type
Below are the details of the option selected for chat clients that is also selected for agents.
#### 3.1 Continuation Token of a Custom Type
This option suggests using `ContinuationToken` to encapsulate all properties representing a long-running operation. The continuation token will be returned by agents in the
`ContinuationToken` property of the `AgentResponse` and `AgentResponseUpdate` responses to indicate that the response is part of a long-running operation. A null value
of the property will indicate that the response is not part of a long-running operation or the long-running operation has been completed. Callers will set the token in the
`ContinuationToken` property of the `AgentRunOptions` class in follow-up calls to the `Run{Streaming}Async` methods to indicate that they want to "continue" the long-running
operation identified by the token.
Each agent will implement its own continuation token class that inherits from `ContinuationToken` to encapsulate properties required for long-running operations that are
specific to the underlying API the agent works with. For example, for the A2A agent, the continuation token class will encapsulate the `TaskId` property.
```csharp
internal sealed class A2AAgentContinuationToken : ResponseContinuationToken
{
public A2AAgentContinuationToken(string taskId)
{
this.TaskId = taskId;
}
public string TaskId { get; set; }
public static LongRunContinuationToken FromToken(ContinuationToken token)
{
if (token is LongRunContinuationToken longRunContinuationToken)
{
return longRunContinuationToken;
}
... // Deserialization logic
}
}
public class AgentRunOptions
{
public ResponseContinuationToken? ContinuationToken { get; set; }
}
public class AgentResponse
{
public ResponseContinuationToken? ContinuationToken { get; }
}
public class AgentResponseUpdate
{
public ResponseContinuationToken? ContinuationToken { get; }
}
// Usage example
AgentResponse response = await agent.RunAsync("What is the capital of France?");
AgentRunOptions options = new() { ContinuationToken = response.ContinuationToken };
while (response.ContinuationToken is { } token)
{
options.ContinuationToken = token;
response = await agent.RunAsync([], options);
}
Console.WriteLine(response.Text);
```
### 4. Continuation Token and Agent Thread
There are two types of agent threads: server-managed and client-managed. The server-managed threads live server-side and are identified by a conversation identifier, and
agents use the identifier to associate runs with the threads. The client-managed threads live client-side and are represented by a collection of chat messages that agents maintain
by adding user messages to them before sending the thread to the service and by adding the agent response back to the thread when received from the service.
When long-running operations are enabled and an agent is configured with tools, the initial run response may contain a tool call that needs to be invoked by the agent. If the agent runs
with a server-managed thread, the tool call will be captured as part of the conversation history server-side and follow-up runs will have access to it, and as a result the agent will invoke the tool.
However, if no thread is provided at the agent's initial run and a client-managed thread is provided for follow-up runs and the agent calls a tool, the tool call which the agent made
at the initial run will not be added to the client-managed thread since the initial run was made with no thread, and as a result the agent will not be able to invoke the tool.
#### 4.1 Require Thread for Long-Running Operations
This option suggests that AI agents require a thread to be provided when long-running operations are enabled. If no thread is provided, the agent will throw an exception.
**Pro:** Ensures agent responses are always captured by client-managed threads when long-running operations are enabled, providing a consistent experience for callers.
**Con:** May be inconvenient for callers to always provide a thread when long-running operations are enabled.
#### 4.2 Don't Require Thread for Long-Running Operations
This option suggests that AI agents don't require a thread to be provided when long-running operations are enabled. According to this option, it's up to the caller to ensure that
the thread is provided with background operations consistently for all runs.
**Pro:** Provides more flexibility to callers by not enforcing thread requirements.
**Con:** May lead to an inconsistent experience for callers if they forget to provide the thread for initial or follow-up runs.
## Decision Outcome
### Long-Running Execution Support for Chat Clients
- **Methods**: Option 1.4 - Use existing `Get{Streaming}ResponseAsync` for common operations; individual interfaces for uncommon operations (e.g., `ICancelableChatClient`)
- **Enabling**: Option 2.1 - Execution mode per invocation via `ChatOptions.AllowLongRunningResponses`
- **Status/Result**: Option 3.2 - Single method to get both status and result
- **RunId/UpdateId**: Option 4.2 - As properties of `ChatResponse{Update}`
- **Model**: Option 6.1.5 - Custom continuation token type
### Long-Running Operations Support for AF Agents
- **Methods**: Option 1.1 - Use existing `Run{Streaming}Async` for common operations; new methods for uncommon operations
- **Enabling**: Option 2.1 - Execution mode per invocation via `AgentRunOptions.AllowLongRunningResponses`
- **Model**: Option 3.1 - Custom continuation token type
- **Thread Requirement**: Option 4.1 - Require thread for long-running operations
## Addendum 1: APIs of Agents Supporting Long-Running Execution
OpenAI Responses
- Create a background response and wait for it to complete using polling:
```csharp
ClientResult result = await this._openAIResponseClient.CreateResponseAsync("What is SLM in AI?", new ResponseCreationOptions
{
Background = true,
});
// InProgress, Completed, Cancelled, Queued, Incomplete, Failed
while (result.Value.Status is (ResponseStatus.Queued or ResponseStatus.InProgress))
{
Thread.Sleep(500); // Wait for 0.5 seconds before checking the status again
result = await this._openAIResponseClient.GetResponseAsync(result.Value.Id);
}
Console.WriteLine($"Response Status: {result.Value.Status}"); // Completed
Console.WriteLine(result.Value.GetOutputText()); // SLM in the context of AI refers to ...
```
- Cancel a background response:
```csharp
...
ClientResult result = await this._openAIResponseClient.CreateResponseAsync("What is SLM in AI?", new ResponseCreationOptions
{
Background = true,
});
result = await this._openAIResponseClient.CancelResponseAsync(result.Value.Id);
Console.WriteLine($"Response Status: {result.Value.Status}"); // Cancelled
```
- Delete a background response:
```csharp
ClientResult result = await this._openAIResponseClient.CreateResponseAsync("What is SLM in AI?", new ResponseCreationOptions
{
Background = true,
});
ClientResult deleteResult = await this._openAIResponseClient.DeleteResponseAsync(result.Value.Id);
Console.WriteLine($"Response Deleted: {deleteResult.Value.Deleted}"); // True if the response was deleted successfully
```
- Streaming a background response
```csharp
await foreach (StreamingResponseUpdate update in this._openAIResponseClient.CreateResponseStreamingAsync("What is SLM in AI?", new ResponseCreationOptions { Background = true }))
{
Console.WriteLine($"Sequence Number: {update.SequenceNumber}"); // 0, 1, 2, etc.
switch (update)
{
case StreamingResponseCreatedUpdate createdUpdate:
Console.WriteLine($"Response Status: {createdUpdate.Response.Status}"); // Queued
break;
case StreamingResponseQueuedUpdate queuedUpdate:
Console.WriteLine($"Response Status: {queuedUpdate.Response.Status}"); // Queued
break;
case StreamingResponseInProgressUpdate inProgressUpdate:
Console.WriteLine($"Response Status: {inProgressUpdate.Response.Status}"); // InProgress
break;
case StreamingResponseOutputItemAddedUpdate outputItemAddedUpdate:
Console.WriteLine($"Output index: {outputItemAddedUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {outputItemAddedUpdate.Item.Id}");
break;
case StreamingResponseContentPartAddedUpdate contentPartAddedUpdate:
Console.WriteLine($"Output Index: {contentPartAddedUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {contentPartAddedUpdate.ItemId}");
Console.WriteLine($"Content Index: {contentPartAddedUpdate.ContentIndex}");
break;
case StreamingResponseOutputTextDeltaUpdate outputTextDeltaUpdate:
Console.WriteLine($"Output Index: {outputTextDeltaUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {outputTextDeltaUpdate.ItemId}");
Console.WriteLine($"Content Index: {outputTextDeltaUpdate.ContentIndex}");
Console.WriteLine($"Delta: {outputTextDeltaUpdate.Delta}"); // SL>M> in> AI> typically>....
break;
case StreamingResponseOutputTextDoneUpdate outputTextDoneUpdate:
Console.WriteLine($"Output Index: {outputTextDoneUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {outputTextDoneUpdate.ItemId}");
Console.WriteLine($"Content Index: {outputTextDoneUpdate.ContentIndex}");
Console.WriteLine($"Text: {outputTextDoneUpdate.Text}"); // SLM in the context of AI typically refers to ...
break;
case StreamingResponseContentPartDoneUpdate contentPartDoneUpdate:
Console.WriteLine($"Output Index: {contentPartDoneUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {contentPartDoneUpdate.ItemId}");
Console.WriteLine($"Content Index: {contentPartDoneUpdate.ContentIndex}");
Console.WriteLine($"Text: {contentPartDoneUpdate.Part.Text}"); // SLM in the context of AI typically refers to ...
break;
case StreamingResponseOutputItemDoneUpdate outputItemDoneUpdate:
Console.WriteLine($"Output Index: {outputItemDoneUpdate.OutputIndex}");
Console.WriteLine($"Item Id: {outputItemDoneUpdate.Item.Id}");
break;
case StreamingResponseCompletedUpdate completedUpdate:
Console.WriteLine($"Response Status: {completedUpdate.Response.Status}"); // Completed
Console.WriteLine($"Output: {completedUpdate.Response.GetOutputText()}"); // SLM in the context of AI typically refers to ...
break;
default:
Console.WriteLine($"Unexpected update type: {update.GetType().Name}");
break;
}
}
```
Docs: [OpenAI background mode](https://platform.openai.com/docs/guides/background)
- Background Mode Disabled
- Non-streaming API - returns the final result
| Method Call | Status | Result | Notes |
|-------------------------------------|-----------|---------------------------------|-------------------------------------|
| CreateResponseAsync(msgs, opts, ct) | Completed | The capital of France is Paris. | |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is less than 5 minutes old |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is more than 5 minutes old |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is more than 12 hours old |
| Cancellation Method | Result |
|---------------------|--------------------------------------|
| CancelResponseAsync | Cannot cancel a synchronous response |
- Streaming API - returns streaming updates callers can iterate over to get the result
| Method Call | Status | Result |
|----------------------------------------------|------------|----------------------------------------------------------------------------------|
| CreateResponseStreamingAsync(msgs, opts, ct) | - | updates |
| Iterating over updates | InProgress | - |
| Iterating over updates | InProgress | - |
| Iterating over updates | InProgress | The |
| Iterating over updates | InProgress | capital |
| Iterating over updates | InProgress | ... |
| Iterating over updates | InProgress | Paris. |
| Iterating over updates | Completed | The capital of France is Paris. |
| GetStreamingResponseAsync(responseId, ct) | - | HTTP 400 - Response cannot be streamed, it was not created with background=true. |
| Cancellation Method | Result |
|---------------------|--------------------------------------|
| CancelResponseAsync | Cannot cancel a synchronous response |
- Background Mode Enabled
- Non-streaming API - returns queued response immediately and allow polling for the status and result
| Method Call | Status | Result | Notes |
|-------------------------------------|-----------|---------------------------------|--------------------------------------------|
| CreateResponseAsync(msgs, opts, ct) | Queued | responseId | |
| GetResponseAsync(responseId, ct) | Queued | - | if called before the response is completed |
| GetResponseAsync(responseId, ct) | Queued | - | if called before the response is completed |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is less than 5 minutes old |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is more than 5 minutes old |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | response is more than 12 hours old |
The response started in background mode runs server-side until it completes, fails, or is cancelled. The client can poll for
the status of the response using its Id. If the client polls before the response is completed, it will get the latest status of the response.
If the client polls after the response is completed, it will get the completed response with the result.
| Cancellation Method | Result | Notes |
|---------------------|-----------|----------------------------------------|
| CancelResponseAsync | Cancelled | if cancelled before response completed |
| CancelResponseAsync | Completed | if cancelled after response completed |
| CancellationToken | No effect | it just cancels the client side call |
- Streaming API - returns streaming updates callers can iterate over immediately or after dropping the stream and picking it up later
| Method Call | Status | Result | Notes |
|----------------------------------------------|------------|--------------------------------------------------------------------------------|-------------------------------------------|
| CreateResponseStreamingAsync(msgs, opts, ct) | - | updates | |
| Iterating over updates | Queued | - | |
| Iterating over updates | Queued | - | |
| Iterating over updates | InProgress | - | |
| Iterating over updates | InProgress | - | |
| Iterating over updates | InProgress | The | |
| Iterating over updates | InProgress | capital | |
| Iterating over updates | InProgress | ... | |
| Iterating over updates | InProgress | Paris. | |
| Iterating over updates | Completed | The capital of France is Paris. | |
| GetStreamingResponseAsync(responseId, ct) | - | updates | response is less than 5 minutes old |
| Iterating over updates | Queued | - | |
| ... | ... | ... | |
| GetStreamingResponseAsync(responseId, ct) | - | HTTP 400 - Response can no longer be streamed, it is more than 5 minutes old. | response is more than 5 minutes old |
| GetResponseAsync(responseId, ct) | Completed | The capital of France is Paris. | accessing response that can't be streamed |
The streamed response that is not available after 5 minutes can be retrieved using the non-streaming API `GetResponseAsync`.
| Cancellation Method | Result | Notes |
|---------------------|------------------------------------|----------------------------------------|
| CancelResponseAsync | Canceled1 | if cancelled before response completed |
| CancelResponseAsync | Cannot cancel a completed response | if cancelled after response completed |
| CancellationToken | No effect | it just cancels the client side call |
1 The CancelResponseAsync method returns `Canceled` status, but a subsequent call to GetResponseStreamingAsync returns
an enumerable that can be iterated over to get the rest of the response until it completes.
Azure AI Foundry Agents
- Create a thread and run the agent against it and wait for it to complete using polling:
```csharp
// Create a thread with a message.
ThreadMessageOptions options = new(MessageRole.User, "What is SLM in AI?");
thread = await this._persistentAgentsClient!.Threads.CreateThreadAsync([options]);
// Run the agent on the thread.
ThreadRun threadRun = await this._persistentAgentsClient.Runs.CreateRunAsync(thread.Id, agent.Id);
// Poll for the run status.
// InProgress, Completed, Cancelling, Cancelled, Queued, Failed, RequiresAction, Expired
while (threadRun.Status == RunStatus.InProgress || threadRun.Status == RunStatus.Queued)
{
threadRun = await this._persistentAgentsClient.Runs.GetRunAsync(thread.Id, threadRun.Id);
}
// Access the run result.
await foreach (PersistentThreadMessage msg in this._persistentAgentsClient.Messages.GetMessagesAsync(thread.Id, threadRun.Id))
{
foreach (MessageContent content in msg.ContentItems)
{
switch (content)
{
case MessageTextContent textItem:
Console.WriteLine($" Text: {textItem.Text}");
//M1: In the context of Artificial Intelligence (AI), **SLM** often ...
//M2: What is SLM in AI?
break;
}
}
}
```
- Cancel an agent run:
```csharp
// Create a thread with a message.
ThreadMessageOptions options = new(MessageRole.User, "What is SLM in AI?");
thread = await this._persistentAgentsClient!.Threads.CreateThreadAsync([options]);
// Run the agent on the thread.
ThreadRun threadRun = await this._persistentAgentsClient.Runs.CreateRunAsync(thread.Id, agent.Id);
Response cancellationResponse = await this._persistentAgentsClient.Runs.CancelRunAsync(thread.Id, threadRun.Id);
```
- Other agent run operations:
GetRunStepAsync
A2A Agents
- Send message to agent and handle the response
```csharp
// Send message to the A2A agent.
A2AResponse response = await this.Client.SendMessageAsync(messageSendParams, cancellationToken).ConfigureAwait(false);
// Handle task responses.
if (response is AgentTask task)
{
while (task.Status.State == TaskState.Working)
{
task = await this.Client.GetTaskAsync(task.Id, cancellationToken).ConfigureAwait(false);
}
if (task.Artifacts != null && task.Artifacts.Count > 0)
{
foreach (var artifact in task.Artifacts)
{
foreach (var part in artifact.Parts)
{
if (part is TextPart textPart)
{
Console.WriteLine($"Result: {textPart.Text}");
}
}
}
Console.WriteLine();
}
}
// Handle message responses.
else if (response is Message message)
{
foreach (var part in message.Parts)
{
if (part is TextPart textPart)
{
Console.WriteLine($"Result: {textPart.Text}");
}
}
}
else
{
throw new InvalidOperationException("Unexpected response type from A2A client.");
}
```
- Cancel task
```csharp
// Send message to the A2A agent.
A2AResponse response = await this.Client.SendMessageAsync(messageSendParams, cancellationToken).ConfigureAwait(false);
// Cancel the task
if (response is AgentTask task)
{
await this.Client.CancelTaskAsync(new TaskIdParams() { Id = task.Id }, cancellationToken).ConfigureAwait(false);
}
```
================================================
FILE: docs/decisions/0010-ag-ui-support.md
================================================
---
status: accepted
contact: javiercn
date: 2025-10-29
deciders: javiercn, DeagleGross, moonbox3, markwallace-microsoft
consulted: Agent Framework team
informed: .NET community
---
# AG-UI Protocol Support for .NET Agent Framework
## Context and Problem Statement
The .NET Agent Framework needed a standardized way to enable communication between AI agents and user-facing applications with support for streaming, real-time updates, and bidirectional communication. Without AG-UI protocol support, .NET agents could not interoperate with the growing ecosystem of AG-UI-compatible frontends and agent frameworks (LangGraph, CrewAI, Pydantic AI, etc.), limiting the framework's adoption and utility.
The AG-UI (Agent-User Interaction) protocol is an open, lightweight, event-based protocol that addresses key challenges in agentic applications including streaming support for long-running agents, event-driven architecture for nondeterministic behavior, and protocol interoperability that complements MCP (tool/context) and A2A (agent-to-agent) protocols.
## Decision Drivers
- Need for streaming communication between agents and client applications
- Requirement for protocol interoperability with other AI frameworks
- Support for long-running, multi-turn conversation sessions
- Real-time UI updates for nondeterministic agent behavior
- Standardized approach to agent-to-UI communication
- Framework abstraction to protect consumers from protocol changes
## Considered Options
1. **Implement AG-UI event types as public API surface** - Expose AG-UI event models directly to consumers
2. **Use custom AIContent types for lifecycle events** - Create new content types (RunStartedContent, RunFinishedContent, RunErrorContent)
3. **Current approach** - Internal event types with framework-native abstractions
## Decision Outcome
Chosen option: "Current approach with internal event types and framework-native abstractions", because it:
- Protects consumers from protocol changes by keeping AG-UI events internal
- Maintains framework abstractions through conversion at boundaries
- Uses existing framework types (AgentResponseUpdate, ChatMessage) for public API
- Focuses on core text streaming functionality
- Leverages existing properties (ConversationId, ResponseId, ErrorContent) instead of custom types
- Provides bidirectional client and server support
### Implementation Details
**In Scope:**
1. **Client-side AG-UI consumption** (`Microsoft.Agents.AI.AGUI` package)
- `AGUIAgent` class for connecting to remote AG-UI servers
- `AGUIAgentThread` for managing conversation threads
- HTTP/SSE streaming support
- Event-to-framework type conversion
2. **Server-side AG-UI hosting** (`Microsoft.Agents.AI.Hosting.AGUI.AspNetCore` package)
- `MapAGUIAgent` extension method for ASP.NET Core
- Server-Sent Events (SSE) response formatting
- Framework-to-event type conversion
- Agent factory pattern for per-request instantiation
3. **Text streaming events**
- Lifecycle events: `RunStarted`, `RunFinished`, `RunError`
- Text message events: `TextMessageStart`, `TextMessageContent`, `TextMessageEnd`
- Thread and run ID management via `ConversationId` and `ResponseId`
### Key Design Decisions
1. **Event Models as Internal Types** - AG-UI event types are internal with conversion via extension methods; public API uses the existing types in Microsoft.Extensions.AI as those are the abstractions people are familiar with
2. **No Custom Content Types** - Run lifecycle communicated through existing `ChatResponseUpdate` properties (`ConversationId`, `ResponseId`) and standard `ErrorContent` type
3. **Agent Factory Pattern** - `MapAGUIAgent` uses factory function `(messages) => AIAgent` to allow request-specific agent configuration supporting multi-tenancy
4. **Bidirectional Conversion Architecture** - Symmetric conversion logic in shared namespace compiled into both packages for server (`AgentResponseUpdate` → AG-UI events) and client (AG-UI events → `AgentResponseUpdate`)
5. **Thread Management** - `AGUIAgentThread` stores only `ThreadId` with thread ID communicated via `ConversationId`; applications manage persistence for parity with other implementations and to be compliant with the protocol. Future extensions will support having the server manage the conversation.
6. **Custom JSON Converter** - Uses custom polymorphic deserialization via `BaseEventJsonConverter` instead of built-in System.Text.Json support to handle AG-UI protocol's flexible discriminator positioning
### Consequences
**Positive:**
- .NET developers can consume AG-UI servers from any framework
- .NET agents accessible from any AG-UI-compatible client
- Standardized streaming communication patterns
- Protected from protocol changes through internal implementation
- Symmetric conversion logic between client and server
- Framework-native public API surface
**Negative:**
- Custom JSON converter required (internal implementation detail)
- Shared code uses preprocessor directives (`#if ASPNETCORE`)
- Additional abstraction layer between protocol and public API
**Neutral:**
- Initial implementation focused on text streaming
- Applications responsible for thread persistence
================================================
FILE: docs/decisions/0011-create-get-agent-api.md
================================================
---
status: proposed
contact: dmytrostruk
date: 2025-12-12
deciders: dmytrostruk, markwallace-microsoft, eavanvalkenburg, giles17
---
# Create/Get Agent API
## Context and Problem Statement
There is a misalignment between the create/get agent API in the .NET and Python implementations.
In .NET, the `CreateAIAgent` method can create either a local instance of an agent or a remote instance if the backend provider supports it. For remote agents, once the agent is created, you can retrieve an existing remote agent by using the `GetAIAgent` method. If a backend provider doesn't support remote agents, `CreateAIAgent` just initializes a new local agent instance and `GetAIAgent` is not available. There is also a `BuildAIAgent` method, which is an extension for the `ChatClientBuilder` class from `Microsoft.Extensions.AI`. It builds pipelines of `IChatClient` instances with an `IServiceProvider`. This functionality does not exist in Python, so `BuildAIAgent` is out of scope.
In Python, there is only one `create_agent` method, which always creates a local instance of the agent. If the backend provider supports remote agents, the remote agent is created only on the first `agent.run()` invocation.
Below is a short summary of different providers and their APIs in .NET:
| Package | Method | Behavior | Python support |
|---|---|---|---|
| Microsoft.Agents.AI | `CreateAIAgent` (based on `IChatClient`) | Creates a local instance of `ChatClientAgent`. | Yes (`create_agent` in `BaseChatClient`). |
| Microsoft.Agents.AI.Anthropic | `CreateAIAgent` (based on `IBetaService` and `IAnthropicClient`) | Creates a local instance of `ChatClientAgent`. | Yes (`AnthropicClient` inherits `BaseChatClient`, which exposes `create_agent`). |
| Microsoft.Agents.AI.AzureAI (V2) | `GetAIAgent` (based on `AIProjectClient` with `AgentReference`) | Creates a local instance of `ChatClientAgent`. | Partial (Python uses `create_agent` from `BaseChatClient`). |
| Microsoft.Agents.AI.AzureAI (V2) | `GetAIAgent`/`GetAIAgentAsync` (with `Name`/`ChatClientAgentOptions`) | Fetches `AgentRecord` via HTTP, then creates a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.AzureAI (V2) | `CreateAIAgent`/`CreateAIAgentAsync` (based on `AIProjectClient`) | Creates a remote agent first, then wraps it into a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.AzureAI.Persistent (V1) | `GetAIAgent` (based on `PersistentAgentsClient` with `PersistentAgent`) | Creates a local instance of `ChatClientAgent`. | Partial (Python uses `create_agent` from `BaseChatClient`). |
| Microsoft.Agents.AI.AzureAI.Persistent (V1) | `GetAIAgent`/`GetAIAgentAsync` (with `AgentId`) | Fetches `PersistentAgent` via HTTP, then creates a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.AzureAI.Persistent (V1) | `CreateAIAgent`/`CreateAIAgentAsync` | Creates a remote agent first, then wraps it into a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.OpenAI | `GetAIAgent` (based on `AssistantClient` with `Assistant`) | Creates a local instance of `ChatClientAgent`. | Partial (Python uses `create_agent` from `BaseChatClient`). |
| Microsoft.Agents.AI.OpenAI | `GetAIAgent`/`GetAIAgentAsync` (with `AgentId`) | Fetches `Assistant` via HTTP, then creates a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.OpenAI | `CreateAIAgent`/`CreateAIAgentAsync` (based on `AssistantClient`) | Creates a remote agent first, then wraps it into a local `ChatClientAgent` instance. | No |
| Microsoft.Agents.AI.OpenAI | `CreateAIAgent` (based on `ChatClient`) | Creates a local instance of `ChatClientAgent`. | Yes (`create_agent` in `BaseChatClient`). |
| Microsoft.Agents.AI.OpenAI | `CreateAIAgent` (based on `OpenAIResponseClient`) | Creates a local instance of `ChatClientAgent`. | Yes (`create_agent` in `BaseChatClient`). |
Another difference between Python and .NET implementation is that in .NET `CreateAIAgent`/`GetAIAgent` methods are implemented as extension methods based on underlying SDK client, like `AIProjectClient` from Azure AI or `AssistantClient` from OpenAI:
```csharp
// Definition
public static ChatClientAgent CreateAIAgent(
this AIProjectClient aiProjectClient,
string name,
string model,
string instructions,
string? description = null,
IList? tools = null,
Func? clientFactory = null,
IServiceProvider? services = null,
CancellationToken cancellationToken = default)
{ }
// Usage
AIProjectClient aiProjectClient = new(new Uri(endpoint), new AzureCliCredential()); // Initialization of underlying SDK client
var newAgent = await aiProjectClient.CreateAIAgentAsync(name: AgentName, model: deploymentName, instructions: AgentInstructions, tools: [tool]); // ChatClientAgent creation from underlying SDK client
// Alternative usage (same as extension method, just explicit syntax)
var newAgent = await AzureAIProjectChatClientExtensions.CreateAIAgentAsync(
aiProjectClient,
name: AgentName,
model: deploymentName,
instructions: AgentInstructions,
tools: [tool]);
```
Python doesn't support extension methods. Currently `create_agent` method is defined on `BaseChatClient`, but this method only creates a local instance of `ChatAgent` and it can't create remote agents for providers that support it for a couple of reasons:
- It's defined as non-async.
- `BaseChatClient` implementation is stateful for providers like Azure AI or OpenAI Assistants. The implementation stores agent/assistant metadata like `AgentId` and `AgentName`, so currently it's not possible to create different instances of `ChatAgent` from a single `BaseChatClient` in case if the implementation is stateful.
## Decision Drivers
- API should be aligned between .NET and Python.
- API should be intuitive and consistent between backend providers in .NET and Python.
## Considered Options
Add missing implementations on the Python side. This should include the following:
### agent-framework-azure-ai (both V1 and V2)
- Add a `get_agent` method that accepts an underlying SDK agent instance and creates a local instance of `ChatAgent`.
- Add a `get_agent` method that accepts an agent identifier, performs an additional HTTP request to fetch agent data, and then creates a local instance of `ChatAgent`.
- Override the `create_agent` method from `BaseChatClient` to create a remote agent instance and wrap it into a local `ChatAgent`.
.NET:
```csharp
var agent1 = new AIProjectClient(...).GetAIAgent(agentInstanceFromSdkType); // Creates a local ChatClientAgent instance from Azure.AI.Projects.OpenAI.AgentReference
var agent2 = new AIProjectClient(...).GetAIAgent(agentName); // Fetches agent data, creates a local ChatClientAgent instance
var agent3 = new AIProjectClient(...).CreateAIAgent(...); // Creates a remote agent, returns a local ChatClientAgent instance
```
### agent-framework-core (OpenAI Assistants)
- Add a `get_agent` method that accepts an underlying SDK agent instance and creates a local instance of `ChatAgent`.
- Add a `get_agent` method that accepts an agent name, performs an additional HTTP request to fetch agent data, and then creates a local instance of `ChatAgent`.
- Override the `create_agent` method from `BaseChatClient` to create a remote agent instance and wrap it into a local `ChatAgent`.
.NET:
```csharp
var agent1 = new AssistantClient(...).GetAIAgent(agentInstanceFromSdkType); // Creates a local ChatClientAgent instance from OpenAI.Assistants.Assistant
var agent2 = new AssistantClient(...).GetAIAgent(agentId); // Fetches agent data, creates a local ChatClientAgent instance
var agent3 = new AssistantClient(...).CreateAIAgent(...); // Creates a remote agent, returns a local ChatClientAgent instance
```
### Possible Python implementations
Methods like `create_agent` and `get_agent` should be implemented separately or defined on some stateless component that will allow to create multiple agents from the same instance/place.
Possible options:
#### Option 1: Module-level functions
Implement free functions in the provider package that accept the underlying SDK client as the first argument (similar to .NET extension methods, but expressed in Python).
Example:
```python
from agent_framework.azure import create_agent, get_agent
ai_project_client = AIProjectClient(...)
# Creates a remote agent first, then returns a local ChatAgent wrapper
created_agent = await create_agent(
ai_project_client,
name="",
instructions="",
tools=[tool],
)
# Gets an existing remote agent and returns a local ChatAgent wrapper
first_agent = await get_agent(ai_project_client, agent_id=agent_id)
# Wraps an SDK agent instance (no extra HTTP call)
second_agent = get_agent(ai_project_client, agent_reference)
```
Pros:
- Naturally supports async `create_agent` / `get_agent`.
- Supports multiple agents per SDK client.
- Closest conceptual match to .NET extension methods while staying Pythonic.
Cons:
- Discoverability is lower (users need to know where the functions live).
- Verbose when creating multiple agents (client must be passed every time):
```python
agent1 = await azure_agents.create_agent(client, name="Agent1", ...)
agent2 = await azure_agents.create_agent(client, name="Agent2", ...)
```
#### Option 2: Provider object
Introduce a dedicated provider type that is constructed from the underlying SDK client, and exposes async `create_agent` / `get_agent` methods.
Example:
```python
from agent_framework.azure import AzureAIAgentProvider
ai_project_client = AIProjectClient(...)
provider = AzureAIAgentProvider(ai_project_client)
agent = await provider.create_agent(
name="",
instructions="",
tools=[tool],
)
agent = await provider.get_agent(agent_id=agent_id)
agent = provider.get_agent(agent_reference=agent_reference)
```
Pros:
- High discoverability and clear grouping of related behavior.
- Keeps SDK clients unchanged and supports multiple agents per SDK client.
- Concise when creating multiple agents (client passed once):
```python
provider = AzureAIAgentProvider(ai_project_client)
agent1 = await provider.create_agent(name="Agent1", ...)
agent2 = await provider.create_agent(name="Agent2", ...)
```
Cons:
- Adds a new public concept/type for users to learn.
#### Option 3: Inheritance (SDK client subclass)
Create a subclass of the underlying SDK client and add `create_agent` / `get_agent` methods.
Example:
```python
class ExtendedAIProjectClient(AIProjectClient):
async def create_agent(self, *, name: str, model: str, instructions: str, **kwargs) -> ChatAgent:
...
async def get_agent(self, *, agent_id: str | None = None, sdk_agent=None, **kwargs) -> ChatAgent:
...
client = ExtendedAIProjectClient(...)
agent = await client.create_agent(name="", instructions="")
```
Pros:
- Discoverable and ergonomic call sites.
- Mirrors the .NET “methods on the client” feeling.
Cons:
- Many SDK clients are not designed for inheritance; SDK upgrades can break subclasses.
- Users must opt into subclass everywhere.
- Typing/initialization can be tricky if the SDK client has non-trivial constructors.
#### Option 4: Monkey patching
Attach `create_agent` / `get_agent` methods to an SDK client class (or instance) at runtime.
Example:
```python
def _create_agent(self, *, name: str, model: str, instructions: str, **kwargs) -> ChatAgent:
...
AIProjectClient.create_agent = _create_agent # monkey patch
```
Pros:
- Produces “extension method-like” call sites without wrappers or subclasses.
Cons:
- Fragile across SDK updates and difficult to type-check.
- Surprising behavior (global side effects), potential conflicts across packages.
- Harder to support/debug, especially in larger apps and test suites.
## Decision Outcome
Implement `create_agent`/`get_agent`/`as_agent` API via **Option 2: Provider object**.
### Rationale
| Aspect | Option 1 (Functions) | Option 2 (Provider) |
|--------|----------------------|---------------------|
| Multiple implementations | One package may contain V1, V2, and other agent types. Function names like `create_agent` become ambiguous - which agent type does it create? | Each provider class is explicit: `AzureAIAgentsProvider` vs `AzureAIProjectAgentProvider` |
| Discoverability | Users must know to import specific functions from the package | IDE autocomplete on provider instance shows all available methods |
| Client reuse | SDK client must be passed to every function call: `create_agent(client, ...)`, `get_agent(client, ...)` | SDK client passed once at construction: `provider = Provider(client)` |
**Option 1 example:**
```python
from agent_framework.azure import create_agent, get_agent
agent1 = await create_agent(client, name="Agent1", ...) # Which agent type, V1 or V2?
agent2 = await create_agent(client, name="Agent2", ...) # Repetitive client passing
```
**Option 2 example:**
```python
from agent_framework.azure import AzureAIProjectAgentProvider
provider = AzureAIProjectAgentProvider(client) # Clear which service, client passed once
agent1 = await provider.create_agent(name="Agent1", ...)
agent2 = await provider.create_agent(name="Agent2", ...)
```
### Method Naming
| Operation | Python | .NET | Async |
|-----------|--------|------|-------|
| Create on service | `create_agent()` | `CreateAIAgent()` | Yes |
| Get from service | `get_agent(id=...)` | `GetAIAgent(agentId)` | Yes |
| Wrap SDK object | `as_agent(reference)` | `AsAIAgent(agentInstance)` | No |
The method names (`create_agent`, `get_agent`) do not explicitly mention "service" or "remote" because:
- In Python, the provider class name explicitly identifies the service (`AzureAIAgentsProvider`, `OpenAIAssistantProvider`), making additional qualifiers in method names redundant.
- In .NET, these are extension methods on `AIProjectClient` or `AssistantClient`, which already imply service operations.
### Provider Class Naming
| Package | Provider Class | SDK Client | Service |
|---------|---------------|------------|---------|
| `agent_framework.azure` | `AzureAIProjectAgentProvider` | `AIProjectClient` | Azure AI Agent Service, based on Responses API (V2) |
| `agent_framework.azure` | `AzureAIAgentsProvider` | `AgentsClient` | Azure AI Agent Service (V1) |
| `agent_framework.openai` | `OpenAIAssistantProvider` | `AsyncOpenAI` | OpenAI Assistants API |
> **Note:** Azure AI naming is temporary. Final naming will be updated according to Azure AI / Microsoft Foundry renaming decisions.
### Usage Examples
#### Azure AI Agent Service V2 (based on Responses API)
```python
from agent_framework.azure import AzureAIProjectAgentProvider
from azure.ai.projects import AIProjectClient
client = AIProjectClient(endpoint, credential)
provider = AzureAIProjectAgentProvider(client)
# Create new agent on service
agent = await provider.create_agent(name="MyAgent", model="gpt-4", instructions="...")
# Get existing agent by name
agent = await provider.get_agent(agent_name="MyAgent")
# Wrap already-fetched SDK object (no HTTP calls)
agent_ref = await client.agents.get("MyAgent")
agent = provider.as_agent(agent_ref)
```
#### Azure AI Persistent Agents V1
```python
from agent_framework.azure import AzureAIAgentsProvider
from azure.ai.agents import AgentsClient
client = AgentsClient(endpoint, credential)
provider = AzureAIAgentsProvider(client)
agent = await provider.create_agent(name="MyAgent", model="gpt-4", instructions="...")
agent = await provider.get_agent(agent_id="persistent-agent-456")
agent = provider.as_agent(persistent_agent)
```
#### OpenAI Assistants
```python
from agent_framework.openai import OpenAIAssistantProvider
from openai import OpenAI
client = OpenAI()
provider = OpenAIAssistantProvider(client)
agent = await provider.create_agent(name="MyAssistant", model="gpt-4", instructions="...")
agent = await provider.get_agent(assistant_id="asst_123")
agent = provider.as_agent(assistant)
```
#### Local-Only Agents (No Provider)
Current method `create_agent` (python) / `CreateAIAgent` (.NET) can be renamed to `as_agent` (python) / `AsAIAgent` (.NET) to emphasize the conversion logic rather than creation/initialization logic and to avoid collision with `create_agent` method for remote calls.
```python
from agent_framework import ChatAgent
from agent_framework.openai import OpenAIChatClient
# Convert chat client to ChatAgent (no remote service involved)
client = OpenAIChatClient(model="gpt-4")
agent = client.as_agent(name="LocalAgent", instructions="...") # instead of create_agent
```
### Adding New Agent Types
Python:
1. Create provider class in appropriate package.
2. Implement `create_agent`, `get_agent`, `as_agent` as applicable.
.NET:
1. Create static class for extension methods.
2. Implement `CreateAIAgentAsync`, `GetAIAgentAsync`, `AsAIAgent` as applicable.
================================================
FILE: docs/decisions/0012-python-typeddict-options.md
================================================
---
# These are optional elements. Feel free to remove any of them.
status: proposed
contact: eavanvalkenburg
date: 2026-01-08
deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon
consulted: taochenosu, moonbox3, dmytrostruk, giles17
---
# Leveraging TypedDict and Generic Options in Python Chat Clients
## Context and Problem Statement
The Agent Framework Python SDK provides multiple chat client implementations for different providers (OpenAI, Anthropic, Azure AI, Bedrock, Ollama, etc.). Each provider has unique configuration options beyond the common parameters defined in `ChatOptions`. Currently, developers using these clients lack type safety and IDE autocompletion for provider-specific options, leading to runtime errors and a poor developer experience.
How can we provide type-safe, discoverable options for each chat client while maintaining a consistent API across all implementations?
## Decision Drivers
- **Type Safety**: Developers should get compile-time/static analysis errors when using invalid options
- **IDE Support**: Full autocompletion and inline documentation for all available options
- **Extensibility**: Users should be able to define custom options that extend provider-specific options
- **Consistency**: All chat clients should follow the same pattern for options handling
- **Provider Flexibility**: Each provider can expose its unique options without affecting the common interface
## Considered Options
- **Option 1: Status Quo - Class `ChatOptions` with `**kwargs`**
- **Option 2: TypedDict with Generic Type Parameters**
### Option 1: Status Quo - Class `ChatOptions` with `**kwargs`
The current approach uses a base `ChatOptions` Class with common parameters, and provider-specific options are passed via `**kwargs` or loosely typed dictionaries.
```python
# Current usage - no type safety for provider-specific options
response = await client.get_response(
messages=messages,
temperature=0.7,
top_k=40,
random=42, # No validation
)
```
**Pros:**
- Simple implementation
- Maximum flexibility
**Cons:**
- No type checking for provider-specific options
- No IDE autocompletion for available options
- Runtime errors for typos or invalid options
- Documentation must be consulted for each provider
### Option 2: TypedDict with Generic Type Parameters (Chosen)
Each chat client is parameterized with a TypeVar bound to a provider-specific `TypedDict` that extends `ChatOptions`. This enables full type safety and IDE support.
```python
# Provider-specific TypedDict
class AnthropicChatOptions(ChatOptions, total=False):
"""Anthropic-specific chat options."""
top_k: int
thinking: ThinkingConfig
# ... other Anthropic-specific options
# Generic chat client
class AnthropicChatClient(ChatClientBase[TAnthropicChatOptions]):
...
client = AnthropicChatClient(...)
# Usage with full type safety
response = await client.get_response(
messages=messages,
options={
"temperature": 0.7,
"top_k": 40,
"random": 42, # fails type checking and IDE would flag this
}
)
# Users can extend for custom options
class MyAnthropicOptions(AnthropicChatOptions, total=False):
custom_field: str
client = AnthropicChatClient[MyAnthropicOptions](...)
# Usage of custom options with full type safety
response = await client.get_response(
messages=messages,
options={
"temperature": 0.7,
"top_k": 40,
"custom_field": "value",
}
)
```
**Pros:**
- Full type safety with static analysis
- IDE autocompletion for all options
- Compile-time error detection
- Self-documenting through type hints
- Users can extend options for their specific needs or advances in models
**Cons:**
- More complex implementation
- Some type: ignore comments needed for TypedDict field overrides
- Minor: Requires TypeVar with default (Python 3.13+ or typing_extensions)
> [NOTE!]
> In .NET this is already achieved through overloads on the `GetResponseAsync` method for each provider-specific options class, e.g., `AnthropicChatOptions`, `OpenAIChatOptions`, etc. So this does not apply to .NET.
### Implementation Details
1. **Base Protocol**: `ChatClientProtocol[TOptions]` is generic over options type, with default set to `ChatOptions` (the new TypedDict)
2. **Provider TypedDicts**: Each provider defines its options extending `ChatOptions`
They can even override fields with type=None to indicate they are not supported.
3. **TypeVar Pattern**: `TProviderOptions = TypeVar("TProviderOptions", bound=TypedDict, default=ProviderChatOptions, contravariant=True)`
4. **Option Translation**: Common options are kept in place,and explicitly documented in the Options class how they are used. (e.g., `user` → `metadata.user_id`) in `_prepare_options` (for Anthropic) to preserve easy use of common options.
## Decision Outcome
Chosen option: **"Option 2: TypedDict with Generic Type Parameters"**, because it provides full type safety, excellent IDE support with autocompletion, and allows users to extend provider-specific options for their use cases. Extended this Generic to ChatAgents in order to also properly type the options used in agent construction and run methods.
See [typed_options.py](../../python/samples/02-agents/typed_options.py) for a complete example demonstrating the usage of typed options with custom extensions.
================================================
FILE: docs/decisions/0013-python-get-response-simplification.md
================================================
---
status: Accepted
contact: eavanvalkenburg
date: 2026-01-06
deciders: markwallace-microsoft, dmytrostruk, taochenosu, alliscode, moonbox3, sphenry
consulted: sergeymenshykh, rbarreto, dmytrostruk, westey-m
informed:
---
# Simplify Python Get Response API into a single method
## Context and Problem Statement
Currently chat clients must implement two separate methods to get responses, one for streaming and one for non-streaming. This adds complexity to the client implementations and increases the maintenance burden. This was likely done because the .NET version cannot do proper typing with a single method, in Python this is possible and this for instance is also how the OpenAI python client works, this would then also make it simpler to work with the Python version because there is only one method to learn about instead of two.
## Implications of this change
### Current Architecture Overview
The current design has **two separate methods** at each layer:
| Layer | Non-streaming | Streaming |
|-------|---------------|-----------|
| **Protocol** | `get_response()` → `ChatResponse` | `get_streaming_response()` → `AsyncIterable[ChatResponseUpdate]` |
| **BaseChatClient** | `get_response()` (public) | `get_streaming_response()` (public) |
| **Implementation** | `_inner_get_response()` (private) | `_inner_get_streaming_response()` (private) |
### Key Usage Areas Identified
#### 1. **ChatAgent** (_agents.py)
- `run()` → calls `self.chat_client.get_response()`
- `run_stream()` → calls `self.chat_client.get_streaming_response()`
These are parallel methods on the agent, so consolidating the client methods would **not break** the agent API. You could keep `agent.run()` and `agent.run_stream()` unchanged while internally calling `get_response(stream=True/False)`.
#### 2. **Function Invocation Decorator** (_tools.py)
This is **the most impacted area**. Currently:
- `_handle_function_calls_response()` decorates `get_response`
- `_handle_function_calls_streaming_response()` decorates `get_streaming_response`
- The `use_function_invocation` class decorator wraps **both methods separately**
**Impact**: The decorator logic is almost identical (~200 lines each) with small differences:
- Non-streaming collects response, returns it
- Streaming yields updates, returns async iterable
With a unified method, you'd need **one decorator** that:
- Checks the `stream` parameter
- Uses `@overload` to determine return type
- Handles both paths with conditional logic
- The new decorator could be applied just on the method, instead of the whole class.
This would **reduce code duplication** but add complexity to a single function.
#### 3. **Observability/Instrumentation** (observability.py)
Same pattern as function invocation:
- `_trace_get_response()` wraps `get_response`
- `_trace_get_streaming_response()` wraps `get_streaming_response`
- `use_instrumentation` decorator applies both
**Impact**: Would need consolidation into a single tracing wrapper.
#### 4. **Chat Middleware** (_middleware.py)
The `use_chat_middleware` decorator also wraps both methods separately with similar logic.
#### 5. **AG-UI Client** (_client.py)
Wraps both methods to unwrap server function calls:
```python
original_get_streaming_response = chat_client.get_streaming_response
original_get_response = chat_client.get_response
```
#### 6. **Provider Implementations** (all subpackages)
All subclasses implement both `_inner_*` methods, except:
- OpenAI Assistants Client (and similar clients, such as Foundry Agents V1) - it implements `_inner_get_response` by calling `_inner_get_streaming_response`
### Implications of Consolidation
| Aspect | Impact |
|--------|--------|
| **Type Safety** | Overloads work well: `@overload` with `Literal[True]` → `AsyncIterable`, `Literal[False]` → `ChatResponse`. Runtime return type based on `stream` param. |
| **Breaking Change** | **Major breaking change** for anyone implementing custom chat clients. They'd need to update from 2 methods to 1 (or 2 inner methods to 1). |
| **Decorator Complexity** | All 3 decorator systems (function invocation, middleware, observability) would need refactoring to handle both paths in one wrapper. |
| **Code Reduction** | Significant reduction in _tools.py (~200 lines of near-duplicate code) and other decorators. |
| **Samples/Tests** | Many samples call `get_streaming_response()` directly - would need updates. |
| **Protocol Simplification** | `ChatClientProtocol` goes from 2 methods + 1 property to 1 method + 1 property. |
### Recommendation
The consolidation makes sense architecturally, but consider:
1. **The overload pattern with `stream: bool`** works well in Python typing:
```python
@overload
async def get_response(self, messages, *, stream: Literal[True] = True, ...) -> AsyncIterable[ChatResponseUpdate]: ...
@overload
async def get_response(self, messages, *, stream: Literal[False] = False, ...) -> ChatResponse: ...
```
2. **The decorator complexity** is the biggest concern. The current approach of separate decorators for separate methods is cleaner than conditional logic inside one wrapper.
## Decision Drivers
- Reduce code needed to implement a Chat Client, simplify the public API for chat clients
- Reduce code duplication in decorators and middleware
- Maintain type safety and clarity in method signatures
## Considered Options
1. Status quo: Keep separate methods for streaming and non-streaming
2. Consolidate into a single `get_response` method with a `stream` parameter
3. Option 2 plus merging `agent.run` and `agent.run_stream` into a single method with a `stream` parameter as well
## Option 1: Status Quo
- Good: Clear separation of streaming vs non-streaming logic
- Good: Aligned with .NET design, although it is already `run` for Python and `RunAsync` for .NET
- Bad: Code duplication in decorators and middleware
- Bad: More complex client implementations
## Option 2: Consolidate into Single Method
- Good: Simplified public API for chat clients
- Good: Reduced code duplication in decorators
- Good: Smaller API footprint for users to get familiar with
- Good: People using OpenAI directly already expect this pattern
- Bad: Increased complexity in decorators and middleware
- Bad: Less alignment with .NET design (`get_response(stream=True)` vs `GetStreamingResponseAsync`)
## Option 3: Consolidate + Merge Agent and Workflow Methods
- Good: Further simplifies agent and workflow implementation
- Good: Single method for all chat interactions
- Good: Smaller API footprint for users to get familiar with
- Good: People using OpenAI directly already expect this pattern
- Good: Workflows internally already use a single method (_run_workflow_with_tracing), so would eliminate public API duplication as well, with hardly any code changes
- Bad: More breaking changes for agent users
- Bad: Increased complexity in agent implementation
- Bad: More extensive misalignment with .NET design (`run(stream=True)` vs `RunStreamingAsync` in addition to `get_response` change)
## Misc
Smaller questions to consider:
- Should default be `stream=False` or `stream=True`? (Current is False)
- Default to `False` makes it simpler for new users, as non-streaming is easier to handle.
- Default to `False` aligns with existing behavior.
- Streaming tends to be faster, so defaulting to `True` could improve performance for common use cases.
- Should this differ between ChatClient, Agent and Workflows? (e.g., Agent and Workflow defaults to streaming, ChatClient to non-streaming)
## Decision Outcome
Chosen Option: **Option 3: Consolidate + Merge Agent and Workflow Methods**
Since this is the most pythonic option and it reduces the API surface and code duplication the most, we will go with this option.
We will keep the default of `stream=False` for all methods to maintain backward compatibility and simplicity for new users.
# Appendix
## Code Samples for Consolidated Method
### Python - Option 3: Direct ChatClient + Agent with Single Method
```python
# Copyright (c) Microsoft. All rights reserved.
import asyncio
from random import randint
from typing import Annotated
from agent_framework import ChatAgent
from agent_framework.openai import OpenAIChatClient
from pydantic import Field
def get_weather(
location: Annotated[str, Field(description="The location to get the weather for.")],
) -> str:
"""Get the weather for a given location."""
conditions = ["sunny", "cloudy", "rainy", "stormy"]
return f"The weather in {location} is {conditions[randint(0, 3)]} with a high of {randint(10, 30)}°C."
async def main() -> None:
# Example 1: Direct ChatClient usage with single method
client = OpenAIChatClient()
message = "What's the weather in Amsterdam and in Paris?"
# Non-streaming usage
print(f"User: {message}")
response = await client.get_response(message, tools=get_weather)
print(f"Assistant: {response.text}")
# Streaming usage - same method, different parameter
print(f"\nUser: {message}")
print("Assistant: ", end="")
async for chunk in client.get_response(message, tools=get_weather, stream=True):
if chunk.text:
print(chunk.text, end="")
print("")
# Example 2: Agent usage with single method
agent = ChatAgent(
chat_client=client,
tools=get_weather,
name="WeatherAgent",
instructions="You are a weather assistant.",
)
thread = agent.get_new_thread()
# Non-streaming agent
print(f"\nUser: {message}")
result = await agent.run(message, thread=thread) # default would be stream=False
print(f"{agent.name}: {result.text}")
# Streaming agent - same method, different parameter
print(f"\nUser: {message}")
print(f"{agent.name}: ", end="")
async for update in agent.run(message, thread=thread, stream=True):
if update.text:
print(update.text, end="")
print("")
if __name__ == "__main__":
asyncio.run(main())
```
### .NET - Current pattern for comparison
```csharp
// Copyright (c) Microsoft. All rights reserved.
using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Agents.AI;
using OpenAI.Chat;
var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")
?? throw new InvalidOperationException("AZURE_OPENAI_ENDPOINT is not set.");
var deploymentName = Environment.GetEnvironmentVariable("AZURE_OPENAI_DEPLOYMENT_NAME") ?? "gpt-4o-mini";
AIAgent agent = new AzureOpenAIClient(
new Uri(endpoint),
new AzureCliCredential())
.GetChatClient(deploymentName)
.CreateAIAgent(
instructions: "You are good at telling jokes about pirates.",
name: "PirateJoker");
// Non-streaming: Returns a string directly
Console.WriteLine("=== Non-streaming ===");
string result = await agent.RunAsync("Tell me a joke about a pirate.");
Console.WriteLine(result);
// Streaming: Returns IAsyncEnumerable
Console.WriteLine("\n=== Streaming ===");
await foreach (AgentUpdate update in agent.RunStreamingAsync("Tell me a joke about a pirate."))
{
Console.Write(update);
}
Console.WriteLine();
```
================================================
FILE: docs/decisions/0014-feature-collections.md
================================================
---
status: accepted
contact: westey-m
date: 2025-01-21
deciders: sergeymenshykh, markwallace, rbarreto, westey-m, stephentoub
consulted: reubenbond
informed:
---
# Feature Collections
## Context and Problem Statement
When using agents, we often have cases where we want to pass some arbitrary services or data to an agent or some component in the agent execution stack.
These services or data are not necessarily known at compile time and can vary by the agent stack that the user has built.
E.g., there may be an agent decorator or chat client decorator that was added to the stack by the user, and an arbitrary payload needs to be passed to that decorator.
Since these payloads are related to components that are not integral parts of the agent framework, they cannot be added as strongly typed settings to the agent run options.
However, the payloads could be added to the agent run options as loosely typed 'features', that can be retrieved as needed.
In some cases certain classes of agents may support the same capability, but not all agents do.
Having the configuration for such a capability on the main abstraction would advertise the functionality to all users, even if their chosen agent does not support it.
The user may type test for certain agent types, and call overloads on the appropriate agent types, with the strongly typed configuration.
Having a feature collection though, would be an alternative way of passing such configuration, without needing to type check the agent type.
All agents that support the functionality would be able to check for the configuration and use it, simplifying the user code.
If the agent does not support the capability, that configuration would be ignored.
### Sample Scenario 1 - Per Run ChatMessageStore Override for hosting Libraries
We are building an agent hosting library, that can host any agent built using the agent framework.
Where an agent is not built on a service that uses in-service chat history storage, the hosting library wants to force the agent to use
the hosting library's chat history storage implementation.
This chat history storage implementation may be specifically tailored to the type of protocol that the hosting library uses, e.g. conversation id based storage or response id based storage.
The hosting library does not know what type of agent it is hosting, so it cannot provide a strongly typed parameter on the agent.
Instead, it adds the chat history storage implementation to a feature collection, and if the agent supports custom chat history storage, it retrieves the implementation from the feature collection and uses it.
```csharp
// Pseudo-code for an agent hosting library that supports conversation id based hosting.
public async Task HandleConversationsBasedRequestAsync(AIAgent agent, string conversationId, string userInput)
{
var thread = await this._threadStore.GetOrCreateThread(conversationId);
// The hosting library can set a per-run chat message store via Features that only applies for that run.
// This message store will load and save messages under the conversation id provided.
ConversationsChatMessageStore messageStore = new(this._dbClient, conversationId);
var response = await agent.RunAsync(
userInput,
thread,
options: new AgentRunOptions()
{
Features = new AgentFeatureCollection().WithFeature(messageStore)
});
await this._threadStore.SaveThreadAsync(conversationId, thread);
return response.Text;
}
// Pseudo-code for an agent hosting library that supports response id based hosting.
public async Task<(string responseMessage, string responseId)> HandleResponseIdBasedRequestAsync(AIAgent agent, string previousResponseId, string userInput)
{
var thread = await this._threadStore.GetOrCreateThreadAsync(previousResponseId);
// The hosting library can set a per-run chat message store via Features that only applies for that run.
// This message store will buffer newly added messages until explicitly saved after the run.
ResponsesChatMessageStore messageStore = new(this._dbClient, previousResponseId);
var response = await agent.RunAsync(
userInput,
thread,
options: new AgentRunOptions()
{
Features = new AgentFeatureCollection().WithFeature(messageStore)
});
// Since the message store may not actually have been used at all (if the agent's underlying chat client requires service-based chat history storage),
// we may not have anything to save back to the database.
// We still want to generate a new response id though, so that we can save the updated thread state under that id.
// We should also use the same id to save any buffered messages in the message store if there are any.
var newResponseId = this.GenerateResponseId();
if (messageStore.HasBufferedMessages)
{
await messageStore.SaveBufferedMessagesAsync(newResponseId);
}
// Save the updated thread state under the new response id that was generated by the store.
await this._threadStore.SaveThreadAsync(newResponseId, thread);
return (response.Text, newResponseId);
}
```
### Sample Scenario 2 - Structured output
Currently our base abstraction does not support structured output, since the capability is not supported by all agents.
For those agents that don't support structured output, we could add an agent decorator that takes the response from the underlying agent, and applies structured output parsing on top of it via an additional LLM call.
If we add structured output configuration as a feature, then any agent that supports structured output could retrieve the configuration from the feature collection and apply it, and where it is not supported, the configuration would simply be ignored.
We could add a simple StructuredOutputAgentFeature that can be added to the list of features and also be used to return the generated structured output.
```csharp
internal class StructuredOutputAgentFeature
{
public Type? OutputType { get; set; }
public JsonSerializerOptions? SerializerOptions { get; set; }
public bool? UseJsonSchemaResponseFormat { get; set; }
// Contains the result of the structured output parsing request.
public ChatResponse? ChatResponse { get; set; }
}
```
We can add a simple decorator class that does the chat client invocation.
```csharp
public class StructuredOutputAgent : DelegatingAIAgent
{
private readonly IChatClient _chatClient;
public StructuredOutputAgent(AIAgent innerAgent, IChatClient chatClient)
: base(innerAgent)
{
this._chatClient = Throw.IfNull(chatClient);
}
public override async Task RunAsync(
IEnumerable messages,
AgentThread? thread = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
// Run the inner agent first, to get back the text response we want to convert.
var response = await base.RunAsync(messages, thread, options, cancellationToken).ConfigureAwait(false);
if (options?.Features?.TryGet(out var responseFormatFeature) is true
&& responseFormatFeature.OutputType is not null)
{
// Create the chat options to request structured output.
ChatOptions chatOptions = new()
{
ResponseFormat = ChatResponseFormat.ForJsonSchema(responseFormatFeature.OutputType, responseFormatFeature.SerializerOptions)
};
// Invoke the chat client to transform the text output into structured data.
// The feature is updated with the result.
// The code can be simplified by adding a non-generic structured output GetResponseAsync
// overload that takes Type as input.
responseFormatFeature.ChatResponse = await this._chatClient.GetResponseAsync(
messages: new[]
{
new ChatMessage(ChatRole.System, "You are a json expert and when provided with any text, will convert it to the requested json format."),
new ChatMessage(ChatRole.User, response.Text)
},
options: chatOptions,
cancellationToken: cancellationToken).ConfigureAwait(false);
}
return response;
}
}
```
Finally, we can add an extension method on `AIAgent` that can add the feature to the run options and check the feature for the structured output result and add the deserialized result to the response.
```csharp
public static async Task> RunAsync(
this AIAgent agent,
IEnumerable messages,
AgentThread? thread = null,
JsonSerializerOptions? serializerOptions = null,
AgentRunOptions? options = null,
bool? useJsonSchemaResponseFormat = null,
CancellationToken cancellationToken = default)
{
// Create the structured output feature.
var structuredOutputFeature = new StructuredOutputAgentFeature();
structuredOutputFeature.OutputType = typeof(T);
structuredOutputFeature.UseJsonSchemaResponseFormat = useJsonSchemaResponseFormat;
// Run the agent.
options ??= new AgentRunOptions();
options.Features ??= new AgentFeatureCollection();
options.Features.Set(structuredOutputFeature);
var response = await agent.RunAsync(messages, thread, options, cancellationToken).ConfigureAwait(false);
// Deserialize the JSON output.
if (structuredOutputFeature.ChatResponse is not null)
{
var typed = new ChatResponse(structuredOutputFeature.ChatResponse, serializerOptions ?? AgentJsonUtilities.DefaultOptions);
return new AgentRunResponse(response, typed.Result);
}
throw new InvalidOperationException("No structured output response was generated by the agent.");
}
```
We can then use the extension method with any agent that supports structured output or that has
been decorated with the `StructuredOutputAgent` decorator.
```csharp
agent = new StructuredOutputAgent(agent, chatClient);
AgentRunResponse response = await agent.RunAsync([new ChatMessage(
ChatRole.User,
"Please provide information about John Smith, who is a 35-year-old software engineer.")]);
```
## Implementation Options
Three options were considered for implementing feature collections:
- **Option 1**: FeatureCollections similar to ASP.NET Core
- **Option 2**: AdditionalProperties Dictionary
- **Option 3**: IServiceProvider
Here are some comparisons about their suitability for our use case:
| Criteria | Feature Collection | Additional Properties | IServiceProvider |
|------------------|--------------------|-----------------------|------------------|
|Ease of use |✅ Good |❌ Bad |✅ Good |
|User familiarity |❌ Bad |✅ Good |✅ Good |
|Type safety |✅ Good |❌ Bad |✅ Good |
|Ability to modify registered options when progressing down the stack|✅ Supported|✅ Supported|❌ Not-Supported (IServiceProvider is read-only)|
|Already available in MEAI stack|❌ No|✅ Yes|❌ No|
|Ambiguity with existing AdditionalProperties|❌ Yes|✅ No|❌ Yes|
## IServiceProvider
Service Collections and Service Providers provide a very popular way to register and retrieve services by type and could be used as a way to pass features to agents and chat clients.
However, since IServiceProvider is read-only, it is not possible to modify the registered services when progressing down the execution stack.
E.g. an agent decorator cannot add additional services to the IServiceProvider passed to it when calling into the inner agent.
IServiceProvider also does not expose a way to list all services contained in it, making it difficult to copy services from one provider to another.
This lack of mutability makes IServiceProvider unsuitable for our use case, since we will not be able to use it to build sample scenario 2.
## AdditionalProperties dictionary
The AdditionalProperties dictionary is already available on various options classes in the agent framework as well as in the MEAI stack and
allows storing arbitrary key/value pairs, where the key is a string and the value is an object.
While FeatureCollection uses Type as a key, AdditionalProperties uses string keys.
This means that users need to agree on string keys to use for specific features, however it is also possible to use Type.FullName as a key by convention
to avoid key collisions, which is an easy convention to follow.
Since the value of AdditionalProperties is of type object, users need to cast the value to the expected type when retrieving it, which is also
a drawback, but when using the convention of using Type.FullName as a key, there is at least a clear expectation of what type to cast to.
```csharp
// Setting a feature
options.AdditionalProperties[typeof(MyFeature).FullName] = new MyFeature();
// Retrieving a feature
if (options.AdditionalProperties.TryGetValue(typeof(MyFeature).FullName, out var featureObj)
&& featureObj is MyFeature myFeature)
{
// Use myFeature
}
```
It would also be possible to add extension methods to simplify setting and getting features from AdditionalProperties.
Having a base class for features should help make this more feature rich.
```csharp
// Setting a feature, this can use Type.FullName as the key.
options.AdditionalProperties
.WithFeature(new MyFeature());
// Retrieving a feature, this can use Type.FullName as the key.
if (options.AdditionalProperties.TryGetFeature(out var myFeature))
{
// Use myFeature
}
```
It would also be possible to add extension methods for a feature to simplify setting and getting features from AdditionalProperties.
```csharp
// Setting a feature
options.AdditionalProperties
.WithMyFeature(new MyFeature());
// Retrieving a feature
if (options.AdditionalProperties.TryGetMyFeature(out var myFeature))
{
// Use myFeature
}
```
## Feature Collection
If we choose the feature collection option, we need to decide on the design of the feature collection itself.
### Feature Collections extension points
We need to decide the set of actions that feature collections would be supported for. Here is the suggested list of actions:
**MAAI.AIAgent:**
1. GetNewThread
1. E.g. this would allow passing an already existing storage id for the thread to use, or an initialized custom chat message store to use.
1. DeserializeThread
1. E.g. this would allow passing an already existing storage id for the thread to use, or an initialized custom chat message store to use.
1. Run / RunStreaming
1. E.g. this would allow passing an override chat message store just for that run, or a desired schema for a structured output middleware component.
**MEAI.ChatClient:**
1. GetResponse / GetStreamingResponse
### Reconciling with existing AdditionalProperties
If we decide to add feature collections, separately from the existing AdditionalProperties dictionaries, we need to consider how to explain to users when to use each one.
One possible approach though is to have the one use the other under the hood.
AdditionalProperties could be stored as a feature in the feature collection.
Users would be able to retrieve additional properties from the feature collection, in addition to retrieving it via a dedicated AdditionalProperties property.
E.g. `features.Get()`
One challenge with this approach is that when setting a value in the AdditionalProperties dictionary, the feature collection would need to be created first if it does not already exist.
```csharp
public class AgentRunOptions
{
public AdditionalPropertiesDictionary? AdditionalProperties { get; set; }
public IAgentFeatureCollection? Features { get; set; }
}
var options = new AgentRunOptions();
// This would need to create the feature collection first, if it does not already exist.
options.AdditionalProperties ??= new AdditionalPropertiesDictionary();
```
Since IAgentFeatureCollection is an interface, AgentRunOptions would need to have a concrete implementation of the interface to create, meaning that the user cannot decide.
It also means that if the user doesn't realise that AdditionalProperties is implemented using feature collections, they may set a value on AdditionalProperties, and then later overwrite the entire feature collection, losing the AdditionalProperties feature.
Options to avoid these issues:
1. Make `Features` readonly.
1. This would prevent the user from overwriting the feature collection after setting AdditionalProperties.
1. Since the user cannot set their own implementation of IAgentFeatureCollection, having an interface for it may not be necessary.
### Feature Collection Implementation
We have two options for implementing feature collections:
1. Create our own [IAgentFeatureCollection interface](https://github.com/microsoft/agent-framework/pull/2354/files#diff-9c42f3e60d70a791af9841d9214e038c6de3eebfc10e3997cb4cdffeb2f1246d) and [implementation](https://github.com/microsoft/agent-framework/pull/2354/files#diff-a435cc738baec500b8799f7f58c1538e3bb06c772a208afc2615ff90ada3f4ca).
2. Reuse the asp.net [IFeatureCollection interface](https://github.com/dotnet/aspnetcore/blob/main/src/Extensions/Features/src/IFeatureCollection.cs) and [implementation](https://github.com/dotnet/aspnetcore/blob/main/src/Extensions/Features/src/FeatureCollection.cs).
#### Roll our own
Advantages:
Creating our own IAgentFeatureCollection interface and implementation has the advantage of being more clearly associated with the agent framework and allows us to
improve on some of the design decisions made in asp.net core's IFeatureCollection.
Drawbacks:
It would mean a different implementation to maintain and test.
#### Reuse asp.net IFeatureCollection
Advantages:
Reusing the asp.net IFeatureCollection has the advantage of being able to reuse the well-established and tested implementation from asp.net
core. Users who are using agents in an asp.net core application may be able to pass feature collections from asp.net core to the agent framework directly.
Drawbacks:
While the package name is `Microsoft.Extensions.Features`, the namespaces of the types are `Microsoft.AspNetCore.Http.Features`, which may create confusion for users of agent framework who are not building web applications or services.
Users may rightly ask: Why do I need to use a class from asp.net core when I'm not building a web application / service?
The current design has some design issues that would be good to avoid. E.g. it does not distinguish between a feature being "not set" and "null". Get returns both as null and there is no tryget method.
Since the [default implementation](https://github.com/dotnet/aspnetcore/blob/main/src/Extensions/Features/src/FeatureCollection.cs) also supports value types, it throws for null values of value types.
A TryGet method would be more appropriate.
## Feature Layering
One possible scenario when adding support for feature collections is to allow layering of features by scope.
The following levels of scope could be supported:
1. Application - Application wide features that apply to all agents / chat clients
2. Artifact (Agent / ChatClient) - Features that apply to all runs of a specific agent or chat client instance
3. Action (GetNewThread / Run / GetResponse) - Feature that apply to a single action only
When retrieving a feature from the collection, the search would start from the most specific scope (Action) and progress to the least specific scope (Application), returning the first matching feature found.
Introducing layering adds some challenges:
- There may be multiple feature collections at the same scope level, e.g. an Agent that uses a ChatClient where both have their own feature collections.
- Do we layer the agent feature collection over the chat client feature collection (Application -> ChatClient -> Agent -> Run), or only use the agent feature collection in the agent (Application -> Agent -> Run), and the chat client feature collection in the chat client (Application -> ChatClient -> Run)?
- The appropriate base feature collection may change when progressing down the stack, e.g. when an Agent calls a ChatClient, the action feature collection stays the same, but the artifact feature collection changes.
- Who creates the feature collection hierarchy?
- Since the hierarchy changes as it progresses down the execution stack, and the caller can only pass in the action level feature collection, the callee needs to combine it with its own artifact level feature collection and the application level feature collection. Each action will need to build the appropriate feature collection hierarchy, at the start of its execution.
- For Artifact level features, it seems odd to pass them in as a bag of untyped features, when we are constructing a known artifact type and therefore can have typed settings.
- E.g. today we have a strongly typed setting on ChatClientAgentOptions to configure a ChatMessageStore for the agent.
- To avoid global statics for application level features, the user would need to pass in the application level feature collection to each artifact that they create.
- This would be very odd if the user also already has to strongly typed settings for each feature that they want to set at the artifact level.
### Layering Options
1. No layering - only a single feature collection is supported per action (the caller can still create a layered collection if desired, but the callee does not do any layering automatically).
1. Fallback is to any features configured on the artifact via strongly typed settings.
1. Full layering - support layering at all levels (Application -> Artifact -> Action).
1. Only apply applicable artifact level features when calling into that artifact.
1. Apply upstream artifact features when calling into downstream artifacts, e.g. Feature hierarchy in ChatClientAgent would be `Application -> Agent -> Run` and in ChatClient would be `Application -> ChatClient -> Agent -> Run` or `Application -> Agent -> ChatClient -> Run`
1. The user needs to provide the application level feature collection to each artifact that they create and artifact features are passed via strongly typed settings.
### Accessing application level features Options
We need to consider how application level features would be accessed if supported.
1. The user provides the application level feature collection to each artifact that the user constructs
1. Passing the application level feature collection to each artifact is tedious for the user.
1. There is a static application level feature collection that can be accessed globally.
1. Statics create issues with testing and isolation.
## Decisions
- Feature Collections Container: Use AdditionalProperties
- Feature Layering: No layering - only a single collection/dictionary is supported per action. Application layers can be added later if needed.
================================================
FILE: docs/decisions/0015-agent-run-context.md
================================================
---
status: proposed
contact: westey-m
date: 2026-01-27
deciders: sergeymenshykh, markwallace, rbarreto, dmytrostruk, westey-m, eavanvalkenburg, stephentoub, lokitoth, alliscode, taochenosu, moonbox3
consulted:
informed:
---
# AgentRunContext for Agent Run
## Context and Problem Statement
During an agent run, various components involved in the execution (middleware, filters, tools, nested agents, etc.) may need access to contextual information about the current run, such as:
1. The agent that is executing the run
2. The session associated with the run
3. The request messages passed to the agent
4. The run options controlling the agent's behavior
Additionally, some components may need to modify this context during execution, for example:
- Replacing the session with a different one
- Modifying the request messages before they reach the agent core
- Updating or replacing the run options entirely
Currently, there is no standardized way to access or modify this context from arbitrary code that executes during an agent run, especially from deeply nested call stacks where the context is not explicitly passed.
## Sample Scenario
When using an Agent as an AIFunction developers may want to pass context from the parent agent run to the child agent run. For example, the developer may want to copy chat history to the child agent, or share the same session across both agents.
To enable these scenarios, we need a way to access the parent agent run context, including e.g. the parent agent itself, the parent agent session, and the parent run options from function tool calls.
```csharp
public static AIFunction AsAIFunctionWithSessionPropagation(this ChatClientAgent agent, AIFunctionFactoryOptions? options = null)
{
Throw.IfNull(agent);
[Description("Invoke an agent to retrieve some information.")]
async Task InvokeAgentAsync(
[Description("Input query to invoke the agent.")] string query,
CancellationToken cancellationToken)
{
// Get the session from the parent agent and pass it to the child agent.
var session = AIAgent.CurrentRunContext?.Session;
// Alternatively, the developer may want to create a new session but copy over the chat history from the parent agent.
// var parentChatHistory = AIAgent.CurrentRunContext?.Session?.GetService>();
// if (parentChatHistory != null)
// {
// var chp = new InMemoryChatHistoryProvider();
// foreach (var message in parentChatHistory)
// {
// chp.Add(message);
// }
// session = agent.GetNewSession(chp);
// }
var response = await agent.RunAsync(query, session: session, cancellationToken: cancellationToken).ConfigureAwait(false);
return response.Text;
}
options ??= new();
options.Name ??= SanitizeAgentName(agent.Name);
options.Description ??= agent.Description;
return AIFunctionFactory.Create(InvokeAgentAsync, options);
}
```
## Decision Drivers
- Components executing during an agent run need access to run context without explicit parameter passing through every layer
- Context should flow naturally across async calls without manual propagation
- The design should allow modification of context properties by agent decorators (e.g., replacing options or session)
- Solution should be consistent with patterns used in similar frameworks (e.g., `FunctionInvokingChatClient.CurrentContext` `HttpContext.Current`, `Activity.Current`)
## Considered Options
- **Option 1**: Pass context explicitly through all method signatures
- **Option 2**: Use `AsyncLocal` to provide ambient context accessible anywhere during the run
- **Option 3**: Use a combination of explicit parameters for `RunCoreAsync` and `AsyncLocal` for ambient access
## Decision Outcome
Chosen option: **Option 3** - Combination of explicit parameters and AsyncLocal ambient access.
This approach provides the best of both worlds:
1. **Explicit parameters are passed to `RunCoreAsync`**: The core agent implementation receives the parameters explicitly, making it clear what data is available and enabling easy unit testing. Any modification of these in a decorator will require calling `RunAsync` on the inner agent with the updated parameters, which would result in the inner agent creating a new `AgentRunContext` instance.
```csharp
public async Task RunAsync(
IEnumerable messages,
AgentSession? session = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
CurrentRunContext = new(this, session, messages as IReadOnlyCollection ?? messages.ToList(), options);
return await this.RunCoreAsync(messages, session, options, cancellationToken).ConfigureAwait(false);
}
```
2. **`AsyncLocal` for ambient access**: The context is stored in an `AsyncLocal` field, making it accessible from any code executing during the agent run via a static property.
The main scenario for this is to allow deeply nested components (e.g., tools, chat client middleware) to access the context without needing to pass it through every method signature. These are external components that cannot easily be modified to accept additional parameters. For internal components, we prefer passing any parameters explicitly.
```csharp
public static AgentRunContext? CurrentRunContext
{
get => s_currentContext.Value;
protected set => s_currentContext.Value = value;
}
```
### AgentRunContext Design
The `AgentRunContext` class encapsulates all run-related state:
```csharp
public class AgentRunContext
{
public AgentRunContext(
AIAgent agent,
AgentSession? session,
IReadOnlyCollection requestMessages,
AgentRunOptions? agentRunOptions)
public AIAgent Agent { get; }
public AgentSession? Session { get; }
public IReadOnlyCollection RequestMessages { get; }
public AgentRunOptions? RunOptions { get; }
}
```
Key design decisions:
- **All properties are read-only**: While some of the sub-properties on the provided properties (like `AgentRunOptions.AllowBackgroundResponses`) may be mutable, the `AgentRunContext` itself is immutable and we want to discourage anyone modifying the values in the context. Modifying the context is unlikely to result in the desired behavior, as the values will typically already have been used by the time any custom code accesses them.
### Benefits
1. **Ambient Access**: Any code executing during the run can access context via `AIAgent.CurrentRunContext` without needing explicit parameters
2. **Async Flow**: `AsyncLocal` automatically flows across async/await boundaries
3. **Modifiability**: Components can modify or replace session, messages, or options as needed
4. **Testability**: The explicit parameter to `RunCoreAsync` makes unit testing straightforward
================================================
FILE: docs/decisions/0016-python-context-middleware.md
================================================
---
# These are optional elements. Feel free to remove any of them.
status: accepted
contact: eavanvalkenburg
date: 2026-02-09
deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon, westey-m
consulted: taochenosu, moonbox3, dmytrostruk, giles17
---
# Unifying Context Management with ContextPlugin
## Context and Problem Statement
The Agent Framework Python SDK currently has multiple abstractions for managing conversation context:
| Concept | Purpose | Location |
|---------|---------|----------|
| `ContextProvider` | Injects instructions, messages, and tools before/after invocations | `_memory.py` |
| `ChatMessageStore` | Stores and retrieves conversation history | `_threads.py` |
| `AgentThread` | Manages conversation state and coordinates storage | `_threads.py` |
This creates cognitive overhead for developers doing "Context Engineering" - the practice of dynamically managing what context (history, RAG results, instructions, tools) is sent to the model. Users must understand:
- When to use `ContextProvider` vs `ChatMessageStore`
- How `AgentThread` coordinates between them
- Different lifecycle hooks (`invoking()`, `invoked()`, `thread_created()`)
**How can we simplify context management into a single, composable pattern that handles all context-related concerns?**
## Decision Drivers
- **Simplicity**: Reduce the number of concepts users must learn
- **Composability**: Enable multiple context sources to be combined flexibly
- **Consistency**: Follow existing patterns in the framework
- **Flexibility**: Support both stateless and session-specific context engineering
- **Attribution**: Enable tracking which provider added which messages/tools
- **Zero-config**: Simple use cases should work without configuration
## Related Issues
This ADR addresses the following issues from the parent issue [#3575](https://github.com/microsoft/agent-framework/issues/3575):
| Issue | Title | How Addressed |
|-------|-------|---------------|
| [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` and `agent.get_session(service_session_id)`. See [§9 Session Management Methods](#9-session-management-methods). |
| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ No longer needed. `AgentSession` provides `to_dict()`/`from_dict()` for serialization. Providers write JSON-serializable values to `session.state`. See [§8 Serialization](#8-session-serializationdeserialization). |
| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `HistoryProvider` works orthogonally: configure `load_messages=False` when service manages storage. Multiple history providers allowed. See [§3 Unified Storage](#3-unified-storage). |
| [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | 🔒 **Closed** - Superseded by this ADR. `ChatMessageStore` removed entirely, replaced by `StorageContextMiddleware`. |
## Current State Analysis
### ContextProvider (Current)
```python
class ContextProvider(ABC):
async def thread_created(self, thread_id: str | None) -> None:
"""Called when a new thread is created."""
pass
async def invoked(
self,
request_messages: ChatMessage | Sequence[ChatMessage],
response_messages: ChatMessage | Sequence[ChatMessage] | None = None,
invoke_exception: Exception | None = None,
**kwargs: Any,
) -> None:
"""Called after the agent receives a response."""
pass
@abstractmethod
async def invoking(self, messages: ChatMessage | MutableSequence[ChatMessage], **kwargs: Any) -> Context:
"""Called before model invocation. Returns Context with instructions, messages, tools."""
pass
```
**Limitations:**
- No clear way to compose multiple providers
- No source attribution for debugging
### ChatMessageStore (Current)
```python
class ChatMessageStoreProtocol(Protocol):
async def list_messages(self) -> list[ChatMessage]: ...
async def add_messages(self, messages: Sequence[ChatMessage]) -> None: ...
async def serialize(self, **kwargs: Any) -> dict[str, Any]: ...
@classmethod
async def deserialize(cls, state: MutableMapping[str, Any], **kwargs: Any) -> "ChatMessageStoreProtocol": ...
```
**Limitations:**
- Only handles message storage, no context injection
- Separate concept from `ContextProvider`
- No control over what gets stored (RAG context vs user messages)
- No control over which get's executed first, the Context Provider or the ChatMessageStore (ordering ambiguity), this is controlled by the framework
### AgentThread (Current)
```python
class AgentThread:
def __init__(
self,
*,
service_thread_id: str | None = None,
message_store: ChatMessageStoreProtocol | None = None,
context_provider: ContextProvider | None = None,
) -> None: ...
```
**Limitations:**
- Coordinates storage and context separately
- Only one `context_provider` and one `ChatMessageStore` (no composition)
## Key Design Considerations
The following key decisions shape the ContextProvider design:
| # | Decision | Rationale |
|---|----------|-----------|
| 1 | **Agent vs Session Ownership** | Agent owns provider instances; Session owns state as mutable dict. Providers shared across sessions, state isolated per session. |
| 2 | **Execution Pattern** | **ContextProvider** with `before_run`/`after_run` methods (hooks pattern). Simpler mental model than wrapper/onion pattern. |
| 3 | **State Management** | Whole state dict (`dict[str, Any]`) passed to each plugin. Dict is mutable, so no return value needed. |
| 4 | **Default Storage at Runtime** | `InMemoryHistoryProvider` auto-added when no providers configured and `options.conversation_id` is set or `options.store` is True. Evaluated at runtime so users can modify pipeline first. |
| 5 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero history providers have `load_messages=True` (likely misconfiguration). |
| 6 | **Single Storage Class** | One `HistoryProvider` configured for memory/audit/evaluation - no separate classes. |
| 7 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For history, `before_run` is skipped entirely when `load_messages=False`. |
| 9 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
| 10 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other plugins. |
| 11 | **Tool Attribution** | `extend_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
| 12 | **Clean Break** | Remove `AgentThread`, old `ContextProvider`, `ChatMessageStore` completely; replace with new `ContextProvider` (hooks pattern), `HistoryProvider`, `AgentSession`. PR1 uses temporary names (`_ContextProviderBase`, `_HistoryProviderBase`) to coexist with old types; PR2 renames to final names after old types are removed. No compatibility shims (preview). |
| 13 | **Plugin Ordering** | User-defined order; storage sees prior plugins (pre-processing) or all plugins (post-processing). |
| 14 | **Session Serialization via `to_dict`/`from_dict`** | `AgentSession` provides `to_dict()` and `from_dict()` for round-tripping. Providers must ensure values they write to `session.state` are JSON-serializable. No `serialize()`/`restore()` methods on providers. |
| 15 | **Session Management Methods** | `agent.create_session()` and `agent.get_session(service_session_id)` for clear lifecycle management. |
## Considered Options
### Option 1: Status Quo - Keep Separate Abstractions
Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concepts. With updated naming and minor improvements, but no fundamental changes to the API or execution model.
**Pros:**
- No migration required
- Familiar to existing users
- Each concept has a focused responsibility
- Existing documentation and examples remain valid
**Cons:**
- Cognitive overhead: three concepts to learn for context management
- No composability: only one `ContextProvider` per thread
- Inconsistent with middleware pattern used elsewhere in the framework
- `invoking()`/`invoked()` split makes related pre/post logic harder to follow
- No source attribution for debugging which provider added which context
- `ChatMessageStore` and `ContextProvider` overlap conceptually but are separate APIs
### Option 2: ContextMiddleware - Wrapper Pattern
Create a unified `ContextMiddleware` base class that uses the onion/wrapper pattern (like existing `AgentMiddleware`, `ChatMiddleware`) to handle all context-related concerns. This includes a `StorageContextMiddleware` subclass specifically for history persistence.
**Class hierarchy:**
- `ContextMiddleware` (base) - for general context injection (RAG, instructions, tools)
- `StorageContextMiddleware(ContextMiddleware)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
```python
class ContextMiddleware(ABC):
def __init__(self, source_id: str, *, session_id: str | None = None):
self.source_id = source_id
self.session_id = session_id
@abstractmethod
async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
"""Wrap the context flow - modify before next(), process after."""
# Pre-processing: add context, modify messages
context.add_messages(self.source_id, [...])
await next(context) # Call next middleware or terminal handler
# Post-processing: log, store, react to response
await self.store(context.response_messages)
```
**Pros:**
- Single concept for all context engineering
- Familiar pattern from other middleware in the framework (`AgentMiddleware`, `ChatMiddleware`)
- Natural composition via pipeline with clear execution order
- Pre/post processing in one method keeps related logic together
- Source attribution built-in
- Full control over the invocation chain (can short-circuit, retry, wrap with try/catch)
- Exception handling naturally scoped to the middleware that caused it
**Cons:**
- Forgetting `await next(context)` silently breaks the chain
- Stack depth increases with each middleware layer
- Harder to implement middleware that only needs pre OR post processing
- Streaming is more complicated
### Option 3: ContextHooks - Pre/Post Pattern
Create a `ContextHooks` base class with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware. This includes a `HistoryContextHooks` subclass specifically for history persistence.
**Class hierarchy:**
- `ContextHooks` (base) - for general context injection (RAG, instructions, tools)
- `HistoryContextHooks(ContextHooks)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
```python
class ContextHooks(ABC):
def __init__(self, source_id: str, *, session_id: str | None = None):
self.source_id = source_id
self.session_id = session_id
async def before_run(self, context: SessionContext) -> None:
"""Called before model invocation. Modify context here."""
pass
async def after_run(self, context: SessionContext) -> None:
"""Called after model invocation. React to response here."""
pass
```
> **Note on naming:** Both the class name (`ContextHooks`) and method names (`before_run`/`after_run`) are open for discussion. The names used throughout this ADR are placeholders pending a final decision. See alternative naming options below.
**Alternative class naming options:**
| Name | Rationale |
|------|-----------|
| `ContextHooks` | Emphasizes the hook-based nature, familiar from React/Git hooks |
| `ContextHandler` | Generic term for something that handles context events |
| `ContextInterceptor` | Common in Java/Spring, emphasizes interception points |
| `ContextProcessor` | Emphasizes processing at defined stages |
| `ContextPlugin` | Emphasizes extensibility, familiar from build tools |
| `SessionHooks` | Ties to `AgentSession`, emphasizes session lifecycle |
| `InvokeHooks` | Directly describes what's being hooked (the invoke call) |
**Alternative method naming options:**
| before / after | Rationale |
|----------------|-----------|
| `before_run` / `after_run` | Matches `agent.run()` terminology |
| `before_invoke` / `after_invoke` | Emphasizes invocation lifecycle |
| `invoking` / `invoked` | Matches current Python `ContextProvider` and .NET naming |
| `pre_invoke` / `post_invoke` | Common prefix convention |
| `on_invoking` / `on_invoked` | Event-style naming |
| `prepare` / `finalize` | Action-oriented naming |
**Example usage:**
```python
class RAGHooks(ContextHooks):
async def before_run(self, context: SessionContext) -> None:
docs = await self.retrieve_documents(context.input_messages[-1].text)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
async def after_run(self, context: SessionContext) -> None:
await self.store_interaction(context.input_messages, context.response_messages)
# Pipeline execution is linear, not nested:
# 1. hook1.before_run(context)
# 2. hook2.before_run(context)
# 3.
# 4. hook2.after_run(context) # Reverse order for symmetry
# 5. hook1.after_run(context)
agent = ChatAgent(
chat_client=client,
context_hooks=[
InMemoryStorageHooks("memory"),
RAGHooks("rag"),
]
)
```
**Pros:**
- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
- Clearer separation between what this does vs what Agent Middleware can do.
- Impossible to forget calling `next()` - the framework handles sequencing
- Easier to implement hooks that only need one phase (just override one method)
- Lower cognitive overhead for developers new to middleware patterns
- Clearer separation of concerns: pre-processing logic separate from post-processing
- Easier to test: no need to mock `next` callable, just call methods directly
- Flatter stack traces when debugging
- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
- Explicit about what happens when: no hidden control flow
**Cons:**
- Diverges from the wrapper pattern used by `AgentMiddleware` and `ChatMiddleware`
- Less powerful: cannot short-circuit the chain or implement retry logic (to mitigate, AgentMiddleware still exists and can be used for this scenario.)
- No "around" advice: cannot wrap invocation in try/catch or timing block
- Exception in `before_run` may leave state inconsistent if no cleanup in `after_run`
- Two methods to implement instead of one (though both are optional)
- Harder to share state between before/after (need instance variables, use state)
- Cannot control whether subsequent hooks run (no early termination)
## Detailed Design
This section covers the design decisions that apply to both approaches. Where the approaches differ, both are shown.
### 1. Execution Pattern
The core difference between the two options is the execution model:
**Option 2 - Middleware (Wrapper/Onion):**
```python
class ContextMiddleware(ABC):
@abstractmethod
async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
"""Abstract — subclasses must implement the full pre/invoke/post flow."""
...
# Subclass must implement process():
class RAGMiddleware(ContextMiddleware):
async def process(self, context, next):
context.add_messages(self.source_id, [...]) # Pre-processing
await next(context) # Call next middleware
await self.store(context.response_messages) # Post-processing
```
**Option 3 - Hooks (Linear):**
```python
class ContextHooks:
async def before_run(self, context: SessionContext) -> None:
"""Default no-op. Override to add pre-invocation logic."""
pass
async def after_run(self, context: SessionContext) -> None:
"""Default no-op. Override to add post-invocation logic."""
pass
# Subclass overrides only the hooks it needs:
class RAGHooks(ContextHooks):
async def before_run(self, context):
context.add_messages(self.source_id, [...])
async def after_run(self, context):
await self.store(context.response_messages)
```
**Execution flow comparison:**
```
Middleware (Wrapper/Onion): Hooks (Linear):
┌──────────────────────────┐ ┌─────────────────────────┐
│ middleware1.process() │ │ hook1.before_run() │
│ ┌───────────────────┐ │ │ hook2.before_run() │
│ │ middleware2.process│ │ │ hook3.before_run() │
│ │ ┌─────────────┐ │ │ ├─────────────────────────┤
│ │ │ invoke │ │ │ vs │ │
│ │ └─────────────┘ │ │ ├─────────────────────────┤
│ │ (post-processing) │ │ │ hook3.after_run() │
│ └───────────────────┘ │ │ hook2.after_run() │
│ (post-processing) │ │ hook1.after_run() │
└──────────────────────────┘ └─────────────────────────┘
```
### 2. Agent vs Session Ownership
Where provider instances live (agent-level vs session-level) is an orthogonal decision that applies to both execution patterns. Each combination has different consequences:
| | **Agent owns instances** | **Session owns instances** |
|--|--------------------------|---------------------------|
| **Middleware (Option 2)** | Agent holds the middleware chain; all sessions share it. Per-session state must be externalized (e.g., passed via context). Pipeline ordering is fixed across sessions. | Each session gets its own middleware chain (via factories). Middleware can hold per-session state internally. Requires factory pattern to construct per-session instances. |
| **Hooks (Option 3)** | Agent holds provider instances; all sessions share them. Per-session state lives in `session.state` dict. Simple flat iteration, no pipeline to construct. | Each session gets its own provider instances (via factories). Providers can hold per-session state internally. Adds factory complexity without the pipeline benefit. |
**Key trade-offs:**
- **Agent-owned + Middleware**: The nested call chain makes it awkward to share — each `process()` call captures `next` in its closure, which may carry session-specific assumptions. Externalizing state is harder when it's interleaved with the wrapping flow.
- **Session-owned + Middleware**: Natural fit — each session gets its own chain with isolated state. But requires factories and heavier sessions.
- **Agent-owned + Hooks**: Natural fit — `before_run`/`after_run` are stateless calls that receive everything they need as parameters (`session`, `context`, `state`). No pipeline to construct, lightweight sessions.
- **Session-owned + Hooks**: Works but adds factory overhead without clear benefit — hooks don't need per-instance state since `session.state` handles isolation.
### 3. Unified Storage
Instead of separate `ChatMessageStore`, storage is a subclass of the base context type:
**Middleware:**
```python
class StorageContextMiddleware(ContextMiddleware):
def __init__(
self,
source_id: str,
*,
load_messages: bool = True,
store_inputs: bool = True,
store_responses: bool = True,
store_context_messages: bool = False,
store_context_from: Sequence[str] | None = None,
): ...
```
**Hooks:**
```python
class StorageContextHooks(ContextHooks):
def __init__(
self,
source_id: str,
*,
load_messages: bool = True,
store_inputs: bool = True,
store_responses: bool = True,
store_context_messages: bool = False,
store_context_from: Sequence[str] | None = None,
): ...
```
**Load Behavior:**
- `load_messages=True` (default): Load messages from storage in `before_run`/pre-processing
- `load_messages=False`: Skip loading; for `StorageContextHooks`, the `before_run` hook is not called at all
**Comparison to Current:**
| Aspect | ChatMessageStore (Current) | Storage Middleware/Hooks (New) |
|--------|---------------------------|------------------------------|
| Load messages | Always via `list_messages()` | Configurable `load_messages` flag |
| Store messages | Always via `add_messages()` | Configurable `store_*` flags |
| What to store | All messages | Selective: inputs, responses, context |
| Injected context | Not supported | `store_context_messages=True/False` + `store_context_from=[source_ids]` for filtering |
### 4. Source Attribution via `source_id`
Both approaches require a `source_id` for attribution (identical implementation):
```python
class SessionContext:
context_messages: dict[str, list[ChatMessage]]
def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
if source_id not in self.context_messages:
self.context_messages[source_id] = []
self.context_messages[source_id].extend(messages)
def get_messages(
self,
sources: Sequence[str] | None = None,
exclude_sources: Sequence[str] | None = None,
) -> list[ChatMessage]:
"""Get messages, optionally filtered by source."""
...
```
**Benefits:**
- Debug which middleware/hooks added which messages
- Filter messages by source (e.g., exclude RAG from storage)
- Multiple instances of same type distinguishable
**Message-level Attribution:**
In addition to source-based filtering, individual `ChatMessage` objects should have an `attribution` marker in their `additional_properties` dict. This enables external scenarios to filter messages after the full list has been composed from input and context messages:
```python
# Setting attribution on a message
message = ChatMessage(
role="system",
text="Relevant context from knowledge base",
additional_properties={"attribution": "knowledge_base"}
)
# Filtering by attribution (external scenario)
all_messages = context.get_all_messages(include_input=True)
filtered = [m for m in all_messages if m.additional_properties.get("attribution") != "ephemeral"]
```
This is useful for scenarios where filtering by `source_id` is not sufficient, such as when messages from the same source need different treatment.
> **Note:** The `attribution` marker is intended for runtime filtering only and should **not** be propagated to storage. Storage middleware should strip `attribution` from `additional_properties` before persisting messages.
### 5. Default Storage Behavior
Zero-config works out of the box (both approaches):
```python
# No middleware/hooks configured - still gets conversation history!
agent = ChatAgent(chat_client=client, name="assistant")
session = agent.create_session()
response = await agent.run("Hello!", session=session)
response = await agent.run("What did I say?", session=session) # Remembers!
```
Default in-memory storage is added at runtime **only when**:
- No `service_session_id` (service not managing storage)
- `options.store` is not `True` (user not expecting service storage)
- **No pipeline configured at all** (pipeline is empty or None)
**Important:** If the user configures *any* middleware/hooks (even non-storage ones), the framework does **not** automatically add storage. This is intentional:
- Once users start customizing the pipeline, we consider them a advanced user and they should know what they are doing, therefore they should explicitly configure storage
- Automatic insertion would create ordering ambiguity
- Explicit configuration is clearer than implicit behavior
### 6. Instance vs Factory
Both approaches support shared instances and per-session factories:
**Middleware:**
```python
# Instance (shared across sessions)
agent = ChatAgent(context_middleware=[RAGContextMiddleware("rag")])
# Factory (new instance per session)
def create_cache(session_id: str | None) -> ContextMiddleware:
return SessionCacheMiddleware("cache", session_id=session_id)
agent = ChatAgent(context_middleware=[create_cache])
```
**Hooks:**
```python
# Instance (shared across sessions)
agent = ChatAgent(context_hooks=[RAGContextHooks("rag")])
# Factory (new instance per session)
def create_cache(session_id: str | None) -> ContextHooks:
return SessionCacheHooks("cache", session_id=session_id)
agent = ChatAgent(context_hooks=[create_cache])
```
### 7. Renaming: Thread → Session
`AgentThread` becomes `AgentSession` to better reflect its purpose:
- "Thread" implies a sequence of messages
- "Session" better captures the broader scope (state, pipeline, lifecycle)
- Align with recent change in .NET SDK
### 8. Session Serialization/Deserialization
There are two approaches to session serialization:
**Option A: Direct serialization on `AgentSession`**
The session itself provides `to_dict()` and `from_dict()`. The caller controls when and where to persist:
```python
# Serialize
data = session.to_dict() # → {"type": "session", "session_id": ..., "service_session_id": ..., "state": {...}}
json_str = json.dumps(data) # Store anywhere (database, file, cache, etc.)
# Deserialize
data = json.loads(json_str)
session = AgentSession.from_dict(data) # Reconstructs session with all state intact
```
**Option B: Serialization through the agent**
The agent provides `save_session()`/`load_session()` methods that coordinate with providers (e.g., letting providers hook into the serialization process, or validating state before persisting). This adds flexibility but also complexity — providers would need lifecycle hooks for serialization, and the agent becomes responsible for persistence concerns.
**Provider contract (both options):** Any values a provider writes to `session.state`/through lifecycle hooks **must be JSON-serializable** (dicts, lists, strings, numbers, booleans, None).
**Comparison to Current:**
| Aspect | Current (`AgentThread`) | New (`AgentSession`) |
|--------|------------------------|---------------------|
| Serialization | `ChatMessageStore.serialize()` + custom logic | `session.to_dict()` → plain dict |
| Deserialization | `ChatMessageStore.deserialize()` + factory | `AgentSession.from_dict(data)` |
| Provider state | Instance state, needs custom ser/deser | Plain dict values in `session.state` |
### 9. Session Management Methods
Both approaches use identical agent methods:
```python
class ChatAgent:
def create_session(self, *, session_id: str | None = None) -> AgentSession:
"""Create a new session."""
...
def get_session(self, service_session_id: str, *, session_id: str | None = None) -> AgentSession:
"""Get a session for a service-managed session ID."""
...
```
**Usage (identical for both):**
```python
session = agent.create_session()
session = agent.create_session(session_id="custom-id")
session = agent.get_session("existing-service-session-id")
session = agent.get_session("existing-service-session-id", session_id="custom-id")
```
### 10. Accessing Context from Other Middleware/Hooks
Non-storage middleware/hooks can read context added by others via `context.context_messages`. However, they should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
If historical context is needed (e.g., RAG using last few messages), maintain a **self-managed buffer**, which would look something like this:
**Middleware:**
```python
class RAGWithBufferMiddleware(ContextMiddleware):
def __init__(self, source_id: str, retriever: Retriever, *, buffer_window: int = 5):
super().__init__(source_id)
self._retriever = retriever
self._buffer_window = buffer_window
self._message_buffer: list[ChatMessage] = []
async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
# Use buffer + current input for retrieval
recent = self._message_buffer[-self._buffer_window * 2:]
query = self._build_query(recent + list(context.input_messages))
docs = await self._retriever.search(query)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
await next(context)
# Update buffer
self._message_buffer.extend(context.input_messages)
if context.response_messages:
self._message_buffer.extend(context.response_messages)
```
**Hooks:**
```python
class RAGWithBufferHooks(ContextHooks):
def __init__(self, source_id: str, retriever: Retriever, *, buffer_window: int = 5):
super().__init__(source_id)
self._retriever = retriever
self._buffer_window = buffer_window
self._message_buffer: list[ChatMessage] = []
async def before_run(self, context: SessionContext) -> None:
recent = self._message_buffer[-self._buffer_window * 2:]
query = self._build_query(recent + list(context.input_messages))
docs = await self._retriever.search(query)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
async def after_run(self, context: SessionContext) -> None:
self._message_buffer.extend(context.input_messages)
if context.response_messages:
self._message_buffer.extend(context.response_messages)
```
**Simple RAG (input only, no buffer):**
```python
# Middleware
async def process(self, context, next):
query = " ".join(msg.text for msg in context.input_messages if msg.text)
docs = await self._retriever.search(query)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
await next(context)
# Hooks
async def before_run(self, context):
query = " ".join(msg.text for msg in context.input_messages if msg.text)
docs = await self._retriever.search(query)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
```
### Migration Impact
| Current | Middleware (Option 2) | Hooks (Option 3) |
|---------|----------------------|------------------|
| `ContextProvider` | `ContextMiddleware` | `ContextHooks` |
| `invoking()` | Before `await next(context)` | `before_run()` |
| `invoked()` | After `await next(context)` | `after_run()` |
| `ChatMessageStore` | `StorageContextMiddleware` | `StorageContextHooks` |
| `AgentThread` | `AgentSession` | `AgentSession` |
### Example: Current vs New
**Current:**
```python
class MyContextProvider(ContextProvider):
async def invoking(self, messages, **kwargs) -> Context:
docs = await self.retrieve_documents(messages[-1].text)
return Context(messages=[ChatMessage.system(f"Context: {docs}")])
async def invoked(self, request, response, **kwargs) -> None:
await self.store_interaction(request, response)
thread = await agent.get_new_thread(message_store=ChatMessageStore())
thread.context_provider = provider
response = await agent.run("Hello", thread=thread)
```
**New (Middleware):**
```python
class RAGMiddleware(ContextMiddleware):
async def process(self, context: SessionContext, next) -> None:
docs = await self.retrieve_documents(context.input_messages[-1].text)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
await next(context)
await self.store_interaction(context.input_messages, context.response_messages)
agent = ChatAgent(
chat_client=client,
context_middleware=[InMemoryStorageMiddleware("memory"), RAGMiddleware("rag")]
)
session = agent.create_session()
response = await agent.run("Hello", session=session)
```
**New (Hooks):**
```python
class RAGHooks(ContextHooks):
async def before_run(self, context: SessionContext) -> None:
docs = await self.retrieve_documents(context.input_messages[-1].text)
context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
async def after_run(self, context: SessionContext) -> None:
await self.store_interaction(context.input_messages, context.response_messages)
agent = ChatAgent(
chat_client=client,
context_hooks=[InMemoryStorageHooks("memory"), RAGHooks("rag")]
)
session = agent.create_session()
response = await agent.run("Hello", session=session)
```
### Instance Ownership Options (for reference)
#### Option A: Instances in Session
The `AgentSession` owns the actual middleware/hooks instances. The pipeline is created when the session is created, and instances are stored in the session.
```python
class AgentSession:
"""Session owns the middleware instances."""
def __init__(
self,
*,
session_id: str | None = None,
context_pipeline: ContextMiddlewarePipeline | None = None, # Owns instances
):
self._session_id = session_id or str(uuid.uuid4())
self._context_pipeline = context_pipeline # Actual instances live here
class ChatAgent:
def __init__(
self,
chat_client: ...,
*,
context_middleware: Sequence[ContextMiddlewareConfig] | None = None,
):
self._context_middleware_config = list(context_middleware or [])
def create_session(self, *, session_id: str | None = None) -> AgentSession:
"""Create session with resolved middleware instances."""
resolved_id = session_id or str(uuid.uuid4())
# Resolve factories and create actual instances
pipeline = None
if self._context_middleware_config:
pipeline = ContextMiddlewarePipeline.from_config(
self._context_middleware_config,
session_id=resolved_id,
)
return AgentSession(
session_id=resolved_id,
context_pipeline=pipeline, # Session owns the instances
)
async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
# Session's pipeline executes
context = await session.run_context_pipeline(input_messages)
# ... invoke model ...
```
**Pros:**
- Self-contained session - all state and behavior together
- Middleware can maintain per-session instance state naturally
- Session given to another agent will work the same way
**Cons:**
- Session becomes heavier (instances + state)
- Complicated serialization - serialization needs to deal with instances, which might include non-serializable things like clients or connections
- Harder to share stateless middleware across sessions efficiently
- Factories must be re-resolved for each session
#### Option B: Instances in Agent, State in Session (CHOSEN)
The agent owns and manages the middleware/hooks instances. The `AgentSession` only stores state data that middleware reads/writes. The agent's runner executes the pipeline using the session's state.
Two variants exist for how state is stored in the session:
##### Option B1: Simple Dict State (CHOSEN)
The session stores state as a simple `dict[str, Any]`. Each plugin receives the **whole state dict**, and since dicts are mutable in Python, plugins can modify it in place without needing to return a value.
```python
class AgentSession:
"""Session only holds state as a simple dict."""
def __init__(self, *, session_id: str | None = None):
self._session_id = session_id or str(uuid.uuid4())
self.service_session_id: str | None = None
self.state: dict[str, Any] = {} # Mutable state dict
class ChatAgent:
def __init__(
self,
chat_client: ...,
*,
context_providers: Sequence[ContextProvider] | None = None,
):
# Agent owns the actual plugin instances
self._context_providers = list(context_providers or [])
def create_session(self, *, session_id: str | None = None) -> AgentSession:
"""Create lightweight session with just state."""
return AgentSession(session_id=session_id)
async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
context = SessionContext(
session_id=session.session_id,
input_messages=[...],
)
# Before-run plugins
for plugin in self._context_providers:
# Skip before_run for HistoryProviders that don't load messages
if isinstance(plugin, HistoryProvider) and not plugin.load_messages:
continue
await plugin.before_run(self, session, context, session.state)
# assemble final input messages from context
# ... actual running, i.e. `get_response` for ChatAgent ...
# After-run plugins (reverse order)
for plugin in reversed(self._context_providers):
await plugin.after_run(self, session, context, session.state)
# Plugin that maintains state - modifies dict in place
class InMemoryHistoryProvider(ContextProvider):
async def before_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
# Read from state (use source_id as key for namespace)
my_state = state.get(self.source_id, {})
messages = my_state.get("messages", [])
context.extend_messages(self.source_id, messages)
async def after_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
# Modify state dict in place - no return needed
my_state = state.setdefault(self.source_id, {})
messages = my_state.get("messages", [])
my_state["messages"] = [
*messages,
*context.input_messages,
*(context.response.messages or []),
]
# Stateless plugin - ignores state
class TimeContextProvider(ContextProvider):
async def before_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
context.extend_instructions(self.source_id, f"Current time: {datetime.now()}")
async def after_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
pass # No state, nothing to do after
```
##### Option B2: SessionState Object
The session stores state in a dedicated `SessionState` object. Each hook receives its own state slice through a mutable wrapper that writes back automatically.
```python
class HookState:
"""Mutable wrapper for a single hook's state.
Changes are written back to the session state automatically.
"""
def __init__(self, session_state: dict[str, dict[str, Any]], source_id: str):
self._session_state = session_state
self._source_id = source_id
if source_id not in session_state:
session_state[source_id] = {}
def get(self, key: str, default: Any = None) -> Any:
return self._session_state[self._source_id].get(key, default)
def set(self, key: str, value: Any) -> None:
self._session_state[self._source_id][key] = value
def update(self, values: dict[str, Any]) -> None:
self._session_state[self._source_id].update(values)
class SessionState:
"""Structured state container for a session."""
def __init__(self, session_id: str):
self.session_id = session_id
self.service_session_id: str | None = None
self._hook_state: dict[str, dict[str, Any]] = {} # source_id -> state
def get_hook_state(self, source_id: str) -> HookState:
"""Get mutable state wrapper for a specific hook."""
return HookState(self._hook_state, source_id)
class AgentSession:
"""Session holds a SessionState object."""
def __init__(self, *, session_id: str | None = None):
self._session_id = session_id or str(uuid.uuid4())
self._state = SessionState(self._session_id)
@property
def state(self) -> SessionState:
return self._state
class ContextHooksRunner:
"""Agent-owned runner that executes hooks with session state."""
def __init__(self, hooks: Sequence[ContextHooks]):
self._hooks = list(hooks)
async def run_before(
self,
context: SessionContext,
session_state: SessionState,
) -> None:
"""Run before_run for all hooks."""
for hook in self._hooks:
my_state = session_state.get_hook_state(hook.source_id)
await hook.before_run(context, my_state)
async def run_after(
self,
context: SessionContext,
session_state: SessionState,
) -> None:
"""Run after_run for all hooks in reverse order."""
for hook in reversed(self._hooks):
my_state = session_state.get_hook_state(hook.source_id)
await hook.after_run(context, my_state)
# Hook uses HookState wrapper - no return needed
class InMemoryStorageHooks(ContextHooks):
async def before_run(
self,
context: SessionContext,
state: HookState, # Mutable wrapper
) -> None:
messages = state.get("messages", [])
context.add_messages(self.source_id, messages)
async def after_run(
self,
context: SessionContext,
state: HookState, # Mutable wrapper
) -> None:
messages = state.get("messages", [])
state.set("messages", [
*messages,
*context.input_messages,
*(context.response_messages or []),
])
# Stateless hook - state wrapper provided but not used
class TimeContextHooks(ContextHooks):
async def before_run(
self,
context: SessionContext,
state: HookState,
) -> None:
context.add_instructions(self.source_id, f"Current time: {datetime.now()}")
async def after_run(
self,
context: SessionContext,
state: HookState,
) -> None:
pass # Nothing to do
```
**Option B Pros (both variants):**
- Lightweight sessions - just data, serializable via `to_dict()`/`from_dict()`
- Plugin instances shared across sessions (more memory efficient)
- Clearer separation: agent = behavior, session = state
**Option B Cons (both variants):**
- More complex execution model (agent + session coordination)
- Plugins must explicitly read/write state (no implicit instance variables)
- Session given to another agent may not work (different plugins configuration)
**B1 vs B2:**
| Aspect | B1: Simple Dict (CHOSEN) | B2: SessionState Object |
|--------|-----------------|-------------------------|
| Simplicity | Simpler, less abstraction | More structure, helper methods |
| State passing | Whole dict passed, mutate in place | Mutable wrapper, no return needed |
| Type safety | `dict[str, Any]` - loose | Can add type hints on methods |
| Extensibility | Add keys as needed | Can add methods/validation |
| Serialization | Direct JSON serialization | Need custom serialization |
#### Comparison
| Aspect | Option A: Instances in Session | Option B: Instances in Agent (CHOSEN) |
|--------|-------------------------------|------------------------------|
| Session weight | Heavier (instances + state) | Lighter (state only) |
| Plugin sharing | Per-session instances | Shared across sessions |
| Instance state | Natural (instance variables) | Explicit (state dict) |
| Serialization | Serialize session + plugins | `session.to_dict()`/`AgentSession.from_dict()` |
| Factory handling | Resolved at session creation | Not needed (state dict handles per-session needs) |
| Signature | `before_run(context)` | `before_run(agent, session, context, state)` |
| Session portability | Works with any agent | Tied to agent's plugins config |
#### Factories Not Needed with Option B
With Option B (instances in agent, state in session), the plugins are shared across sessions and the explicit state dict handles per-session needs. Therefore, **factory support is not needed**:
- State is externalized to the session's `state: dict[str, Any]`
- If a plugin needs per-session initialization, it can do so in `before_run` on first call (checking if state is empty)
- All plugins are shared across sessions (more memory efficient)
- Plugins use `state.setdefault(self.source_id, {})` to namespace their state
---
## Decision Outcome
### Decision 1: Execution Pattern
**Chosen: Option 3 - Hooks (Pre/Post Pattern)** with the following naming:
- **Class name:** `ContextProvider` (emphasizes extensibility, familiar from build tools, and does not favor reading or writing)
- **Method names:** `before_run` / `after_run` (matches `agent.run()` terminology)
Rationale:
- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
- Easier to implement plugins that only need one phase (just override one method)
- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
- Clearer separation between what this does vs what Agent Middleware can do
Both options share the same:
- Agent vs Session ownership model
- `source_id` attribution
- Natively serializable sessions (state dict is JSON-serializable)
- Session management methods (`create_session`, `get_session`)
- Renaming `AgentThread` → `AgentSession`
### Decision 2: Instance Ownership (Orthogonal)
**Chosen: Option B1 - Instances in Agent, State in Session (Simple Dict)**
The agent (any `SupportsAgentRun` implementation) owns and manages the `ContextProvider` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
Rationale for B over A:
- Lightweight sessions - just data, serializable via `to_dict()`/`from_dict()`
- Plugin instances shared across sessions (more memory efficient)
- Clearer separation: agent = behavior, session = state
- Factories not needed - state dict handles per-session needs
Rationale for B1 over B2: Simpler is better. The whole state dict is passed to each plugin, and since Python dicts are mutable, plugins can modify state in place without returning anything. This is the most Pythonic approach.
> **Note on trust:** Since all `ContextProvider` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
### Addendum (2026-02-17): Provider-scoped hook state and default source IDs
This addendum introduces a **breaking change** that supersedes earlier references in this ADR where hooks received the
entire `session.state` object as their `state` parameter.
#### Hook state contract
- `before_run` and `after_run` now receive a **provider-scoped** mutable state dict.
- The framework passes `session.state.setdefault(provider.source_id, {})` to hook `state`.
- Cross-provider/global inspection remains available through `session.state` on `AgentSession`.
#### Session requirement and fallback behavior
- Provider hooks must use session-backed scoped state; there is no ad-hoc `{}` fallback state.
- If providers run without a caller-supplied session, the framework creates an internal run-scoped `AgentSession` and
passes provider-scoped state from that session.
#### Migration guidance
Migrate provider implementations and samples from nested access to scoped access:
- `state[self.source_id]["key"]` → `state["key"]`
- `state.setdefault(self.source_id, {})["key"]` → `state["key"]`
#### DEFAULT_SOURCE_ID standardization
Aligned with and extending [PR #3944](https://github.com/microsoft/agent-framework/pull/3944), all built-in/connector
providers in this surface now define a `DEFAULT_SOURCE_ID` and allow constructor override via `source_id`.
Naming convention:
- snake_case
- close to the provider class name
- history providers may use `*_memory` where differentiation is useful
Defaults introduced by this change:
- `InMemoryHistoryProvider.DEFAULT_SOURCE_ID = "in_memory"`
- `Mem0ContextProvider.DEFAULT_SOURCE_ID = "mem0"`
- `RedisContextProvider.DEFAULT_SOURCE_ID = "redis"`
- `RedisHistoryProvider.DEFAULT_SOURCE_ID = "redis_memory"`
- `AzureAISearchContextProvider.DEFAULT_SOURCE_ID = "azure_ai_search"`
- `FoundryMemoryProvider.DEFAULT_SOURCE_ID = "foundry_memory"`
## Comparison to .NET Implementation
The .NET Agent Framework provides equivalent functionality through a different structure. Both implementations achieve the same goals using idioms natural to their respective languages.
### Concept Mapping
| .NET Concept | Python (Chosen) |
|--------------|-----------------|
| `AIContextProvider` (abstract base) | `ContextProvider` |
| `ChatHistoryProvider` (abstract base) | `HistoryProvider` |
| `AIContext` (return from `InvokingAsync`) | `SessionContext` (mutable, passed through) |
| `AgentSession` / `ChatClientAgentSession` | `AgentSession` |
| `InMemoryChatHistoryProvider` | `InMemoryHistoryProvider` |
| `ChatClientAgentOptions` factory delegates | Not needed - state dict handles per-session needs |
### Feature Equivalence
Both platforms provide the same core capabilities:
| Capability | .NET | Python |
|------------|------|--------|
| Inject context before invocation | `AIContextProvider.InvokingAsync()` → returns `AIContext` with `Instructions`, `Messages`, `Tools` | `ContextProvider.before_run()` → mutates `SessionContext` in place |
| React after invocation | `AIContextProvider.InvokedAsync()` | `ContextProvider.after_run()` |
| Load conversation history | `ChatHistoryProvider.InvokingAsync()` → returns `IEnumerable` | `HistoryProvider.before_run()` → calls `context.extend_messages()` |
| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `HistoryProvider.after_run()` → calls `save_messages()` |
| Session serialization | `Serialize()` on providers → `JsonElement` | `session.to_dict()`/`AgentSession.from_dict()` — providers write JSON-serializable values to `session.state` |
| Factory-based creation | `Func>` delegates on `ChatClientAgentOptions` | Not needed - state dict handles per-session needs |
| Default storage | Auto-injects `InMemoryChatHistoryProvider` when no `ChatHistoryProvider` or `ConversationId` set | Auto-injects `InMemoryHistoryProvider` when no providers and `conversation_id` or `store=True` |
| Service-managed history | `ConversationId` property (mutually exclusive with `ChatHistoryProvider`) | `service_session_id` on `AgentSession` |
| Message reduction | `IChatReducer` on `InMemoryChatHistoryProvider` | Not yet designed (see Open Discussion: Context Compaction) |
### Implementation Differences
The implementations differ in ways idiomatic to each language:
| Aspect | .NET Approach | Python Approach |
|--------|---------------|-----------------|
| **Context providers** | Separate `AIContextProvider` and `ChatHistoryProvider` (one of each per session) | Unified list of `ContextProvider` (multiple) |
| **Composition** | One of each provider type per session | Unlimited providers in pipeline |
| **Context passing** | `InvokingAsync()` returns `AIContext` (instructions + messages + tools) | `before_run()` mutates `SessionContext` in place |
| **Response access** | `InvokedContext` carries response messages | `SessionContext.response` carries full `AgentResponse` (messages, response_id, usage_details, etc.) |
| **Type system** | Strict abstract classes, compile-time checks | Duck typing, protocols, runtime flexibility |
| **Configuration** | Factory delegates on `ChatClientAgentOptions` | Direct instantiation, list of instances |
| **State management** | Instance state in providers, serialized via `JsonElement` | Explicit state dict in session, serialized via `session.to_dict()` |
| **Default storage** | Auto-injects `InMemoryChatHistoryProvider` when neither `ChatHistoryProvider` nor `ConversationId` is set | Auto-injects `InMemoryHistoryProvider` when no providers and `conversation_id` or `store=True` |
| **Source tracking** | Limited - `message.source_id` in observability/DevUI only | Built-in `source_id` on every provider, keyed in `context_messages` dict |
| **Service discovery** | `GetService()` on providers and sessions | Not applicable - Python uses direct references |
### Design Trade-offs
Each approach has trade-offs that align with language conventions:
**.NET's separate provider types:**
- Clearer separation between context injection and history storage
- Easier to detect "missing storage" and auto-inject defaults (checks for `ChatHistoryProvider` or `ConversationId`)
- Type system enforces single provider of each type
- `AIContext` return type makes it clear what context is being added (instructions vs messages vs tools)
- `GetService()` pattern enables provider discovery without tight coupling
**Python's unified pipeline:**
- Single abstraction for all context concerns
- Multiple instances of same type (e.g., multiple storage backends with different `source_id`s)
- More explicit - customization means owning full configuration
- `source_id` enables filtering/debugging across all sources
- Mutable `SessionContext` avoids allocating return objects
- Explicit state dict makes serialization trivial (no `JsonElement` layer)
Neither approach is inherently better - they reflect different language philosophies while achieving equivalent functionality. The Python design embraces the "we're all consenting adults" philosophy, while .NET provides more compile-time guardrails.
---
## Open Discussion: Context Compaction
### Problem Statement
A common need for long-running agents is **context compaction** - automatically summarizing or truncating conversation history when approaching token limits. This is particularly important for agents that make many tool calls in succession (10s or 100s), where the context can grow unboundedly.
Currently, this is challenging because:
- `ChatMessageStore.list_messages()` is only called once at the start of `agent.run()`, not during the tool loop
- `ChatMiddleware` operates on a copy of messages, so modifications don't persist across tool loop iterations
- The function calling loop happens deep within the `ChatClient`, which is below the agent level
### Design Question
Should `ContextPlugin` be invoked:
1. **Only at agent invocation boundaries** (current proposal) - before/after each `agent.run()` call
2. **During the tool loop** - before/after each model call within a single `agent.run()`
### Boundary vs In-Run Compaction
While boundary and in-run compaction could potentially use the same mechanism, they have **different goals and behaviors**:
**Boundary compaction** (before/after `agent.run()`):
- **Before run**: Keep context manageable - load a compacted view of history
- **After run**: Keep storage compact - summarize/truncate before persisting
- Useful for maintaining reasonable context sizes across conversation turns
- One reason to have **multiple storage plugins**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
**In-run compaction** (during function calling loops):
- Relevant for **function calling scenarios** where many tool calls accumulate
- Typically **in-memory only** - no need to persist intermediate compaction and only useful when the conversation/session is _not_ managed by the service
- Different strategies apply:
- Remove old function call/result pairs entirely/Keep only the most recent N tool interactions
- Replace call/result pairs with a single summary message (with a different role)
- Summarize several function call/result pairs into one larger context message
### Service-Managed vs Local Storage
**Important:** In-run compaction is relevant only for **non-service-managed histories**. When using service-managed storage (`service_session_id` is set):
- The service handles history management internally
- Only the new calls and results are sent to/from the service each turn
- The service is responsible for its own compaction strategy, but we do not control that
For local storage, a full message list is sent to the model each time, making compaction the client's responsibility.
### Options
**Option A: Invocation-boundary only (current proposal)**
- Simpler mental model
- Consistent with `AgentMiddleware` pattern
- In-run compaction would need to happen via a separate mechanism (e.g., `ChatMiddleware` at the client level)
- Risk: Different compaction mechanisms at different layers could be confusing
**Option B: Also during tool loops**
- Single mechanism for all context manipulation
- More powerful but more complex
- Requires coordination with `ChatClient` internals
- Risk: Performance overhead if plugins are expensive
**Option C: Unified approach across layers**
- Define a single context compaction abstraction that works at both agent and client levels
- `ContextPlugin` could delegate to `ChatMiddleware` for mid-loop execution
- Requires deeper architectural thought
### Potential Extension Points (for any option)
Regardless of the chosen approach, these extension points could support compaction:
- A `CompactionStrategy` that can be shared between plugins and function calling configuration
- Hooks for `ChatClient` to notify the agent layer when context limits are approaching
- A unified `ContextManager` that coordinates compaction across layers
- **Message-level attribution**: The `attribution` marker in `ChatMessage.additional_properties` can be used during compaction to identify messages that should be preserved (e.g., `attribution: "important"`) or that are safe to remove (e.g., `attribution: "ephemeral"`). This prevents accidental filtering of critical context during aggressive compaction.
> **Note:** The .NET SDK currently has a `ChatReducer` interface for context reduction/compaction. We should consider adopting similar naming in Python (e.g., `ChatReducer` or `ContextReducer`) for cross-platform consistency.
**This section requires further discussion.**
## Implementation Plan
See **Appendix A** for class hierarchy, API signatures, and user experience examples.
See the **Workplan** at the end for PR breakdown and reference implementation.
---
## Appendix A: API Overview
### Class Hierarchy
```
ContextProvider (base - hooks pattern)
├── HistoryProvider (storage subclass)
│ ├── InMemoryHistoryProvider (built-in)
│ ├── RedisHistoryProvider (packages/redis)
│ └── CosmosHistoryProvider (packages/azure-ai)
├── AzureAISearchContextProvider (packages/azure-ai-search)
├── Mem0ContextProvider (packages/mem0)
└── (custom user providers)
AgentSession (lightweight state container)
SessionContext (per-invocation state)
```
### ContextProvider
```python
class ContextProvider(ABC):
"""Base class for context providers (hooks pattern).
Context providers participate in the context engineering pipeline,
adding context before model invocation and processing responses after.
Attributes:
source_id: Unique identifier for this provider instance (required).
Used for message/tool attribution so other providers can filter.
"""
def __init__(self, source_id: str):
self.source_id = source_id
async def before_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
"""Called before model invocation. Override to add context."""
pass
async def after_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
"""Called after model invocation. Override to process response."""
pass
```
> **Serialization contract:** Any values a provider writes to `state` must be JSON-serializable. Sessions are serialized via `session.to_dict()` and restored via `AgentSession.from_dict()`.
> **Agent-agnostic:** The `agent` parameter is typed as `SupportsAgentRun` (the base protocol), not `ChatAgent`. Context providers work with any agent implementation.
### HistoryProvider
```python
class HistoryProvider(ContextProvider):
"""Base class for conversation history storage providers.
Subclasses only need to implement get_messages() and save_messages().
The default before_run/after_run handle loading and storing based on
configuration flags. Override them for custom behavior.
A single class configured for different use cases:
- Primary memory storage (loads + stores messages)
- Audit/logging storage (stores only, doesn't load)
- Evaluation storage (stores only for later analysis)
Loading behavior:
- `load_messages=True` (default): Load messages from storage in before_run
- `load_messages=False`: Agent skips `before_run` entirely (audit/logging mode)
Storage behavior:
- `store_inputs`: Store input messages (default True)
- `store_responses`: Store response messages (default True)
- `store_context_messages`: Also store context from other providers (default False)
- `store_context_from`: Only store from specific source_ids (default None = all)
"""
def __init__(
self,
source_id: str,
*,
load_messages: bool = True,
store_inputs: bool = True,
store_responses: bool = True,
store_context_messages: bool = False,
store_context_from: Sequence[str] | None = None,
): ...
# --- Subclasses implement these ---
@abstractmethod
async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
"""Retrieve stored messages for this session."""
...
@abstractmethod
async def save_messages(self, session_id: str | None, messages: Sequence[ChatMessage]) -> None:
"""Persist messages for this session."""
...
# --- Default implementations (override for custom behavior) ---
async def before_run(self, agent, session, context, state) -> None:
"""Load history into context. Skipped by the agent when load_messages=False."""
history = await self.get_messages(context.session_id)
context.extend_messages(self.source_id, history)
async def after_run(self, agent, session, context, state) -> None:
"""Store messages based on store_* configuration flags."""
messages_to_store: list[ChatMessage] = []
# Optionally include context from other providers
if self.store_context_messages:
if self.store_context_from:
messages_to_store.extend(context.get_messages(sources=self.store_context_from))
else:
messages_to_store.extend(context.get_messages(exclude_sources=[self.source_id]))
if self.store_inputs:
messages_to_store.extend(context.input_messages)
if self.store_responses and context.response.messages:
messages_to_store.extend(context.response.messages)
if messages_to_store:
await self.save_messages(context.session_id, messages_to_store)
```
### SessionContext
```python
class SessionContext:
"""Per-invocation state passed through the context provider pipeline.
Created fresh for each agent.run() call. Providers read from and write to
the mutable fields to add context before invocation and process responses after.
Attributes:
session_id: The ID of the current session
service_session_id: Service-managed session ID (if present)
input_messages: New messages being sent to the agent (set by caller)
context_messages: Dict mapping source_id -> messages added by that provider.
Maintains insertion order (provider execution order).
instructions: Additional instructions - providers can append here
tools: Additional tools - providers can append here
response (property): After invocation, contains the full AgentResponse (set by agent).
Includes response.messages, response.response_id, response.agent_id,
response.usage_details, etc. Read-only property - use AgentMiddleware to modify.
options: Options passed to agent.run() - READ-ONLY, for reflection only
metadata: Shared metadata dictionary for cross-provider communication
"""
def __init__(
self,
*,
session_id: str | None = None,
service_session_id: str | None = None,
input_messages: list[ChatMessage],
context_messages: dict[str, list[ChatMessage]] | None = None,
instructions: list[str] | None = None,
tools: list[ToolProtocol] | None = None,
options: dict[str, Any] | None = None,
metadata: dict[str, Any] | None = None,
): ...
self._response: "AgentResponse | None" = None
@property
def response(self) -> "AgentResponse | None":
"""The agent's response. Set by the framework after invocation, read-only for providers."""
...
def extend_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
"""Add context messages from a specific source."""
...
def extend_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
"""Add instructions to be prepended to the conversation."""
...
def extend_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
"""Add tools with source attribution in tool.metadata."""
...
def get_messages(
self,
*,
sources: Sequence[str] | None = None,
exclude_sources: Sequence[str] | None = None,
include_input: bool = False,
include_response: bool = False,
) -> list[ChatMessage]:
"""Get context messages, optionally filtered and optionally including input/response.
Returns messages in provider execution order (dict insertion order),
with input and response appended if requested.
"""
...
```
### AgentSession (Decision B1)
```python
class AgentSession:
"""A conversation session with an agent.
Lightweight state container. Provider instances are owned by the agent,
not the session. The session only holds session IDs and a mutable state dict.
"""
def __init__(self, *, session_id: str | None = None):
self._session_id = session_id or str(uuid.uuid4())
self.service_session_id: str | None = None
self.state: dict[str, Any] = {}
@property
def session_id(self) -> str:
return self._session_id
def to_dict(self) -> dict[str, Any]:
"""Serialize session to a plain dict."""
return {
"type": "session",
"session_id": self._session_id,
"service_session_id": self.service_session_id,
"state": self.state,
}
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AgentSession":
"""Restore session from a dict."""
session = cls(session_id=data["session_id"])
session.service_session_id = data.get("service_session_id")
session.state = data.get("state", {})
return session
```
### ChatAgent Integration
```python
class ChatAgent:
def __init__(
self,
chat_client: ...,
*,
context_providers: Sequence[ContextProvider] | None = None,
):
self._context_providers = list(context_providers or [])
def create_session(self, *, session_id: str | None = None) -> AgentSession:
"""Create a new lightweight session."""
return AgentSession(session_id=session_id)
def get_session(self, service_session_id: str, *, session_id: str | None = None) -> AgentSession:
"""Get or create a session for a service-managed session ID."""
session = AgentSession(session_id=session_id)
session.service_session_id = service_session_id
return session
async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> AgentResponse:
options = options or {}
# Auto-add InMemoryHistoryProvider when no providers and conversation_id/store requested
if not self._context_providers and (options.get("conversation_id") or options.get("store") is True):
self._context_providers.append(InMemoryHistoryProvider("memory"))
context = SessionContext(session_id=session.session_id, input_messages=[...])
# Before-run providers (forward order, skip HistoryProviders with load_messages=False)
for provider in self._context_providers:
if isinstance(provider, HistoryProvider) and not provider.load_messages:
continue
await provider.before_run(self, session, context, session.state)
# ... assemble messages, invoke model ...
context._response = response # Set the full AgentResponse for after_run access
# After-run providers (reverse order)
for provider in reversed(self._context_providers):
await provider.after_run(self, session, context, session.state)
```
### Message/Tool Attribution
The `SessionContext` provides explicit methods for adding context:
```python
# Adding messages (keyed by source_id in context_messages dict)
context.extend_messages(self.source_id, messages)
# Adding instructions (flat list, source_id for debugging)
context.extend_instructions(self.source_id, "Be concise and helpful.")
context.extend_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
# Adding tools (source attribution added to tool.metadata automatically)
context.extend_tools(self.source_id, [my_tool, another_tool])
# Getting all context messages in provider execution order
all_context = context.get_messages()
# Including input and response messages too
full_conversation = context.get_messages(include_input=True, include_response=True)
# Filtering by source
memory_messages = context.get_messages(sources=["memory"])
non_rag_messages = context.get_messages(exclude_sources=["rag"])
# Direct access to check specific sources
if "memory" in context.context_messages:
history = context.context_messages["memory"]
```
---
## User Experience Examples
### Example 0: Zero-Config Default (Simplest Use Case)
```python
from agent_framework import ChatAgent
# No providers configured - but conversation history still works!
agent = ChatAgent(
chat_client=client,
name="assistant",
# No context_providers specified
)
# Create session - automatically gets InMemoryHistoryProvider when conversation_id or store=True
session = agent.create_session()
response = await agent.run("Hello, my name is Alice!", session=session)
# Conversation history is preserved automatically
response = await agent.run("What's my name?", session=session)
# Agent remembers: "Your name is Alice!"
# With service-managed session - no default storage added (service handles it)
service_session = agent.create_session(service_session_id="thread_abc123")
# With store=True in options - user expects service storage, no default added
response = await agent.run("Hello!", session=session, options={"store": True})
```
### Example 1: Explicit Memory Storage
```python
from agent_framework import ChatAgent, InMemoryHistoryProvider
# Explicit provider configuration (same behavior as default, but explicit)
agent = ChatAgent(
chat_client=client,
name="assistant",
context_providers=[
InMemoryHistoryProvider(source_id="memory")
]
)
# Create session and chat
session = agent.create_session()
response = await agent.run("Hello!", session=session)
# Messages are automatically stored and loaded on next invocation
response = await agent.run("What did I say before?", session=session)
```
### Example 2: RAG + Memory + Audit (All HistoryProvider)
```python
from agent_framework import ChatAgent
from agent_framework.azure import CosmosHistoryProvider, AzureAISearchContextProvider
from agent_framework.redis import RedisHistoryProvider
# RAG provider that injects relevant documents
search_provider = AzureAISearchContextProvider(
source_id="rag",
endpoint="https://...",
index_name="documents",
)
# Primary memory storage (loads + stores)
# load_messages=True (default) - loads and stores messages
memory_provider = RedisHistoryProvider(
source_id="memory",
redis_url="redis://...",
)
# Audit storage - SAME CLASS, different configuration
# load_messages=False = never loads, just stores for audit
audit_provider = CosmosHistoryProvider(
source_id="audit",
connection_string="...",
load_messages=False, # Don't load - just store for audit
)
agent = ChatAgent(
chat_client=client,
name="assistant",
context_providers=[
memory_provider, # First: loads history
search_provider, # Second: adds RAG context
audit_provider, # Third: stores for audit (no load)
]
)
```
### Example 3: Custom Context Providers
```python
from agent_framework import ContextProvider, SessionContext
class TimeContextProvider(ContextProvider):
"""Adds current time to the context."""
async def before_run(self, agent, session, context, state) -> None:
from datetime import datetime
context.extend_instructions(
self.source_id,
f"Current date and time: {datetime.now().isoformat()}"
)
class UserPreferencesProvider(ContextProvider):
"""Tracks and applies user preferences from conversation."""
async def before_run(self, agent, session, context, state) -> None:
prefs = state.get(self.source_id, {}).get("preferences", {})
if prefs:
context.extend_instructions(
self.source_id,
f"User preferences: {json.dumps(prefs)}"
)
async def after_run(self, agent, session, context, state) -> None:
# Extract preferences from response and store in session state
for msg in context.response.messages or []:
if "preference:" in msg.text.lower():
my_state = state.setdefault(self.source_id, {})
my_state.setdefault("preferences", {})
# ... extract and store preference
# Compose providers - each with mandatory source_id
agent = ChatAgent(
chat_client=client,
context_providers=[
InMemoryHistoryProvider(source_id="memory"),
TimeContextProvider(source_id="time"),
UserPreferencesProvider(source_id="prefs"),
]
)
```
### Example 4: Filtering by Source (Using Dict-Based Context)
```python
class SelectiveContextProvider(ContextProvider):
"""Provider that only processes messages from specific sources."""
async def before_run(self, agent, session, context, state) -> None:
# Check what sources have added messages so far
print(f"Sources so far: {list(context.context_messages.keys())}")
# Get messages excluding RAG context
non_rag_messages = context.get_messages(exclude_sources=["rag"])
# Or get only memory messages
if "memory" in context.context_messages:
memory_only = context.context_messages["memory"]
# Do something with filtered messages...
# e.g., sentiment analysis, topic extraction
class RAGContextProvider(ContextProvider):
"""Provider that adds RAG context."""
async def before_run(self, agent, session, context, state) -> None:
# Search for relevant documents based on input
relevant_docs = await self._search(context.input_messages)
# Add RAG context using explicit method
rag_messages = [
ChatMessage(role="system", text=f"Relevant info: {doc}")
for doc in relevant_docs
]
context.extend_messages(self.source_id, rag_messages)
```
### Example 5: Explicit Storage Configuration for Service-Managed Sessions
```python
# HistoryProvider uses explicit configuration - no automatic detection.
# load_messages=True (default): Load messages from storage
# load_messages=False: Skip loading (useful for audit-only storage)
agent = ChatAgent(
chat_client=client,
context_providers=[
RedisHistoryProvider(
source_id="memory",
redis_url="redis://...",
# load_messages=True is the default
)
]
)
session = agent.create_session()
# Normal run - loads and stores messages
response = await agent.run("Hello!", session=session)
# For service-managed sessions, configure storage explicitly:
# - Use load_messages=False when service handles history
service_storage = RedisHistoryProvider(
source_id="audit",
redis_url="redis://...",
load_messages=False, # Don't load - service manages history
)
agent_with_service = ChatAgent(
chat_client=client,
context_providers=[service_storage]
)
service_session = agent_with_service.create_session(service_session_id="thread_abc123")
response = await agent_with_service.run("Hello!", session=service_session)
# History provider stores for audit but doesn't load (service handles history)
```
### Example 6: Multiple Instances of Same Provider Type
```python
# You can have multiple instances of the same provider class
# by using different source_ids
agent = ChatAgent(
chat_client=client,
context_providers=[
# Primary storage for conversation history
RedisHistoryProvider(
source_id="conversation_memory",
redis_url="redis://primary...",
load_messages=True, # This one loads
),
# Secondary storage for audit (different Redis instance)
RedisHistoryProvider(
source_id="audit_log",
redis_url="redis://audit...",
load_messages=False, # This one just stores
),
]
)
# Warning will NOT be logged because only one has load_messages=True
```
### Example 7: Provider Ordering - RAG Before vs After Memory
The order of providers determines what context each one can see. This is especially important for RAG, which may benefit from seeing conversation history.
```python
from agent_framework import ChatAgent
from agent_framework.context import InMemoryHistoryProvider, ContextProvider, SessionContext
class RAGContextProvider(ContextProvider):
"""RAG provider that retrieves relevant documents based on available context."""
async def before_run(self, agent, session, context, state) -> None:
# Build query from what we can see
query_parts = []
# We can always see the current input
for msg in context.input_messages:
query_parts.append(msg.text)
# Can we see history? Depends on provider order!
history = context.get_messages() # Gets context from providers that ran before us
if history:
# Include recent history for better RAG context
recent = history[-3:] # Last 3 messages
for msg in recent:
query_parts.append(msg.text)
query = " ".join(query_parts)
documents = await self._retrieve_documents(query)
# Add retrieved documents as context
rag_messages = [ChatMessage.system(f"Relevant context:\n{doc}") for doc in documents]
context.extend_messages(self.source_id, rag_messages)
async def _retrieve_documents(self, query: str) -> list[str]:
# ... vector search implementation
return ["doc1", "doc2"]
# =============================================================================
# SCENARIO A: RAG runs BEFORE Memory
# =============================================================================
# RAG only sees the current input message - no conversation history
# Use when: RAG should be based purely on the current query
agent_rag_first = ChatAgent(
chat_client=client,
context_providers=[
RAGContextProvider("rag"), # Runs first - only sees input_messages
InMemoryHistoryProvider("memory"), # Runs second - loads/stores history
]
)
# Flow:
# 1. RAG.before_run():
# - context.input_messages = ["What's the weather?"]
# - context.get_messages() = [] (empty - memory hasn't run yet)
# - RAG query based on: "What's the weather?" only
# - Adds: context_messages["rag"] = [retrieved docs]
#
# 2. Memory.before_run():
# - Loads history: context_messages["memory"] = [previous conversation]
#
# 3. Agent invocation with: history + rag docs + input
#
# 4. Memory.after_run():
# - Stores: input + response (not RAG docs by default)
#
# 5. RAG.after_run():
# - (nothing to do)
# =============================================================================
# SCENARIO B: RAG runs AFTER Memory
# =============================================================================
# RAG sees conversation history - can use it for better retrieval
# Use when: RAG should consider conversation context for better results
agent_memory_first = ChatAgent(
chat_client=client,
context_providers=[
InMemoryHistoryProvider("memory"), # Runs first - loads history
RAGContextProvider("rag"), # Runs second - sees history + input
]
)
# Flow:
# 1. Memory.before_run():
# - Loads history: context_messages["memory"] = [previous conversation]
#
# 2. RAG.before_run():
# - context.input_messages = ["What's the weather?"]
# - context.get_messages() = [previous conversation] (sees history!)
# - RAG query based on: recent history + "What's the weather?"
# - Better retrieval because RAG understands conversation context
# - Adds: context_messages["rag"] = [more relevant docs]
#
# 3. Agent invocation with: history + rag docs + input
#
# 4. RAG.after_run():
# - (nothing to do)
#
# 5. Memory.after_run():
# - Stores: input + response
# =============================================================================
# SCENARIO C: RAG after Memory, with selective storage
# =============================================================================
# Memory first for better RAG, plus separate audit that stores RAG context
agent_full_context = ChatAgent(
chat_client=client,
context_providers=[
InMemoryHistoryProvider("memory"), # Primary history storage
RAGContextProvider("rag"), # Gets history context for better retrieval
PersonaContextProvider("persona"), # Adds persona instructions
# Audit storage - stores everything including RAG results
CosmosHistoryProvider(
"audit",
load_messages=False, # Don't load (memory handles that)
store_context_messages=True, # Store RAG + persona context too
),
]
)
```
---
### Workplan
The implementation is split into 2 PRs to limit scope and simplify review.
```
PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
```
#### PR 1: New Types
**Goal:** Create all new types. No changes to existing code yet. Because the old `ContextProvider` class (in `_memory.py`) still exists during this PR, the new base class uses the **temporary name `_ContextProviderBase`** to avoid import collisions. All new provider implementations reference `_ContextProviderBase` / `_HistoryProviderBase` in PR1.
**Core Package - `packages/core/agent_framework/_sessions.py`:**
- [ ] `SessionContext` class with explicit add/get methods
- [ ] `_ContextProviderBase` base class with `before_run()`/`after_run()` (temporary name; renamed to `ContextProvider` in PR2)
- [ ] `_HistoryProviderBase(_ContextProviderBase)` derived class with load_messages/store flags (temporary; renamed to `HistoryProvider` in PR2)
- [ ] `AgentSession` class with `state: dict[str, Any]`, `to_dict()`, `from_dict()`
- [ ] `InMemoryHistoryProvider(_HistoryProviderBase)`
**External Packages (new classes alongside existing ones, temporary `_` prefix):**
- [ ] `packages/azure-ai-search/` - create `_AzureAISearchContextProvider(_ContextProviderBase)` — constructor keeps existing params, adds `source_id` (see compatibility notes below)
- [ ] `packages/redis/` - create `_RedisHistoryProvider(_HistoryProviderBase)` — constructor keeps existing `RedisChatMessageStore` connection params, adds `source_id` + storage flags
- [ ] `packages/redis/` - create `_RedisContextProvider(_ContextProviderBase)` — constructor keeps existing `RedisProvider` vector/search params, adds `source_id`
- [ ] `packages/mem0/` - create `_Mem0ContextProvider(_ContextProviderBase)` — constructor keeps existing params, adds `source_id`
**Constructor Compatibility Notes:**
The existing provider constructors can be preserved with minimal additions:
| Existing Class | New Class (PR1 temporary name) | Constructor Changes |
|---|---|---|
| `AzureAISearchContextProvider(ContextProvider)` | `_AzureAISearchContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). All existing params (`endpoint`, `index_name`, `api_key`, `mode`, `top_k`, etc.) stay the same. `invoking()` → `before_run()`, `invoked()` → `after_run()`. |
| `Mem0Provider(ContextProvider)` | `_Mem0ContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). All existing params (`mem0_client`, `api_key`, `agent_id`, `user_id`, etc.) stay the same. `scope_to_per_operation_thread_id` → maps to session_id scoping via `before_run`. |
| `RedisChatMessageStore` | `_RedisHistoryProvider(_HistoryProviderBase)` | Add `source_id: str` (required) + `load_messages`, `store_inputs`, `store_responses` flags. Keep connection params (`redis_url`, `credential_provider`, `host`, `port`, `ssl`). Drop `thread_id` (now from `context.session_id`), `messages` (state managed via `session.state`), `max_messages` (→ message reduction concern). |
| `RedisProvider(ContextProvider)` | `_RedisContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). Keep vector/search params (`redis_url`, `index_name`, `redis_vectorizer`, etc.). Drop `thread_id` scoping (now from `context.session_id`). |
**Testing:**
- [ ] Unit tests for `SessionContext` methods (extend_messages, get_messages, extend_instructions, extend_tools)
- [ ] Unit tests for `_HistoryProviderBase` load/store flags
- [ ] Unit tests for `InMemoryHistoryProvider` state persistence via session.state
- [ ] Unit tests for source attribution (mandatory source_id)
---
#### PR 2: Agent Integration + Cleanup
**Goal:** Wire up new types into `ChatAgent` and remove old types.
**Changes to `ChatAgent`:**
- [ ] Replace `thread` parameter with `session` in `agent.run()`
- [ ] Add `context_providers` parameter to `ChatAgent.__init__()`
- [ ] Add `create_session()` method
- [ ] Verify `session.to_dict()`/`AgentSession.from_dict()` round-trip in integration tests
- [ ] Wire up provider iteration (before_run forward, after_run reverse)
- [ ] Add validation warning if multiple/zero history providers have `load_messages=True`
- [ ] Wire up default `InMemoryHistoryProvider` behavior (auto-add when no providers and `conversation_id` or `store=True`)
**Remove Legacy Types:**
- [ ] `packages/core/agent_framework/_memory.py` - remove old `ContextProvider` class
- [ ] `packages/core/agent_framework/_threads.py` - remove `ChatMessageStore`, `ChatMessageStoreProtocol`, `AgentThread`
- [ ] Remove old provider classes from `azure-ai-search`, `redis`, `mem0`
**Rename Temporary Types → Final Names:**
- [ ] `_ContextProviderBase` → `ContextProvider` in `_sessions.py`
- [ ] `_HistoryProviderBase` → `HistoryProvider` in `_sessions.py`
- [ ] `_AzureAISearchContextProvider` → `AzureAISearchContextProvider` in `packages/azure-ai-search/`
- [ ] `_Mem0ContextProvider` → `Mem0ContextProvider` in `packages/mem0/`
- [ ] `_RedisHistoryProvider` → `RedisHistoryProvider` in `packages/redis/`
- [ ] `_RedisContextProvider` → `RedisContextProvider` in `packages/redis/`
- [ ] Update all imports across packages and `__init__.py` exports to use final names
**Public API (root package exports):**
All base classes and `InMemoryHistoryProvider` are exported from the root package:
```python
from agent_framework import (
ContextProvider,
HistoryProvider,
InMemoryHistoryProvider,
SessionContext,
AgentSession,
)
```
**Documentation & Samples:**
- [ ] Update all samples in `samples/` to use new API
- [ ] Write migration guide
- [ ] Update API documentation
**Testing:**
- [ ] Unit tests for provider execution order (before_run forward, after_run reverse)
- [ ] Unit tests for validation warnings (multiple/zero loaders)
- [ ] Unit tests for session serialization (`session.to_dict()`/`AgentSession.from_dict()` round-trip)
- [ ] Integration test: agent with `context_providers` + `session` works
- [ ] Integration test: full conversation with memory persistence
- [ ] Ensure all existing tests still pass (with updated API)
- [ ] Verify no references to removed types remain
---
#### CHANGELOG (single entry for release)
- **[BREAKING]** Replaced `ContextProvider` with new `ContextProvider` (hooks pattern with `before_run`/`after_run`)
- **[BREAKING]** Replaced `ChatMessageStore` with `HistoryProvider`
- **[BREAKING]** Replaced `AgentThread` with `AgentSession`
- **[BREAKING]** Replaced `thread` parameter with `session` in `agent.run()`
- Added `SessionContext` for invocation state with source attribution
- Added `InMemoryHistoryProvider` for conversation history
- `AgentSession` provides `to_dict()`/`from_dict()` for serialization (no special serialize/restore on providers)
---
#### Estimated Sizes
| PR | New Lines | Modified Lines | Risk |
|----|-----------|----------------|------|
| PR1 | ~500 | ~0 | Low |
| PR2 | ~150 | ~400 | Medium |
---
#### Implementation Detail: Decorator-based Providers
For simple use cases, a class-based provider can be verbose. A decorator API allows registering plain functions as `before_run` or `after_run` hooks for a more Pythonic setup:
```python
from agent_framework import ChatAgent, before_run, after_run
agent = ChatAgent(chat_client=client)
@before_run(agent)
async def add_system_prompt(agent, session, context, state):
"""Inject a system prompt before every invocation."""
context.extend_messages("system", [ChatMessage(role="system", content="You are helpful.")])
@after_run(agent)
async def log_response(agent, session, context, state):
"""Log the response after every invocation."""
print(f"Response: {context.response.text}")
```
Under the hood, the decorators create a `ContextProvider` instance wrapping the function and append it to `agent._context_providers`:
```python
def before_run(agent: ChatAgent, *, source_id: str = "decorated"):
def decorator(fn):
provider = _FunctionContextProvider(source_id=source_id, before_fn=fn)
agent._context_providers.append(provider)
return fn
return decorator
def after_run(agent: ChatAgent, *, source_id: str = "decorated"):
def decorator(fn):
provider = _FunctionContextProvider(source_id=source_id, after_fn=fn)
agent._context_providers.append(provider)
return fn
return decorator
```
This is a convenience layer — the class-based API remains the primary interface for providers that need configuration, state, or both hooks.
---
#### Reference Implementation
Full implementation code for the chosen design (hooks pattern, Decision B1).
##### SessionContext
```python
# Copyright (c) Microsoft. All rights reserved.
from abc import ABC, abstractmethod
from collections.abc import Awaitable, Callable, Sequence
from typing import Any
from ._types import ChatMessage
from ._tools import ToolProtocol
class SessionContext:
"""Per-invocation state passed through the context provider pipeline.
Created fresh for each agent.run() call. Providers read from and write to
the mutable fields to add context before invocation and process responses after.
Attributes:
session_id: The ID of the current session
service_session_id: Service-managed session ID (if present, service handles storage)
input_messages: The new messages being sent to the agent (read-only, set by caller)
context_messages: Dict mapping source_id -> messages added by that provider.
Maintains insertion order (provider execution order). Use extend_messages()
to add messages with proper source attribution.
instructions: Additional instructions - providers can append here
tools: Additional tools - providers can append here
response (property): After invocation, contains the full AgentResponse (set by agent).
Includes response.messages, response.response_id, response.agent_id,
response.usage_details, etc.
Read-only property - use AgentMiddleware to modify responses.
options: Options passed to agent.run() - READ-ONLY, for reflection only
metadata: Shared metadata dictionary for cross-provider communication
Note:
- `options` is read-only; changes will NOT be merged back into the agent run
- `response` is a read-only property; use AgentMiddleware to modify responses
- `instructions` and `tools` are merged by the agent into the run options
- `context_messages` values are flattened in order when building the final input
"""
def __init__(
self,
*,
session_id: str | None = None,
service_session_id: str | None = None,
input_messages: list[ChatMessage],
context_messages: dict[str, list[ChatMessage]] | None = None,
instructions: list[str] | None = None,
tools: list[ToolProtocol] | None = None,
options: dict[str, Any] | None = None,
metadata: dict[str, Any] | None = None,
):
self.session_id = session_id
self.service_session_id = service_session_id
self.input_messages = input_messages
self.context_messages: dict[str, list[ChatMessage]] = context_messages or {}
self.instructions: list[str] = instructions or []
self.tools: list[ToolProtocol] = tools or []
self._response: AgentResponse | None = None
self.options = options or {} # READ-ONLY - for reflection only
self.metadata = metadata or {}
@property
def response(self) -> AgentResponse | None:
"""The agent's response. Set by the framework after invocation, read-only for providers."""
return self._response
def extend_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
"""Add context messages from a specific source.
Messages are stored keyed by source_id, maintaining insertion order
based on provider execution order.
Args:
source_id: The provider source_id adding these messages
messages: The messages to add
"""
if source_id not in self.context_messages:
self.context_messages[source_id] = []
self.context_messages[source_id].extend(messages)
def extend_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
"""Add instructions to be prepended to the conversation.
Instructions are added to a flat list. The source_id is recorded
in metadata for debugging but instructions are not keyed by source.
Args:
source_id: The provider source_id adding these instructions
instructions: A single instruction string or sequence of strings
"""
if isinstance(instructions, str):
instructions = [instructions]
self.instructions.extend(instructions)
def extend_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
"""Add tools to be available for this invocation.
Tools are added with source attribution in their metadata.
Args:
source_id: The provider source_id adding these tools
tools: The tools to add
"""
for tool in tools:
if hasattr(tool, 'metadata') and isinstance(tool.metadata, dict):
tool.metadata["context_source"] = source_id
self.tools.extend(tools)
def get_messages(
self,
*,
sources: Sequence[str] | None = None,
exclude_sources: Sequence[str] | None = None,
include_input: bool = False,
include_response: bool = False,
) -> list[ChatMessage]:
"""Get context messages, optionally filtered and including input/response.
Returns messages in provider execution order (dict insertion order),
with input and response appended if requested.
Args:
sources: If provided, only include context messages from these sources
exclude_sources: If provided, exclude context messages from these sources
include_input: If True, append input_messages after context
include_response: If True, append response.messages at the end
Returns:
Flattened list of messages in conversation order
"""
result: list[ChatMessage] = []
for source_id, messages in self.context_messages.items():
if sources is not None and source_id not in sources:
continue
if exclude_sources is not None and source_id in exclude_sources:
continue
result.extend(messages)
if include_input and self.input_messages:
result.extend(self.input_messages)
if include_response and self.response:
result.extend(self.response.messages)
return result
```
##### ContextProvider
```python
class ContextProvider(ABC):
"""Base class for context providers (hooks pattern).
Context providers participate in the context engineering pipeline,
adding context before model invocation and processing responses after.
Attributes:
source_id: Unique identifier for this provider instance (required).
Used for message/tool attribution so other providers can filter.
"""
def __init__(self, source_id: str):
"""Initialize the provider.
Args:
source_id: Unique identifier for this provider instance.
Used for message/tool attribution.
"""
self.source_id = source_id
async def before_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
"""Called before model invocation.
Override to add context (messages, instructions, tools) to the
SessionContext before the model is invoked.
Args:
agent: The agent running this invocation
session: The current session
context: The invocation context - add messages/instructions/tools here
state: The session's mutable state dict
"""
pass
async def after_run(
self,
agent: "SupportsAgentRun",
session: AgentSession,
context: SessionContext,
state: dict[str, Any],
) -> None:
"""Called after model invocation.
Override to process the response (store messages, extract info, etc.).
The context.response.messages will be populated at this point.
Args:
agent: The agent that ran this invocation
session: The current session
context: The invocation context with response populated
state: The session's mutable state dict
"""
pass
```
> **Serialization contract:** Any values a provider writes to `state` must be JSON-serializable.
> Sessions are serialized via `session.to_dict()` and restored via `AgentSession.from_dict()`.
```
##### HistoryProvider
```python
class HistoryProvider(ContextProvider):
"""Base class for conversation history storage providers.
A single class that can be configured for different use cases:
- Primary memory storage (loads + stores messages)
- Audit/logging storage (stores only, doesn't load)
- Evaluation storage (stores only for later analysis)
Loading behavior (when to add messages to context_messages[source_id]):
- `load_messages=True` (default): Load messages from storage
- `load_messages=False`: Agent skips `before_run` entirely (audit/logging mode)
Storage behavior:
- `store_inputs`: Store input messages (default True)
- `store_responses`: Store response messages (default True)
- Storage always happens unless explicitly disabled, regardless of load_messages
Warning: At session creation time, a warning is logged if:
- Multiple history providers have `load_messages=True` (likely duplicate loading)
- Zero history providers have `load_messages=True` (likely missing primary storage)
Examples:
# Primary memory - loads and stores
memory = InMemoryHistoryProvider(source_id="memory")
# Audit storage - stores only, doesn't add to context
audit = RedisHistoryProvider(
source_id="audit",
load_messages=False,
redis_url="redis://...",
)
# Full audit - stores everything including RAG context
full_audit = CosmosHistoryProvider(
source_id="full_audit",
load_messages=False,
store_context_messages=True,
)
"""
def __init__(
self,
source_id: str,
*,
load_messages: bool = True,
store_responses: bool = True,
store_inputs: bool = True,
store_context_messages: bool = False,
store_context_from: Sequence[str] | None = None,
):
super().__init__(source_id)
self.load_messages = load_messages
self.store_responses = store_responses
self.store_inputs = store_inputs
self.store_context_messages = store_context_messages
self.store_context_from = list(store_context_from) if store_context_from else None
@abstractmethod
async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
"""Retrieve stored messages for this session."""
pass
@abstractmethod
async def save_messages(
self,
session_id: str | None,
messages: Sequence[ChatMessage]
) -> None:
"""Persist messages for this session."""
pass
def _get_context_messages_to_store(self, context: SessionContext) -> list[ChatMessage]:
"""Get context messages that should be stored based on configuration."""
if not self.store_context_messages:
return []
if self.store_context_from is not None:
return context.get_messages(sources=self.store_context_from)
else:
return context.get_messages(exclude_sources=[self.source_id])
async def before_run(self, agent, session, context, state) -> None:
"""Load history into context. Skipped by the agent when load_messages=False."""
history = await self.get_messages(context.session_id)
context.extend_messages(self.source_id, history)
async def after_run(self, agent, session, context, state) -> None:
"""Store messages based on configuration."""
messages_to_store: list[ChatMessage] = []
messages_to_store.extend(self._get_context_messages_to_store(context))
if self.store_inputs:
messages_to_store.extend(context.input_messages)
if self.store_responses and context.response.messages:
messages_to_store.extend(context.response.messages)
if messages_to_store:
await self.save_messages(context.session_id, messages_to_store)
```
##### AgentSession
```python
import uuid
import warnings
from collections.abc import Sequence
class AgentSession:
"""A conversation session with an agent.
Lightweight state container. Provider instances are owned by the agent,
not the session. The session only holds session IDs and a mutable state dict.
Attributes:
session_id: Unique identifier for this session
service_session_id: Service-managed session ID (if using service-side storage)
state: Mutable state dict shared with all providers
"""
def __init__(
self,
*,
session_id: str | None = None,
service_session_id: str | None = None,
):
"""Initialize the session.
Note: Prefer using agent.create_session() instead of direct construction.
Args:
session_id: Optional session ID (generated if not provided)
service_session_id: Optional service-managed session ID
"""
self._session_id = session_id or str(uuid.uuid4())
self.service_session_id = service_session_id
self.state: dict[str, Any] = {}
@property
def session_id(self) -> str:
"""The unique identifier for this session."""
return self._session_id
def to_dict(self) -> dict[str, Any]:
"""Serialize session to a plain dict for storage/transfer."""
return {
"type": "session",
"session_id": self._session_id,
"service_session_id": self.service_session_id,
"state": self.state,
}
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AgentSession":
"""Restore session from a previously serialized dict."""
session = cls(
session_id=data["session_id"],
service_session_id=data.get("service_session_id"),
)
session.state = data.get("state", {})
return session
class ChatAgent:
def __init__(
self,
chat_client: ...,
*,
context_providers: Sequence[ContextProvider] | None = None,
):
self._context_providers = list(context_providers or [])
def create_session(
self,
*,
session_id: str | None = None,
) -> AgentSession:
"""Create a new lightweight session.
Args:
session_id: Optional session ID (generated if not provided)
"""
return AgentSession(session_id=session_id)
def get_session(
self,
service_session_id: str,
*,
session_id: str | None = None,
) -> AgentSession:
"""Get or create a session for a service-managed session ID.
Args:
service_session_id: Service-managed session ID
session_id: Optional session ID (generated if not provided)
"""
session = AgentSession(session_id=session_id)
session.service_session_id = service_session_id
return session
def _ensure_default_storage(self, session: AgentSession, options: dict[str, Any]) -> None:
"""Add default InMemoryHistoryProvider if needed.
Default storage is added when ALL of these are true:
- A session is provided (always the case here)
- No context_providers configured
- Either options.conversation_id is set or options.store is True
"""
if self._context_providers:
return
if options.get("conversation_id") or options.get("store") is True:
self._context_providers.append(InMemoryHistoryProvider("memory"))
def _validate_providers(self) -> None:
"""Warn if history provider configuration looks like a mistake."""
storage_providers = [
p for p in self._context_providers
if isinstance(p, HistoryProvider)
]
if not storage_providers:
return
loaders = [p for p in storage_providers if p.load_messages is True]
if len(loaders) > 1:
warnings.warn(
f"Multiple history providers configured to load messages: "
f"{[p.source_id for p in loaders]}. "
f"This may cause duplicate messages in context.",
UserWarning
)
elif len(loaders) == 0:
warnings.warn(
f"History providers configured but none have load_messages=True: "
f"{[p.source_id for p in storage_providers]}. "
f"No conversation history will be loaded.",
UserWarning
)
async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> ...:
"""Run the agent with the given input."""
options = options or {}
# Ensure default storage on first run
self._ensure_default_storage(session, options)
self._validate_providers()
context = SessionContext(
session_id=session.session_id,
service_session_id=session.service_session_id,
input_messages=[...],
options=options,
)
# Before-run providers (forward order, skip HistoryProviders with load_messages=False)
for provider in self._context_providers:
if isinstance(provider, HistoryProvider) and not provider.load_messages:
continue
await provider.before_run(self, session, context, session.state)
# ... assemble final messages from context, invoke model ...
# After-run providers (reverse order)
for provider in reversed(self._context_providers):
await provider.after_run(self, session, context, session.state)
# Session serialization is trivial — session.state is a plain dict:
#
# # Serialize
# data = {
# "session_id": session.session_id,
# "service_session_id": session.service_session_id,
# "state": session.state,
# }
# json_str = json.dumps(data)
#
# # Deserialize
# data = json.loads(json_str)
# session = AgentSession(session_id=data["session_id"], service_session_id=data.get("service_session_id"))
# session.state = data["state"]
```
================================================
FILE: docs/decisions/0016-structured-output.md
================================================
---
status: proposed
contact: sergeymenshykh
date: 2026-01-22
deciders: rbarreto, westey-m, stephentoub
informed: {}
---
# Structured Output
Structured output is a valuable aspect of any agent system, since it forces an agent to produce output in a required format that may include required fields.
This allows easily turning unstructured data into structured data using a general-purpose language model.
## Context and Problem Statement
Structured output is currently supported only by `ChatClientAgent` and can be configured in two ways:
**Approach 1: ResponseFormat + Deserialize**
Specify the SO type schema via the `ChatClientAgent{Run}Options.ChatOptions.ResponseFormat` property at agent creation or invocation time, then use `JsonSerializer.Deserialize` to extract the structured data from the response text.
```csharp
// SO type can be provided at agent creation time
ChatClientAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions()
{
Name = "...",
ChatOptions = new() { ResponseFormat = ChatResponseFormat.ForJsonSchema() }
});
AgentResponse response = await agent.RunAsync("...");
PersonInfo personInfo = response.Deserialize(JsonSerializerOptions.Web);
Console.WriteLine($"Name: {personInfo.Name}");
Console.WriteLine($"Age: {personInfo.Age}");
Console.WriteLine($"Occupation: {personInfo.Occupation}");
// Alternatively, SO type can be provided at agent invocation time
response = await agent.RunAsync("...", new ChatClientAgentRunOptions()
{
ChatOptions = new() { ResponseFormat = ChatResponseFormat.ForJsonSchema() }
});
personInfo = response.Deserialize(JsonSerializerOptions.Web);
Console.WriteLine($"Name: {personInfo.Name}");
Console.WriteLine($"Age: {personInfo.Age}");
Console.WriteLine($"Occupation: {personInfo.Occupation}");
```
**Approach 2: Generic RunAsync**
Supply the SO type as a generic parameter to `RunAsync` and access the parsed result directly via the `Result` property.
```csharp
ChatClientAgent agent = ...;
AgentResponse response = await agent.RunAsync("...");
Console.WriteLine($"Name: {response.Result.Name}");
Console.WriteLine($"Age: {response.Result.Age}");
Console.WriteLine($"Occupation: {response.Result.Occupation}");
```
Note: `RunAsync` is an instance method of `ChatClientAgent` and not part of the `AIAgent` base class since not all agents support structured output.
Approach 1 is perceived as cumbersome by the community, as it requires additional effort when using primitive or collection types - the SO schema may need to be wrapped in an artificial JSON object. Otherwise, the caller will encounter an error like _Invalid schema for response_format 'Movie': schema must be a JSON Schema of 'type: "object"', got 'type: "array"'_.
This occurs because OpenAI and compatible APIs require a JSON object as the root schema.
Approach 1 is also necessary in scenarios where (a) agents can only be configured with SO at creation time (such as with `AIProjectClient`), (b) the SO type is not known at compile time, or (c) the JSON schema is represented as text (for declarative agents) or as a `JsonElement`.
Approach 2 is more convenient and works seamlessly with primitives and collections. However, it requires the SO type to be known at compile time, making it less flexible.
Additionally, since the `RunAsync` methods are instance methods of `ChatClientAgent` and are not part of the `AIAgent` base class, applying decorators like `OpenTelemetryAgent` on top of `ChatClientAgent` prevents users from accessing `RunAsync`, meaning structured output is not available with decorated agents.
Given the different scenarios above in which structured output can be used, there is no one-size-fits-all solution. Each approach has its own advantages and limitations,
and the two can complement each other to provide a comprehensive structured output experience across various use cases.
## Approaches Overview
1. SO usage via `ResponseFormat` property
2. SO usage via `RunAsync` generic method
## 1. SO usage via `ResponseFormat` property
This approach should be used in the following scenarios:
- 1.1 SO result as text is sufficient as is, and deserialization is not required
- 1.2 SO for inter-agent collaboration
- 1.3 SO can only be configured at agent creation time (such as with `AIProjectClient`)
- 1.4 SO type is not known at compile time and represented by System.Type
- 1.5 SO is represented by JSON schema and there's no corresponding .NET type either at compile time or at runtime
- 1.6 SO in streaming scenarios, where the SO response is produced in parts
**Note: Primitives and arrays are not supported by this approach.**
When a caller provides a schema via `ResponseFormat`, they are explicitly telling the framework what schema to use. The framework passes that schema through as-is and
is not responsible for transforming it. Because the framework does not own the schema, it cannot wrap primitives or arrays into a JSON object to satisfy API requirements,
nor can it unwrap the response afterward - the caller controls the schema and is responsible for ensuring it is compatible with the underlying API.
This is in contrast to the `RunAsync` approach (section 2), where the caller provides a type `T` and says "make it work." In that case, the caller does not
dictate the schema - the framework infers the schema from `T`, owns the end-to-end pipeline (schema generation, API invocation, and deserialization), and can
therefore wrap and unwrap primitives and arrays transparently.
Additionally, in streaming scenarios (1.6), the framework cannot reliably unwrap a response it did not wrap, since it has no way of knowing whether the caller wrapped the schema.Wrapping and unwrapping can only be done safely when the framework owns the entire lifecycle - from schema creation through deserialization — which is only the case with `RunAsync`.
If a caller needs to work with primitives or arrays via the `ResponseFormat` approach, they can easily create a wrapper type around them:
```csharp
public class MovieListWrapper
{
public List Movies { get; set; }
}
```
### 1.1 SO result as text is sufficient as is, and deserialization is not required
In this scenario, the caller only needs the raw JSON text returned by the model and does not need to deserialize it into a .NET type.
The SO schema is specified via `ResponseFormat` at agent creation or invocation time, and the response text is consumed directly from the `AgentResponse`.
```csharp
AIAgent agent = chatClient.AsAIAgent();
AgentRunOptions runOptions = new()
{
ResponseFormat = ChatResponseFormat.ForJsonSchema()
};
AgentResponse response = await agent.RunAsync("...", options: runOptions);
Console.WriteLine(response.Text);
```
### 1.2 SO for inter-agent collaboration
This scenario assumes a multi-agent setup where agents collaborate by passing messages to each other.
One agent produces structured output as text that is then passed directly as input to the next agent, without intermediate deserialization.
```csharp
// First agent extracts structured data from unstructured input
AIAgent extractionAgent = chatClient.AsAIAgent(new ChatClientAgentOptions()
{
Name = "ExtractionAgent",
ChatOptions = new()
{
Instructions = "Extract person information from the provided text.",
ResponseFormat = ChatResponseFormat.ForJsonSchema()
}
});
AgentResponse extractionResponse = await extractionAgent.RunAsync("John Smith is a 35-year-old software engineer.");
// Pass the message with structured output text directly to the next agent
ChatMessage soMessage = extractionResponse.Messages.Last();
AIAgent summaryAgent = chatClient.AsAIAgent(new ChatClientAgentOptions()
{
Name = "SummaryAgent",
ChatOptions = new() { Instructions = "Given the following structured person data, write a short professional bio." }
});
AgentResponse summaryResponse = await summaryAgent.RunAsync(soMessage);
Console.WriteLine(summaryResponse);
```
### 1.3 SO configured at agent creation time
In this scenario, the SO schema can only be configured at agent creation time (such as with `AIProjectClient`) and cannot be changed on a per-run basis.
The caller specifies the `ResponseFormat` when creating the agent, and all subsequent invocations use the same schema.
```csharp
AIProjectClient client = ...;
AIAgent agent = await client.CreateAIAgentAsync(model: "", new ChatClientAgentOptions()
{
Name = "...",
ChatOptions = new() { ResponseFormat = ChatResponseFormat.ForJsonSchema() }
});
AgentResponse response = await agent.RunAsync("Please provide information about John Smith.");
PersonInfo personInfo = JsonSerializer.Deserialize(response.Text, JsonSerializerOptions.Web)!;
Console.WriteLine($"Name: {personInfo.Name}");
Console.WriteLine($"Age: {personInfo.Age}");
Console.WriteLine($"Occupation: {personInfo.Occupation}");
```
### 1.4 SO type not known at compile time and represented by System.Type
In this scenario, the SO type is not known at compile time and is provided as a `System.Type` at runtime. This is useful for dynamic scenarios where the schema is determined programmatically,
such as when building tooling or frameworks that work with user-defined types.
```csharp
Type soType = GetStructuredOutputTypeFromConfiguration(); // e.g., typeof(PersonInfo)
ChatResponseFormat responseFormat = ChatResponseFormat.ForJsonSchema(soType);
AgentResponse response = await agent.RunAsync("...", new ChatClientAgentRunOptions()
{
ChatOptions = new() { ResponseFormat = responseFormat }
});
PersonInfo personInfo = (PersonInfo)JsonSerializer.Deserialize(response.Text, soType, JsonSerializerOptions.Web)!;
```
### 1.5 SO represented by JSON schema with no corresponding .NET type
In this scenario, the SO schema is represented as raw JSON schema text or a `JsonElement`, and there is no corresponding .NET type available at compile time or runtime.
This is typical for declarative agents or scenarios where schemas are loaded from external configuration.
```csharp
// JSON schema provided as a string, e.g., loaded from a configuration file
string jsonSchema = """
{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" },
"occupation": { "type": "string" }
},
"required": ["name", "age", "occupation"]
}
""";
ChatResponseFormat responseFormat = ChatResponseFormat.ForJsonSchema(
jsonSchemaName: "PersonInfo",
jsonSchema: BinaryData.FromString(jsonSchema));
AgentResponse response = await agent.RunAsync("...", new ChatClientAgentRunOptions()
{
ChatOptions = new() { ResponseFormat = responseFormat }
});
// Consume the SO result as text since there's no .NET type to deserialize into
Console.WriteLine(response.Text);
```
### 1.6 SO in streaming scenarios
In this scenario, the SO response is produced incrementally in parts via streaming. The caller specifies the `ResponseFormat` and consumes the response chunks as they arrive.
Deserialization is performed after all chunks have been received.
```csharp
AIAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions()
{
Name = "HelpfulAssistant",
ChatOptions = new()
{
Instructions = "You are a helpful assistant.",
ResponseFormat = ChatResponseFormat.ForJsonSchema()
}
});
IAsyncEnumerable updates = agent.RunStreamingAsync("Please provide information about John Smith, who is a 35-year-old software engineer.");
AgentResponse response = await updates.ToAgentResponseAsync();
// Deserialize the complete SO result after streaming is finished
PersonInfo personInfo = JsonSerializer.Deserialize(response.Text)!;
```
## 2. SO usage via `RunAsync` generic method
This approach provides a convenient way to work with structured output on a per-run basis when the target type is known at compile time and a typed instance of the result
is required.
### Decision Drivers
1. Support arrays and primitives as SO types
2. Support complex types as SO types
3. Work with `AIAgent` decorators (e.g., `OpenTelemetryAgent`)
4. Enable SO for all AI agents, regardless of whether they natively support it
### Considered Options
1. `RunAsync` as an instance method of `AIAgent` class delegating to virtual `RunCoreAsync`
2. `RunAsync` as an extension method using feature collection
3. `RunAsync` as a method of the new `ITypedAIAgent` interface
4. `RunAsync` as an instance method of `AIAgent` class working via the new `AgentRunOptions.ResponseFormat` property
### 1. `RunAsync` as an instance method of `AIAgent` class delegating to virtual `RunCoreAsync`
This option adds the `RunAsync` method directly to the `AIAgent` base class.
```csharp
public abstract class AIAgent
{
public Task> RunAsync(
IEnumerable messages,
AgentSession? session = null,
JsonSerializerOptions? serializerOptions = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
=> this.RunCoreAsync(messages, session, serializerOptions, options, cancellationToken);
protected virtual Task> RunCoreAsync(
IEnumerable messages,
AgentSession? session = null,
JsonSerializerOptions? serializerOptions = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
throw new NotSupportedException($"The agent of type '{this.GetType().FullName}' does not support typed responses.");
}
}
```
Agents with native SO support override the `RunCoreAsync` method to provide their implementation. If not overridden, the method throws a `NotSupportedException`.
Users will call the generic `RunAsync` method directly on the agent:
```csharp
AIAgent agent = chatClient.AsAIAgent(name: "HelpfulAssistant", instructions: "You are a helpful assistant.");
AgentResponse response = await agent.RunAsync("Please provide information about John Smith, who is a 35-year-old software engineer.");
```
Decision drivers satisfied:
1. Support arrays and primitives as SO types
2. Support complex types as SO types
3. Work with `AIAgent` decorators (e.g., `OpenTelemetryAgent`)
4. Enable SO for all AI agents, regardless of whether they natively support it
Pros:
- The `AIAgent.RunAsync` method is easily discoverable.
- Both the SO decorator and `ChatClientAgent` have compile-time access to the type `T`, allowing them to use the native `IChatClient.GetResponseAsync` API, which handles primitives and collections seamlessly.
Cons:
- Agents without native SO support will still expose `RunAsync`, which may be misleading.
- `ChatClientAgent` exposing `RunAsync` may be misleading when the underlying chat client does not support SO.
- All `AIAgent` decorators must override `RunCoreAsync` to properly handle `RunAsync` calls.
### 2. `RunAsync` as an extension method using feature collection
This option uses the Agent Framework feature collection (implemented via `AgentRunOptions.AdditionalProperties`) to pass a `StructuredOutputFeature` to agents, signaling that SO is requested.
Agents with native SO support check for this feature. If present, they read the target type, build the schema, invoke the underlying API, and store the response back in the feature.
```csharp
public class StructuredOutputFeature
{
public StructuredOutputFeature(Type outputType)
{
this.OutputType = outputType;
}
[JsonIgnore]
public Type OutputType { get; set; }
public JsonSerializerOptions? SerializerOptions { get; set; }
public AgentResponse? Response { get; set; }
}
```
The `RunAsync` extension method for `AIAgent` adds this feature to the collection.
```csharp
public static async Task> RunAsync(
this AIAgent agent,
IEnumerable messages,
AgentSession? session = null,
JsonSerializerOptions? serializerOptions = null,
AgentRunOptions? options = null,
CancellationToken cancellationToken = default)
{
// Create the structured output feature.
StructuredOutputFeature structuredOutputFeature = new(typeof(T))
{
SerializerOptions = serializerOptions,
};
// Register it in the feature collection.
((options ??= new AgentRunOptions()).AdditionalProperties ??= []).Add(typeof(StructuredOutputFeature).FullName!, structuredOutputFeature);
var response = await agent.RunAsync(messages, session, options, cancellationToken).ConfigureAwait(false);
if (structuredOutputFeature.Response is not null)
{
return new StructuredOutputResponse(structuredOutputFeature.Response, response, serializerOptions);
}
throw new InvalidOperationException("No structured output response was generated by the agent.");
}
```
Users will call the `RunAsync` extension method directly on the agent:
```csharp
AIAgent agent = chatClient.AsAIAgent(name: "HelpfulAssistant", instructions: "You are a helpful assistant.");
AgentResponse response = await agent.RunAsync("Please provide information about John Smith, who is a 35-year-old software engineer.");
```
Decision drivers satisfied:
1. Support arrays and primitives as SO types
2. Support complex types as SO types
3. Work with `AIAgent` decorators (e.g., `OpenTelemetryAgent`)
4. Enable SO for all AI agents, regardless of whether they natively support it
Pros:
- The `RunAsync` extension method is easily discoverable.
- The `AIAgent` public API surface remains unchanged.
- No changes required to `AIAgent` decorators.
Cons:
- Agents without native SO support will still expose `RunAsync`, which may be misleading.
- `ChatClientAgent` exposing `RunAsync` may be misleading when the underlying chat client does not support SO.
### 3. `RunAsync` as a method of the new `ITypedAIAgent` interface
This option defines a new `ITypedAIAgent` interface that agents with SO support implement. Agents without SO support do not implement it, allowing users to check for SO capability via interface detection.
The interface:
```csharp
public interface ITypedAIAgent
{
Task> RunAsync(
IEnumerable