Repository: karpathy/llm-council Branch: master Commit: 92e1fccb1bdc Files: 34 Total size: 66.9 KB Directory structure: gitextract_tdw_7erc/ ├── .gitignore ├── .python-version ├── CLAUDE.md ├── README.md ├── backend/ │ ├── __init__.py │ ├── config.py │ ├── council.py │ ├── main.py │ ├── openrouter.py │ └── storage.py ├── frontend/ │ ├── .gitignore │ ├── README.md │ ├── eslint.config.js │ ├── index.html │ ├── package.json │ ├── src/ │ │ ├── App.css │ │ ├── App.jsx │ │ ├── api.js │ │ ├── components/ │ │ │ ├── ChatInterface.css │ │ │ ├── ChatInterface.jsx │ │ │ ├── Sidebar.css │ │ │ ├── Sidebar.jsx │ │ │ ├── Stage1.css │ │ │ ├── Stage1.jsx │ │ │ ├── Stage2.css │ │ │ ├── Stage2.jsx │ │ │ ├── Stage3.css │ │ │ └── Stage3.jsx │ │ ├── index.css │ │ └── main.jsx │ └── vite.config.js ├── main.py ├── pyproject.toml └── start.sh ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # Python-generated files __pycache__/ *.py[oc] build/ dist/ wheels/ *.egg-info # Virtual environments .venv # Keys and secrets .env # Data files data/ # Frontend frontend/node_modules/ frontend/dist/ frontend/.vite/ ================================================ FILE: .python-version ================================================ 3.10 ================================================ FILE: CLAUDE.md ================================================ # CLAUDE.md - Technical Notes for LLM Council This file contains technical details, architectural decisions, and important implementation notes for future development sessions. ## Project Overview LLM Council is a 3-stage deliberation system where multiple LLMs collaboratively answer user questions. The key innovation is anonymized peer review in Stage 2, preventing models from playing favorites. ## Architecture ### Backend Structure (`backend/`) **`config.py`** - Contains `COUNCIL_MODELS` (list of OpenRouter model identifiers) - Contains `CHAIRMAN_MODEL` (model that synthesizes final answer) - Uses environment variable `OPENROUTER_API_KEY` from `.env` - Backend runs on **port 8001** (NOT 8000 - user had another app on 8000) **`openrouter.py`** - `query_model()`: Single async model query - `query_models_parallel()`: Parallel queries using `asyncio.gather()` - Returns dict with 'content' and optional 'reasoning_details' - Graceful degradation: returns None on failure, continues with successful responses **`council.py`** - The Core Logic - `stage1_collect_responses()`: Parallel queries to all council models - `stage2_collect_rankings()`: - Anonymizes responses as "Response A, B, C, etc." - Creates `label_to_model` mapping for de-anonymization - Prompts models to evaluate and rank (with strict format requirements) - Returns tuple: (rankings_list, label_to_model_dict) - Each ranking includes both raw text and `parsed_ranking` list - `stage3_synthesize_final()`: Chairman synthesizes from all responses + rankings - `parse_ranking_from_text()`: Extracts "FINAL RANKING:" section, handles both numbered lists and plain format - `calculate_aggregate_rankings()`: Computes average rank position across all peer evaluations **`storage.py`** - JSON-based conversation storage in `data/conversations/` - Each conversation: `{id, created_at, messages[]}` - Assistant messages contain: `{role, stage1, stage2, stage3}` - Note: metadata (label_to_model, aggregate_rankings) is NOT persisted to storage, only returned via API **`main.py`** - FastAPI app with CORS enabled for localhost:5173 and localhost:3000 - POST `/api/conversations/{id}/message` returns metadata in addition to stages - Metadata includes: label_to_model mapping and aggregate_rankings ### Frontend Structure (`frontend/src/`) **`App.jsx`** - Main orchestration: manages conversations list and current conversation - Handles message sending and metadata storage - Important: metadata is stored in the UI state for display but not persisted to backend JSON **`components/ChatInterface.jsx`** - Multiline textarea (3 rows, resizable) - Enter to send, Shift+Enter for new line - User messages wrapped in markdown-content class for padding **`components/Stage1.jsx`** - Tab view of individual model responses - ReactMarkdown rendering with markdown-content wrapper **`components/Stage2.jsx`** - **Critical Feature**: Tab view showing RAW evaluation text from each model - De-anonymization happens CLIENT-SIDE for display (models receive anonymous labels) - Shows "Extracted Ranking" below each evaluation so users can validate parsing - Aggregate rankings shown with average position and vote count - Explanatory text clarifies that boldface model names are for readability only **`components/Stage3.jsx`** - Final synthesized answer from chairman - Green-tinted background (#f0fff0) to highlight conclusion **Styling (`*.css`)** - Light mode theme (not dark mode) - Primary color: #4a90e2 (blue) - Global markdown styling in `index.css` with `.markdown-content` class - 12px padding on all markdown content to prevent cluttered appearance ## Key Design Decisions ### Stage 2 Prompt Format The Stage 2 prompt is very specific to ensure parseable output: ``` 1. Evaluate each response individually first 2. Provide "FINAL RANKING:" header 3. Numbered list format: "1. Response C", "2. Response A", etc. 4. No additional text after ranking section ``` This strict format allows reliable parsing while still getting thoughtful evaluations. ### De-anonymization Strategy - Models receive: "Response A", "Response B", etc. - Backend creates mapping: `{"Response A": "openai/gpt-5.1", ...}` - Frontend displays model names in **bold** for readability - Users see explanation that original evaluation used anonymous labels - This prevents bias while maintaining transparency ### Error Handling Philosophy - Continue with successful responses if some models fail (graceful degradation) - Never fail the entire request due to single model failure - Log errors but don't expose to user unless all models fail ### UI/UX Transparency - All raw outputs are inspectable via tabs - Parsed rankings shown below raw text for validation - Users can verify system's interpretation of model outputs - This builds trust and allows debugging of edge cases ## Important Implementation Details ### Relative Imports All backend modules use relative imports (e.g., `from .config import ...`) not absolute imports. This is critical for Python's module system to work correctly when running as `python -m backend.main`. ### Port Configuration - Backend: 8001 (changed from 8000 to avoid conflict) - Frontend: 5173 (Vite default) - Update both `backend/main.py` and `frontend/src/api.js` if changing ### Markdown Rendering All ReactMarkdown components must be wrapped in `
` for proper spacing. This class is defined globally in `index.css`. ### Model Configuration Models are hardcoded in `backend/config.py`. Chairman can be same or different from council members. The current default is Gemini as chairman per user preference. ## Common Gotchas 1. **Module Import Errors**: Always run backend as `python -m backend.main` from project root, not from backend directory 2. **CORS Issues**: Frontend must match allowed origins in `main.py` CORS middleware 3. **Ranking Parse Failures**: If models don't follow format, fallback regex extracts any "Response X" patterns in order 4. **Missing Metadata**: Metadata is ephemeral (not persisted), only available in API responses ## Future Enhancement Ideas - Configurable council/chairman via UI instead of config file - Streaming responses instead of batch loading - Export conversations to markdown/PDF - Model performance analytics over time - Custom ranking criteria (not just accuracy/insight) - Support for reasoning models (o1, etc.) with special handling ## Testing Notes Use `test_openrouter.py` to verify API connectivity and test different model identifiers before adding to council. The script tests both streaming and non-streaming modes. ## Data Flow Summary ``` User Query ↓ Stage 1: Parallel queries → [individual responses] ↓ Stage 2: Anonymize → Parallel ranking queries → [evaluations + parsed rankings] ↓ Aggregate Rankings Calculation → [sorted by avg position] ↓ Stage 3: Chairman synthesis with full context ↓ Return: {stage1, stage2, stage3, metadata} ↓ Frontend: Display with tabs + validation UI ``` The entire flow is async/parallel where possible to minimize latency. ================================================ FILE: README.md ================================================ # LLM Council ![llmcouncil](header.jpg) The idea of this repo is that instead of asking a question to your favorite LLM provider (e.g. OpenAI GPT 5.1, Google Gemini 3.0 Pro, Anthropic Claude Sonnet 4.5, xAI Grok 4, eg.c), you can group them into your "LLM Council". This repo is a simple, local web app that essentially looks like ChatGPT except it uses OpenRouter to send your query to multiple LLMs, it then asks them to review and rank each other's work, and finally a Chairman LLM produces the final response. In a bit more detail, here is what happens when you submit a query: 1. **Stage 1: First opinions**. The user query is given to all LLMs individually, and the responses are collected. The individual responses are shown in a "tab view", so that the user can inspect them all one by one. 2. **Stage 2: Review**. Each individual LLM is given the responses of the other LLMs. Under the hood, the LLM identities are anonymized so that the LLM can't play favorites when judging their outputs. The LLM is asked to rank them in accuracy and insight. 3. **Stage 3: Final response**. The designated Chairman of the LLM Council takes all of the model's responses and compiles them into a single final answer that is presented to the user. ## Vibe Code Alert This project was 99% vibe coded as a fun Saturday hack because I wanted to explore and evaluate a number of LLMs side by side in the process of [reading books together with LLMs](https://x.com/karpathy/status/1990577951671509438). It's nice and useful to see multiple responses side by side, and also the cross-opinions of all LLMs on each other's outputs. I'm not going to support it in any way, it's provided here as is for other people's inspiration and I don't intend to improve it. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like. ## Setup ### 1. Install Dependencies The project uses [uv](https://docs.astral.sh/uv/) for project management. **Backend:** ```bash uv sync ``` **Frontend:** ```bash cd frontend npm install cd .. ``` ### 2. Configure API Key Create a `.env` file in the project root: ```bash OPENROUTER_API_KEY=sk-or-v1-... ``` Get your API key at [openrouter.ai](https://openrouter.ai/). Make sure to purchase the credits you need, or sign up for automatic top up. ### 3. Configure Models (Optional) Edit `backend/config.py` to customize the council: ```python COUNCIL_MODELS = [ "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", ] CHAIRMAN_MODEL = "google/gemini-3-pro-preview" ``` ## Running the Application **Option 1: Use the start script** ```bash ./start.sh ``` **Option 2: Run manually** Terminal 1 (Backend): ```bash uv run python -m backend.main ``` Terminal 2 (Frontend): ```bash cd frontend npm run dev ``` Then open http://localhost:5173 in your browser. ## Tech Stack - **Backend:** FastAPI (Python 3.10+), async httpx, OpenRouter API - **Frontend:** React + Vite, react-markdown for rendering - **Storage:** JSON files in `data/conversations/` - **Package Management:** uv for Python, npm for JavaScript ================================================ FILE: backend/__init__.py ================================================ """LLM Council backend package.""" ================================================ FILE: backend/config.py ================================================ """Configuration for the LLM Council.""" import os from dotenv import load_dotenv load_dotenv() # OpenRouter API key OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY") # Council members - list of OpenRouter model identifiers COUNCIL_MODELS = [ "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", ] # Chairman model - synthesizes final response CHAIRMAN_MODEL = "google/gemini-3-pro-preview" # OpenRouter API endpoint OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions" # Data directory for conversation storage DATA_DIR = "data/conversations" ================================================ FILE: backend/council.py ================================================ """3-stage LLM Council orchestration.""" from typing import List, Dict, Any, Tuple from .openrouter import query_models_parallel, query_model from .config import COUNCIL_MODELS, CHAIRMAN_MODEL async def stage1_collect_responses(user_query: str) -> List[Dict[str, Any]]: """ Stage 1: Collect individual responses from all council models. Args: user_query: The user's question Returns: List of dicts with 'model' and 'response' keys """ messages = [{"role": "user", "content": user_query}] # Query all models in parallel responses = await query_models_parallel(COUNCIL_MODELS, messages) # Format results stage1_results = [] for model, response in responses.items(): if response is not None: # Only include successful responses stage1_results.append({ "model": model, "response": response.get('content', '') }) return stage1_results async def stage2_collect_rankings( user_query: str, stage1_results: List[Dict[str, Any]] ) -> Tuple[List[Dict[str, Any]], Dict[str, str]]: """ Stage 2: Each model ranks the anonymized responses. Args: user_query: The original user query stage1_results: Results from Stage 1 Returns: Tuple of (rankings list, label_to_model mapping) """ # Create anonymized labels for responses (Response A, Response B, etc.) labels = [chr(65 + i) for i in range(len(stage1_results))] # A, B, C, ... # Create mapping from label to model name label_to_model = { f"Response {label}": result['model'] for label, result in zip(labels, stage1_results) } # Build the ranking prompt responses_text = "\n\n".join([ f"Response {label}:\n{result['response']}" for label, result in zip(labels, stage1_results) ]) ranking_prompt = f"""You are evaluating different responses to the following question: Question: {user_query} Here are the responses from different models (anonymized): {responses_text} Your task: 1. First, evaluate each response individually. For each response, explain what it does well and what it does poorly. 2. Then, at the very end of your response, provide a final ranking. IMPORTANT: Your final ranking MUST be formatted EXACTLY as follows: - Start with the line "FINAL RANKING:" (all caps, with colon) - Then list the responses from best to worst as a numbered list - Each line should be: number, period, space, then ONLY the response label (e.g., "1. Response A") - Do not add any other text or explanations in the ranking section Example of the correct format for your ENTIRE response: Response A provides good detail on X but misses Y... Response B is accurate but lacks depth on Z... Response C offers the most comprehensive answer... FINAL RANKING: 1. Response C 2. Response A 3. Response B Now provide your evaluation and ranking:""" messages = [{"role": "user", "content": ranking_prompt}] # Get rankings from all council models in parallel responses = await query_models_parallel(COUNCIL_MODELS, messages) # Format results stage2_results = [] for model, response in responses.items(): if response is not None: full_text = response.get('content', '') parsed = parse_ranking_from_text(full_text) stage2_results.append({ "model": model, "ranking": full_text, "parsed_ranking": parsed }) return stage2_results, label_to_model async def stage3_synthesize_final( user_query: str, stage1_results: List[Dict[str, Any]], stage2_results: List[Dict[str, Any]] ) -> Dict[str, Any]: """ Stage 3: Chairman synthesizes final response. Args: user_query: The original user query stage1_results: Individual model responses from Stage 1 stage2_results: Rankings from Stage 2 Returns: Dict with 'model' and 'response' keys """ # Build comprehensive context for chairman stage1_text = "\n\n".join([ f"Model: {result['model']}\nResponse: {result['response']}" for result in stage1_results ]) stage2_text = "\n\n".join([ f"Model: {result['model']}\nRanking: {result['ranking']}" for result in stage2_results ]) chairman_prompt = f"""You are the Chairman of an LLM Council. Multiple AI models have provided responses to a user's question, and then ranked each other's responses. Original Question: {user_query} STAGE 1 - Individual Responses: {stage1_text} STAGE 2 - Peer Rankings: {stage2_text} Your task as Chairman is to synthesize all of this information into a single, comprehensive, accurate answer to the user's original question. Consider: - The individual responses and their insights - The peer rankings and what they reveal about response quality - Any patterns of agreement or disagreement Provide a clear, well-reasoned final answer that represents the council's collective wisdom:""" messages = [{"role": "user", "content": chairman_prompt}] # Query the chairman model response = await query_model(CHAIRMAN_MODEL, messages) if response is None: # Fallback if chairman fails return { "model": CHAIRMAN_MODEL, "response": "Error: Unable to generate final synthesis." } return { "model": CHAIRMAN_MODEL, "response": response.get('content', '') } def parse_ranking_from_text(ranking_text: str) -> List[str]: """ Parse the FINAL RANKING section from the model's response. Args: ranking_text: The full text response from the model Returns: List of response labels in ranked order """ import re # Look for "FINAL RANKING:" section if "FINAL RANKING:" in ranking_text: # Extract everything after "FINAL RANKING:" parts = ranking_text.split("FINAL RANKING:") if len(parts) >= 2: ranking_section = parts[1] # Try to extract numbered list format (e.g., "1. Response A") # This pattern looks for: number, period, optional space, "Response X" numbered_matches = re.findall(r'\d+\.\s*Response [A-Z]', ranking_section) if numbered_matches: # Extract just the "Response X" part return [re.search(r'Response [A-Z]', m).group() for m in numbered_matches] # Fallback: Extract all "Response X" patterns in order matches = re.findall(r'Response [A-Z]', ranking_section) return matches # Fallback: try to find any "Response X" patterns in order matches = re.findall(r'Response [A-Z]', ranking_text) return matches def calculate_aggregate_rankings( stage2_results: List[Dict[str, Any]], label_to_model: Dict[str, str] ) -> List[Dict[str, Any]]: """ Calculate aggregate rankings across all models. Args: stage2_results: Rankings from each model label_to_model: Mapping from anonymous labels to model names Returns: List of dicts with model name and average rank, sorted best to worst """ from collections import defaultdict # Track positions for each model model_positions = defaultdict(list) for ranking in stage2_results: ranking_text = ranking['ranking'] # Parse the ranking from the structured format parsed_ranking = parse_ranking_from_text(ranking_text) for position, label in enumerate(parsed_ranking, start=1): if label in label_to_model: model_name = label_to_model[label] model_positions[model_name].append(position) # Calculate average position for each model aggregate = [] for model, positions in model_positions.items(): if positions: avg_rank = sum(positions) / len(positions) aggregate.append({ "model": model, "average_rank": round(avg_rank, 2), "rankings_count": len(positions) }) # Sort by average rank (lower is better) aggregate.sort(key=lambda x: x['average_rank']) return aggregate async def generate_conversation_title(user_query: str) -> str: """ Generate a short title for a conversation based on the first user message. Args: user_query: The first user message Returns: A short title (3-5 words) """ title_prompt = f"""Generate a very short title (3-5 words maximum) that summarizes the following question. The title should be concise and descriptive. Do not use quotes or punctuation in the title. Question: {user_query} Title:""" messages = [{"role": "user", "content": title_prompt}] # Use gemini-2.5-flash for title generation (fast and cheap) response = await query_model("google/gemini-2.5-flash", messages, timeout=30.0) if response is None: # Fallback to a generic title return "New Conversation" title = response.get('content', 'New Conversation').strip() # Clean up the title - remove quotes, limit length title = title.strip('"\'') # Truncate if too long if len(title) > 50: title = title[:47] + "..." return title async def run_full_council(user_query: str) -> Tuple[List, List, Dict, Dict]: """ Run the complete 3-stage council process. Args: user_query: The user's question Returns: Tuple of (stage1_results, stage2_results, stage3_result, metadata) """ # Stage 1: Collect individual responses stage1_results = await stage1_collect_responses(user_query) # If no models responded successfully, return error if not stage1_results: return [], [], { "model": "error", "response": "All models failed to respond. Please try again." }, {} # Stage 2: Collect rankings stage2_results, label_to_model = await stage2_collect_rankings(user_query, stage1_results) # Calculate aggregate rankings aggregate_rankings = calculate_aggregate_rankings(stage2_results, label_to_model) # Stage 3: Synthesize final answer stage3_result = await stage3_synthesize_final( user_query, stage1_results, stage2_results ) # Prepare metadata metadata = { "label_to_model": label_to_model, "aggregate_rankings": aggregate_rankings } return stage1_results, stage2_results, stage3_result, metadata ================================================ FILE: backend/main.py ================================================ """FastAPI backend for LLM Council.""" from fastapi import FastAPI, HTTPException from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import StreamingResponse from pydantic import BaseModel from typing import List, Dict, Any import uuid import json import asyncio from . import storage from .council import run_full_council, generate_conversation_title, stage1_collect_responses, stage2_collect_rankings, stage3_synthesize_final, calculate_aggregate_rankings app = FastAPI(title="LLM Council API") # Enable CORS for local development app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:5173", "http://localhost:3000"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) class CreateConversationRequest(BaseModel): """Request to create a new conversation.""" pass class SendMessageRequest(BaseModel): """Request to send a message in a conversation.""" content: str class ConversationMetadata(BaseModel): """Conversation metadata for list view.""" id: str created_at: str title: str message_count: int class Conversation(BaseModel): """Full conversation with all messages.""" id: str created_at: str title: str messages: List[Dict[str, Any]] @app.get("/") async def root(): """Health check endpoint.""" return {"status": "ok", "service": "LLM Council API"} @app.get("/api/conversations", response_model=List[ConversationMetadata]) async def list_conversations(): """List all conversations (metadata only).""" return storage.list_conversations() @app.post("/api/conversations", response_model=Conversation) async def create_conversation(request: CreateConversationRequest): """Create a new conversation.""" conversation_id = str(uuid.uuid4()) conversation = storage.create_conversation(conversation_id) return conversation @app.get("/api/conversations/{conversation_id}", response_model=Conversation) async def get_conversation(conversation_id: str): """Get a specific conversation with all its messages.""" conversation = storage.get_conversation(conversation_id) if conversation is None: raise HTTPException(status_code=404, detail="Conversation not found") return conversation @app.post("/api/conversations/{conversation_id}/message") async def send_message(conversation_id: str, request: SendMessageRequest): """ Send a message and run the 3-stage council process. Returns the complete response with all stages. """ # Check if conversation exists conversation = storage.get_conversation(conversation_id) if conversation is None: raise HTTPException(status_code=404, detail="Conversation not found") # Check if this is the first message is_first_message = len(conversation["messages"]) == 0 # Add user message storage.add_user_message(conversation_id, request.content) # If this is the first message, generate a title if is_first_message: title = await generate_conversation_title(request.content) storage.update_conversation_title(conversation_id, title) # Run the 3-stage council process stage1_results, stage2_results, stage3_result, metadata = await run_full_council( request.content ) # Add assistant message with all stages storage.add_assistant_message( conversation_id, stage1_results, stage2_results, stage3_result ) # Return the complete response with metadata return { "stage1": stage1_results, "stage2": stage2_results, "stage3": stage3_result, "metadata": metadata } @app.post("/api/conversations/{conversation_id}/message/stream") async def send_message_stream(conversation_id: str, request: SendMessageRequest): """ Send a message and stream the 3-stage council process. Returns Server-Sent Events as each stage completes. """ # Check if conversation exists conversation = storage.get_conversation(conversation_id) if conversation is None: raise HTTPException(status_code=404, detail="Conversation not found") # Check if this is the first message is_first_message = len(conversation["messages"]) == 0 async def event_generator(): try: # Add user message storage.add_user_message(conversation_id, request.content) # Start title generation in parallel (don't await yet) title_task = None if is_first_message: title_task = asyncio.create_task(generate_conversation_title(request.content)) # Stage 1: Collect responses yield f"data: {json.dumps({'type': 'stage1_start'})}\n\n" stage1_results = await stage1_collect_responses(request.content) yield f"data: {json.dumps({'type': 'stage1_complete', 'data': stage1_results})}\n\n" # Stage 2: Collect rankings yield f"data: {json.dumps({'type': 'stage2_start'})}\n\n" stage2_results, label_to_model = await stage2_collect_rankings(request.content, stage1_results) aggregate_rankings = calculate_aggregate_rankings(stage2_results, label_to_model) yield f"data: {json.dumps({'type': 'stage2_complete', 'data': stage2_results, 'metadata': {'label_to_model': label_to_model, 'aggregate_rankings': aggregate_rankings}})}\n\n" # Stage 3: Synthesize final answer yield f"data: {json.dumps({'type': 'stage3_start'})}\n\n" stage3_result = await stage3_synthesize_final(request.content, stage1_results, stage2_results) yield f"data: {json.dumps({'type': 'stage3_complete', 'data': stage3_result})}\n\n" # Wait for title generation if it was started if title_task: title = await title_task storage.update_conversation_title(conversation_id, title) yield f"data: {json.dumps({'type': 'title_complete', 'data': {'title': title}})}\n\n" # Save complete assistant message storage.add_assistant_message( conversation_id, stage1_results, stage2_results, stage3_result ) # Send completion event yield f"data: {json.dumps({'type': 'complete'})}\n\n" except Exception as e: # Send error event yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n" return StreamingResponse( event_generator(), media_type="text/event-stream", headers={ "Cache-Control": "no-cache", "Connection": "keep-alive", } ) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8001) ================================================ FILE: backend/openrouter.py ================================================ """OpenRouter API client for making LLM requests.""" import httpx from typing import List, Dict, Any, Optional from .config import OPENROUTER_API_KEY, OPENROUTER_API_URL async def query_model( model: str, messages: List[Dict[str, str]], timeout: float = 120.0 ) -> Optional[Dict[str, Any]]: """ Query a single model via OpenRouter API. Args: model: OpenRouter model identifier (e.g., "openai/gpt-4o") messages: List of message dicts with 'role' and 'content' timeout: Request timeout in seconds Returns: Response dict with 'content' and optional 'reasoning_details', or None if failed """ headers = { "Authorization": f"Bearer {OPENROUTER_API_KEY}", "Content-Type": "application/json", } payload = { "model": model, "messages": messages, } try: async with httpx.AsyncClient(timeout=timeout) as client: response = await client.post( OPENROUTER_API_URL, headers=headers, json=payload ) response.raise_for_status() data = response.json() message = data['choices'][0]['message'] return { 'content': message.get('content'), 'reasoning_details': message.get('reasoning_details') } except Exception as e: print(f"Error querying model {model}: {e}") return None async def query_models_parallel( models: List[str], messages: List[Dict[str, str]] ) -> Dict[str, Optional[Dict[str, Any]]]: """ Query multiple models in parallel. Args: models: List of OpenRouter model identifiers messages: List of message dicts to send to each model Returns: Dict mapping model identifier to response dict (or None if failed) """ import asyncio # Create tasks for all models tasks = [query_model(model, messages) for model in models] # Wait for all to complete responses = await asyncio.gather(*tasks) # Map models to their responses return {model: response for model, response in zip(models, responses)} ================================================ FILE: backend/storage.py ================================================ """JSON-based storage for conversations.""" import json import os from datetime import datetime from typing import List, Dict, Any, Optional from pathlib import Path from .config import DATA_DIR def ensure_data_dir(): """Ensure the data directory exists.""" Path(DATA_DIR).mkdir(parents=True, exist_ok=True) def get_conversation_path(conversation_id: str) -> str: """Get the file path for a conversation.""" return os.path.join(DATA_DIR, f"{conversation_id}.json") def create_conversation(conversation_id: str) -> Dict[str, Any]: """ Create a new conversation. Args: conversation_id: Unique identifier for the conversation Returns: New conversation dict """ ensure_data_dir() conversation = { "id": conversation_id, "created_at": datetime.utcnow().isoformat(), "title": "New Conversation", "messages": [] } # Save to file path = get_conversation_path(conversation_id) with open(path, 'w') as f: json.dump(conversation, f, indent=2) return conversation def get_conversation(conversation_id: str) -> Optional[Dict[str, Any]]: """ Load a conversation from storage. Args: conversation_id: Unique identifier for the conversation Returns: Conversation dict or None if not found """ path = get_conversation_path(conversation_id) if not os.path.exists(path): return None with open(path, 'r') as f: return json.load(f) def save_conversation(conversation: Dict[str, Any]): """ Save a conversation to storage. Args: conversation: Conversation dict to save """ ensure_data_dir() path = get_conversation_path(conversation['id']) with open(path, 'w') as f: json.dump(conversation, f, indent=2) def list_conversations() -> List[Dict[str, Any]]: """ List all conversations (metadata only). Returns: List of conversation metadata dicts """ ensure_data_dir() conversations = [] for filename in os.listdir(DATA_DIR): if filename.endswith('.json'): path = os.path.join(DATA_DIR, filename) with open(path, 'r') as f: data = json.load(f) # Return metadata only conversations.append({ "id": data["id"], "created_at": data["created_at"], "title": data.get("title", "New Conversation"), "message_count": len(data["messages"]) }) # Sort by creation time, newest first conversations.sort(key=lambda x: x["created_at"], reverse=True) return conversations def add_user_message(conversation_id: str, content: str): """ Add a user message to a conversation. Args: conversation_id: Conversation identifier content: User message content """ conversation = get_conversation(conversation_id) if conversation is None: raise ValueError(f"Conversation {conversation_id} not found") conversation["messages"].append({ "role": "user", "content": content }) save_conversation(conversation) def add_assistant_message( conversation_id: str, stage1: List[Dict[str, Any]], stage2: List[Dict[str, Any]], stage3: Dict[str, Any] ): """ Add an assistant message with all 3 stages to a conversation. Args: conversation_id: Conversation identifier stage1: List of individual model responses stage2: List of model rankings stage3: Final synthesized response """ conversation = get_conversation(conversation_id) if conversation is None: raise ValueError(f"Conversation {conversation_id} not found") conversation["messages"].append({ "role": "assistant", "stage1": stage1, "stage2": stage2, "stage3": stage3 }) save_conversation(conversation) def update_conversation_title(conversation_id: str, title: str): """ Update the title of a conversation. Args: conversation_id: Conversation identifier title: New title for the conversation """ conversation = get_conversation(conversation_id) if conversation is None: raise ValueError(f"Conversation {conversation_id} not found") conversation["title"] = title save_conversation(conversation) ================================================ FILE: frontend/.gitignore ================================================ # Logs logs *.log npm-debug.log* yarn-debug.log* yarn-error.log* pnpm-debug.log* lerna-debug.log* node_modules dist dist-ssr *.local # Editor directories and files .vscode/* !.vscode/extensions.json .idea .DS_Store *.suo *.ntvs* *.njsproj *.sln *.sw? ================================================ FILE: frontend/README.md ================================================ # React + Vite This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules. Currently, two official plugins are available: - [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react) uses [Babel](https://babeljs.io/) (or [oxc](https://oxc.rs) when used in [rolldown-vite](https://vite.dev/guide/rolldown)) for Fast Refresh - [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh ## React Compiler The React Compiler is not enabled on this template because of its impact on dev & build performances. To add it, see [this documentation](https://react.dev/learn/react-compiler/installation). ## Expanding the ESLint configuration If you are developing a production application, we recommend using TypeScript with type-aware lint rules enabled. Check out the [TS template](https://github.com/vitejs/vite/tree/main/packages/create-vite/template-react-ts) for information on how to integrate TypeScript and [`typescript-eslint`](https://typescript-eslint.io) in your project. ================================================ FILE: frontend/eslint.config.js ================================================ import js from '@eslint/js' import globals from 'globals' import reactHooks from 'eslint-plugin-react-hooks' import reactRefresh from 'eslint-plugin-react-refresh' import { defineConfig, globalIgnores } from 'eslint/config' export default defineConfig([ globalIgnores(['dist']), { files: ['**/*.{js,jsx}'], extends: [ js.configs.recommended, reactHooks.configs.flat.recommended, reactRefresh.configs.vite, ], languageOptions: { ecmaVersion: 2020, globals: globals.browser, parserOptions: { ecmaVersion: 'latest', ecmaFeatures: { jsx: true }, sourceType: 'module', }, }, rules: { 'no-unused-vars': ['error', { varsIgnorePattern: '^[A-Z_]' }], }, }, ]) ================================================ FILE: frontend/index.html ================================================ frontend
================================================ FILE: frontend/package.json ================================================ { "name": "frontend", "private": true, "version": "0.0.0", "type": "module", "scripts": { "dev": "vite", "build": "vite build", "lint": "eslint .", "preview": "vite preview" }, "dependencies": { "react": "^19.2.0", "react-dom": "^19.2.0", "react-markdown": "^10.1.0" }, "devDependencies": { "@eslint/js": "^9.39.1", "@types/react": "^19.2.5", "@types/react-dom": "^19.2.3", "@vitejs/plugin-react": "^5.1.1", "eslint": "^9.39.1", "eslint-plugin-react-hooks": "^7.0.1", "eslint-plugin-react-refresh": "^0.4.24", "globals": "^16.5.0", "vite": "^7.2.4" } } ================================================ FILE: frontend/src/App.css ================================================ * { box-sizing: border-box; } .app { display: flex; height: 100vh; width: 100vw; overflow: hidden; background: #ffffff; color: #333; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen', 'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue', sans-serif; } ================================================ FILE: frontend/src/App.jsx ================================================ import { useState, useEffect } from 'react'; import Sidebar from './components/Sidebar'; import ChatInterface from './components/ChatInterface'; import { api } from './api'; import './App.css'; function App() { const [conversations, setConversations] = useState([]); const [currentConversationId, setCurrentConversationId] = useState(null); const [currentConversation, setCurrentConversation] = useState(null); const [isLoading, setIsLoading] = useState(false); // Load conversations on mount useEffect(() => { loadConversations(); }, []); // Load conversation details when selected useEffect(() => { if (currentConversationId) { loadConversation(currentConversationId); } }, [currentConversationId]); const loadConversations = async () => { try { const convs = await api.listConversations(); setConversations(convs); } catch (error) { console.error('Failed to load conversations:', error); } }; const loadConversation = async (id) => { try { const conv = await api.getConversation(id); setCurrentConversation(conv); } catch (error) { console.error('Failed to load conversation:', error); } }; const handleNewConversation = async () => { try { const newConv = await api.createConversation(); setConversations([ { id: newConv.id, created_at: newConv.created_at, message_count: 0 }, ...conversations, ]); setCurrentConversationId(newConv.id); } catch (error) { console.error('Failed to create conversation:', error); } }; const handleSelectConversation = (id) => { setCurrentConversationId(id); }; const handleSendMessage = async (content) => { if (!currentConversationId) return; setIsLoading(true); try { // Optimistically add user message to UI const userMessage = { role: 'user', content }; setCurrentConversation((prev) => ({ ...prev, messages: [...prev.messages, userMessage], })); // Create a partial assistant message that will be updated progressively const assistantMessage = { role: 'assistant', stage1: null, stage2: null, stage3: null, metadata: null, loading: { stage1: false, stage2: false, stage3: false, }, }; // Add the partial assistant message setCurrentConversation((prev) => ({ ...prev, messages: [...prev.messages, assistantMessage], })); // Send message with streaming await api.sendMessageStream(currentConversationId, content, (eventType, event) => { switch (eventType) { case 'stage1_start': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.loading.stage1 = true; return { ...prev, messages }; }); break; case 'stage1_complete': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.stage1 = event.data; lastMsg.loading.stage1 = false; return { ...prev, messages }; }); break; case 'stage2_start': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.loading.stage2 = true; return { ...prev, messages }; }); break; case 'stage2_complete': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.stage2 = event.data; lastMsg.metadata = event.metadata; lastMsg.loading.stage2 = false; return { ...prev, messages }; }); break; case 'stage3_start': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.loading.stage3 = true; return { ...prev, messages }; }); break; case 'stage3_complete': setCurrentConversation((prev) => { const messages = [...prev.messages]; const lastMsg = messages[messages.length - 1]; lastMsg.stage3 = event.data; lastMsg.loading.stage3 = false; return { ...prev, messages }; }); break; case 'title_complete': // Reload conversations to get updated title loadConversations(); break; case 'complete': // Stream complete, reload conversations list loadConversations(); setIsLoading(false); break; case 'error': console.error('Stream error:', event.message); setIsLoading(false); break; default: console.log('Unknown event type:', eventType); } }); } catch (error) { console.error('Failed to send message:', error); // Remove optimistic messages on error setCurrentConversation((prev) => ({ ...prev, messages: prev.messages.slice(0, -2), })); setIsLoading(false); } }; return (
); } export default App; ================================================ FILE: frontend/src/api.js ================================================ /** * API client for the LLM Council backend. */ const API_BASE = 'http://localhost:8001'; export const api = { /** * List all conversations. */ async listConversations() { const response = await fetch(`${API_BASE}/api/conversations`); if (!response.ok) { throw new Error('Failed to list conversations'); } return response.json(); }, /** * Create a new conversation. */ async createConversation() { const response = await fetch(`${API_BASE}/api/conversations`, { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({}), }); if (!response.ok) { throw new Error('Failed to create conversation'); } return response.json(); }, /** * Get a specific conversation. */ async getConversation(conversationId) { const response = await fetch( `${API_BASE}/api/conversations/${conversationId}` ); if (!response.ok) { throw new Error('Failed to get conversation'); } return response.json(); }, /** * Send a message in a conversation. */ async sendMessage(conversationId, content) { const response = await fetch( `${API_BASE}/api/conversations/${conversationId}/message`, { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ content }), } ); if (!response.ok) { throw new Error('Failed to send message'); } return response.json(); }, /** * Send a message and receive streaming updates. * @param {string} conversationId - The conversation ID * @param {string} content - The message content * @param {function} onEvent - Callback function for each event: (eventType, data) => void * @returns {Promise} */ async sendMessageStream(conversationId, content, onEvent) { const response = await fetch( `${API_BASE}/api/conversations/${conversationId}/message/stream`, { method: 'POST', headers: { 'Content-Type': 'application/json', }, body: JSON.stringify({ content }), } ); if (!response.ok) { throw new Error('Failed to send message'); } const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6); try { const event = JSON.parse(data); onEvent(event.type, event); } catch (e) { console.error('Failed to parse SSE event:', e); } } } } }, }; ================================================ FILE: frontend/src/components/ChatInterface.css ================================================ .chat-interface { flex: 1; display: flex; flex-direction: column; height: 100vh; background: #ffffff; } .messages-container { flex: 1; overflow-y: auto; padding: 24px; } .empty-state { display: flex; flex-direction: column; align-items: center; justify-content: center; height: 100%; color: #666; text-align: center; } .empty-state h2 { margin: 0 0 8px 0; font-size: 24px; color: #333; } .empty-state p { margin: 0; font-size: 16px; } .message-group { margin-bottom: 32px; } .user-message, .assistant-message { margin-bottom: 16px; } .message-label { font-size: 12px; font-weight: 600; color: #666; margin-bottom: 8px; text-transform: uppercase; letter-spacing: 0.5px; } .user-message .message-content { background: #f0f7ff; padding: 16px; border-radius: 8px; border: 1px solid #d0e7ff; color: #333; line-height: 1.6; max-width: 80%; white-space: pre-wrap; } .loading-indicator { display: flex; align-items: center; gap: 12px; padding: 16px; color: #666; font-size: 14px; } .stage-loading { display: flex; align-items: center; gap: 12px; padding: 16px; margin: 12px 0; background: #f9fafb; border-radius: 8px; border: 1px solid #e0e0e0; color: #666; font-size: 14px; font-style: italic; } .spinner { width: 20px; height: 20px; border: 2px solid #e0e0e0; border-top-color: #4a90e2; border-radius: 50%; animation: spin 0.8s linear infinite; } @keyframes spin { to { transform: rotate(360deg); } } .input-form { display: flex; align-items: flex-end; gap: 12px; padding: 24px; border-top: 1px solid #e0e0e0; background: #fafafa; } .message-input { flex: 1; padding: 14px; background: #ffffff; border: 1px solid #d0d0d0; border-radius: 8px; color: #333; font-size: 15px; font-family: inherit; line-height: 1.5; outline: none; resize: vertical; min-height: 80px; max-height: 300px; } .message-input:focus { border-color: #4a90e2; box-shadow: 0 0 0 3px rgba(74, 144, 226, 0.1); } .message-input:disabled { opacity: 0.5; cursor: not-allowed; background: #f5f5f5; } .send-button { padding: 14px 28px; background: #4a90e2; border: 1px solid #4a90e2; border-radius: 8px; color: #fff; font-size: 15px; font-weight: 600; cursor: pointer; transition: background 0.2s; white-space: nowrap; align-self: flex-end; } .send-button:hover:not(:disabled) { background: #357abd; border-color: #357abd; } .send-button:disabled { opacity: 0.5; cursor: not-allowed; background: #ccc; border-color: #ccc; } ================================================ FILE: frontend/src/components/ChatInterface.jsx ================================================ import { useState, useEffect, useRef } from 'react'; import ReactMarkdown from 'react-markdown'; import Stage1 from './Stage1'; import Stage2 from './Stage2'; import Stage3 from './Stage3'; import './ChatInterface.css'; export default function ChatInterface({ conversation, onSendMessage, isLoading, }) { const [input, setInput] = useState(''); const messagesEndRef = useRef(null); const scrollToBottom = () => { messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' }); }; useEffect(() => { scrollToBottom(); }, [conversation]); const handleSubmit = (e) => { e.preventDefault(); if (input.trim() && !isLoading) { onSendMessage(input); setInput(''); } }; const handleKeyDown = (e) => { // Submit on Enter (without Shift) if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); handleSubmit(e); } }; if (!conversation) { return (

Welcome to LLM Council

Create a new conversation to get started

); } return (
{conversation.messages.length === 0 ? (

Start a conversation

Ask a question to consult the LLM Council

) : ( conversation.messages.map((msg, index) => (
{msg.role === 'user' ? (
You
{msg.content}
) : (
LLM Council
{/* Stage 1 */} {msg.loading?.stage1 && (
Running Stage 1: Collecting individual responses...
)} {msg.stage1 && } {/* Stage 2 */} {msg.loading?.stage2 && (
Running Stage 2: Peer rankings...
)} {msg.stage2 && ( )} {/* Stage 3 */} {msg.loading?.stage3 && (
Running Stage 3: Final synthesis...
)} {msg.stage3 && }
)}
)) )} {isLoading && (
Consulting the council...
)}
{conversation.messages.length === 0 && (