[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve LocalGPT\ntitle: '[BUG] '\nlabels: 'bug'\nassignees: ''\n\n---\n\n## 🐛 Bug Description\nA clear and concise description of what the bug is.\n\n## 🔄 Steps to Reproduce\n1. Go to '...'\n2. Click on '...'\n3. Scroll down to '...'\n4. See error\n\n## ✅ Expected Behavior\nA clear and concise description of what you expected to happen.\n\n## ❌ Actual Behavior\nA clear and concise description of what actually happened.\n\n## 📸 Screenshots\nIf applicable, add screenshots to help explain your problem.\n\n## 🖥️ Environment Information\n**Desktop/Server:**\n- OS: [e.g. macOS 13.4, Ubuntu 20.04, Windows 11]\n- Python Version: [e.g. 3.11.5]\n- Node.js Version: [e.g. 23.10.0]\n- Ollama Version: [e.g. 0.9.5]\n- Docker Version: [e.g. 24.0.6] (if using Docker)\n\n**Browser (if web interface issue):**\n- Browser: [e.g. Chrome, Safari, Firefox]\n- Version: [e.g. 118.0.0.0]\n\n## 📋 System Health Check\nPlease run `python system_health_check.py` and paste the output:\n\n```\n[Paste system health check output here]\n```\n\n## 📝 Error Logs\nPlease include relevant error messages or logs:\n\n```\n[Paste error logs here]\n```\n\n## 🔧 Configuration\n- Deployment method: [Docker / Direct Python]\n- Models used: [e.g. qwen3:0.6b, qwen3:8b]\n- Document types: [e.g. PDF, DOCX, TXT]\n\n## 📎 Additional Context\nAdd any other context about the problem here.\n\n## 🤔 Possible Solution\nIf you have ideas for fixing the issue, please share them here. "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for LocalGPT\ntitle: '[FEATURE] '\nlabels: 'enhancement'\nassignees: ''\n\n---\n\n## 🚀 Feature Request\n\n### 📝 Is your feature request related to a problem? Please describe.\nA clear and concise description of what the problem is. Ex. I'm always frustrated when [...]\n\n### 💡 Describe the solution you'd like\nA clear and concise description of what you want to happen.\n\n### 🔄 Describe alternatives you've considered\nA clear and concise description of any alternative solutions or features you've considered.\n\n### 🎯 Use Case\nDescribe the specific use case or scenario where this feature would be valuable:\n- Who would use this feature?\n- When would they use it?\n- How would it improve their workflow?\n\n### 📋 Acceptance Criteria\nWhat would need to be implemented for this feature to be considered complete?\n- [ ] Criterion 1\n- [ ] Criterion 2\n- [ ] Criterion 3\n\n### 🏗️ Implementation Ideas\nIf you have ideas about how this could be implemented, please share:\n- Which components would be affected?\n- Any technical considerations?\n- Potential challenges?\n\n### 📊 Priority\nHow important is this feature to you?\n- [ ] Critical - Blocking my use case\n- [ ] High - Would significantly improve my workflow\n- [ ] Medium - Nice to have\n- [ ] Low - Minor improvement\n\n### 📎 Additional Context\nAdd any other context, screenshots, mockups, or examples about the feature request here.\n\n### 🔗 Related Issues\nLink any related issues or discussions: "
  },
  {
    "path": ".github/pull_request_template.md",
    "content": "## 📝 Description\n\nBrief description of what this PR does.\n\nFixes #(issue number) <!-- If applicable -->\n\n## 🎯 Type of Change\n\n- [ ] 🐛 Bug fix (non-breaking change which fixes an issue)\n- [ ] ✨ New feature (non-breaking change which adds functionality)\n- [ ] 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)\n- [ ] 📚 Documentation update\n- [ ] 🧪 Test improvements\n- [ ] 🔧 Code refactoring\n- [ ] 🎨 UI/UX improvements\n\n## 🧪 Testing\n\n### Test Environment\n- [ ] Tested with Docker deployment\n- [ ] Tested with direct Python deployment\n- [ ] Tested on macOS\n- [ ] Tested on Linux\n- [ ] Tested on Windows\n\n### Test Cases\n- [ ] All existing tests pass\n- [ ] New tests added for new functionality\n- [ ] Manual testing completed\n- [ ] System health check passes\n\n```bash\n# Commands used for testing\npython system_health_check.py\npython run_system.py --health\n# Add any specific test commands here\n```\n\n## 📋 Checklist\n\n### Code Quality\n- [ ] Code follows the project's coding standards\n- [ ] Self-review of the code completed\n- [ ] Code is properly commented\n- [ ] Type hints added (Python)\n- [ ] No console.log statements left in production code\n\n### Documentation\n- [ ] Documentation updated (if applicable)\n- [ ] API documentation updated (if applicable)\n- [ ] README updated (if applicable)\n- [ ] CONTRIBUTING.md guidelines followed\n\n### Dependencies\n- [ ] No new dependencies added, or new dependencies are justified\n- [ ] requirements.txt updated (if applicable)\n- [ ] package.json updated (if applicable)\n\n## 🖥️ Screenshots (if applicable)\n\nAdd screenshots to help reviewers understand the changes.\n\n## 📊 Performance Impact\n\nDescribe any performance implications:\n- [ ] No performance impact\n- [ ] Performance improved\n- [ ] Performance may be affected (explain below)\n\n## 🔄 Migration Notes\n\nIf this is a breaking change, describe what users need to do:\n- [ ] No migration needed\n- [ ] Migration steps documented below\n\n## 📎 Additional Notes\n\nAny additional information that reviewers should know. "
  },
  {
    "path": ".gitignore",
    "content": "# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.\n\n# dependencies\n/node_modules\n/.pnp\n.pnp.*\n.yarn/*\n!.yarn/patches\n!.yarn/plugins\n!.yarn/releases\n!.yarn/versions\n\n# testing\n/coverage\n\n# next.js\n/.next/\n/out/\n\n# production\n/build\n\n# misc\n.DS_Store\n*.pem\n\n# debug\nnpm-debug.log*\nyarn-debug.log*\nyarn-error.log*\n.pnpm-debug.log*\n\n# env files (can opt-in for committing if needed)\n.env*\n\n# vercel\n.vercel\n\n# typescript\n*.tsbuildinfo\nnext-env.d.ts\n\n# Python\n__pycache__/\n*.pyc\n\n# Local Data\n/index_store\n/shared_uploads\nchat_history.db\n*.pkl\n\n# Backend generated files\nbackend/shared_uploads/\n\n# Vector DB artefacts\nlancedb/\nindex_store/overviews/\n\n# Logs and runtime output\nlogs/\n*.log\n\n# SQLite or other database files\n*.db\n#backend/*.db\n# backend/chat_history.db\nbackend/chroma_db/\nbackend/chroma_db/**\n\n# Document and user-uploaded files (PDFs, images, etc.)\nrag_system/documents/\n*.pdf\n\n# Ensure docker.env remains tracked\n!docker.env\n!backend/chat_data.db\n\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to LocalGPT\n\nThank you for your interest in contributing to LocalGPT! This guide will help you get started with contributing to our private document intelligence platform.\n\n## 🚀 Quick Start for Contributors\n\n### Prerequisites\n- Python 3.8+ (we test with 3.11.5)\n- Node.js 16+ (we test with 23.10.0)\n- Git\n- Ollama (for local AI models)\n\n### Development Setup\n\n1. **Fork and Clone**\n   ```bash\n   # Fork the repository on GitHub, then clone your fork\n   git clone https://github.com/YOUR_USERNAME/multimodal_rag.git\n   cd multimodal_rag\n   \n   # Add upstream remote\n   git remote add upstream https://github.com/PromtEngineer/multimodal_rag.git\n   ```\n\n2. **Set Up Development Environment**\n   ```bash\n   # Install Python dependencies\n   pip install -r requirements.txt\n   \n   # Install Node.js dependencies\n   npm install\n   \n   # Install Ollama and models\n   curl -fsSL https://ollama.ai/install.sh | sh\n   ollama pull qwen3:0.6b\n   ollama pull qwen3:8b\n   ```\n\n3. **Verify Setup**\n   ```bash\n   # Run health check\n   python system_health_check.py\n   \n   # Start development system\n   python run_system.py --mode dev\n   ```\n\n## 📋 Development Workflow\n\n### Branch Strategy\n\nWe use a feature branch workflow:\n\n- `main` - Production-ready code\n- `docker` - Docker deployment features and documentation\n- `feature/*` - New features\n- `fix/*` - Bug fixes\n- `docs/*` - Documentation updates\n\n### Making Changes\n\n1. **Create a Feature Branch**\n   ```bash\n   # Update your main branch\n   git checkout main\n   git pull upstream main\n   \n   # Create feature branch\n   git checkout -b feature/your-feature-name\n   ```\n\n2. **Make Your Changes**\n   - Follow our [coding standards](#coding-standards)\n   - Write tests for new functionality\n   - Update documentation as needed\n\n3. **Test Your Changes**\n   ```bash\n   # Run health checks\n   python system_health_check.py\n   \n   # Test specific components\n   python -m pytest tests/ -v\n   \n   # Test system integration\n   python run_system.py --health\n   ```\n\n4. **Commit Your Changes**\n   ```bash\n   git add .\n   git commit -m \"feat: add new feature description\"\n   ```\n\n5. **Push and Create PR**\n   ```bash\n   git push origin feature/your-feature-name\n   # Create pull request on GitHub\n   ```\n\n## 🎯 Types of Contributions\n\n### 🐛 Bug Fixes\n- Check existing issues first\n- Include reproduction steps\n- Add tests to prevent regression\n\n### ✨ New Features\n- Discuss in issues before implementing\n- Follow existing architecture patterns\n- Include comprehensive tests\n- Update documentation\n\n### 📚 Documentation\n- Fix typos and improve clarity\n- Add examples and use cases\n- Update API documentation\n- Improve setup guides\n\n### 🧪 Testing\n- Add unit tests\n- Improve integration tests\n- Add performance benchmarks\n- Test edge cases\n\n## 📝 Coding Standards\n\n### Python Code Style\n\nWe follow PEP 8 with some modifications:\n\n```python\n# Use type hints\ndef process_document(file_path: str, config: Dict[str, Any]) -> ProcessingResult:\n    \"\"\"Process a document with the given configuration.\n    \n    Args:\n        file_path: Path to the document file\n        config: Processing configuration dictionary\n        \n    Returns:\n        ProcessingResult object with metadata and chunks\n    \"\"\"\n    pass\n\n# Use descriptive variable names\nembedding_model_name = \"Qwen/Qwen3-Embedding-0.6B\"\nretrieval_results = retriever.search(query, top_k=20)\n\n# Use dataclasses for structured data\n@dataclass\nclass IndexingConfig:\n    embedding_batch_size: int = 50\n    enable_late_chunking: bool = True\n    chunk_size: int = 512\n```\n\n### TypeScript/React Code Style\n\n```typescript\n// Use TypeScript interfaces\ninterface ChatMessage {\n  id: string;\n  content: string;\n  role: 'user' | 'assistant';\n  timestamp: Date;\n  sources?: DocumentSource[];\n}\n\n// Use functional components with hooks\nconst ChatInterface: React.FC<ChatProps> = ({ sessionId }) => {\n  const [messages, setMessages] = useState<ChatMessage[]>([]);\n  \n  const handleSendMessage = useCallback(async (content: string) => {\n    // Implementation\n  }, [sessionId]);\n  \n  return (\n    <div className=\"chat-interface\">\n      {/* Component JSX */}\n    </div>\n  );\n};\n```\n\n### File Organization\n\n```\nrag_system/\n├── agent/           # ReAct agent implementation\n├── indexing/        # Document processing and indexing\n├── retrieval/       # Search and retrieval components\n├── pipelines/       # End-to-end processing pipelines\n├── rerankers/       # Result reranking implementations\n└── utils/           # Shared utilities\n\nsrc/\n├── components/      # React components\n├── lib/            # Utility functions and API clients\n└── app/            # Next.js app router pages\n```\n\n## 🧪 Testing Guidelines\n\n### Unit Tests\n```python\n# Test file: tests/test_embeddings.py\nimport pytest\nfrom rag_system.indexing.embedders import HuggingFaceEmbedder\n\ndef test_embedding_generation():\n    embedder = HuggingFaceEmbedder(\"sentence-transformers/all-MiniLM-L6-v2\")\n    embeddings = embedder.create_embeddings([\"test text\"])\n    \n    assert embeddings.shape[0] == 1\n    assert embeddings.shape[1] == 384  # Model dimension\n    assert embeddings.dtype == np.float32\n```\n\n### Integration Tests\n```python\n# Test file: tests/test_integration.py\ndef test_end_to_end_indexing():\n    \"\"\"Test complete document indexing pipeline.\"\"\"\n    agent = get_agent(\"test\")\n    result = agent.index_documents([\"test_document.pdf\"])\n    \n    assert result.success\n    assert len(result.indexed_chunks) > 0\n```\n\n### Frontend Tests\n```typescript\n// Test file: src/components/__tests__/ChatInterface.test.tsx\nimport { render, screen, fireEvent } from '@testing-library/react';\nimport { ChatInterface } from '../ChatInterface';\n\ntest('sends message when form is submitted', async () => {\n  render(<ChatInterface sessionId=\"test-session\" />);\n  \n  const input = screen.getByPlaceholderText('Type your message...');\n  const button = screen.getByRole('button', { name: /send/i });\n  \n  fireEvent.change(input, { target: { value: 'test message' } });\n  fireEvent.click(button);\n  \n  expect(screen.getByText('test message')).toBeInTheDocument();\n});\n```\n\n## 📖 Documentation Standards\n\n### Code Documentation\n```python\ndef create_index(\n    documents: List[str],\n    config: IndexingConfig,\n    progress_callback: Optional[Callable[[float], None]] = None\n) -> IndexingResult:\n    \"\"\"Create a searchable index from documents.\n    \n    This function processes documents through the complete indexing pipeline:\n    1. Text extraction and chunking\n    2. Embedding generation\n    3. Vector database storage\n    4. BM25 index creation\n    \n    Args:\n        documents: List of document file paths to index\n        config: Indexing configuration with model settings and parameters\n        progress_callback: Optional callback function for progress updates\n        \n    Returns:\n        IndexingResult containing success status, metrics, and any errors\n        \n    Raises:\n        IndexingError: If document processing fails\n        ModelLoadError: If embedding model cannot be loaded\n        \n    Example:\n        >>> config = IndexingConfig(embedding_batch_size=32)\n        >>> result = create_index([\"doc1.pdf\", \"doc2.pdf\"], config)\n        >>> print(f\"Indexed {result.chunk_count} chunks\")\n    \"\"\"\n```\n\n### API Documentation\n```python\n# Use OpenAPI/FastAPI documentation\n@app.post(\"/chat\", response_model=ChatResponse)\nasync def chat_endpoint(request: ChatRequest) -> ChatResponse:\n    \"\"\"Chat with indexed documents.\n    \n    Send a natural language query and receive an AI-generated response\n    based on the indexed document collection.\n    \n    - **query**: The user's question or prompt\n    - **session_id**: Chat session identifier\n    - **search_type**: Type of search (vector, hybrid, bm25)\n    - **retrieval_k**: Number of documents to retrieve\n    \n    Returns a response with the AI-generated answer and source documents.\n    \"\"\"\n```\n\n## 🔧 Development Tools\n\n### Recommended VS Code Extensions\n```json\n{\n  \"recommendations\": [\n    \"ms-python.python\",\n    \"ms-python.pylint\",\n    \"ms-python.black-formatter\",\n    \"bradlc.vscode-tailwindcss\",\n    \"esbenp.prettier-vscode\",\n    \"ms-vscode.vscode-typescript-next\"\n  ]\n}\n```\n\n### Pre-commit Hooks\n```bash\n# Install pre-commit\npip install pre-commit\n\n# Set up hooks\npre-commit install\n\n# Run manually\npre-commit run --all-files\n```\n\n### Development Scripts\n```bash\n# Lint Python code\npython -m pylint rag_system/\n\n# Format Python code\npython -m black rag_system/\n\n# Type check\npython -m mypy rag_system/\n\n# Lint TypeScript\nnpm run lint\n\n# Format TypeScript\nnpm run format\n```\n\n## 🐛 Issue Reporting\n\n### Bug Reports\nWhen reporting bugs, please include:\n\n1. **Environment Information**\n   ```\n   - OS: macOS 13.4\n   - Python: 3.11.5\n   - Node.js: 23.10.0\n   - Ollama: 0.9.5\n   ```\n\n2. **Steps to Reproduce**\n   ```\n   1. Start system with `python run_system.py`\n   2. Upload document via web interface\n   3. Ask question \"What is this document about?\"\n   4. Error occurs during response generation\n   ```\n\n3. **Expected vs Actual Behavior**\n4. **Error Messages and Logs**\n5. **Screenshots (if applicable)**\n\n### Feature Requests\nInclude:\n- **Use Case**: Why is this feature needed?\n- **Proposed Solution**: How should it work?\n- **Alternatives**: What other approaches were considered?\n- **Additional Context**: Any relevant examples or references\n\n## 📦 Release Process\n\n### Version Numbering\nWe use semantic versioning (semver):\n- `MAJOR.MINOR.PATCH`\n- Major: Breaking changes\n- Minor: New features (backward compatible)\n- Patch: Bug fixes\n\n### Release Checklist\n- [ ] All tests pass\n- [ ] Documentation updated\n- [ ] Version bumped in relevant files\n- [ ] Changelog updated\n- [ ] Docker images built and tested\n- [ ] Release notes prepared\n\n## 🤝 Community Guidelines\n\n### Code of Conduct\n- Be respectful and inclusive\n- Focus on constructive feedback\n- Help others learn and grow\n- Maintain professional communication\n\n### Getting Help\n- **GitHub Issues**: For bugs and feature requests\n- **GitHub Discussions**: For questions and general discussion\n- **Documentation**: Check existing docs first\n- **Code Review**: Provide thoughtful, actionable feedback\n\n## 🎯 Project Priorities\n\n### Current Focus Areas\n1. **Performance Optimization**: Improving indexing and retrieval speed\n2. **Model Support**: Adding more embedding and generation models\n3. **User Experience**: Enhancing the web interface\n4. **Documentation**: Improving setup and usage guides\n5. **Testing**: Expanding test coverage\n\n### Architecture Goals\n- **Modularity**: Components should be loosely coupled\n- **Extensibility**: Easy to add new models and features\n- **Performance**: Optimize for speed and memory usage\n- **Reliability**: Robust error handling and recovery\n- **Privacy**: Keep user data secure and local\n\n## 📚 Additional Resources\n\n### Learning Resources\n- [RAG System Architecture Overview](Documentation/architecture_overview.md)\n- [API Reference](Documentation/api_reference.md)\n- [Deployment Guide](Documentation/deployment_guide.md)\n- [Troubleshooting Guide](DOCKER_TROUBLESHOOTING.md)\n\n### External References\n- [LangChain Documentation](https://python.langchain.com/)\n- [Ollama Documentation](https://ollama.ai/docs)\n- [Next.js Documentation](https://nextjs.org/docs)\n- [FastAPI Documentation](https://fastapi.tiangolo.com/)\n\n---\n\n## 🙏 Thank You!\n\nThank you for contributing to LocalGPT! Your contributions help make private document intelligence accessible to everyone.\n\nFor questions about contributing, please:\n1. Check existing documentation\n2. Search existing issues\n3. Create a new issue with the `question` label\n4. Join our community discussions\n\nHappy coding! 🚀 "
  },
  {
    "path": "DOCKER_README.md",
    "content": "# 🐳 LocalGPT Docker Deployment Guide\n\nThis guide covers running LocalGPT using Docker containers with local Ollama for optimal performance.\n\n## 🚀 Quick Start\n\n### Complete Setup (5 Minutes)\n```bash\n# 1. Install Ollama locally\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# 2. Start Ollama server\nollama serve\n\n# 3. Install required models (in another terminal)\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n\n# 4. Clone and start LocalGPT\ngit clone https://github.com/your-org/rag-system.git\ncd rag-system\n./start-docker.sh\n\n# 5. Access the application\nopen http://localhost:3000\n```\n\n## 📋 Prerequisites\n\n- **Docker Desktop** installed and running\n- **Ollama** installed locally (required for best performance)\n- **8GB+ RAM** (16GB recommended for larger models)\n- **10GB+ free disk space**\n\n## 🏗️ Architecture\n\n### Current Setup (Local Ollama + Docker Containers)\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Frontend      │────│    Backend      │────│    RAG API      │\n│  (Container)    │    │  (Container)    │    │  (Container)    │\n│   Port: 3000    │    │   Port: 8000    │    │   Port: 8001    │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                                                        │\n                                                        │ API calls\n                                                        ▼\n                                               ┌─────────────────┐\n                                               │     Ollama      │\n                                               │ (Local/Host)    │\n                                               │   Port: 11434   │\n                                               └─────────────────┘\n```\n\n**Why Local Ollama?**\n- ✅ Better performance (direct GPU access)\n- ✅ Simpler setup (one less container)\n- ✅ Easier model management\n- ✅ More reliable connection\n\n## 🛠️ Container Details\n\n### Frontend Container (rag-frontend)\n- **Image**: Custom Node.js 18 build\n- **Port**: 3000\n- **Purpose**: Next.js web interface\n- **Health Check**: HTTP GET to /\n- **Memory**: ~500MB\n\n### Backend Container (rag-backend) \n- **Image**: Custom Python 3.11 build\n- **Port**: 8000\n- **Purpose**: Session management, chat history, API gateway\n- **Health Check**: HTTP GET to /health\n- **Memory**: ~300MB\n\n### RAG API Container (rag-api)\n- **Image**: Custom Python 3.11 build\n- **Port**: 8001\n- **Purpose**: Document indexing, retrieval, AI processing\n- **Health Check**: HTTP GET to /models\n- **Memory**: ~2GB (varies with model usage)\n\n## 📂 Volume Mounts & Data\n\n### Persistent Data\n- `./lancedb/` → Vector database storage\n- `./index_store/` → Document indexes and metadata\n- `./shared_uploads/` → Uploaded document files\n- `./backend/chat_data.db` → SQLite chat history database\n\n### Shared Between Containers\nAll containers share access to document storage and databases through bind mounts.\n\n## 🔧 Configuration\n\n### Environment Variables (docker.env)\n```bash\n# Ollama Configuration\nOLLAMA_HOST=http://host.docker.internal:11434\n\n# Service Configuration  \nNODE_ENV=production\nRAG_API_URL=http://rag-api:8001\nNEXT_PUBLIC_API_URL=http://localhost:8000\n\n# Database Paths (inside containers)\nDATABASE_PATH=/app/backend/chat_data.db\nLANCEDB_PATH=/app/lancedb\nUPLOADS_PATH=/app/shared_uploads\n```\n\n### Model Configuration\nThe system uses these models by default:\n- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)\n- **Generation**: `qwen3:0.6b` (fast) or `qwen3:8b` (high quality)\n- **Reranking**: Built-in cross-encoder\n\n## 🎯 Management Commands\n\n### Start/Stop Services\n```bash\n# Start all services\n./start-docker.sh\n\n# Stop all services\n./start-docker.sh stop\n\n# Restart services\n./start-docker.sh stop && ./start-docker.sh\n```\n\n### Monitor Services\n```bash\n# Check container status\n./start-docker.sh status\ndocker compose ps\n\n# View live logs\n./start-docker.sh logs\ndocker compose logs -f\n\n# View specific service logs\ndocker compose logs -f rag-api\ndocker compose logs -f backend\ndocker compose logs -f frontend\n```\n\n### Manual Docker Compose\n```bash\n# Start manually\ndocker compose --env-file docker.env up --build -d\n\n# Stop manually\ndocker compose down\n\n# Rebuild specific service\ndocker compose build --no-cache rag-api\ndocker compose up -d rag-api\n```\n\n### Health Checks\n```bash\n# Test all endpoints\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n```\n\n## 🐞 Debugging\n\n### Access Container Shells\n```bash\n# RAG API container (most debugging happens here)\ndocker compose exec rag-api bash\n\n# Backend container\ndocker compose exec backend bash\n\n# Frontend container\ndocker compose exec frontend sh\n```\n\n### Common Debug Commands\n```bash\n# Test RAG system initialization\ndocker compose exec rag-api python -c \"\nfrom rag_system.main import get_agent\nagent = get_agent('default')\nprint('✅ RAG System OK')\n\"\n\n# Test Ollama connection from container\ndocker compose exec rag-api curl http://host.docker.internal:11434/api/tags\n\n# Check environment variables\ndocker compose exec rag-api env | grep OLLAMA\n\n# View Python packages\ndocker compose exec rag-api pip list | grep -E \"(torch|transformers|lancedb)\"\n```\n\n### Resource Monitoring\n```bash\n# Monitor container resources\ndocker stats\n\n# Check disk usage\ndocker system df\ndf -h ./lancedb ./shared_uploads\n\n# Check memory usage by service\ndocker stats --format \"table {{.Name}}\\t{{.CPUPerc}}\\t{{.MemUsage}}\\t{{.MemPerc}}\"\n```\n\n## 🚨 Troubleshooting\n\n### Common Issues\n\n#### Container Won't Start\n```bash\n# Check logs for specific error\ndocker compose logs [service-name]\n\n# Rebuild from scratch\n./start-docker.sh stop\ndocker system prune -f\n./start-docker.sh\n\n# Check for port conflicts\nlsof -i :3000 -i :8000 -i :8001\n```\n\n#### Can't Connect to Ollama\n```bash\n# Verify Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Restart Ollama\npkill ollama\nollama serve\n\n# Test from container\ndocker compose exec rag-api curl http://host.docker.internal:11434/api/tags\n```\n\n#### Memory Issues\n```bash\n# Check memory usage\ndocker stats --no-stream\nfree -h  # On host\n\n# Increase Docker memory limit\n# Docker Desktop → Settings → Resources → Memory → 8GB+\n\n# Use smaller models\nollama pull qwen3:0.6b  # Instead of qwen3:8b\n```\n\n#### Frontend Build Errors\n```bash\n# Clean build\ndocker compose build --no-cache frontend\ndocker compose up -d frontend\n\n# Check frontend logs\ndocker compose logs frontend\n```\n\n#### Database/Storage Issues\n```bash\n# Check file permissions\nls -la backend/chat_data.db\nls -la lancedb/\n\n# Reset permissions\nchmod 664 backend/chat_data.db\nchmod -R 755 lancedb/ shared_uploads/\n\n# Test database access\ndocker compose exec backend sqlite3 /app/backend/chat_data.db \".tables\"\n```\n\n### Performance Issues\n\n#### Slow Response Times\n- Use faster models: `qwen3:0.6b` instead of `qwen3:8b`\n- Increase Docker memory allocation\n- Ensure SSD storage for databases\n- Monitor with `docker stats`\n\n#### High Memory Usage\n- Reduce batch sizes in configuration\n- Use smaller embedding models\n- Clear unused Docker resources: `docker system prune`\n\n### Complete Reset\n```bash\n# Nuclear option - reset everything\n./start-docker.sh stop\ndocker system prune -a --volumes\nrm -rf lancedb/* shared_uploads/* backend/chat_data.db\n./start-docker.sh\n```\n\n## 🏆 Success Criteria\n\nYour Docker deployment is successful when:\n\n- ✅ `./start-docker.sh status` shows all containers healthy\n- ✅ All health checks pass (see commands above)  \n- ✅ You can access http://localhost:3000\n- ✅ You can upload documents and create indexes\n- ✅ You can chat with your documents\n- ✅ No errors in container logs\n\n### Performance Benchmarks\n\n**Good Performance:**\n- Container startup: < 2 minutes\n- Index creation: < 2 min per 100MB document\n- Query response: < 30 seconds\n- Memory usage: < 4GB total containers\n\n**Optimal Performance:**\n- Container startup: < 1 minute\n- Index creation: < 1 min per 100MB document  \n- Query response: < 10 seconds\n- Memory usage: < 2GB total containers\n\n## 📚 Additional Resources\n\n- **Detailed Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`\n- **Complete Documentation**: See `Documentation/docker_usage.md`\n- **System Architecture**: See `Documentation/architecture_overview.md`\n- **Direct Development**: See main `README.md` for non-Docker setup\n\n---\n\n**Happy Dockerizing! 🐳** Need help? Check the troubleshooting guide or open an issue. "
  },
  {
    "path": "DOCKER_TROUBLESHOOTING.md",
    "content": "# 🐳 Docker Troubleshooting Guide - LocalGPT\n\n_Last updated: 2025-01-07_\n\nThis guide helps diagnose and fix Docker-related issues with LocalGPT's containerized deployment.\n\n---\n\n## 🏁 Quick Health Check\n\n### System Status Check\n```bash\n# Check Docker daemon\ndocker version\n\n# Check Ollama status  \ncurl http://localhost:11434/api/tags\n\n# Check containers\n./start-docker.sh status\n\n# Test all endpoints\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n```\n\n### Expected Success Output\n```\n✅ Frontend OK\n✅ Backend OK\n✅ RAG API OK\n✅ Ollama OK\n```\n\n---\n\n## 🚨 Common Issues & Solutions\n\n### 1. Docker Daemon Issues\n\n#### Problem: \"Cannot connect to Docker daemon\"\n```\nCannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\n```\n\n#### Solution A: Restart Docker Desktop (macOS/Windows)\n```bash\n# Quit Docker Desktop completely\n# macOS: Click Docker icon → \"Quit Docker Desktop\"\n# Windows: Right-click Docker icon → \"Quit Docker Desktop\"\n\n# Wait for it to fully shut down\nsleep 10\n\n# Start Docker Desktop\nopen -a Docker  # macOS\n# Windows: Click Docker Desktop from Start menu\n\n# Wait for Docker to be ready (2-3 minutes)\ndocker version\n```\n\n#### Solution B: Linux Docker Service\n```bash\n# Check Docker service status\nsudo systemctl status docker\n\n# Restart Docker service\nsudo systemctl restart docker\n\n# Enable auto-start\nsudo systemctl enable docker\n\n# Test connection\ndocker version\n```\n\n#### Solution C: Hard Reset\n```bash\n# Kill all Docker processes\nsudo pkill -f docker\n\n# Remove socket files\nsudo rm -f /var/run/docker.sock\nsudo rm -f /Users/prompt/.docker/run/docker.sock  # macOS\n\n# Restart Docker Desktop\nopen -a Docker  # macOS\n```\n\n### 2. Ollama Connection Issues\n\n#### Problem: RAG API can't connect to Ollama\n```\nConnectionError: Failed to connect to Ollama at http://host.docker.internal:11434\n```\n\n#### Solution A: Verify Ollama is Running\n```bash\n# Check if Ollama is running\ncurl http://localhost:11434/api/tags\n\n# If not running, start it\nollama serve\n\n# Install required models\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n#### Solution B: Test from Container\n```bash\n# Test Ollama connection from RAG API container\ndocker compose exec rag-api curl http://host.docker.internal:11434/api/tags\n\n# If this fails, check Docker network settings\ndocker network ls\ndocker network inspect rag_system_old_default\n```\n\n#### Solution C: Alternative Ollama Host\n```bash\n# Edit docker.env to use different host\necho \"OLLAMA_HOST=http://172.17.0.1:11434\" >> docker.env\n\n# Or use IP address\necho \"OLLAMA_HOST=http://$(ipconfig getifaddr en0):11434\" >> docker.env  # macOS\n```\n\n### 3. Container Build Failures\n\n#### Problem: Frontend build fails\n```\nERROR: Failed to build frontend container\n```\n\n#### Solution: Clean Build\n```bash\n# Stop containers\n./start-docker.sh stop\n\n# Clean Docker cache\ndocker system prune -f\ndocker builder prune -f\n\n# Rebuild frontend only\ndocker compose build --no-cache frontend\ndocker compose up -d frontend\n\n# Check logs\ndocker compose logs frontend\n```\n\n#### Problem: Python package installation fails\n```\nERROR: Could not install packages due to an EnvironmentError\n```\n\n#### Solution: Update Dependencies\n```bash\n# Check requirements file exists\nls -la requirements-docker.txt\n\n# Test package installation locally\npip install -r requirements-docker.txt --dry-run\n\n# Rebuild with updated base image\ndocker compose build --no-cache --pull rag-api\n```\n\n### 4. Port Conflicts\n\n#### Problem: \"Port already in use\"\n```\nError starting userland proxy: listen tcp4 0.0.0.0:3000: bind: address already in use\n```\n\n#### Solution: Find and Kill Conflicting Processes\n```bash\n# Check what's using the ports\nlsof -i :3000 -i :8000 -i :8001\n\n# Kill specific processes\npkill -f \"npm run dev\"      # Frontend\npkill -f \"server.py\"        # Backend\npkill -f \"api_server\"       # RAG API\n\n# Or kill by port\nsudo kill -9 $(lsof -t -i:3000)\nsudo kill -9 $(lsof -t -i:8000)\nsudo kill -9 $(lsof -t -i:8001)\n\n# Restart containers\n./start-docker.sh\n```\n\n### 5. Memory Issues\n\n#### Problem: Containers crash due to OOM (Out of Memory)\n```\nContainer killed due to memory limit\n```\n\n#### Solution: Increase Docker Memory\n```bash\n# Check current memory usage\ndocker stats --no-stream\n\n# Increase Docker Desktop memory allocation\n# Docker Desktop → Settings → Resources → Memory → 8GB+\n\n# Monitor memory usage\ndocker stats\n\n# Use smaller models if needed\nollama pull qwen3:0.6b  # Instead of qwen3:8b\n```\n\n#### Problem: System running slow\n```bash\n# Check host memory\nfree -h  # Linux\nvm_stat  # macOS\n\n# Clean up Docker resources\ndocker system prune -f\ndocker volume prune -f\n```\n\n### 6. Volume Mount Issues\n\n#### Problem: Permission denied accessing files\n```\nPermission denied: /app/lancedb\n```\n\n#### Solution: Fix Permissions\n```bash\n# Create directories if they don't exist\nmkdir -p lancedb index_store shared_uploads backend\n\n# Fix permissions\nchmod -R 755 lancedb index_store shared_uploads\nchmod 664 backend/chat_data.db\n\n# Check ownership\nls -la lancedb/ shared_uploads/ backend/\n\n# Reset permissions if needed\nsudo chown -R $USER:$USER lancedb shared_uploads backend\n```\n\n#### Problem: Database file not found\n```\nNo such file or directory: '/app/backend/chat_data.db'\n```\n\n#### Solution: Initialize Database\n```bash\n# Create empty database file\ntouch backend/chat_data.db\n\n# Or initialize with schema\npython -c \"\nfrom backend.database import ChatDatabase\ndb = ChatDatabase()\ndb.init_database()\nprint('Database initialized')\n\"\n\n# Restart containers\n./start-docker.sh stop\n./start-docker.sh\n```\n\n---\n\n## 🔍 Advanced Debugging\n\n### Container-Level Debugging\n\n#### Access Container Shells\n```bash\n# RAG API container (most issues happen here)\ndocker compose exec rag-api bash\n\n# Check environment variables\ndocker compose exec rag-api env | grep -E \"(OLLAMA|RAG|NODE)\"\n\n# Test Python imports\ndocker compose exec rag-api python -c \"\nimport sys\nprint('Python version:', sys.version)\nfrom rag_system.main import get_agent\nprint('✅ RAG system imports work')\n\"\n\n# Backend container\ndocker compose exec backend bash\npython -c \"\nfrom backend.database import ChatDatabase\nprint('✅ Database imports work')\n\"\n\n# Frontend container  \ndocker compose exec frontend sh\nnpm --version\nnode --version\n```\n\n#### Check Container Resources\n```bash\n# Monitor real-time resource usage\ndocker stats\n\n# Check individual container health\ndocker compose ps\ndocker inspect rag-api --format='{{.State.Health.Status}}'\n\n# View container configurations\ndocker compose config\n```\n\n#### Network Debugging\n```bash\n# Check network connectivity\ndocker compose exec rag-api ping backend\ndocker compose exec backend ping rag-api\ndocker compose exec rag-api ping host.docker.internal\n\n# Check DNS resolution\ndocker compose exec rag-api nslookup host.docker.internal\n\n# Test HTTP connections\ndocker compose exec rag-api curl -v http://backend:8000/health\ndocker compose exec rag-api curl -v http://host.docker.internal:11434/api/tags\n```\n\n### Log Analysis\n\n#### Container Logs\n```bash\n# View all logs\n./start-docker.sh logs\n\n# Follow specific service logs\ndocker compose logs -f rag-api\ndocker compose logs -f backend\ndocker compose logs -f frontend\n\n# Search for errors\ndocker compose logs rag-api 2>&1 | grep -i error\ndocker compose logs backend 2>&1 | grep -i \"traceback\\|error\"\n\n# Save logs to file\ndocker compose logs > docker-debug.log 2>&1\n```\n\n#### System Logs\n```bash\n# Docker daemon logs (Linux)\njournalctl -u docker.service -f\n\n# macOS: Check Console app for Docker logs\n# Windows: Check Event Viewer\n```\n\n---\n\n## 🧪 Testing & Validation\n\n### Manual Container Testing\n\n#### Test Individual Containers\n```bash\n# Test RAG API alone\ndocker build -f Dockerfile.rag-api -t test-rag-api .\ndocker run --rm -p 8001:8001 -e OLLAMA_HOST=http://host.docker.internal:11434 test-rag-api &\nsleep 30\ncurl http://localhost:8001/models\npkill -f test-rag-api\n\n# Test Backend alone\ndocker build -f Dockerfile.backend -t test-backend .\ndocker run --rm -p 8000:8000 test-backend &\nsleep 30\ncurl http://localhost:8000/health\npkill -f test-backend\n```\n\n#### Integration Testing\n```bash\n# Full system test\n./start-docker.sh\n\n# Wait for all services to be ready\nsleep 60\n\n# Test complete workflow\ncurl -X POST http://localhost:8000/sessions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"title\": \"Test Session\"}'\n\n# Test document upload (if you have a test PDF)\n# curl -X POST http://localhost:8000/upload -F \"file=@test.pdf\"\n\n# Clean up\n./start-docker.sh stop\n```\n\n### Automated Testing Script\n\nCreate `test-docker-health.sh`:\n```bash\n#!/bin/bash\nset -e\n\necho \"🐳 Docker Health Test Starting...\"\n\n# Start containers\n./start-docker.sh\n\n# Wait for services\necho \"⏳ Waiting for services to start...\"\nsleep 60\n\n# Test endpoints\necho \"🔍 Testing endpoints...\"\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\" || echo \"❌ Frontend FAIL\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\" || echo \"❌ Backend FAIL\"  \ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\" || echo \"❌ RAG API FAIL\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\" || echo \"❌ Ollama FAIL\"\n\n# Test container health\necho \"🔍 Checking container health...\"\ndocker compose ps\n\necho \"🎉 Health test complete!\"\n```\n\n---\n\n## 🔄 Recovery Procedures\n\n### Complete System Reset\n\n#### Soft Reset\n```bash\n# Stop containers\n./start-docker.sh stop\n\n# Clean up Docker resources\ndocker system prune -f\n\n# Restart containers\n./start-docker.sh\n```\n\n#### Hard Reset (⚠️ Deletes all data)\n```bash\n# Stop everything\n./start-docker.sh stop\n\n# Remove all containers, images, and volumes\ndocker system prune -a --volumes\n\n# Remove local data (CAUTION: This deletes all your documents and chat history)\nrm -rf lancedb/* shared_uploads/* backend/chat_data.db\n\n# Rebuild from scratch\n./start-docker.sh\n```\n\n#### Selective Reset\n\nReset only specific components:\n```bash\n# Reset just the database\n./start-docker.sh stop\nrm backend/chat_data.db\n./start-docker.sh\n\n# Reset just vector storage\n./start-docker.sh stop\nrm -rf lancedb/*\n./start-docker.sh\n\n# Reset just uploaded documents\nrm -rf shared_uploads/*\n```\n\n---\n\n## 📊 Performance Optimization\n\n### Resource Monitoring\n```bash\n# Monitor containers continuously\nwatch -n 5 'docker stats --no-stream'\n\n# Check disk usage\ndocker system df\ndu -sh lancedb shared_uploads backend\n\n# Monitor host resources\nhtop  # Linux\ntop   # macOS/Windows\n```\n\n### Performance Tuning\n```bash\n# Use smaller models for better performance\nollama pull qwen3:0.6b  # Instead of qwen3:8b\n\n# Reduce Docker memory if needed\n# Docker Desktop → Settings → Resources → Memory\n\n# Clean up regularly\ndocker system prune -f\ndocker volume prune -f\n```\n\n---\n\n## 🆘 When All Else Fails\n\n### Alternative Deployment Options\n\n#### 1. Direct Development (No Docker)\n```bash\n# Stop Docker containers\n./start-docker.sh stop\n\n# Use direct development instead\npython run_system.py\n```\n\n#### 2. Minimal Docker (RAG API only)\n```bash\n# Run only RAG API in Docker\ndocker build -f Dockerfile.rag-api -t rag-api .\ndocker run -p 8001:8001 rag-api\n\n# Run other components directly\ncd backend && python server.py &\nnpm run dev\n```\n\n#### 3. Hybrid Approach\n```bash\n# Run some services in Docker, others directly\ndocker compose up -d rag-api\ncd backend && python server.py &\nnpm run dev\n```\n\n### Getting Help\n\n#### Diagnostic Information to Collect\n```bash\n# System information\ndocker version\ndocker compose version\nuname -a\n\n# Container information\ndocker compose ps\ndocker compose config\n\n# Resource information\ndocker stats --no-stream\ndocker system df\n\n# Error logs\ndocker compose logs > docker-errors.log 2>&1\n```\n\n#### Support Channels\n1. **Check GitHub Issues**: Search existing issues for similar problems\n2. **Documentation**: Review the complete documentation in `Documentation/`\n3. **Create Issue**: Include diagnostic information above\n\n---\n\n## ✅ Success Checklist\n\nYour Docker deployment is working correctly when:\n\n- ✅ `docker version` shows Docker is running\n- ✅ `curl http://localhost:11434/api/tags` shows Ollama is accessible\n- ✅ `./start-docker.sh status` shows all containers healthy\n- ✅ All health check URLs return 200 OK\n- ✅ You can access the frontend at http://localhost:3000\n- ✅ You can create document indexes successfully\n- ✅ You can chat with your documents\n- ✅ No error messages in container logs\n\n**If all boxes are checked, your Docker deployment is successful! 🎉**\n\n---\n\n**Still having issues?** Check the main `DOCKER_README.md` or create an issue with your diagnostic information. "
  },
  {
    "path": "Dockerfile.backend",
    "content": "FROM python:3.11-slim\n\n# Set working directory\nWORKDIR /app\n\n# Install system dependencies\nRUN apt-get update && apt-get install -y \\\n    curl \\\n    && rm -rf /var/lib/apt/lists/*\n\n# Copy requirements and install Python dependencies (using Docker-specific requirements)\nCOPY requirements-docker.txt ./requirements.txt\nRUN pip install --no-cache-dir -r requirements.txt\n\n# Copy backend code and dependencies\nCOPY backend/ ./backend/\nCOPY rag_system/ ./rag_system/\n\n# Create necessary directories and initialize database\nRUN mkdir -p shared_uploads logs backend\n\n# Expose port\nEXPOSE 8000\n\n# Health check\nHEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \\\n    CMD curl -f http://localhost:8000/health || exit 1\n\n# Run the backend server\nWORKDIR /app/backend\nCMD [\"python\", \"server.py\"]  "
  },
  {
    "path": "Dockerfile.frontend",
    "content": "FROM node:18-alpine\n\n# Set working directory\nWORKDIR /app\n\n# Install dependencies (including dev dependencies for build)\nCOPY package.json package-lock.json ./\nRUN npm ci\n\n# Copy source code and configuration files\nCOPY src/ ./src/\nCOPY public/ ./public/\nCOPY next.config.ts ./\nCOPY tsconfig.json ./\nCOPY tailwind.config.js ./\nCOPY postcss.config.mjs ./\nCOPY eslint.config.mjs ./\n\n# Build the application (skip linting for Docker)\nENV NEXT_LINT=false\nRUN npm run build\n\n# Expose port\nEXPOSE 3000\n\n# Health check\nHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \\\n    CMD curl -f http://localhost:3000 || exit 1\n\n# Start the application\nCMD [\"npm\", \"start\"] "
  },
  {
    "path": "Dockerfile.rag-api",
    "content": "FROM python:3.11-slim\n\n# Set working directory\nWORKDIR /app\n\n# Install system dependencies\nRUN apt-get update && apt-get install -y \\\n    curl \\\n    build-essential \\\n    && rm -rf /var/lib/apt/lists/*\n\n# Copy requirements and install Python dependencies (using Docker-specific requirements)\nCOPY requirements-docker.txt ./requirements.txt\nRUN pip install --no-cache-dir -r requirements.txt\n\n# Copy RAG system code and backend dependencies\nCOPY rag_system/ ./rag_system/\nCOPY backend/ ./backend/\n\n# Create necessary directories\nRUN mkdir -p lancedb index_store shared_uploads logs\n\n# Expose port\nEXPOSE 8001\n\n# Health check\nHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \\\n    CMD curl -f http://localhost:8001/models || exit 1\n\n# Run the RAG API server\nCMD [\"python\", \"-m\", \"rag_system.api_server\"] "
  },
  {
    "path": "Documentation/api_reference.md",
    "content": "# 📚 API Reference (Backend & RAG API)\n\n_Last updated: 2025-01-07_\n\n---\n\n## Backend HTTP API (Python `backend/server.py`)\n**Base URL**: `http://localhost:8000`\n\n| Endpoint | Method | Description | Request Body | Success Response |\n|----------|--------|-------------|--------------|------------------|\n| `/health` | GET | Health probe incl. Ollama status & DB stats | – | 200 JSON `{ status, ollama_running, available_models, database_stats }` |\n| `/chat` | POST | Stateless chat (no session) | `{ message:str, model?:str, conversation_history?:[{role,content}]}` | 200 `{ response:str, model:str, message_count:int }` |\n| `/sessions` | GET | List all sessions | – | `{ sessions:ChatSession[], total:int }` |\n| `/sessions` | POST | Create session | `{ title?:str, model?:str }` | 201 `{ session:ChatSession, session_id }` |\n| `/sessions/<id>` | GET | Get session + msgs | – | `{ session, messages }` |\n| `/sessions/<id>` | DELETE | Delete session | – | `{ message, deleted_session_id }` |\n| `/sessions/<id>/rename` | POST | Rename session | `{ title:str }` | `{ message, session }` |\n| `/sessions/<id>/messages` | POST | Session chat (builds history) | See ChatRequest + retrieval opts ▼ | `{ response, session, user_message_id, ai_message_id }` |\n| `/sessions/<id>/documents` | GET | List uploaded docs | – | `{ files:string[], file_count:int, session }` |\n| `/sessions/<id>/upload` | POST multipart | Upload docs to session | field `files[]` | `{ message, uploaded_files, processing_results?, session_documents?, total_session_documents? }` |\n| `/sessions/<id>/index` | POST | Trigger RAG indexing for session | `{ latechunk?, doclingChunk?, chunkSize?, ... }` | `{ message }` |\n| `/sessions/<id>/indexes` | GET | List indexes linked to session | – | `{ indexes, total }` |\n| `/sessions/<sid>/indexes/<idxid>` | POST | Link index to session | – | `{ message }` |\n| `/sessions/cleanup` | GET | Remove empty sessions | – | `{ message, cleanup_count }` |\n| `/models` | GET | List generation / embedding models | – | `{ generation_models:str[], embedding_models:str[] }` |\n| `/indexes` | GET | List all indexes | – | `{ indexes, total }` |\n| `/indexes` | POST | Create index | `{ name:str, description?:str, metadata?:dict }` | `{ index_id }` |\n| `/indexes/<id>` | GET | Get single index | – | `{ index }` |\n| `/indexes/<id>` | DELETE | Delete index | – | `{ message, index_id }` |\n| `/indexes/<id>/upload` | POST multipart | Upload docs to index | field `files[]` | `{ message, uploaded_files }` |\n| `/indexes/<id>/build` | POST | Build / rebuild index (RAG) | `{ latechunk?, doclingChunk?, ...}` | 200 `{ response?, message?}` (idempotent) |\n\n---\n\n## RAG API (Python `rag_system/api_server.py`)\n**Base URL**: `http://localhost:8001`\n\n| Endpoint | Method | Description | Request Body | Success Response |\n|----------|--------|-------------|--------------|------------------|\n| `/chat` | POST | Run RAG query with full pipeline | See RAG ChatRequest ▼ | `{ answer:str, source_documents:[], reasoning?:str, confidence?:float }` |\n| `/chat/stream` | POST | Run RAG query with SSE streaming | Same as /chat | Server-Sent Events stream |\n| `/index` | POST | Index documents with full configuration | See Index Request ▼ | `{ message:str, indexed_files:[], table_name:str }` |\n| `/models` | GET | List available models | – | `{ generation_models:str[], embedding_models:str[] }` |\n\n### RAG ChatRequest (Advanced Options)\n```jsonc\n{\n  \"query\": \"string\",                    // Required – user question\n  \"session_id\": \"string\",               // Optional – for session context\n  \"table_name\": \"string\",               // Optional – specific index table\n  \"compose_sub_answers\": true,          // Optional – compose sub-answers \n  \"query_decompose\": true,              // Optional – decompose complex queries\n  \"ai_rerank\": false,                   // Optional – AI-powered reranking\n  \"context_expand\": false,              // Optional – context expansion\n  \"verify\": true,                       // Optional – answer verification\n  \"retrieval_k\": 20,                    // Optional – number of chunks to retrieve\n  \"context_window_size\": 1,             // Optional – context window size\n  \"reranker_top_k\": 10,                 // Optional – top-k after reranking\n  \"search_type\": \"hybrid\",              // Optional – \"hybrid|dense|fts\"\n  \"dense_weight\": 0.7,                  // Optional – dense search weight (0-1)\n  \"force_rag\": false,                   // Optional – bypass triage, force RAG\n  \"provence_prune\": false,              // Optional – sentence-level pruning\n  \"provence_threshold\": 0.8,            // Optional – pruning threshold\n  \"model\": \"qwen3:8b\"                   // Optional – generation model override\n}\n```\n\n### Index Request (Document Indexing)\n```jsonc\n{\n  \"file_paths\": [\"path1.pdf\", \"path2.pdf\"],  // Required – files to index\n  \"session_id\": \"string\",                     // Required – session identifier\n  \"chunk_size\": 512,                          // Optional – chunk size (default: 512)\n  \"chunk_overlap\": 64,                        // Optional – chunk overlap (default: 64)\n  \"enable_latechunk\": true,                   // Optional – enable late chunking\n  \"enable_docling_chunk\": false,              // Optional – enable DocLing chunking\n  \"retrieval_mode\": \"hybrid\",                 // Optional – \"hybrid|dense|fts\"\n  \"window_size\": 2,                           // Optional – context window\n  \"enable_enrich\": true,                      // Optional – enable enrichment\n  \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",  // Optional – embedding model\n  \"enrich_model\": \"qwen3:0.6b\",               // Optional – enrichment model\n  \"overview_model_name\": \"qwen3:0.6b\",        // Optional – overview model\n  \"batch_size_embed\": 50,                     // Optional – embedding batch size\n  \"batch_size_enrich\": 25                     // Optional – enrichment batch size\n}\n```\n\n> **Note on CORS** – All endpoints include `Access-Control-Allow-Origin: *` header.\n\n---\n\n## Frontend Wrapper (`src/lib/api.ts`)\nThe React/Next.js frontend calls the backend via a typed wrapper. Important methods & payloads:\n\n| Method | Backend Endpoint | Payload Shape |\n|--------|------------------|---------------|\n| `checkHealth()` | `/health` | – |\n| `sendMessage({ message, model?, conversation_history? })` | `/chat` | ChatRequest |\n| `getSessions()` | `/sessions` | – |\n| `createSession(title?, model?)` | `/sessions` | – |\n| `getSession(sessionId)` | `/sessions/<id>` | – |\n| `sendSessionMessage(sessionId, message, opts)` | `/sessions/<id>/messages` | `ChatRequest + retrieval opts` |\n| `uploadFiles(sessionId, files[])` | `/sessions/<id>/upload` | multipart |\n| `indexDocuments(sessionId)` | `/sessions/<id>/index` | opts similar to buildIndex |\n| `buildIndex(indexId, opts)` | `/indexes/<id>/build` | Index build options |\n| `linkIndexToSession` | `/sessions/<sid>/indexes/<idx>` | – |\n\n---\n\n## Payload Definitions (Canonical)\n\n### ChatRequest (frontend ⇄ backend)\n```jsonc\n{\n  \"message\": \"string\",              // Required – raw user text\n  \"model\": \"string\",                // Optional – generation model id\n  \"conversation_history\": [         // Optional – prior turn list\n    { \"role\": \"user|assistant\", \"content\": \"string\" }\n  ]\n}\n```\n\n### Session Chat Extended Options\n```jsonc\n{\n  \"composeSubAnswers\": true,\n  \"decompose\": true,\n  \"aiRerank\": false,\n  \"contextExpand\": false,\n  \"verify\": true,\n  \"retrievalK\": 10,\n  \"contextWindowSize\": 5,\n  \"rerankerTopK\": 20,\n  \"searchType\": \"fts|hybrid|dense\",\n  \"denseWeight\": 0.75,\n  \"force_rag\": false\n}\n```\n\n### Index Build Options\n```jsonc\n{\n  \"latechunk\": true,\n  \"doclingChunk\": false,\n  \"chunkSize\": 512,\n  \"chunkOverlap\": 64,\n  \"retrievalMode\": \"hybrid|dense|fts\",\n  \"windowSize\": 2,\n  \"enableEnrich\": true,\n  \"embeddingModel\": \"Qwen/Qwen3-Embedding-0.6B\",\n  \"enrichModel\": \"qwen3:0.6b\",\n  \"overviewModel\": \"qwen3:0.6b\",\n  \"batchSizeEmbed\": 64,\n  \"batchSizeEnrich\": 32\n}\n```\n\n---\n\n_This reference is derived from static code analysis of `backend/server.py`, `rag_system/api_server.py`, and `src/lib/api.ts`. Keep it in sync with route or type changes._ "
  },
  {
    "path": "Documentation/architecture_overview.md",
    "content": "# 🏗️ System Architecture Overview\n\n_Last updated: 2025-07-06_\n\nThis document explains how data and control flow through the Advanced **RAG System** — from a user's browser all the way to model inference and back.  It is intended as the **ground-truth reference** for engineers and integrators.\n\n---\n\n## 1. Bird's-Eye Diagram\n\n```mermaid\nflowchart LR\n    subgraph Client\n        U[\"👤  User (Browser)\"]\n        FE[\"Next.js Front-end\\nReact Components\"]\n        U --> FE\n    end\n\n    subgraph Network\n        FE -->|HTTP/JSON| BE[\"Python HTTP Server\\nbackend/server.py\"]\n    end\n\n    subgraph Core[\"rag_system core package\"]\n        BE --> LOOP[\"Agent Loop\\n(rag_system/agent/loop.py)\"]\n        BE --> IDX[\"Indexing Pipeline\\n(pipelines/indexing_pipeline.py)\"]\n\n        LOOP --> RP[\"Retrieval Pipeline\\n(pipelines/retrieval_pipeline.py)\"]\n        LOOP --> VER[\"Verifier (Grounding Check)\"]\n        RP --> RET[\"Retrievers\\nBM25 | Dense | Hybrid\"]\n        RP --> RER[\"AI Reranker\"]\n        RP --> SYNT[\"Answer Synthesiser\"]\n    end\n\n    subgraph Storage\n        LDB[(\"LanceDB Vector Tables\")]\n        SQL[(\"SQLite – chat & metadata\")]\n    end\n\n    subgraph Models\n        OLLAMA[\"Ollama Server\\n(qwen3, etc.)\"]\n        HF[\"HuggingFace Hosted\\nEmbedding/Reranker Models\"]\n    end\n\n    %% data edges\n    IDX -->|chunks & embeddings| LDB\n    RET -->|vector search| LDB\n    LOOP -->|LLM calls| OLLAMA\n    RP -->|LLM calls| OLLAMA\n    VER -->|LLM calls| OLLAMA\n    RP -->|rerank| HF\n\n    BE -->|CRUD| SQL\n```\n\n---\n\n### Data-flow Narrative\n1. **User** interacts with the Next.js UI; messages are posted via `src/lib/api.ts`.\n2. **backend/server.py** receives JSON over HTTP, applies CORS, and proxies the request into `rag_system`.\n3. **Agent Loop** decides (via _Triage_) whether to perform Retrieval-Augmented Generation (RAG) or direct LLM answering.\n4. If RAG is chosen:\n   1. **Retrieval Pipeline** fetches candidates from **LanceDB** using BM25 + dense vectors.\n   2. **AI Reranker** (HF model) sorts snippets.\n   3. **Answer Synthesiser** calls **Ollama** to write the final answer.\n5. Answers can be **Verified** for grounding (optional flag).\n6. Index-building is an offline path triggered from the UI — PDF/📄 files are chunked, embedded and stored in LanceDB.\n\n---\n\n## 2. Component Documents\nThe table below links to deep-dives for each major component.\n\n| **Component** | **Documentation** |\n|---------------|-------------------|\n| Agent Loop | [`system_overview.md`](system_overview.md) |\n| Indexing Pipeline | [`indexing_pipeline.md`](indexing_pipeline.md) |\n| Retrieval Pipeline | [`retrieval_pipeline.md`](retrieval_pipeline.md) |\n| Verifier | [`verifier.md`](verifier.md) |\n| Triage System | [`triage_system.md`](triage_system.md) |\n\n---\n\n> **Change-management**: whenever architecture changes (new micro-service, different DB, etc.) update this overview diagram first, then individual component docs. "
  },
  {
    "path": "Documentation/deployment_guide.md",
    "content": "# 🚀 RAG System Deployment Guide\n\n_Last updated: 2025-01-07_\n\nThis guide provides comprehensive instructions for deploying the RAG system using both Docker and direct development approaches.\n\n---\n\n## 🎯 Deployment Options\n\n### Option 1: Docker Deployment (Production) 🐳\n- **Best for**: Production environments, containerized deployments, scaling\n- **Pros**: Isolated, reproducible, easy to manage\n- **Cons**: Slightly more complex setup, resource overhead\n\n### Option 2: Direct Development (Development) 💻\n- **Best for**: Development, debugging, customization\n- **Pros**: Direct access to code, faster iteration, easier debugging\n- **Cons**: More dependencies to manage\n\n---\n\n## 1. Prerequisites\n\n### 1.1 System Requirements\n\n#### **Minimum Requirements**\n- **CPU**: 4 cores, 2.5GHz+\n- **RAM**: 8GB (16GB recommended)\n- **Storage**: 50GB free space\n- **OS**: Linux, macOS, or Windows with WSL2\n\n#### **Recommended Requirements**\n- **CPU**: 8+ cores, 3.0GHz+\n- **RAM**: 32GB+ (for large models)\n- **Storage**: 200GB+ SSD\n- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional, for acceleration)\n\n### 1.2 Common Dependencies\n\n**Both deployment methods require:**\n```bash\n# Ollama (required for both approaches)\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Git for cloning\ngit 2.30+\n```\n\n### 1.3 Docker-Specific Dependencies\n\n**For Docker deployment:**\n```bash\n# Docker & Docker Compose\nDocker Engine 24.0+\nDocker Compose 2.20+\n```\n\n### 1.4 Direct Development Dependencies\n\n**For direct development:**\n```bash\n# Python & Node.js\nPython 3.8+\nNode.js 16+\nnpm 8+\n```\n\n---\n\n## 2. 🐳 Docker Deployment\n\n### 2.1 Installation\n\n#### **Step 1: Install Docker**\n\n**Ubuntu/Debian:**\n```bash\n# Install Docker\ncurl -fsSL https://get.docker.com -o get-docker.sh\nsudo sh get-docker.sh\nsudo usermod -aG docker $USER\nnewgrp docker\n\n# Install Docker Compose V2\nsudo apt-get update\nsudo apt-get install docker-compose-plugin\n```\n\n**macOS:**\n```bash\n# Install Docker Desktop\nbrew install --cask docker\n# Or download from: https://www.docker.com/products/docker-desktop\n```\n\n**Windows:**\n```bash\n# Install Docker Desktop with WSL2 backend\n# Download from: https://www.docker.com/products/docker-desktop\n```\n\n#### **Step 2: Clone Repository**\n```bash\ngit clone https://github.com/your-org/rag-system.git\ncd rag-system\n```\n\n#### **Step 3: Install Ollama**\n```bash\n# Install Ollama (runs locally even with Docker)\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama\nollama serve\n\n# In another terminal, install models\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n#### **Step 4: Launch Docker System**\n```bash\n# Start all containers using the convenience script\n./start-docker.sh\n\n# Or manually:\ndocker compose --env-file docker.env up --build -d\n```\n\n#### **Step 5: Verify Deployment**\n```bash\n# Check container status\ndocker compose ps\n\n# Test all endpoints\ncurl http://localhost:3000      # Frontend\ncurl http://localhost:8000/health  # Backend\ncurl http://localhost:8001/models  # RAG API\ncurl http://localhost:11434/api/tags  # Ollama\n```\n\n### 2.2 Docker Management\n\n#### **Container Operations**\n```bash\n# Start system\n./start-docker.sh\n\n# Stop system\n./start-docker.sh stop\n\n# View logs\n./start-docker.sh logs\n\n# Check status\n./start-docker.sh status\n\n# Manual Docker Compose commands\ndocker compose ps                    # Check status\ndocker compose logs -f              # Follow logs\ndocker compose down                 # Stop all containers\ndocker compose up --build -d        # Rebuild and restart\n```\n\n#### **Individual Container Management**\n```bash\n# Restart specific service\ndocker compose restart rag-api\n\n# View specific service logs\ndocker compose logs -f backend\n\n# Execute commands in container\ndocker compose exec rag-api python -c \"print('Hello')\"\n```\n\n---\n\n## 3. 💻 Direct Development\n\n### 3.1 Installation\n\n#### **Step 1: Install Dependencies**\n\n**Python Dependencies:**\n```bash\n# Clone repository\ngit clone https://github.com/your-org/rag-system.git\ncd rag-system\n\n# Create virtual environment (recommended)\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install Python packages\npip install -r requirements.txt\n```\n\n**Node.js Dependencies:**\n```bash\n# Install Node.js dependencies\nnpm install\n```\n\n#### **Step 2: Install and Configure Ollama**\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama\nollama serve\n\n# In another terminal, install models\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n#### **Step 3: Launch System**\n\n**Option A: Integrated Launcher (Recommended)**\n```bash\n# Start all components with one command\npython run_system.py\n```\n\n**Option B: Manual Component Startup**\n```bash\n# Terminal 1: RAG API\npython -m rag_system.api_server\n\n# Terminal 2: Backend\ncd backend && python server.py\n\n# Terminal 3: Frontend\nnpm run dev\n\n# Access at http://localhost:3000\n```\n\n#### **Step 4: Verify Installation**\n```bash\n# Check system health\npython system_health_check.py\n\n# Test endpoints\ncurl http://localhost:3000      # Frontend\ncurl http://localhost:8000/health  # Backend\ncurl http://localhost:8001/models  # RAG API\n```\n\n### 3.2 Direct Development Management\n\n#### **System Operations**\n```bash\n# Start system\npython run_system.py\n\n# Check system health\npython system_health_check.py\n\n# Stop system\n# Press Ctrl+C in terminal running run_system.py\n```\n\n#### **Individual Component Management**\n```bash\n# Start components individually\npython -m rag_system.api_server    # RAG API on port 8001\ncd backend && python server.py     # Backend on port 8000\nnpm run dev                         # Frontend on port 3000\n\n# Development tools\nnpm run build                       # Build frontend for production\npip install -r requirements.txt --upgrade  # Update Python packages\n```\n\n---\n\n## 4. Architecture Comparison\n\n### 4.1 Docker Architecture\n\n```mermaid\ngraph TB\n    subgraph \"Docker Containers\"\n        Frontend[Frontend Container<br/>Next.js<br/>Port 3000]\n        Backend[Backend Container<br/>Python API<br/>Port 8000]\n        RAG[RAG API Container<br/>Document Processing<br/>Port 8001]\n    end\n    \n    subgraph \"Local System\"\n        Ollama[Ollama Server<br/>Port 11434]\n    end\n    \n    Frontend --> Backend\n    Backend --> RAG\n    RAG --> Ollama\n```\n\n### 4.2 Direct Development Architecture\n\n```mermaid\ngraph TB\n    subgraph \"Local Processes\"\n        Frontend[Next.js Dev Server<br/>Port 3000]\n        Backend[Python Backend<br/>Port 8000]\n        RAG[RAG API<br/>Port 8001]\n        Ollama[Ollama Server<br/>Port 11434]\n    end\n    \n    Frontend --> Backend\n    Backend --> RAG\n    RAG --> Ollama\n```\n\n---\n\n## 5. Configuration\n\n### 5.1 Environment Variables\n\n#### **Docker Configuration (`docker.env`)**\n```bash\n# Ollama Configuration\nOLLAMA_HOST=http://host.docker.internal:11434\n\n# Service Configuration\nNODE_ENV=production\nRAG_API_URL=http://rag-api:8001\nNEXT_PUBLIC_API_URL=http://localhost:8000\n```\n\n#### **Direct Development Configuration**\n```bash\n# Environment variables are set automatically by run_system.py\n# Override in environment if needed:\nexport OLLAMA_HOST=http://localhost:11434\nexport RAG_API_URL=http://localhost:8001\n```\n\n### 5.2 Model Configuration\n\n#### **Default Models**\n```python\n# Embedding Models\nEMBEDDING_MODELS = [\n    \"Qwen/Qwen3-Embedding-0.6B\",  # Fast, 1024 dimensions\n    \"Qwen/Qwen3-Embedding-4B\",    # High quality, 2048 dimensions\n]\n\n# Generation Models  \nGENERATION_MODELS = [\n    \"qwen3:0.6b\",  # Fast responses\n    \"qwen3:8b\",    # High quality\n]\n```\n\n### 5.3 Performance Tuning\n\n#### **Memory Settings**\n```bash\n# For Docker: Increase memory allocation\n# Docker Desktop → Settings → Resources → Memory → 16GB+\n\n# For Direct Development: Monitor with\nhtop  # or top on macOS\n```\n\n#### **Model Settings**\n```python\n# Batch sizes (adjust based on available RAM)\nEMBEDDING_BATCH_SIZE = 50   # Reduce if OOM\nENRICHMENT_BATCH_SIZE = 25  # Reduce if OOM\n\n# Chunk settings\nCHUNK_SIZE = 512           # Text chunk size\nCHUNK_OVERLAP = 64         # Overlap between chunks\n```\n\n---\n\n## 6. Operational Procedures\n\n### 6.1 System Monitoring\n\n#### **Health Checks**\n```bash\n# Comprehensive system check\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n```\n\n#### **Performance Monitoring**\n```bash\n# Docker monitoring\ndocker stats\n\n# Direct development monitoring\nhtop           # Overall system\nnvidia-smi     # GPU usage (if available)\n```\n\n### 6.2 Log Management\n\n#### **Docker Logs**\n```bash\n# All services\ndocker compose logs -f\n\n# Specific service\ndocker compose logs -f rag-api\n\n# Save logs to file\ndocker compose logs > system.log 2>&1\n```\n\n#### **Direct Development Logs**\n```bash\n# Logs are printed to terminal\n# Redirect to file if needed:\npython run_system.py > system.log 2>&1\n```\n\n### 6.3 Backup and Restore\n\n#### **Data Backup**\n```bash\n# Create backup directory\nmkdir -p backups/$(date +%Y%m%d)\n\n# Backup databases and indexes\ncp -r backend/chat_data.db backups/$(date +%Y%m%d)/\ncp -r lancedb backups/$(date +%Y%m%d)/\ncp -r index_store backups/$(date +%Y%m%d)/\n\n# For Docker: also backup volumes\ndocker compose down\ndocker run --rm -v rag_system_old_ollama_data:/data -v $(pwd)/backups:/backup alpine tar czf /backup/ollama_models_$(date +%Y%m%d).tar.gz -C /data .\n```\n\n#### **Data Restore**\n```bash\n# Stop system\n./start-docker.sh stop  # Docker\n# Or Ctrl+C for direct development\n\n# Restore files\ncp -r backups/YYYYMMDD/* ./\n\n# Restart system\n./start-docker.sh  # Docker\npython run_system.py  # Direct development\n```\n\n---\n\n## 7. Troubleshooting\n\n### 7.1 Common Issues\n\n#### **Port Conflicts**\n```bash\n# Check what's using ports\nlsof -i :3000 -i :8000 -i :8001 -i :11434\n\n# For Docker: Stop conflicting containers\n./start-docker.sh stop\n\n# For Direct: Kill processes\npkill -f \"npm run dev\"\npkill -f \"server.py\"\npkill -f \"api_server\"\n```\n\n#### **Docker Issues**\n```bash\n# Docker daemon not running\ndocker version  # Check if daemon responds\n\n# Restart Docker Desktop (macOS/Windows)\n# Or restart docker service (Linux)\nsudo systemctl restart docker\n\n# Clear Docker cache\ndocker system prune -f\n```\n\n#### **Ollama Issues**\n```bash\n# Check Ollama status\ncurl http://localhost:11434/api/tags\n\n# Restart Ollama\npkill ollama\nollama serve\n\n# Reinstall models\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n### 7.2 Performance Issues\n\n#### **Memory Problems**\n```bash\n# Check memory usage\nfree -h           # Linux\nvm_stat           # macOS\ndocker stats      # Docker containers\n\n# Solutions:\n# 1. Increase system RAM\n# 2. Reduce batch sizes in configuration\n# 3. Use smaller models (qwen3:0.6b instead of qwen3:8b)\n```\n\n#### **Slow Response Times**\n```bash\n# Check model loading\ncurl http://localhost:11434/api/tags\n\n# Monitor component response times\ntime curl http://localhost:8001/models\n\n# Solutions:\n# 1. Use SSD storage\n# 2. Increase CPU cores\n# 3. Use GPU acceleration (if available)\n```\n\n---\n\n## 8. Production Considerations\n\n### 8.1 Security\n\n#### **Network Security**\n```bash\n# Use reverse proxy (nginx/traefik) for production\n# Enable HTTPS/TLS\n# Restrict port access with firewall\n```\n\n#### **Data Security**\n```bash\n# Enable authentication in production\n# Encrypt sensitive data\n# Regular security updates\n```\n\n### 8.2 Scaling\n\n#### **Horizontal Scaling**\n```bash\n# Use Docker Swarm or Kubernetes\n# Load balance frontend and backend\n# Scale RAG API instances based on load\n```\n\n#### **Resource Optimization**\n```bash\n# Use dedicated GPU nodes for AI workloads\n# Implement model caching\n# Optimize batch processing\n```\n\n---\n\n## 9. Success Criteria\n\n### 9.1 Deployment Verification\n\nYour deployment is successful when:\n\n- ✅ All health checks pass\n- ✅ Frontend loads at http://localhost:3000\n- ✅ You can create document indexes\n- ✅ You can chat with uploaded documents\n- ✅ No error messages in logs\n\n### 9.2 Performance Benchmarks\n\n**Acceptable Performance:**\n- Index creation: < 2 minutes per 100MB document\n- Query response: < 30 seconds for complex questions\n- Memory usage: < 8GB total system memory\n\n**Optimal Performance:**\n- Index creation: < 1 minute per 100MB document  \n- Query response: < 10 seconds for complex questions\n- Memory usage: < 16GB total system memory\n\n---\n\n**Happy Deploying! 🚀** "
  },
  {
    "path": "Documentation/docker_usage.md",
    "content": "# 🐳 Docker Usage Guide - RAG System\n\n_Last updated: 2025-01-07_\n\nThis guide provides practical Docker commands and procedures for running the RAG system in containerized environments with local Ollama.\n\n---\n\n## 📋 Prerequisites\n\n### Required Setup\n- Docker Desktop installed and running\n- Ollama installed locally (even for Docker deployment)\n- 8GB+ RAM available\n\n### Architecture Overview\n```\n┌─────────────────────────────────────┐\n│           Docker Containers        │\n├─────────────────────────────────────┤\n│ Frontend (Port 3000)               │\n│ Backend (Port 8000)                │\n│ RAG API (Port 8001)                │\n└─────────────────────────────────────┘\n            │\n            ▼\n┌─────────────────────────────────────┐\n│         Local System               │\n├─────────────────────────────────────┤\n│ Ollama Server (Port 11434)         │\n└─────────────────────────────────────┘\n```\n\n---\n\n## 1. Quick Start Commands\n\n### Step 1: Clone and Setup\n\n```bash\n# Clone repository\ngit clone <your-repository-url>\ncd rag_system_old\n\n# Verify Docker is running\ndocker version\n```\n\n### Step 2: Install and Configure Ollama (Required)\n\n**⚠️ Important**: Even with Docker, Ollama must be installed locally for optimal performance.\n\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama (in one terminal)\nollama serve\n\n# Install required models (in another terminal)\nollama pull qwen3:0.6b      # Fast model (650MB)\nollama pull qwen3:8b        # High-quality model (4.7GB)\n\n# Verify models are installed\nollama list\n\n# Test Ollama connection\ncurl http://localhost:11434/api/tags\n```\n\n### Step 3: Start Docker Containers\n\n```bash\n# Start all containers\n./start-docker.sh\n\n# Stop all containers\n./start-docker.sh stop\n\n# View logs\n./start-docker.sh logs\n\n# Check status\n./start-docker.sh status\n\n# Restart containers\n./start-docker.sh stop\n./start-docker.sh\n```\n\n### 1.2 Service Access\n\nOnce running, access the system at:\n- **Frontend**: http://localhost:3000\n- **Backend API**: http://localhost:8000  \n- **RAG API**: http://localhost:8001\n- **Ollama**: http://localhost:11434\n\n---\n\n## 2. Container Management\n\n### 2.1 Using the Convenience Script\n\n```bash\n# Start all containers\n./start-docker.sh\n\n# Stop all containers\n./start-docker.sh stop\n\n# View logs\n./start-docker.sh logs\n\n# Check status\n./start-docker.sh status\n\n# Restart containers\n./start-docker.sh stop\n./start-docker.sh\n```\n\n### 2.2 Manual Docker Compose Commands\n\n```bash\n# Start all services\ndocker compose --env-file docker.env up --build -d\n\n# Check status\ndocker compose ps\n\n# View logs\ndocker compose logs -f\n\n# Stop all services\ndocker compose down\n\n# Force rebuild\ndocker compose build --no-cache\ndocker compose up --build -d\n```\n\n### 2.3 Individual Service Management\n\n```bash\n# Start specific service\ndocker compose up -d frontend\ndocker compose up -d backend\ndocker compose up -d rag-api\n\n# Restart specific service\ndocker compose restart rag-api\n\n# Stop specific service\ndocker compose stop backend\n\n# View specific service logs\ndocker compose logs -f rag-api\n```\n\n---\n\n## 3. Development Workflow\n\n### 3.1 Code Changes\n\n```bash\n# After frontend changes\ndocker compose restart frontend\n\n# After backend changes  \ndocker compose restart backend\n\n# After RAG system changes\ndocker compose restart rag-api\n\n# Rebuild after dependency changes\ndocker compose build --no-cache rag-api\ndocker compose up -d rag-api\n```\n\n### 3.2 Debugging Containers\n\n```bash\n# Access container shell\ndocker compose exec frontend sh\ndocker compose exec backend bash\ndocker compose exec rag-api bash\n\n# Run commands in container\ndocker compose exec rag-api python -c \"from rag_system.main import get_agent; print('✅ RAG System OK')\"\ndocker compose exec backend curl http://localhost:8000/health\n\n# Check environment variables\ndocker compose exec rag-api env | grep OLLAMA\n```\n\n### 3.3 Development vs Production\n\n```bash\n# Development mode (if docker-compose.dev.yml exists)\ndocker compose -f docker-compose.yml -f docker-compose.dev.yml up -d\n\n# Production mode (default)\ndocker compose --env-file docker.env up -d\n```\n\n---\n\n## 4. Logging & Monitoring\n\n### 4.1 Log Management\n\n```bash\n# View all logs\ndocker compose logs\n\n# View specific service logs\ndocker compose logs frontend\ndocker compose logs backend\ndocker compose logs rag-api\n\n# Follow logs in real-time\ndocker compose logs -f\n\n# View last N lines\ndocker compose logs --tail=100\n\n# View logs with timestamps\ndocker compose logs -t\n\n# Save logs to file\ndocker compose logs > system.log 2>&1\n\n# View logs since specific time\ndocker compose logs --since=2h\ndocker compose logs --since=2025-01-01T00:00:00\n```\n\n### 4.2 System Monitoring\n\n```bash\n# Monitor resource usage\ndocker stats\n\n# Monitor specific containers\ndocker stats rag-frontend rag-backend rag-api\n\n# Check container health\ndocker compose ps\n\n# System information\ndocker system info\ndocker system df\n```\n\n---\n\n## 5. Ollama Integration\n\n### 5.1 Ollama Setup\n\n```bash\n# Install Ollama (one-time setup)\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama server\nollama serve\n\n# Check Ollama status\ncurl http://localhost:11434/api/tags\n\n# Install models\nollama pull qwen3:0.6b      # Fast model\nollama pull qwen3:8b        # High-quality model\n\n# List installed models\nollama list\n```\n\n### 5.2 Ollama Management\n\n```bash\n# Check model status from container\ndocker compose exec rag-api curl http://host.docker.internal:11434/api/tags\n\n# Test Ollama connection\ncurl -X POST http://localhost:11434/api/generate \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"qwen3:0.6b\", \"prompt\": \"Hello\", \"stream\": false}'\n\n# Monitor Ollama logs (if running with logs)\n# Ollama logs appear in the terminal where you ran 'ollama serve'\n```\n\n### 5.3 Model Management\n\n```bash\n# Update models\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n\n# Remove unused models\nollama rm old-model-name\n\n# Check model information\nollama show qwen3:0.6b\n```\n\n---\n\n## 6. Data Management\n\n### 6.1 Volume Management\n\n```bash\n# List volumes\ndocker volume ls\n\n# View volume usage\ndocker system df -v\n\n# Backup volumes\ndocker run --rm -v rag_system_old_lancedb:/data -v $(pwd)/backup:/backup alpine tar czf /backup/lancedb_backup.tar.gz -C /data .\n\n# Clean unused volumes\ndocker volume prune\n```\n\n### 6.2 Database Management\n\n```bash\n# Access SQLite database\ndocker compose exec backend sqlite3 /app/backend/chat_data.db\n\n# Backup database\ncp backend/chat_data.db backup/chat_data_$(date +%Y%m%d).db\n\n# Check LanceDB tables from container\ndocker compose exec rag-api python -c \"\nimport lancedb\ndb = lancedb.connect('/app/lancedb')\nprint('Tables:', db.table_names())\n\"\n```\n\n### 6.3 File Management\n\n```bash\n# Access shared files\ndocker compose exec rag-api ls -la /app/shared_uploads\n\n# Copy files to/from containers\ndocker cp local_file.pdf rag-api:/app/shared_uploads/\ndocker cp rag-api:/app/shared_uploads/file.pdf ./local_file.pdf\n\n# Check disk usage\ndocker compose exec rag-api df -h\n```\n\n---\n\n## 7. Troubleshooting\n\n### 7.1 Common Issues\n\n#### Container Won't Start\n```bash\n# Check Docker daemon\ndocker version\n\n# Check for port conflicts\nlsof -i :3000 -i :8000 -i :8001\n\n# Check container logs\ndocker compose logs [service-name]\n\n# Restart Docker Desktop\n# macOS/Windows: Restart Docker Desktop\n# Linux: sudo systemctl restart docker\n```\n\n#### Ollama Connection Issues\n```bash\n# Check Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Restart Ollama\npkill ollama\nollama serve\n\n# Check from container\ndocker compose exec rag-api curl http://host.docker.internal:11434/api/tags\n```\n\n#### Performance Issues\n```bash\n# Check resource usage\ndocker stats\n\n# Increase Docker memory (Docker Desktop Settings)\n# Recommended: 8GB+ for Docker\n\n# Check container health\ndocker compose ps\n```\n\n### 7.2 Reset and Clean\n\n```bash\n# Stop everything\n./start-docker.sh stop\n\n# Clean containers and images\ndocker system prune -a\n\n# Clean volumes (⚠️ deletes data)\ndocker volume prune\n\n# Complete reset (⚠️ deletes everything)\ndocker compose down -v\ndocker system prune -a --volumes\n```\n\n### 7.3 Health Checks\n\n```bash\n# Comprehensive health check\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n\n# Check all container status\ndocker compose ps\n\n# Test model loading\ndocker compose exec rag-api python -c \"\nfrom rag_system.main import get_agent\nagent = get_agent('default')\nprint('✅ RAG System initialized successfully')\n\"\n```\n\n---\n\n## 8. Advanced Usage\n\n### 8.1 Production Deployment\n\n```bash\n# Use production environment\nexport NODE_ENV=production\n\n# Start with resource limits\ndocker compose --env-file docker.env up -d\n\n# Enable automatic restarts\ndocker update --restart unless-stopped $(docker ps -q)\n```\n\n### 8.2 Scaling\n\n```bash\n# Scale specific services\ndocker compose up -d --scale backend=2 --scale rag-api=2\n\n# Use Docker Swarm for clustering\ndocker swarm init\ndocker stack deploy -c docker-compose.yml rag-system\n```\n\n### 8.3 Security\n\n```bash\n# Scan images for vulnerabilities\ndocker scout cves rag-frontend\ndocker scout cves rag-backend\ndocker scout cves rag-api\n\n# Update base images\ndocker compose build --no-cache --pull\n```\n\n---\n\n## 9. Configuration\n\n### 9.1 Environment Variables\n\nThe system uses `docker.env` for configuration:\n\n```bash\n# Ollama configuration\nOLLAMA_HOST=http://host.docker.internal:11434\n\n# Service configuration\nNODE_ENV=production\nRAG_API_URL=http://rag-api:8001\nNEXT_PUBLIC_API_URL=http://localhost:8000\n```\n\n### 9.2 Custom Configuration\n\n```bash\n# Create custom environment file\ncp docker.env docker.custom.env\n\n# Edit custom configuration\nnano docker.custom.env\n\n# Use custom configuration\ndocker compose --env-file docker.custom.env up -d\n```\n\n---\n\n## 10. Success Checklist\n\nYour Docker deployment is successful when:\n\n- ✅ All containers are running: `docker compose ps`\n- ✅ Ollama is accessible: `curl http://localhost:11434/api/tags`\n- ✅ Frontend loads: `curl http://localhost:3000`\n- ✅ Backend responds: `curl http://localhost:8000/health`\n- ✅ RAG API works: `curl http://localhost:8001/models`\n- ✅ You can create indexes and chat with documents\n\n### Performance Expectations\n\n**Acceptable Performance:**\n- Container startup: < 2 minutes\n- Memory usage: < 4GB Docker containers + Ollama\n- Response time: < 30 seconds for complex queries\n\n**Optimal Performance:**\n- Container startup: < 1 minute  \n- Memory usage: < 2GB Docker containers + Ollama\n- Response time: < 10 seconds for complex queries\n\n---\n\n**Happy Containerizing! 🐳** "
  },
  {
    "path": "Documentation/improvement_plan.md",
    "content": "# RAG System – Improvement Road-map\n\n_Revision: 2025-07-05_\n\nThis document captures high-impact enhancements identified during the July 2025 code-review.  Items are grouped by theme and include a short rationale plus suggested implementation notes.  **No code has been changed – this file is planning only.**\n\n---\n\n## 1. Retrieval Accuracy & Speed\n\n| ID | Item | Rationale | Notes |\n|----|------|-----------|-------|\n| 1.1 | Late-chunk result merging | Returned snippets can be single late-chunks → fragmented. | After retrieval, gather sibling chunks (±1) and concatenate before reranking / display. |\n| 1.2 | Tiered retrieval (ANN pre-filter) | Large indexes → LanceDB full scan can be slow. | Use in-memory FAISS/HNSW to narrow to top-N, then exact LanceDB search. |\n| 1.3 | Dynamic fusion weights | Different corpora favour dense vs BM25 differently. | Learn weight on small validation set; store in index `metadata`. |\n| 1.4 | Query expansion via KG | Use extracted entities to enrich queries. | Requires Graph-RAG path clean-up first. |\n\n## 2. Routing / Triage\n\n| ID | Item | Rationale |\n|----|------|-----------|\n| 2.1 | Embed + cache document overviews | LLM router costs tokens; cosine-similarity pre-check is cheaper. |\n| 2.2 | Session-level routing memo | Avoid repeated LLM triage for follow-up queries. |\n| 2.3 | Remove legacy pattern rules | Simplifies maintenance once overview & ML routing mature. |\n\n## 3. Indexing Pipeline\n\n| ID | Item | Rationale |\n|----|------|-----------|\n| 3.1 | Parallel document conversion | PDF→MD + chunking is serial today; speed gains possible. |\n| 3.2 | Incremental indexing | Re-embedding whole corpus wastes time. |\n| 3.3 | Auto GPU dtype selection | Use FP16 on CUDA / MPS for memory and speed. |\n| 3.4 | Post-build health check | Catch broken indexes (dim mismatch etc.) early. |\n\n## 4. Embedding Model Management\n\n* **Registry file** mapping tag → dims/source/license.  UI & backend validate against it.\n* **Embedder pool** caches loaded HF/Ollama weights per model to save RAM.\n\n## 5. Database & Storage\n\n* LanceDB table GC for orphaned tables.\n* Scheduled SQLite `VACUUM` when fragmentation > X %.\n\n## 6. Observability & Ops\n\n* JSON structured logging.\n* `/metrics` endpoint for Prometheus.\n* Deep health-probe (`/health/deep`) exercising end-to-end query.\n\n## 7. Front-end UX\n\n* SSE-driven progress bar for indexing.\n* Matched-term highlighting in retrieved snippets.\n* Preset buttons (Fast / Balanced / High-Recall) for retrieval settings.\n\n## 8. Testing & CI\n\n* Replace deleted BM25 tests with LanceDB hybrid tests.\n* Integration test: build → query → assert ≥1 doc.\n* GitHub Action that spins up Ollama, pulls small embedding model, runs smoke test.\n\n## 9. Codebase Hygiene\n\n* Graph-RAG integration (currently disabled, can be implemented if needed).\n* Consolidate duplicate config keys (`embedding_model_name`, etc.).\n* Run `mypy --strict`, pylint, and black in CI.\n\n---\n\n### 🧹 System Cleanup (Priority: **HIGH**)\nReduce complexity and improve maintainability.\n\n* **✅ COMPLETED**: Remove experimental DSPy integration and unused modules (35+ files removed)  \n* **✅ COMPLETED**: Clean up duplicate or obsolete documentation files\n* **✅ COMPLETED**: Remove unused import statements and dependencies  \n* **✅ COMPLETED**: Consolidate similar configuration files\n* **✅ COMPLETED**: Remove broken or non-functional ReAct agent implementation\n\n### Priority Matrix (suggested order)\n\n1.  **Critical reliability**: 3.4, 5.1, 9.2\n2.  **User-visible wins**: 1.1, 7.1, 7.2\n3.  **Performance**: 1.2, 3.1, 3.3\n4.  **Long-term maintainability**: 2.3, 9.1, 9.3\n\nFeel free to rearrange based on team objectives and resource availability. "
  },
  {
    "path": "Documentation/indexing_pipeline.md",
    "content": "# 🗂️ Indexing Pipeline\n\n_Implementation entry-point: `rag_system/pipelines/indexing_pipeline.py` + helpers in `indexing/` & `ingestion/`._\n\n## Overview\nTransforms raw documents (PDF, TXT, etc.) into search-ready **chunks** with embeddings, storing them in LanceDB and generating auxiliary assets (overviews, context summaries).\n\n## High-Level Diagram\n```mermaid\nflowchart TD\n    A[\"Uploaded Files\"] --> B{Converter}\n    B -->|PDF→text| C[\"Plain Text\"]\n    C --> D{Chunker}\n    D -->|docling| D1[DocLing Chunking]\n    D -->|latechunk| D2[Late Chunking]\n    D -->|standard| D3[Fixed-size]\n    D1 & D2 & D3 --> E[\"Contextual Enricher\"]\n    E -->|local ctx summary| F[\"Embedding Generator\"]\n    F -->|vectors| G[(LanceDB Table)]\n    E --> H[\"Overview Builder\"]\n    H -->|JSONL| OVR[[`index_store/overviews/<idx>.jsonl`]]\n```\n\n## Steps in Detail\n| Step | Module | Key Classes | Notes |\n|------|--------|------------|-------|\n| Conversion | `ingestion/pdf_converter.py` | `PDFConverter` | Uses `Docling` library to extract text with structure preservation. |\n| Chunking | `ingestion/chunking.py`, `indexing/latechunk.py`, `ingestion/docling_chunker.py` | `MarkdownRecursiveChunker`, `DoclingChunker` | Controlled by flags `latechunk`, `doclingChunk`, `chunkSize`, `chunkOverlap`. |\n| Contextual Enrichment | `indexing/contextualizer.py` | `ContextualEnricher` | Generates per-chunk summaries (LLM call). |\n| Embedding | `indexing/embedders.py`, `indexing/representations.py` | `QwenEmbedder`, `EmbeddingGenerator` | Batch size tunable (`batchSizeEmbed`). Uses Qwen3-Embedding models. |\n| LanceDB Ingest | `index_store/lancedb/…` | – | Each index has a dedicated table `text_pages_<index_id>`. |\n| Overview | `indexing/overview_builder.py` | `OverviewBuilder` | First-N chunks summarised for triage routing. |\n\n### Control Flow (Code)\n1. **backend/server.py → handle_build_index()** collects files + opts and POSTs to `/index` endpoint on advanced RAG API (local process).\n2. **indexing_pipeline.IndexingPipeline.run()** orchestrates conversion → chunking → enrichment → embedding → storage.\n3. Metadata (chunk_size, models, etc.) stored in SQLite `indexes` table.\n\n## Configuration Flags\n| Flag | Description | Default |\n|------|-------------|---------|\n| `latechunk` | Merge k adjacent sibling chunks at query time | false |\n| `doclingChunk` | Use DocLing structural chunking | false |\n| `chunkSize` / `chunkOverlap` | Standard fixed slicing | 512 / 64 |\n| `enableEnrich` | Run contextual summaries | true |\n| `embeddingModel` | Override embedder | `Qwen/Qwen3-Embedding-0.6B` |\n| `overviewModel` | Model used in `OverviewBuilder` | `qwen3:0.6b` |\n| `batchSizeEmbed / Enrich` | Batch sizes | 50 / 25 |\n\n## Error Handling\n* Duplicate LanceDB table ➟ now idempotent (commit `af99b38`).\n* Failed PDF parse ➟ chunker skips file, logs warning.\n\n## Extension Ideas\n* Add OCR layer before PDF conversion.\n* Store embeddings in Remote LanceDB instance (update URL in config).\n\n## Detailed Implementation Analysis\n\n### Pipeline Architecture Pattern\nThe `IndexingPipeline` uses a **sequential processing pattern** with parallel batch operations. Each stage processes all documents before moving to the next stage, enabling efficient memory usage and progress tracking.\n\n```python\ndef run(self, file_paths: List[str]):\n    with timer(\"Complete Indexing Pipeline\"):\n        # Stage 1: Document Processing & Chunking\n        all_chunks = []\n        doc_chunks_map = {}\n        \n        # Stage 2: Contextual Enrichment (optional)\n        if self.contextual_enricher:\n            all_chunks = self.contextual_enricher.enrich_batch(all_chunks)\n        \n        # Stage 3: Dense Indexing (embedding + storage)\n        if self.vector_indexer:\n            self.vector_indexer.index_chunks(all_chunks, table_name)\n        \n        # Stage 4: Graph Extraction (optional)\n        if self.graph_extractor:\n            self.graph_extractor.extract_and_store(all_chunks)\n```\n\n### Document Processing Deep-Dive\n\n#### PDF Conversion Strategy\n```python\n# PDFConverter uses Docling for robust text extraction with structure\ndef convert_to_markdown(self, file_path: str) -> List[Tuple[str, Dict, Any]]:\n    # Quick heuristic: if PDF has text layer, skip OCR for speed\n    use_ocr = not self._pdf_has_text(file_path)\n    converter = self.converter_ocr if use_ocr else self.converter_no_ocr\n    \n    result = converter.convert(file_path)\n    markdown_content = result.document.export_to_markdown()\n    \n    metadata = {\"source\": file_path}\n    # Return DoclingDocument object for advanced chunkers\n    return [(markdown_content, metadata, result.document)]\n```\n\n**Benefits**:\n- Preserves document structure (headings, lists, tables)\n- Automatic OCR fallback for image-based PDFs\n- Maintains page-level metadata for source attribution\n- Structured output supports advanced chunking strategies\n\n#### Chunking Strategy Selection\n```python\n# Dynamic chunker selection based on config\nchunker_mode = config.get(\"chunker_mode\", \"legacy\")\n\nif chunker_mode == \"docling\":\n    self.chunker = DoclingChunker(\n        max_tokens=chunk_size,\n        overlap=overlap_sentences,\n        tokenizer_model=\"Qwen/Qwen3-Embedding-0.6B\"\n    )\nelse:\n    self.chunker = MarkdownRecursiveChunker(\n        max_chunk_size=chunk_size,\n        min_chunk_size=min(chunk_overlap, chunk_size // 4)\n    )\n```\n\n#### Recursive Markdown Chunking Algorithm\n```python\ndef chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:\n    # Priority hierarchy for splitting\n    separators = [\n        \"\\n\\n# \",      # H1 headers (highest priority)\n        \"\\n\\n## \",     # H2 headers\n        \"\\n\\n### \",    # H3 headers\n        \"\\n\\n\",        # Paragraph breaks\n        \"\\n\",          # Line breaks\n        \". \",          # Sentence boundaries\n        \" \"            # Word boundaries (last resort)\n    ]\n    \n    chunks = []\n    current_chunk = \"\"\n    \n    for separator in separators:\n        if len(current_chunk) <= self.max_chunk_size:\n            continue\n            \n        # Split on current separator\n        parts = current_chunk.split(separator)\n        \n        # Reassemble with overlap\n        for i, part in enumerate(parts):\n            if len(part) > self.max_chunk_size:\n                # Recursively split large parts\n                continue\n            \n            # Add overlap from previous chunk\n            if i > 0 and len(chunks) > 0:\n                overlap_text = chunks[-1][\"text\"][-self.chunk_overlap:]\n                part = overlap_text + separator + part\n            \n            chunks.append({\n                \"text\": part,\n                \"document_id\": document_id,\n                \"metadata\": {**metadata, \"chunk_index\": len(chunks)}\n            })\n```\n\n### DocLing Chunking Implementation\n\n#### Token-Aware Sentence Packing\n```python\nclass DoclingChunker:\n    def __init__(self, max_tokens: int = 512, overlap: int = 1, \n                 tokenizer_model: str = \"Qwen/Qwen3-Embedding-0.6B\"):\n        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_model)\n        self.max_tokens = max_tokens\n        self.overlap = overlap  # sentences of overlap\n    \n    def split_markdown(self, markdown: str, document_id: str, metadata: Dict):\n        sentences = self._sentence_split(markdown)\n        chunks = []\n        window = []\n        \n        while sentences:\n            # Add sentences until token limit\n            while (sentences and \n                   self._token_len(\" \".join(window + [sentences[0]])) <= self.max_tokens):\n                window.append(sentences.pop(0))\n            \n            if not window:  # Single sentence > limit\n                window.append(sentences.pop(0))\n            \n            # Create chunk\n            chunk_text = \" \".join(window)\n            chunks.append({\n                \"chunk_id\": f\"{document_id}_{len(chunks)}\",\n                \"text\": chunk_text,\n                \"metadata\": {\n                    **metadata,\n                    \"chunk_index\": len(chunks),\n                    \"heading_path\": metadata.get(\"heading_path\", []),\n                    \"block_type\": metadata.get(\"block_type\", \"paragraph\")\n                }\n            })\n            \n            # Add overlap for next chunk\n            if self.overlap and sentences:\n                overlap_sentences = window[-self.overlap:]\n                sentences = overlap_sentences + sentences\n            window = []\n        \n        return chunks\n```\n\n#### Document Structure Preservation\n```python\ndef chunk_document(self, doc, document_id: str, metadata: Dict):\n    \"\"\"Walk DoclingDocument tree and emit structured chunks.\"\"\"\n    chunks = []\n    current_heading_path = []\n    buffer = []\n    \n    # Process document elements in reading order\n    for txt_item in doc.texts:\n        role = getattr(txt_item, \"role\", None)\n        \n        if role == \"heading\":\n            self._flush_buffer(buffer, chunks, current_heading_path)\n            level = getattr(txt_item, \"level\", 1)\n            # Update heading hierarchy\n            current_heading_path = current_heading_path[:level-1]\n            current_heading_path.append(txt_item.text.strip())\n            continue\n        \n        # Accumulate text in token-aware buffer\n        text_piece = txt_item.text\n        if self._buffer_would_exceed_limit(buffer, text_piece):\n            self._flush_buffer(buffer, chunks, current_heading_path)\n        \n        buffer.append(text_piece)\n    \n    self._flush_buffer(buffer, chunks, current_heading_path)\n    return chunks\n```\n\n### Contextual Enrichment Implementation\n\n#### Batch Processing Pattern\n```python\nclass ContextualEnricher:\n    def enrich_batch(self, chunks: List[Dict]) -> List[Dict]:\n        enriched_chunks = []\n        \n        # Process in batches to manage memory\n        for i in range(0, len(chunks), self.batch_size):\n            batch = chunks[i:i + self.batch_size]\n            \n            # Parallel enrichment within batch\n            with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:\n                futures = [\n                    executor.submit(self._enrich_single_chunk, chunk)\n                    for chunk in batch\n                ]\n                \n                for future in concurrent.futures.as_completed(futures):\n                    enriched_chunks.append(future.result())\n        \n        return enriched_chunks\n```\n\n#### Contextual Prompt Engineering\n```python\ndef _generate_context_summary(self, chunk_text: str, surrounding_context: str) -> str:\n    prompt = f\"\"\"\n    Analyze this text chunk and provide a concise summary that captures:\n    1. Main topics and key information\n    2. Context within the broader document\n    3. Relevance for search and retrieval\n    \n    Document Context:\n    {surrounding_context}\n    \n    Chunk to Analyze:\n    {chunk_text}\n    \n    Summary (max 2 sentences):\n    \"\"\"\n    \n    response = self.llm_client.complete(\n        prompt=prompt,\n        model=self.ollama_config[\"enrichment_model\"]  # qwen3:0.6b\n    )\n    \n    return response.strip()\n```\n\n### Embedding Generation Pipeline\n\n#### Model Selection Strategy\n```python\ndef select_embedder(model_name: str, ollama_host: str = None):\n    \"\"\"Select appropriate embedder based on model name.\"\"\"\n    if \"Qwen3-Embedding\" in model_name:\n        return QwenEmbedder(model_name=model_name)\n    elif \"bge-\" in model_name:\n        return BGEEmbedder(model_name=model_name)\n    elif ollama_host and model_name in [\"nomic-embed-text\"]:\n        return OllamaEmbedder(model_name=model_name, host=ollama_host)\n    else:\n        # Default to Qwen embedder\n        return QwenEmbedder(model_name=\"Qwen/Qwen3-Embedding-0.6B\")\n```\n\n#### Batch Embedding Generation\n```python\nclass QwenEmbedder:\n    def create_embeddings(self, texts: List[str]) -> np.ndarray:\n        \"\"\"Generate embeddings in batches for efficiency.\"\"\"\n        embeddings = []\n        \n        for i in range(0, len(texts), self.batch_size):\n            batch = texts[i:i + self.batch_size]\n            \n            # Tokenize and encode\n            inputs = self.tokenizer(\n                batch, \n                padding=True, \n                truncation=True, \n                max_length=512,\n                return_tensors='pt'\n            )\n            \n            with torch.no_grad():\n                outputs = self.model(**inputs)\n                # Mean pooling over token embeddings\n                batch_embeddings = outputs.last_hidden_state.mean(dim=1)\n                embeddings.append(batch_embeddings.cpu().numpy())\n        \n        return np.vstack(embeddings)\n```\n\n### LanceDB Storage Implementation\n\n#### Table Management Strategy\n```python\nclass LanceDBManager:\n    def create_table_if_not_exists(self, table_name: str, schema: Schema):\n        \"\"\"Create LanceDB table with proper schema.\"\"\"\n        try:\n            table = self.db.open_table(table_name)\n            print(f\"Table {table_name} already exists\")\n            return table\n        except FileNotFoundError:\n            # Table doesn't exist, create it\n            table = self.db.create_table(\n                table_name,\n                schema=schema,\n                mode=\"create\"\n            )\n            print(f\"Created new table: {table_name}\")\n            return table\n    \n    def index_chunks(self, chunks: List[Dict], table_name: str):\n        \"\"\"Store chunks with embeddings in LanceDB.\"\"\"\n        table = self.get_table(table_name)\n        \n        # Prepare data for insertion\n        records = []\n        for chunk in chunks:\n            record = {\n                \"chunk_id\": chunk[\"chunk_id\"],\n                \"text\": chunk[\"text\"],\n                \"vector\": chunk[\"embedding\"].tolist(),\n                \"metadata\": json.dumps(chunk[\"metadata\"]),\n                \"document_id\": chunk[\"metadata\"][\"document_id\"],\n                \"chunk_index\": chunk[\"metadata\"][\"chunk_index\"]\n            }\n            records.append(record)\n        \n        # Batch insert\n        table.add(records)\n        \n        # Create vector index for fast similarity search\n        table.create_index(\"vector\", config=IvfPq(num_partitions=256))\n```\n\n### Overview Building for Query Routing\n\n#### Document Summarization Strategy\n```python\nclass OverviewBuilder:\n    def build_overview(self, chunks: List[Dict], document_id: str) -> Dict:\n        \"\"\"Generate document overview for query routing.\"\"\"\n        # Take first N chunks for overview (usually most important)\n        sample_chunks = chunks[:self.max_chunks_for_overview]\n        combined_text = \"\\n\\n\".join([c[\"text\"] for c in sample_chunks])\n        \n        overview_prompt = f\"\"\"\n        Analyze this document and create a brief overview that includes:\n        1. Main topic and purpose\n        2. Key themes and concepts\n        3. Document type and domain\n        4. Relevant search keywords\n        \n        Document text:\n        {combined_text}\n        \n        Overview (max 3 sentences):\n        \"\"\"\n        \n        overview = self.llm_client.complete(\n            prompt=overview_prompt,\n            model=self.overview_model  # qwen3:0.6b for speed\n        )\n        \n        return {\n            \"document_id\": document_id,\n            \"overview\": overview.strip(),\n            \"chunk_count\": len(chunks),\n            \"keywords\": self._extract_keywords(combined_text),\n            \"created_at\": datetime.now().isoformat()\n        }\n    \n    def save_overview(self, overview: Dict):\n        \"\"\"Save overview to JSONL file for query routing.\"\"\"\n        overview_path = f\"./index_store/overviews/{overview['document_id']}.jsonl\"\n        \n        with open(overview_path, 'w') as f:\n            json.dump(overview, f)\n```\n\n### Performance Optimizations\n\n#### Memory Management\n```python\nclass IndexingPipeline:\n    def __init__(self, config: Dict, ollama_client: OllamaClient, ollama_config: Dict):\n        # Lazy initialization to save memory\n        self._pdf_converter = None\n        self._chunker = None\n        self._embedder = None\n        \n    def _get_embedder(self):\n        \"\"\"Lazy load embedder to avoid memory overhead.\"\"\"\n        if self._embedder is None:\n            model_name = self.config.get(\"embedding_model_name\", \"Qwen/Qwen3-Embedding-0.6B\")\n            self._embedder = select_embedder(model_name)\n        return self._embedder\n    \n    def process_document_batch(self, file_paths: List[str]):\n        \"\"\"Process documents in batches to manage memory.\"\"\"\n        for batch_start in range(0, len(file_paths), self.batch_size):\n            batch = file_paths[batch_start:batch_start + self.batch_size]\n            \n            # Process batch\n            self._process_batch(batch)\n            \n            # Cleanup to free memory\n            if hasattr(self, '_embedder') and self._embedder:\n                self._embedder.cleanup()\n```\n\n#### Parallel Processing\n```python\ndef run_parallel_processing(self, file_paths: List[str]):\n    \"\"\"Process multiple documents in parallel.\"\"\"\n    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:\n        futures = []\n        \n        for file_path in file_paths:\n            future = executor.submit(self._process_single_file, file_path)\n            futures.append(future)\n        \n        # Collect results\n        results = []\n        for future in concurrent.futures.as_completed(futures):\n            try:\n                result = future.result(timeout=300)  # 5 minute timeout\n                results.append(result)\n            except Exception as e:\n                print(f\"Error processing file: {e}\")\n        \n        return results\n```\n\n### Error Handling and Recovery\n\n#### Graceful Degradation\n```python\ndef run(self, file_paths: List[str], table_name: str):\n    \"\"\"Main pipeline with comprehensive error handling.\"\"\"\n    processed_files = []\n    failed_files = []\n    \n    for file_path in file_paths:\n        try:\n            # Attempt processing\n            chunks = self._process_single_file(file_path)\n            \n            if chunks:\n                # Store successfully processed chunks\n                self._store_chunks(chunks, table_name)\n                processed_files.append(file_path)\n            else:\n                print(f\"⚠️ No chunks generated from {file_path}\")\n                failed_files.append((file_path, \"No chunks generated\"))\n                \n        except Exception as e:\n            print(f\"❌ Error processing {file_path}: {e}\")\n            failed_files.append((file_path, str(e)))\n            continue  # Continue with other files\n    \n    # Return summary\n    return {\n        \"processed\": len(processed_files),\n        \"failed\": len(failed_files),\n        \"processed_files\": processed_files,\n        \"failed_files\": failed_files\n    }\n```\n\n#### Recovery Mechanisms\n```python\ndef recover_from_partial_failure(self, table_name: str, document_id: str):\n    \"\"\"Recover from partial indexing failures.\"\"\"\n    try:\n        # Check what was already processed\n        table = self.db_manager.get_table(table_name)\n        existing_chunks = table.search().where(f\"document_id = '{document_id}'\").to_list()\n        \n        if existing_chunks:\n            print(f\"Found {len(existing_chunks)} existing chunks for {document_id}\")\n            return True\n            \n        # Cleanup partial data\n        self._cleanup_partial_data(table_name, document_id)\n        return False\n        \n    except Exception as e:\n        print(f\"Recovery failed: {e}\")\n        return False\n```\n\n### Configuration and Customization\n\n#### Pipeline Configuration Options\n```python\nDEFAULT_CONFIG = {\n    \"chunking\": {\n        \"strategy\": \"docling\",  # \"docling\", \"recursive\", \"fixed\"\n        \"max_tokens\": 512,\n        \"overlap\": 64,\n        \"min_chunk_size\": 100\n    },\n    \"embedding\": {\n        \"model_name\": \"Qwen/Qwen3-Embedding-0.6B\",\n        \"batch_size\": 32,\n        \"max_length\": 512\n    },\n    \"enrichment\": {\n        \"enabled\": True,\n        \"model\": \"qwen3:0.6b\",\n        \"batch_size\": 16\n    },\n    \"overview\": {\n        \"enabled\": True,\n        \"max_chunks\": 5,\n        \"model\": \"qwen3:0.6b\"\n    },\n    \"storage\": {\n        \"create_index\": True,\n        \"index_type\": \"IvfPq\",\n        \"num_partitions\": 256\n    }\n}\n```\n\n#### Custom Processing Hooks\n```python\nclass IndexingPipeline:\n    def __init__(self, config: Dict, hooks: Dict = None):\n        self.hooks = hooks or {}\n    \n    def _run_hook(self, hook_name: str, *args, **kwargs):\n        \"\"\"Execute custom processing hooks.\"\"\"\n        if hook_name in self.hooks:\n            return self.hooks[hook_name](*args, **kwargs)\n        return None\n    \n    def process_chunk(self, chunk: Dict) -> Dict:\n        \"\"\"Process single chunk with custom hooks.\"\"\"\n        # Pre-processing hook\n        chunk = self._run_hook(\"pre_chunk_process\", chunk) or chunk\n        \n        # Standard processing\n        if self.contextual_enricher:\n            chunk = self.contextual_enricher.enrich_chunk(chunk)\n        \n        # Post-processing hook\n        chunk = self._run_hook(\"post_chunk_process\", chunk) or chunk\n        \n        return chunk\n```\n\n---\n\n## Current Implementation Status\n\n### Completed Features ✅\n- DocLing-based PDF processing with OCR fallback\n- Multiple chunking strategies (DocLing, Recursive, Fixed-size)\n- Qwen3-Embedding-0.6B integration\n- Contextual enrichment with qwen3:0.6b\n- LanceDB storage with vector indexing\n- Overview generation for query routing\n- Batch processing and parallel execution\n- Comprehensive error handling\n\n### In Development 🚧\n- Graph extraction and knowledge graph building\n- Multimodal processing for images and tables\n- Advanced late-chunking optimization\n- Distributed processing support\n\n### Planned Features 📋\n- Custom model fine-tuning pipeline\n- Real-time incremental indexing\n- Cross-document relationship extraction\n- Advanced metadata enrichment\n\n---\n\n## Performance Benchmarks\n\n| Document Type | Processing Speed | Memory Usage | Storage Efficiency |\n|---------------|------------------|--------------|-------------------|\n| Text PDFs | 2-5 pages/sec | 2-4GB | 1MB/100 pages |\n| Image PDFs | 0.5-1 page/sec | 4-8GB | 2MB/100 pages |\n| Technical Docs | 1-3 pages/sec | 3-6GB | 1.5MB/100 pages |\n| Research Papers | 2-4 pages/sec | 2-4GB | 1.2MB/100 pages |\n\n## Extension Points\n\n### Custom Chunkers\n```python\nclass CustomChunker(BaseChunker):\n    def chunk(self, text: str, document_id: str, metadata: Dict) -> List[Dict]:\n        # Implement custom chunking logic\n        pass\n```\n\n### Custom Embedders\n```python\nclass CustomEmbedder(BaseEmbedder):\n    def create_embeddings(self, texts: List[str]) -> np.ndarray:\n        # Implement custom embedding generation\n        pass\n```\n\n### Custom Enrichers\n```python\nclass CustomEnricher(BaseEnricher):\n    def enrich_chunk(self, chunk: Dict) -> Dict:\n        # Implement custom enrichment logic\n        pass\n``` "
  },
  {
    "path": "Documentation/installation_guide.md",
    "content": "# 📦 RAG System Installation Guide\n\n_Last updated: 2025-01-07_\n\nThis guide provides step-by-step instructions for installing and setting up the RAG system using either Docker or direct development approaches.\n\n---\n\n## 🎯 Installation Options\n\n### Option 1: Docker Deployment (Production Ready) 🐳\n- **Best for**: Production environments, isolated setups, easy management\n- **Requirements**: Docker Desktop + Local Ollama\n- **Setup time**: ~10 minutes\n\n### Option 2: Direct Development (Developer Friendly) 💻\n- **Best for**: Development, customization, debugging\n- **Requirements**: Python + Node.js + Ollama\n- **Setup time**: ~15 minutes\n\n---\n\n## 1. Prerequisites\n\n### 1.1 System Requirements\n\n#### **Minimum Requirements**\n- **CPU**: 4 cores, 2.5GHz+\n- **RAM**: 8GB (16GB recommended)\n- **Storage**: 50GB free space\n- **OS**: macOS 10.15+, Ubuntu 20.04+, Windows 10+\n\n#### **Recommended Requirements**\n- **CPU**: 8+ cores, 3.0GHz+\n- **RAM**: 32GB+ (for large models)\n- **Storage**: 200GB+ SSD\n- **GPU**: NVIDIA GPU with 8GB+ VRAM (optional)\n\n### 1.2 Common Dependencies\n\n**Required for both approaches:**\n- **Ollama**: AI model runtime (always required)\n- **Git**: 2.30+ for cloning repository\n\n**Docker-specific:**\n- **Docker Desktop**: 24.0+ with Docker Compose\n\n**Direct Development-specific:**\n- **Python**: 3.8+ \n- **Node.js**: 16+ with npm\n\n---\n\n## 2. Ollama Installation (Required for Both)\n\n### 2.1 Install Ollama\n\n#### **macOS/Linux:**\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Verify installation\nollama --version\n```\n\n#### **Windows:**\n```bash\n# Download from: https://ollama.ai/download\n# Run the installer and follow setup wizard\n```\n\n### 2.2 Configure Ollama\n\n```bash\n# Start Ollama server\nollama serve\n\n# In another terminal, install required models\nollama pull qwen3:0.6b      # Fast model (650MB)\nollama pull qwen3:8b        # High-quality model (4.7GB)\n\n# Verify models are installed\nollama list\n\n# Test Ollama\nollama run qwen3:0.6b \"Hello, how are you?\"\n```\n\n**⚠️ Important**: Keep Ollama running (`ollama serve`) for the entire setup process.\n\n---\n\n## 3. 🐳 Docker Installation & Setup\n\n### 3.1 Install Docker\n\n#### **macOS:**\n```bash\n# Install Docker Desktop via Homebrew\nbrew install --cask docker\n\n# Or download from: https://www.docker.com/products/docker-desktop/\n# Start Docker Desktop from Applications\n\n# Verify installation\ndocker --version\ndocker compose version\n```\n\n#### **Ubuntu/Debian:**\n```bash\n# Update system\nsudo apt-get update\n\n# Install Docker using convenience script\ncurl -fsSL https://get.docker.com -o get-docker.sh\nsudo sh get-docker.sh\n\n# Add user to docker group\nsudo usermod -aG docker $USER\nnewgrp docker\n\n# Install Docker Compose V2\nsudo apt-get install docker-compose-plugin\n\n# Verify installation\ndocker --version\ndocker compose version\n```\n\n#### **Windows:**\n1. Download Docker Desktop from https://www.docker.com/products/docker-desktop/\n2. Run installer and enable WSL 2 integration\n3. Restart computer and start Docker Desktop\n4. Verify in PowerShell: `docker --version`\n\n### 3.2 Clone and Setup RAG System\n\n```bash\n# Clone repository\ngit clone <your-repository-url>\ncd rag_system_old\n\n# Verify Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Start Docker containers\n./start-docker.sh\n\n# Wait for containers to start (2-3 minutes)\nsleep 120\n\n# Verify deployment\n./start-docker.sh status\n```\n\n### 3.3 Test Docker Deployment\n\n```bash\n# Test all endpoints\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n\n# Access the application\nopen http://localhost:3000\n```\n\n---\n\n## 4. 💻 Direct Development Setup\n\n### 4.1 Install Development Dependencies\n\n#### **Python Setup:**\n```bash\n# Clone repository\ngit clone https://github.com/your-org/rag-system.git\ncd rag-system\n\n# Create virtual environment (recommended)\npython -m venv venv\n\n# Activate virtual environment\nsource venv/bin/activate  # macOS/Linux\n# venv\\Scripts\\activate   # Windows\n\n# Install Python dependencies\npip install -r requirements.txt\n\n# Verify Python setup\npython -c \"import torch; print('✅ PyTorch OK')\"\npython -c \"import transformers; print('✅ Transformers OK')\"\npython -c \"import lancedb; print('✅ LanceDB OK')\"\n```\n\n#### **Node.js Setup:**\n```bash\n# Install Node.js dependencies\nnpm install\n\n# Verify Node.js setup\nnode --version  # Should be 16+\nnpm --version\nnpm list --depth=0\n```\n\n### 4.2 Start Direct Development\n\n```bash\n# Ensure Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Start all components with one command\npython run_system.py\n\n# Or start components manually in separate terminals:\n# Terminal 1: python -m rag_system.api_server\n# Terminal 2: cd backend && python server.py  \n# Terminal 3: npm run dev\n```\n\n### 4.3 Test Direct Development\n\n```bash\n# Check system health\npython system_health_check.py\n\n# Test endpoints\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\n\n# Access the application\nopen http://localhost:3000\n```\n\n---\n\n## 5. Detailed Installation Steps\n\n### 5.1 Repository Setup\n\n```bash\n# Clone repository\ngit clone https://github.com/your-org/rag-system.git\ncd rag-system\n\n# Check repository structure\nls -la\n\n# Create required directories\nmkdir -p lancedb index_store shared_uploads logs backend\ntouch backend/chat_data.db\n\n# Set permissions\nchmod -R 755 lancedb index_store shared_uploads\nchmod 664 backend/chat_data.db\n```\n\n### 5.2 Configuration\n\n#### **Environment Variables**\nFor Docker (automatic via `docker.env`):\n```bash\nOLLAMA_HOST=http://host.docker.internal:11434\nNODE_ENV=production\nRAG_API_URL=http://rag-api:8001\nNEXT_PUBLIC_API_URL=http://localhost:8000\n```\n\nFor Direct Development (set automatically by `run_system.py`):\n```bash\nOLLAMA_HOST=http://localhost:11434\nRAG_API_URL=http://localhost:8001\nNEXT_PUBLIC_API_URL=http://localhost:8000\n```\n\n#### **Model Configuration**\nThe system defaults to these models:\n- **Embedding**: `Qwen/Qwen3-Embedding-0.6B` (1024 dimensions)\n- **Generation**: `qwen3:0.6b` for fast responses, `qwen3:8b` for quality\n- **Reranking**: Built-in cross-encoder\n\n### 5.3 Database Initialization\n\n```bash\n# Initialize SQLite database\npython -c \"\nfrom backend.database import ChatDatabase\ndb = ChatDatabase()\ndb.init_database()\nprint('✅ Database initialized')\n\"\n\n# Verify database\nsqlite3 backend/chat_data.db \".tables\"\n```\n\n---\n\n## 6. Verification & Testing\n\n### 6.1 System Health Checks\n\n#### **Comprehensive Health Check:**\n```bash\n# For Docker deployment\n./start-docker.sh status\ndocker compose ps\n\n# For Direct development\npython system_health_check.py\n\n# Universal health check\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"\ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n```\n\n#### **RAG System Test:**\n```bash\n# Test RAG system initialization\npython -c \"\nfrom rag_system.main import get_agent\nagent = get_agent('default')\nprint('✅ RAG System initialized successfully')\n\"\n\n# Test embedding generation\npython -c \"\nfrom rag_system.main import get_agent\nagent = get_agent('default')\nembedder = agent.retrieval_pipeline._get_text_embedder()\ntest_emb = embedder.create_embeddings(['Hello world'])\nprint(f'✅ Embedding generated: {test_emb.shape}')\n\"\n```\n\n### 6.2 Functional Testing\n\n#### **Document Upload Test:**\n1. Access http://localhost:3000\n2. Click \"Create New Index\"\n3. Upload a PDF document\n4. Configure settings and build index\n5. Test chat functionality\n\n#### **API Testing:**\n```bash\n# Test session creation\ncurl -X POST http://localhost:8000/sessions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"title\": \"Test Session\"}'\n\n# Test models endpoint\ncurl http://localhost:8001/models\n\n# Test health endpoints\ncurl http://localhost:8000/health\ncurl http://localhost:8001/health\n```\n\n---\n\n## 7. Troubleshooting Installation\n\n### 7.1 Common Issues\n\n#### **Ollama Issues:**\n```bash\n# Ollama not responding\ncurl http://localhost:11434/api/tags\n\n# If fails, restart Ollama\npkill ollama\nollama serve\n\n# Reinstall models if needed\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n#### **Docker Issues:**\n```bash\n# Docker daemon not running\ndocker version\n\n# Restart Docker Desktop (macOS/Windows)\n# Or restart docker service (Linux)\nsudo systemctl restart docker\n\n# Clear Docker cache if build fails\ndocker system prune -f\n```\n\n#### **Python Issues:**\n```bash\n# Check Python version\npython --version  # Should be 3.8+\n\n# Check virtual environment\nwhich python\npip list | grep torch\n\n# Reinstall dependencies\npip install -r requirements.txt --force-reinstall\n```\n\n#### **Node.js Issues:**\n```bash\n# Check Node version\nnode --version  # Should be 16+\n\n# Clear and reinstall\nrm -rf node_modules package-lock.json\nnpm install\n```\n\n### 7.2 Performance Issues\n\n#### **Memory Problems:**\n```bash\n# Check system memory\nfree -h  # Linux\nvm_stat  # macOS\n\n# For Docker: Increase memory allocation\n# Docker Desktop → Settings → Resources → Memory → 8GB+\n\n# Use smaller models\nollama pull qwen3:0.6b  # Instead of qwen3:8b\n```\n\n#### **Slow Performance:**\n- Use SSD storage for databases (`lancedb/`, `shared_uploads/`)\n- Increase CPU cores if possible\n- Close unnecessary applications\n- Use smaller batch sizes in configuration\n\n---\n\n## 8. Post-Installation Setup\n\n### 8.1 Model Optimization\n\n```bash\n# Install additional models (optional)\nollama pull nomic-embed-text        # Alternative embedding model\nollama pull llama3.1:8b            # Alternative generation model\n\n# Test model switching\ncurl -X POST http://localhost:8001/chat \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"Hello\", \"model\": \"qwen3:8b\"}'\n```\n\n### 8.2 Security Configuration\n\n```bash\n# Set proper file permissions\nchmod 600 backend/chat_data.db    # Restrict database access\nchmod 700 lancedb/                # Restrict vector DB access\n\n# Configure firewall (production)\nsudo ufw allow 3000/tcp           # Frontend\nsudo ufw deny 8000/tcp            # Backend (internal only)\nsudo ufw deny 8001/tcp            # RAG API (internal only)\n```\n\n### 8.3 Backup Setup\n\n```bash\n# Create backup script\ncat > backup_system.sh << 'EOF'\n#!/bin/bash\nBACKUP_DIR=\"backups/$(date +%Y%m%d_%H%M%S)\"\nmkdir -p \"$BACKUP_DIR\"\n\n# Backup databases and indexes\ncp -r backend/chat_data.db \"$BACKUP_DIR/\"\ncp -r lancedb \"$BACKUP_DIR/\"\ncp -r index_store \"$BACKUP_DIR/\"\ncp -r shared_uploads \"$BACKUP_DIR/\"\n\necho \"Backup completed: $BACKUP_DIR\"\nEOF\n\nchmod +x backup_system.sh\n```\n\n---\n\n## 9. Success Criteria\n\n### 9.1 Installation Complete When:\n\n- ✅ All health checks pass without errors\n- ✅ Frontend loads at http://localhost:3000\n- ✅ All models are installed and responding\n- ✅ You can create document indexes\n- ✅ You can chat with uploaded documents\n- ✅ No error messages in logs/terminal\n\n### 9.2 Performance Benchmarks\n\n**Acceptable Performance:**\n- System startup: < 5 minutes\n- Index creation: < 2 minutes per 100MB document\n- Query response: < 30 seconds\n- Memory usage: < 8GB total\n\n**Optimal Performance:**\n- System startup: < 2 minutes\n- Index creation: < 1 minute per 100MB document\n- Query response: < 10 seconds\n- Memory usage: < 4GB total\n\n---\n\n## 10. Next Steps\n\n### 10.1 Getting Started\n\n1. **Upload Documents**: Create your first index with PDF documents\n2. **Explore Features**: Try different query types and models\n3. **Customize**: Adjust model settings and chunk sizes\n4. **Scale**: Add more documents and create multiple indexes\n\n### 10.2 Additional Resources\n\n- **Quick Start**: See `Documentation/quick_start.md`\n- **Docker Usage**: See `Documentation/docker_usage.md`\n- **System Architecture**: See `Documentation/architecture_overview.md`\n- **API Reference**: See `Documentation/api_reference.md`\n\n---\n\n**Congratulations! 🎉** Your RAG system is now ready to use. Visit http://localhost:3000 to start chatting with your documents. "
  },
  {
    "path": "Documentation/prompt_inventory.md",
    "content": "# 📜 Prompt Inventory (Ground-Truth)\n\n_All generation / verification prompts currently hard-coded in the codebase._  \n_Last updated: 2025-07-06_\n\n> Edit process: if you change a prompt in code, please **update this file** or, once we migrate to the central registry, delete the entry here.\n\n---\n\n## 1. Indexing / Context Enrichment\n\n| ID | File & Lines | Variable / Builder | Purpose |\n|----|--------------|--------------------|---------|\n| `overview_builder.default` | `rag_system/indexing/overview_builder.py` `12-21` | `DEFAULT_PROMPT` | Generate 1-paragraph document overview for search-time routing.\n| `contextualizer.system` | `rag_system/indexing/contextualizer.py` `11` | `SYSTEM_PROMPT` | System instruction: explain summarisation role.\n| `contextualizer.local_context` | same file `13-15` | `LOCAL_CONTEXT_PROMPT_TEMPLATE` | Human message – wraps neighbouring chunks.\n| `contextualizer.chunk` | same file `17-19` | `CHUNK_PROMPT_TEMPLATE` | Human message – shows the target chunk.\n| `graph_extractor.entities` | `rag_system/indexing/graph_extractor.py` `20-31` | `entity_prompt` | Ask LLM to list entities.\n| `graph_extractor.relationships` | same file `53-64` | `relationship_prompt` | Ask LLM to list relationships.\n\n## 2. Retrieval / Query Transformation\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `query_transformer.expand` | `rag_system/retrieval/query_transformer.py` `10-26` | Produce query rewrites (keywords, boolean). |\n| `hyde.hypothetical_doc` | same `115-122` | HyDE hypothetical document generator. |\n| `graph_query.translate` | same `124-140` | Translate user question to JSON KG query. |\n\n## 3. Pipeline Answer Synthesis\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `retrieval_pipeline.synth_final` | `rag_system/pipelines/retrieval_pipeline.py` `217-256` | Turn verified facts into answer (with directives 1-6). |\n\n## 4. Agent – Classical Loop\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `agent.loop.initial_thought` | `rag_system/agent/loop.py` `157-180` | First LLM call to think about query. |\n| `agent.loop.verify_path` | same `190-205` | Secondary thought loop. |\n| `agent.loop.compose_sub` | same `506-542` | Compose answer from sub-answers. |\n| `agent.loop.router` | same `648-660` | Decide which subsystem handles query. |\n\n## 5. Verifier\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `verifier.fact_check` | `rag_system/agent/verifier.py` `18-58` | Strict JSON-format grounding verifier. |\n\n## 6. Backend Router (Fast path)\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `backend.router` | `backend/server.py` `435-448` | Decide \"RAG vs direct LLM\" before heavy processing. |\n\n## 7. Miscellaneous\n\n| ID | File & Lines | Purpose |\n|----|--------------|---------|\n| `vision.placeholder` | `rag_system/utils/ollama_client.py` `169` | Dummy prompt for VLM colour check. |\n\n---\n\n### Missing / To-Do\n1. Verify whether **ReActAgent.PROMPT_TEMPLATE** captures every placeholder – some earlier lines may need explicit ID when we move to central registry.\n2. Search TS/JS code once the backend prompts are ported (currently none).\n\n---\n\n**Next step:** create `rag_system/prompts/registry.yaml` and start moving each prompt above into a key–value entry with identical IDs. Update callers gradually using the helper proposed earlier. "
  },
  {
    "path": "Documentation/quick_start.md",
    "content": "# ⚡ Quick Start Guide - RAG System\n\n_Get up and running in 5 minutes!_\n\n---\n\n## 🚀 Choose Your Deployment Method\n\n### Option 1: Docker Deployment (Production Ready) 🐳\n\nBest for: Production deployments, isolated environments, easy scaling\n\n### Option 2: Direct Development (Developer Friendly) 💻  \n\nBest for: Development, customization, debugging, faster iteration\n\n---\n\n## 🐳 Docker Deployment\n\n### Prerequisites\n- Docker Desktop installed and running\n- 8GB+ RAM available\n- Internet connection\n\n### Step 1: Clone and Setup\n\n```bash\n# Clone repository\ngit clone <your-repository-url>\ncd rag_system_old\n\n# Ensure Docker is running\ndocker version\n```\n\n### Step 2: Install Ollama Locally\n\n**Even with Docker, Ollama runs locally for better performance:**\n\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama (in one terminal)\nollama serve\n\n# Install models (in another terminal)\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n### Step 3: Start Docker Containers\n\n```bash\n# Start all containers\n./start-docker.sh\n\n# Or manually:\ndocker compose --env-file docker.env up --build -d\n```\n\n### Step 4: Verify Deployment\n\n```bash\n# Check container status\ndocker compose ps\n\n# Test endpoints\ncurl http://localhost:3000      # Frontend\ncurl http://localhost:8000/health  # Backend  \ncurl http://localhost:8001/models  # RAG API\n```\n\n### Step 5: Access Application\n\nOpen your browser to: **http://localhost:3000**\n\n---\n\n## 💻 Direct Development\n\n### Prerequisites\n- Python 3.8+\n- Node.js 16+ and npm\n- 8GB+ RAM available\n\n### Step 1: Clone and Install Dependencies\n\n```bash\n# Clone repository\ngit clone <your-repository-url>\ncd rag_system_old\n\n# Install Python dependencies\npip install -r requirements.txt\n\n# Install Node.js dependencies  \nnpm install\n```\n\n### Step 2: Install and Configure Ollama\n\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Start Ollama (in one terminal)\nollama serve\n\n# Install models (in another terminal)\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n```\n\n### Step 3: Start the System\n\n```bash\n# Start all components with one command\npython run_system.py\n```\n\n**Or start components manually in separate terminals:**\n\n```bash\n# Terminal 1: RAG API\npython -m rag_system.api_server\n\n# Terminal 2: Backend\ncd backend && python server.py\n\n# Terminal 3: Frontend\nnpm run dev\n```\n\n### Step 4: Verify Installation\n\n```bash\n# Check system health\npython system_health_check.py\n\n# Test endpoints\ncurl http://localhost:3000      # Frontend\ncurl http://localhost:8000/health  # Backend\ncurl http://localhost:8001/models  # RAG API\n```\n\n### Step 5: Access Application\n\nOpen your browser to: **http://localhost:3000**\n\n---\n\n## 🎯 First Use Guide\n\n### 1. Create a Chat Session\n- Click \"New Chat\" in the interface\n- Give your session a descriptive name\n\n### 2. Upload Documents\n- Click \"Create New Index\" button\n- Upload PDF files from your computer\n- Configure processing options:\n  - **Chunk Size**: 512 (recommended)\n  - **Embedding Model**: Qwen/Qwen3-Embedding-0.6B\n  - **Enable Enrichment**: Yes\n- Click \"Build Index\" and wait for processing\n\n### 3. Start Chatting\n- Select your built index\n- Ask questions about your documents:\n  - \"What is this document about?\"\n  - \"Summarize the key points\"\n  - \"What are the main findings?\"\n  - \"Compare the arguments in section 3 and 5\"\n\n---\n\n## 🔧 Management Commands\n\n### Docker Commands\n\n```bash\n# Container management\n./start-docker.sh                    # Start all containers\n./start-docker.sh stop              # Stop all containers\n./start-docker.sh logs              # View logs\n./start-docker.sh status            # Check status\n\n# Manual Docker Compose\ndocker compose ps                    # Check status\ndocker compose logs -f              # Follow logs\ndocker compose down                 # Stop containers\ndocker compose up --build -d        # Rebuild and start\n```\n\n### Direct Development Commands\n\n```bash\n# System management\npython run_system.py               # Start all services\npython system_health_check.py      # Check system health\n\n# Individual components\npython -m rag_system.api_server    # RAG API only\ncd backend && python server.py     # Backend only\nnpm run dev                         # Frontend only\n\n# Stop: Press Ctrl+C in terminal running services\n```\n\n---\n\n## 🆘 Quick Troubleshooting\n\n### Docker Issues\n\n**Containers not starting?**\n```bash\n# Check Docker daemon\ndocker version\n\n# Restart Docker Desktop and try again\n./start-docker.sh\n```\n\n**Port conflicts?**\n```bash\n# Check what's using ports\nlsof -i :3000 -i :8000 -i :8001\n\n# Stop conflicting processes\n./start-docker.sh stop\n```\n\n### Direct Development Issues\n\n**Import errors?**\n```bash\n# Check Python installation\npython --version  # Should be 3.8+\n\n# Reinstall dependencies\npip install -r requirements.txt --force-reinstall\n```\n\n**Node.js errors?**\n```bash\n# Check Node version\nnode --version    # Should be 16+\n\n# Reinstall dependencies\nrm -rf node_modules package-lock.json\nnpm install\n```\n\n### Common Issues\n\n**Ollama not responding?**\n```bash\n# Check if Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Restart Ollama\npkill ollama\nollama serve\n```\n\n**Out of memory?**\n```bash\n# Check memory usage\ndocker stats  # For Docker\nhtop          # For direct development\n\n# Recommended: 16GB+ RAM for optimal performance\n```\n\n---\n\n## 📊 System Verification\n\nRun this comprehensive check:\n\n```bash\n# Check all endpoints\ncurl -f http://localhost:3000 && echo \"✅ Frontend OK\"\ncurl -f http://localhost:8000/health && echo \"✅ Backend OK\"  \ncurl -f http://localhost:8001/models && echo \"✅ RAG API OK\"\ncurl -f http://localhost:11434/api/tags && echo \"✅ Ollama OK\"\n\n# For Docker: Check containers\ndocker compose ps\n```\n\n---\n\n## 🎉 Success!\n\nIf you see:\n- ✅ All services responding\n- ✅ Frontend accessible at http://localhost:3000  \n- ✅ No error messages\n\nYou're ready to start using LocalGPT!\n\n### What's Next?\n\n1. **📚 Upload Documents**: Add your PDF files to create indexes\n2. **💬 Start Chatting**: Ask questions about your documents\n3. **🔧 Customize**: Explore different models and settings\n4. **📖 Learn More**: Check the full documentation below\n\n### 📁 Key Files\n\n```\nrag-system/\n├── 🐳 start-docker.sh           # Docker deployment script\n├── 🏃 run_system.py             # Direct development launcher\n├── 🩺 system_health_check.py    # System verification\n├── 📋 requirements.txt          # Python dependencies\n├── 📦 package.json              # Node.js dependencies\n├── 📁 Documentation/            # Complete documentation\n└── 📁 rag_system/              # Core system code\n```\n\n### 📖 Additional Resources\n\n- **🏗️ Architecture**: See `Documentation/architecture_overview.md`\n- **🔧 Configuration**: See `Documentation/system_overview.md`  \n- **🚀 Deployment**: See `Documentation/deployment_guide.md`\n- **🐛 Troubleshooting**: See `DOCKER_TROUBLESHOOTING.md`\n\n---\n\n**Happy RAG-ing! 🚀** \n\n---\n\n## 🛠️ Indexing Scripts\n\nThe repository includes several convenient scripts for document indexing:\n\n### Simple Index Creation Script\n\nFor quick document indexing without the UI:\n\n```bash\n# Basic usage\n./simple_create_index.sh \"Index Name\" \"document.pdf\"\n\n# Multiple documents\n./simple_create_index.sh \"Research Papers\" \"paper1.pdf\" \"paper2.pdf\" \"notes.txt\"\n\n# Using wildcards\n./simple_create_index.sh \"Invoice Collection\" ./invoices/*.pdf\n```\n\n**Supported file types**: PDF, TXT, DOCX, MD\n\n### Batch Indexing Script\n\nFor processing large document collections:\n\n```bash\n# Using the Python batch indexing script\npython demo_batch_indexing.py\n\n# Or using the direct indexing script\npython create_index_script.py\n```\n\nThese scripts automatically:\n- ✅ Check prerequisites (Ollama running, Python dependencies)\n- ✅ Validate document formats\n- ✅ Create database entries\n- ✅ Process documents with the RAG pipeline\n- ✅ Generate searchable indexes\n\n--- "
  },
  {
    "path": "Documentation/retrieval_pipeline.md",
    "content": "# 📥 Retrieval Pipeline\n\n_Maps to `rag_system/pipelines/retrieval_pipeline.py` and helpers in `retrieval/`, `rerankers/`._\n\n## Role\nGiven a **user query** and one or more indexed tables, retrieve the most relevant text chunks and synthesise an answer.\n\n## Sub-components\n| Stage | Module | Key Classes / Fns | Notes |\n|-------|--------|-------------------|-------|\n| Query Pre-processing | `retrieval/query_transformer.py` | `QueryTransformer`, `HyDEGenerator`, `GraphQueryTranslator` | Expands, rewrites, or translates the raw query. |\n| Retrieval | `retrieval/retrievers.py` | `BM25Retriever`, `DenseRetriever`, `HybridRetriever` | Abstract over LanceDB vector + FTS search. |\n| Reranking | `rerankers/reranker.py` | `ColBERTSmall`, fallback `bge-reranker` | Optionally improves result ordering. |\n| Synthesis | `pipelines/retrieval_pipeline.py` | `_synthesize_final_answer()` | Calls LLM with evidence snippets. |\n\n## End-to-End Flow\n\n```mermaid\nflowchart LR\n    Q[\"User Query\"] --> XT[\"Query Transformer\"]\n    XT -->|variants| RETRIEVE\n    subgraph Retrieval\n        RET_BM25[BM25] --> MERGE\n        RET_DENSE[Dense Vector] --> MERGE\n        style RET_BM25 fill:#444,stroke:#ccc,color:#fff\n        style RET_DENSE fill:#444,stroke:#ccc,color:#fff\n    end\n    MERGE --> RERANK\n    RERANK --> K[[\"Top-K Chunks\"]]\n    K --> SYNTH[\"Answer Synthesiser\\n(LLM)\"]\n    SYNTH --> A[\"Answer + Sources\"]\n```\n\n### Narrative\n1. **Query Transformer** may expand the query (keyword list, HyDE doc, KG translation) depending on `searchType`.\n2. **Retrievers** execute BM25 and/or dense similarity against LanceDB.  Combination controlled by `retrievalMode` and `denseWeight`.\n3. **Reranker** (if `aiRerank=true` or hybrid search) scores snippets; top `rerankerTopK` chosen.\n4. **Synthesiser** streams an LLM completion using the prompt described in `prompt_inventory.md` (`retrieval_pipeline.synth_final`).\n\n## Configuration Flags (passed from UI → backend)\n| Flag | Default | Effect |\n|------|---------|--------|\n| `searchType` | `fts` | UI label (FTS / Dense / Hybrid). |\n| `retrievalK` | 10 | Initial candidate count per retriever. |\n| `contextWindowSize` | 5 | How many adjacent chunks to merge (late-chunk). |\n| `rerankerTopK` | 20 | How many docs to pass into AI reranker. |\n| `denseWeight` | 0.5 | When `hybrid`, linear mix weight. |\n| `aiRerank` | bool | Toggle reranker. |\n| `verify` | bool | If true, pass answer to **Verifier** component. |\n\n## Interfaces\n* Reads from **LanceDB** tables `text_pages_<index>`.\n* Calls **Ollama** generation model specified in `PIPELINE_CONFIGS`.\n* Exposes `RetrievalPipeline.answer_stream()` iterator consumed by SSE API.\n\n## Extension Points\n* Plug new retriever by inheriting `BaseRetriever` and registering in `retrievers.py`.\n* Swap reranker model via `EXTERNAL_MODELS['reranker_model']`.\n* Custom answer prompt can be overridden by passing `prompt_override` to `_synthesize_final_answer()` (not yet surfaced in UI).\n\n##  Detailed Implementation Analysis\n\n### Core Architecture Pattern\nThe `RetrievalPipeline` uses **lazy initialization** for all components to avoid heavy memory usage during startup. Each component (embedder, retrievers, rerankers) is only loaded when first accessed via private `_get_*()` methods.\n\n```python\ndef _get_text_embedder(self):\n    if self.text_embedder is None:\n        self.text_embedder = select_embedder(\n            self.config.get(\"embedding_model_name\", \"Qwen/Qwen3-Embedding-0.6B\"),\n            self.ollama_config.get(\"host\")\n        )\n    return self.text_embedder\n```\n\n### Thread Safety Implementation\n**Critical Issue**: ColBERT reranker and model loading are not thread-safe. The system uses multiple locks:\n\n```python\n# Global locks to prevent race conditions\n_rerank_lock: Lock = Lock()           # Protects .rank() calls\n_ai_reranker_init_lock: Lock = Lock() # Prevents concurrent model loading\n_sentence_pruner_lock: Lock = Lock()  # Serializes Provence model init\n```\n\nWhen multiple queries run in parallel, only one thread can initialize heavy models or perform reranking operations.\n\n### Retrieval Strategy Deep-Dive\n\n#### 1. Multi-Vector Dense Retrieval (`_get_dense_retriever()`)\n```python\nself.dense_retriever = MultiVectorRetriever(\n    db_manager,           # LanceDB connection\n    text_embedder,        # Qwen3-Embedding embedder\n    vision_model=None,    # Optional multimodal\n    fusion_config={}      # Score combination rules\n)\n```\n\n**Process**:\n1. Query → embedding vector (1024D for Qwen3-Embedding-0.6B)\n2. LanceDB ANN search using IVF-PQ index\n3. Cosine similarity scoring\n4. Returns top-K with metadata\n\n#### 2. BM25 Full-Text Search (`_get_bm25_retriever()`)\n```python\n# Uses SQLite FTS5 under the hood\nSELECT chunk_id, text, bm25(fts_table) as score \nFROM fts_table \nWHERE fts_table MATCH ? \nORDER BY bm25(fts_table) \nLIMIT ?\n```\n\n**Token Processing**:\n- Stemming via Porter algorithm\n- Stop-word removal\n- N-gram tokenization (configurable)\n\n#### 3. Hybrid Score Fusion\nWhen both retrievers are enabled:\n```python\nfinal_score = (1 - dense_weight) * bm25_score + dense_weight * dense_score\n```\nDefault `dense_weight = 0.7` favors semantic over lexical matching (updated from 0.5).\n\n### Late-Chunk Merging Algorithm\n\n**Problem**: Small chunks lose context; large chunks dilute relevance.  \n**Solution**: Retrieve small chunks, then expand with neighbors.\n\n```python\ndef _get_surrounding_chunks_lancedb(self, chunk, window_size):\n    start_index = max(0, chunk_index - window_size)\n    end_index = chunk_index + window_size\n    \n    sql_filter = f\"document_id = '{document_id}' AND chunk_index >= {start_index} AND chunk_index <= {end_index}\"\n    results = tbl.search().where(sql_filter).to_list()\n    \n    # Sort by chunk_index to maintain document order\n    return sorted(results, key=lambda x: x.get(\"chunk_index\", 0))\n```\n\n**Benefits**:\n- Maintains granular search precision\n- Provides richer context for answer generation\n- Configurable window size (default: 5 chunks = ~2500 tokens)\n\n### AI Reranker Implementation\n\n#### ColBERT Strategy (via rerankers-lib)\n```python\nfrom rerankers import Reranker\nself.ai_reranker = Reranker(\"answerdotai/answerai-colbert-small-v1\", model_type=\"colbert\")\n\n# Usage\nscores = reranker.rank(query, [doc.text for doc in candidates])\n```\n\n**ColBERT Architecture**:\n- **Query encoding**: Each token → 128D vector\n- **Document encoding**: Each token → 128D vector  \n- **Interaction**: MaxSim between all query-doc token pairs\n- **Advantage**: Fine-grained token-level matching\n\n#### Fallback: BGE Cross-Encoder\n```python\n# When ColBERT fails/unavailable\nfrom sentence_transformers import CrossEncoder\nmodel = CrossEncoder('BAAI/bge-reranker-base')\nscores = model.predict([(query, doc.text) for doc in candidates])\n```\n\n### Answer Synthesis Pipeline\n\n#### Prompt Engineering Pattern\n```python\ndef _synthesize_final_answer(self, query: str, facts: str, *, event_callback=None):\n    prompt = f\"\"\"\nYou are an AI assistant specialised in answering questions from retrieved context.\n\nContext you receive\n• VERIFIED FACTS – text snippets retrieved from the user's documents.\n• ORIGINAL QUESTION – the user's actual query.\n\nInstructions\n1. Evaluate each snippet for relevance to the ORIGINAL QUESTION\n2. Synthesise an answer **using only information from relevant snippets**\n3. If snippets contradict, mention the contradiction explicitly\n4. If insufficient information: \"I could not find that information in the provided documents.\"\n5. Provide thorough, well-structured answer with relevant numbers/names\n6. Do **not** introduce external knowledge\n\n–––––  Retrieved Snippets  –––––\n{facts}\n––––––––––––––––––––––––––––––\n\nORIGINAL QUESTION: \"{query}\"\n\"\"\"\n\n    response = self.llm_client.complete_stream(\n        prompt=prompt,\n        model=self.ollama_config[\"generation_model\"]  # qwen3:8b\n    )\n    \n    for chunk in response:\n        if event_callback:\n            event_callback({\"type\": \"answer_chunk\", \"content\": chunk})\n        yield chunk\n```\n\n**Advanced Features**:\n- **Source Attribution**: Automatic citation generation\n- **Confidence Scoring**: Based on retrieval scores and snippet relevance\n- **Answer Verification**: Optional grounding check via Verifier component\n\n### Query Processing and Transformation\n\n#### Query Decomposition\n```python\nclass QueryDecomposer:\n    def decompose_query(self, query: str) -> List[str]:\n        \"\"\"Break complex queries into simpler sub-queries.\"\"\"\n        decomposition_prompt = f\"\"\"\n        Break down this complex question into 2-4 simpler sub-questions that would help answer the original question.\n        \n        Original question: {query}\n        \n        Sub-questions:\n        1.\n        2.\n        3.\n        4.\n        \"\"\"\n        \n        response = self.llm_client.complete(\n            prompt=decomposition_prompt,\n            model=self.enrichment_model  # qwen3:0.6b for speed\n        )\n        \n        # Parse response into list of sub-queries\n        return self._parse_subqueries(response)\n```\n\n#### HyDE (Hypothetical Document Embeddings)\n```python\nclass HyDEGenerator:\n    def generate_hypothetical_doc(self, query: str) -> str:\n        \"\"\"Generate hypothetical document that would answer the query.\"\"\"\n        hyde_prompt = f\"\"\"\n        Generate a hypothetical document passage that would perfectly answer this question:\n        \n        Question: {query}\n        \n        Hypothetical passage:\n        \"\"\"\n        \n        response = self.llm_client.complete(\n            prompt=hyde_prompt,\n            model=self.enrichment_model\n        )\n        \n        return response.strip()\n```\n\n### Caching and Performance Optimization\n\n#### Semantic Query Caching\n```python\nclass RetrievalPipeline:\n    def __init__(self, config, ollama_client, ollama_config):\n        # TTL cache for embeddings and results\n        self.query_cache = TTLCache(maxsize=100, ttl=300)  # 5 min TTL\n        self.embedding_cache = LRUCache(maxsize=500)\n        self.semantic_threshold = 0.98  # Similarity threshold for cache hits\n    \n    def get_cached_result(self, query: str, session_id: str = None) -> Optional[Dict]:\n        \"\"\"Check for semantically similar cached queries.\"\"\"\n        query_embedding = self._get_text_embedder().create_embeddings([query])[0]\n        \n        for cached_query, cached_data in self.query_cache.items():\n            cached_embedding = cached_data[\"embedding\"]\n            similarity = cosine_similarity([query_embedding], [cached_embedding])[0][0]\n            \n            if similarity > self.semantic_threshold:\n                # Check session scope if configured\n                if self.cache_scope == \"session\" and cached_data.get(\"session_id\") != session_id:\n                    continue\n                \n                print(f\"🎯 Cache hit: {similarity:.3f} similarity\")\n                return cached_data[\"result\"]\n        \n        return None\n```\n\n#### Batch Processing Optimizations\n```python\ndef process_query_batch(self, queries: List[str]) -> List[Dict]:\n    \"\"\"Process multiple queries efficiently.\"\"\"\n    # Batch embed all queries\n    query_embeddings = self._get_text_embedder().create_embeddings(queries)\n    \n    # Batch search\n    results = []\n    for i, query in enumerate(queries):\n        embedding = query_embeddings[i]\n        \n        # Search with pre-computed embedding\n        dense_results = self._search_dense_with_embedding(embedding)\n        bm25_results = self._search_bm25(query)\n        \n        # Combine and rerank\n        combined = self._combine_results(dense_results, bm25_results)\n        reranked = self._rerank_batch([query], [combined])[0]\n        \n        results.append(reranked)\n    \n    return results\n```\n\n### Advanced Search Features\n\n#### Conversational Context Integration\n```python\ndef answer_with_history(self, query: str, conversation_history: List[Dict], **kwargs):\n    \"\"\"Answer query with conversation context.\"\"\"\n    # Build conversational context\n    context_prompt = self._build_conversation_context(conversation_history)\n    \n    # Expand query with context\n    expanded_query = f\"{context_prompt}\\n\\nCurrent question: {query}\"\n    \n    # Process with expanded context\n    return self.answer_stream(expanded_query, **kwargs)\n\ndef _build_conversation_context(self, history: List[Dict]) -> str:\n    \"\"\"Build context from conversation history.\"\"\"\n    context_parts = []\n    \n    for turn in history[-3:]:  # Last 3 turns for context\n        if turn.get(\"role\") == \"user\":\n            context_parts.append(f\"Previous question: {turn['content']}\")\n        elif turn.get(\"role\") == \"assistant\":\n            # Extract key points from previous answers\n            context_parts.append(f\"Previous context: {turn['content'][:200]}...\")\n    \n    return \"\\n\".join(context_parts)\n```\n\n#### Multi-Index Search\n```python\ndef search_multiple_indexes(self, query: str, index_ids: List[str], **kwargs):\n    \"\"\"Search across multiple document indexes.\"\"\"\n    all_results = []\n    \n    for index_id in index_ids:\n        table_name = f\"text_pages_{index_id}\"\n        \n        try:\n            # Search individual index\n            index_results = self._search_single_index(query, table_name, **kwargs)\n            \n            # Add index metadata\n            for result in index_results:\n                result[\"source_index\"] = index_id\n            \n            all_results.extend(index_results)\n            \n        except Exception as e:\n            print(f\"⚠️ Error searching index {index_id}: {e}\")\n            continue\n    \n    # Global reranking across all indexes\n    if len(all_results) > kwargs.get(\"retrieval_k\", 20):\n        all_results = self._rerank_global(query, all_results, **kwargs)\n    \n    return all_results\n```\n\n### Error Handling and Resilience\n\n#### Graceful Degradation\n```python\ndef answer_stream(self, query: str, **kwargs):\n    \"\"\"Main answer method with comprehensive error handling.\"\"\"\n    try:\n        # Try full pipeline\n        return self._answer_stream_full_pipeline(query, **kwargs)\n        \n    except Exception as e:\n        print(f\"⚠️ Full pipeline failed: {e}\")\n        \n        try:\n            # Fallback: Dense-only search\n            kwargs[\"search_type\"] = \"dense\"\n            kwargs[\"ai_rerank\"] = False\n            return self._answer_stream_fallback(query, **kwargs)\n            \n        except Exception as e2:\n            print(f\"⚠️ Fallback failed: {e2}\")\n            \n            # Last resort: Direct LLM answer\n            return self._direct_llm_answer(query)\n\ndef _direct_llm_answer(self, query: str):\n    \"\"\"Direct LLM answer as last resort.\"\"\"\n    prompt = f\"\"\"\n    The document retrieval system is temporarily unavailable. \n    Please provide a helpful response acknowledging this limitation.\n    \n    User question: {query}\n    \n    Response:\n    \"\"\"\n    \n    response = self.llm_client.complete_stream(\n        prompt=prompt,\n        model=self.ollama_config[\"generation_model\"]\n    )\n    \n    yield \"⚠️ Document search unavailable. Providing general response:\\n\\n\"\n    \n    for chunk in response:\n        yield chunk\n```\n\n#### Recovery Mechanisms\n```python\ndef recover_from_embedding_failure(self, query: str, **kwargs):\n    \"\"\"Recover when embedding model fails.\"\"\"\n    print(\"🔄 Attempting embedding model recovery...\")\n    \n    # Try to reinitialize embedder\n    try:\n        self.text_embedder = None  # Clear failed instance\n        embedder = self._get_text_embedder()  # Reinitialize\n        \n        # Test with simple query\n        test_embedding = embedder.create_embeddings([\"test\"])\n        \n        if test_embedding is not None:\n            print(\"✅ Embedding model recovered\")\n            return True\n            \n    except Exception as e:\n        print(f\"❌ Recovery failed: {e}\")\n    \n    # Fallback to BM25-only search\n    kwargs[\"search_type\"] = \"bm25\"\n    kwargs[\"ai_rerank\"] = False\n    print(\"🔄 Falling back to keyword search only\")\n    \n    return False\n```\n\n### Performance Monitoring and Metrics\n\n#### Query Performance Tracking\n```python\nclass PerformanceTracker:\n    def __init__(self):\n        self.metrics = {\n            \"query_count\": 0,\n            \"avg_response_time\": 0,\n            \"cache_hit_rate\": 0,\n            \"error_rate\": 0,\n            \"embedding_time\": 0,\n            \"retrieval_time\": 0,\n            \"reranking_time\": 0,\n            \"synthesis_time\": 0\n        }\n    \n    @contextmanager\n    def track_query(self, query: str):\n        \"\"\"Context manager for tracking query performance.\"\"\"\n        start_time = time.time()\n        \n        try:\n            yield\n            \n            # Success metrics\n            duration = time.time() - start_time\n            self.metrics[\"query_count\"] += 1\n            self.metrics[\"avg_response_time\"] = (\n                (self.metrics[\"avg_response_time\"] * (self.metrics[\"query_count\"] - 1) + duration) \n                / self.metrics[\"query_count\"]\n            )\n            \n        except Exception as e:\n            # Error metrics\n            self.metrics[\"error_rate\"] = (\n                self.metrics[\"error_rate\"] * self.metrics[\"query_count\"] + 1\n            ) / (self.metrics[\"query_count\"] + 1)\n            \n            raise e\n        \n        finally:\n            self.metrics[\"query_count\"] += 1\n```\n\n#### Resource Usage Monitoring\n```python\ndef monitor_memory_usage(self):\n    \"\"\"Monitor memory usage of pipeline components.\"\"\"\n    import psutil\n    import gc\n    \n    process = psutil.Process()\n    memory_info = process.memory_info()\n    \n    print(f\"Memory Usage: {memory_info.rss / 1024 / 1024:.1f} MB\")\n    \n    # Component-specific monitoring\n    if hasattr(self, 'text_embedder') and self.text_embedder:\n        print(f\"Embedder loaded: {type(self.text_embedder).__name__}\")\n    \n    if hasattr(self, 'ai_reranker') and self.ai_reranker:\n        print(f\"Reranker loaded: {type(self.ai_reranker).__name__}\")\n    \n    # Suggest cleanup if memory usage is high\n    if memory_info.rss > 8 * 1024 * 1024 * 1024:  # 8GB\n        print(\"⚠️ High memory usage detected - consider cleanup\")\n        gc.collect()\n```\n\n---\n\n## Configuration Reference\n\n### Default Pipeline Configuration\n```python\nRETRIEVAL_CONFIG = {\n    \"retriever\": \"multivector\",\n    \"search_type\": \"hybrid\",\n    \"retrieval_k\": 20,\n    \"reranker_top_k\": 10,\n    \"dense_weight\": 0.7,\n    \"late_chunking\": {\n        \"enabled\": True,\n        \"window_size\": 5\n    },\n    \"ai_rerank\": True,\n    \"verify_answers\": False,\n    \"cache_enabled\": True,\n    \"cache_ttl\": 300,\n    \"semantic_cache_threshold\": 0.98\n}\n```\n\n### Model Configuration\n```python\nMODEL_CONFIG = {\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",\n    \"generation_model\": \"qwen3:8b\",\n    \"enrichment_model\": \"qwen3:0.6b\",\n    \"reranker_model\": \"answerdotai/answerai-colbert-small-v1\",\n    \"fallback_reranker\": \"BAAI/bge-reranker-base\"\n}\n```\n\n### Performance Tuning\n```python\nPERFORMANCE_CONFIG = {\n    \"batch_sizes\": {\n        \"embedding\": 32,\n        \"reranking\": 16,\n        \"synthesis\": 1\n    },\n    \"timeouts\": {\n        \"embedding\": 30,\n        \"retrieval\": 60,\n        \"reranking\": 30,\n        \"synthesis\": 120\n    },\n    \"memory_limits\": {\n        \"max_cache_size\": 1000,\n        \"max_results_per_query\": 100,\n        \"chunk_size_limit\": 2048\n    }\n}\n```\n\n## Extension Examples\n\n### Custom Retriever Implementation\n```python\nclass CustomRetriever(BaseRetriever):\n    def search(self, query: str, k: int = 10) -> List[Dict]:\n        \"\"\"Implement custom search logic.\"\"\"\n        # Your custom retrieval implementation\n        pass\n    \n    def get_embeddings(self, texts: List[str]) -> np.ndarray:\n        \"\"\"Generate embeddings for custom retrieval.\"\"\"\n        # Your custom embedding logic\n        pass\n```\n\n### Custom Reranker Implementation\n```python\nclass CustomReranker(BaseReranker):\n    def rank(self, query: str, documents: List[Dict]) -> List[Dict]:\n        \"\"\"Implement custom reranking logic.\"\"\"\n        # Your custom reranking implementation\n        pass\n```\n\n### Custom Query Transformer\n```python\nclass CustomQueryTransformer:\n    def transform(self, query: str, context: Dict = None) -> str:\n        \"\"\"Transform query based on context.\"\"\"\n        # Your custom query transformation logic\n        pass\n``` "
  },
  {
    "path": "Documentation/system_overview.md",
    "content": "# 🏗️ RAG System - Complete System Overview\n\n_Last updated: 2025-01-09_\n\nThis document provides a comprehensive overview of the Advanced Retrieval-Augmented Generation (RAG) System, covering its architecture, components, data flow, and operational characteristics.\n\n---\n\n## 1. System Architecture\n\n### 1.1 High-Level Architecture\n\nThe RAG system implements a sophisticated 4-tier microservices architecture:\n\n```mermaid\ngraph TB\n    subgraph \"Client Layer\"\n        Browser[👤 User Browser]\n        UI[Next.js Frontend<br/>React/TypeScript]\n        Browser --> UI\n    end\n    \n    subgraph \"API Gateway Layer\"\n        Backend[Backend Server<br/>Python HTTP Server<br/>Port 8000]\n        UI -->|REST API| Backend\n    end\n    \n    subgraph \"Processing Layer\"\n        RAG[RAG API Server<br/>Document Processing<br/>Port 8001]\n        Backend -->|Internal API| RAG\n    end\n    \n    subgraph \"LLM Service Layer\"\n        Ollama[Ollama Server<br/>LLM Inference<br/>Port 11434]\n        RAG -->|Model Calls| Ollama\n    end\n    \n    subgraph \"Storage Layer\"\n        SQLite[(SQLite Database<br/>Sessions & Metadata)]\n        LanceDB[(LanceDB<br/>Vector Embeddings)]\n        FileSystem[File System<br/>Documents & Indexes]\n        \n        Backend --> SQLite\n        RAG --> LanceDB\n        RAG --> FileSystem\n    end\n```\n\n### 1.2 Component Breakdown\n\n| Component | Technology | Port | Purpose |\n|-----------|------------|------|---------|\n| **Frontend** | Next.js 15, React 19, TypeScript | 3000 | User interface, chat interactions |\n| **Backend** | Python 3.11, HTTP Server | 8000 | API gateway, session management, routing |\n| **RAG API** | Python 3.11, Advanced NLP | 8001 | Document processing, retrieval, generation |\n| **Ollama** | Go-based LLM server | 11434 | Local LLM inference (embedding, generation) |\n| **SQLite** | Embedded database | - | Sessions, messages, index metadata |\n| **LanceDB** | Vector database | - | Document embeddings, similarity search |\n\n---\n\n## 2. Core Functionality\n\n### 2.1 Intelligent Dual-Layer Routing\n\nThe system's key innovation is its **dual-layer routing architecture** that optimizes both speed and intelligence:\n\n#### **Layer 1: Speed Optimization Routing**\n- **Location**: `backend/server.py`\n- **Purpose**: Route simple queries to Direct LLM (~1.3s) vs complex queries to RAG Pipeline (~20s)\n- **Decision Logic**: Pattern matching, keyword detection, query complexity analysis\n\n```python\n# Example routing decisions\n\"Hello!\" → Direct LLM (greeting pattern)\n\"What does the document say about pricing?\" → RAG Pipeline (document keyword)\n\"What's 2+2?\" → Direct LLM (simple + short)\n\"Summarize the key findings from the report\" → RAG Pipeline (complex + indicators)\n```\n\n#### **Layer 2: Intelligence Optimization Routing**\n- **Location**: `rag_system/agent/loop.py`\n- **Purpose**: Within RAG pipeline, route to optimal processing method\n- **Methods**: \n  - `direct_answer`: General knowledge queries\n  - `rag_query`: Document-specific queries requiring retrieval\n  - `graph_query`: Entity relationship queries (future feature)\n\n### 2.2 Document Processing Pipeline\n\n#### **Indexing Process**\n1. **Document Upload**: PDF files uploaded via web interface\n2. **Text Extraction**: Docling library extracts text with layout preservation\n3. **Chunking**: Intelligent chunking with configurable strategies (DocLing, Late Chunking, Standard)\n4. **Embedding**: Text converted to vector embeddings using Qwen models\n5. **Storage**: Vectors stored in LanceDB with metadata in SQLite\n\n#### **Retrieval Process**\n1. **Query Processing**: User query analyzed and contextualized\n2. **Embedding**: Query converted to vector embedding\n3. **Search**: Hybrid search combining vector similarity and BM25 keyword matching\n4. **Reranking**: AI-powered reranking for relevance optimization\n5. **Synthesis**: LLM generates final answer using retrieved context\n\n### 2.3 Advanced Features\n\n#### **Query Decomposition**\n- Complex queries automatically broken into sub-queries\n- Parallel processing of sub-queries for efficiency\n- Intelligent composition of final answers\n\n#### **Contextual Enrichment**\n- Conversation history integration\n- Context-aware query expansion\n- Session-based memory management\n\n#### **Verification System**\n- Answer verification against source documents\n- Confidence scoring and grounding checks\n- Source attribution and citation\n\n---\n\n## 3. Data Architecture\n\n### 3.1 Storage Systems\n\n#### **SQLite Database** (`backend/chat_data.db`)\n```sql\n-- Core tables\nsessions          -- Chat sessions with metadata\nmessages          -- Individual messages and responses\nindexes           -- Document index metadata\nsession_indexes   -- Links sessions to their indexes\n```\n\n#### **LanceDB Vector Store** (`./lancedb/`)\n```\ntables/\n├── text_pages_[uuid]     -- Document text embeddings\n├── image_pages_[uuid]    -- Image embeddings (future)\n└── metadata_[uuid]       -- Document metadata\n```\n\n#### **File System** (`./index_store/`)\n```\nindex_store/\n├── overviews/           -- Document summaries for routing\n├── bm25/               -- BM25 keyword indexes\n└── graph/              -- Knowledge graph data\n```\n\n### 3.2 Data Flow\n\n1. **Document Upload** → File System (`shared_uploads/`)\n2. **Processing** → Embeddings stored in LanceDB\n3. **Metadata** → Index info stored in SQLite\n4. **Query** → Search LanceDB + SQLite coordination\n5. **Response** → Message history stored in SQLite\n\n---\n\n## 4. Model Architecture\n\n### 4.1 Configurable Model Pipeline\n\nThe system supports multiple embedding and generation models with automatic switching:\n\n#### **Current Model Configuration**\n```python\nEXTERNAL_MODELS = {\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",  # 1024D\n    \"reranker_model\": \"answerdotai/answerai-colbert-small-v1\",  # ColBERT reranker\n    \"vision_model\": \"Qwen/Qwen-VL-Chat\",  # Vision model for multimodal\n    \"fallback_reranker\": \"BAAI/bge-reranker-base\",  # Backup reranker\n}\n\nOLLAMA_CONFIG = {\n    \"generation_model\": \"qwen3:8b\",  # High-quality generation\n    \"enrichment_model\": \"qwen3:0.6b\",  # Fast enrichment/routing\n    \"host\": \"http://localhost:11434\"\n}\n```\n\n#### **Model Switching**\n- **Per-Session**: Each chat session can use different embedding models\n- **Automatic**: System automatically switches models based on index metadata\n- **Dynamic**: Models loaded just-in-time to optimize memory usage\n\n### 4.2 Supported Models\n\n#### **Embedding Models**\n- `Qwen/Qwen3-Embedding-0.6B` (1024D) - Default, fast and high-quality\n\n#### **Generation Models** (via Ollama)\n- `qwen3:8b` - Primary generation model (high quality)\n- `qwen3:0.6b` - Fast enrichment and routing model\n\n#### **Reranking Models**\n- `answerdotai/answerai-colbert-small-v1` - Primary ColBERT reranker\n- `BAAI/bge-reranker-base` - Fallback cross-encoder reranker\n\n#### **Vision Models** (Multimodal)\n- `Qwen/Qwen-VL-Chat` - Vision-language model for image processing\n\n---\n\n## 5. Pipeline Configurations\n\n### 5.1 Default Production Pipeline\n\n```python\nPIPELINE_CONFIGS = {\n    \"default\": {\n        \"description\": \"Production-ready pipeline with hybrid search, AI reranking, and verification\",\n        \"storage\": {\n            \"lancedb_uri\": \"./lancedb\",\n            \"text_table_name\": \"text_pages_v3\", \n            \"bm25_path\": \"./index_store/bm25\",\n            \"graph_path\": \"./index_store/graph/knowledge_graph.gml\"\n        },\n        \"retrieval\": {\n            \"retriever\": \"multivector\",\n            \"search_type\": \"hybrid\",\n            \"late_chunking\": {\n                \"enabled\": True,\n                \"table_suffix\": \"_lc_v3\"\n            },\n            \"dense\": { \n                \"enabled\": True,\n                \"weight\": 0.7\n            },\n            \"bm25\": { \n                \"enabled\": True,\n                \"index_name\": \"rag_bm25_index\"\n            }\n        },\n        \"embedding_model_name\": \"Qwen/Qwen3-Embedding-0.6B\",\n        \"reranker\": {\n            \"enabled\": True,\n            \"model_name\": \"answerdotai/answerai-colbert-small-v1\",\n            \"top_k\": 20\n        }\n    }\n}\n```\n\n### 5.2 Processing Options\n\n#### **Chunking Strategies**\n- **Standard**: Fixed-size chunks with overlap\n- **DocLing**: Structure-aware chunking using DocLing library\n- **Late Chunking**: Small chunks expanded at query time\n\n#### **Enrichment Options**\n- **Contextual Enrichment**: AI-generated chunk summaries\n- **Overview Building**: Document-level summaries for routing\n- **Graph Extraction**: Entity and relationship extraction\n\n---\n\n## 6. Performance Characteristics\n\n### 6.1 Response Times\n\n| Operation | Time Range | Notes |\n|-----------|------------|-------|\n| Simple Chat | 1-3 seconds | Direct LLM, no retrieval |\n| Document Query | 5-15 seconds | Includes retrieval and reranking |\n| Complex Analysis | 15-30 seconds | Multi-step reasoning |\n| Document Indexing | 2-5 min/100MB | Depends on enrichment settings |\n\n### 6.2 Memory Usage\n\n| Component | Memory Usage | Notes |\n|-----------|--------------|-------|\n| Embedding Model | 1-2GB | Qwen3-Embedding-0.6B |\n| Generation Model | 8-16GB | qwen3:8b |\n| Reranker Model | 500MB-1GB | ColBERT reranker |\n| Database Cache | 500MB-2GB | LanceDB and SQLite |\n\n### 6.3 Scalability\n\n- **Concurrent Users**: 5-10 users with 16GB RAM\n- **Document Capacity**: 10,000+ documents per index\n- **Query Throughput**: 10-20 queries/minute per instance\n- **Storage**: Approximately 1MB per 100 pages indexed\n\n---\n\n## 7. Security & Privacy\n\n### 7.1 Data Privacy\n\n- **Local Processing**: All AI models run locally via Ollama\n- **No External Calls**: No data sent to external APIs\n- **Document Isolation**: Documents stored locally with session-based access\n- **User Isolation**: Each session maintains separate context\n\n---\n\n## 8. Configuration & Customization\n\n### 8.1 Model Configuration\nModels can be configured in `rag_system/main.py`:\n\n```python\n# Embedding model configuration\nEXTERNAL_MODELS = {\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",  # Your preferred model\n    \"reranker_model\": \"answerdotai/answerai-colbert-small-v1\",\n}\n\n# Generation model configuration\nOLLAMA_CONFIG = {\n    \"generation_model\": \"qwen3:8b\",  # Your LLM model\n    \"enrichment_model\": \"qwen3:0.6b\",  # Your fast model\n}\n```\n\n### 8.2 Pipeline Configuration\nProcessing behavior configured in `PIPELINE_CONFIGS`:\n\n```python\nPIPELINE_CONFIGS = {\n    \"retrieval\": {\n        \"search_type\": \"hybrid\",\n        \"dense\": {\"weight\": 0.7},\n        \"bm25\": {\"enabled\": True}\n    },\n    \"chunking\": {\n        \"chunk_size\": 512,\n        \"chunk_overlap\": 64,\n        \"enable_latechunk\": True,\n        \"enable_docling\": True\n    }\n}\n```\n\n### 8.3 UI Configuration\nFrontend behavior configured in environment variables:\n\n```bash\nNEXT_PUBLIC_API_URL=http://localhost:8000\nNEXT_PUBLIC_ENABLE_STREAMING=true\nNEXT_PUBLIC_MAX_FILE_SIZE=50MB\n```\n\n---\n\n## 9. Monitoring & Observability\n\n### 9.1 Logging System\n- **Structured Logging**: JSON-formatted logs with timestamps\n- **Log Levels**: DEBUG, INFO, WARNING, ERROR\n- **Log Rotation**: Automatic log file rotation\n- **Component Isolation**: Separate logs per service\n\n### 9.2 Health Monitoring\n- **Health Endpoints**: `/health` on all services\n- **Service Dependencies**: Cascading health checks\n- **Performance Metrics**: Response times, error rates\n- **Resource Monitoring**: Memory, CPU, disk usage\n\n### 9.3 Debugging Features\n- **Debug Mode**: Detailed operation tracing\n- **Query Inspection**: Step-by-step query processing\n- **Model Switching Logs**: Embedding model change tracking\n- **Error Reporting**: Comprehensive error context\n\n---\n\n## ⚙️ Configuration Modes\n\nThe system supports multiple configuration modes optimized for different use cases:\n\n### **Default Mode** (`\"default\"`)\n- **Description**: Production-ready pipeline with full features\n- **Search**: Hybrid (dense + BM25) with 0.7 dense weight\n- **Reranking**: AI-powered ColBERT reranker\n- **Query Processing**: Query decomposition enabled\n- **Verification**: Grounding verification enabled\n- **Performance**: ~3-8 seconds per query\n- **Memory**: ~10-16GB (with models loaded)\n\n### **Fast Mode** (`\"fast\"`)  \n- **Description**: Speed-optimized pipeline with minimal overhead\n- **Search**: Vector-only (no BM25, no late chunking)\n- **Reranking**: Disabled\n- **Query Processing**: Single-pass, no decomposition\n- **Verification**: Disabled\n- **Performance**: ~1-3 seconds per query\n- **Memory**: ~8-12GB (with models loaded)\n\n### **BM25 Mode** (`\"bm25\"`)\n- **Description**: Traditional keyword-based search\n- **Search**: BM25 only\n- **Use Case**: Exact keyword matching, legacy compatibility\n\n### **Graph RAG Mode** (`\"graph_rag\"`)\n- **Description**: Knowledge graph integration (currently disabled)\n- **Status**: Available for future implementation\n- **Use Case**: Relationship-aware retrieval\n\n---\n\n## 10. Development & Extension\n\n### 10.1 Architecture Principles\n- **Modular Design**: Clear separation of concerns\n- **Configuration-Driven**: Behavior controlled via config files\n- **Lazy Loading**: Components loaded on-demand\n- **Thread Safety**: Proper synchronization for concurrent access\n\n### 10.2 Extension Points\n- **Custom Retrievers**: Implement `BaseRetriever` interface\n- **Custom Chunkers**: Extend chunking strategies\n- **Custom Models**: Add new embedding or generation models\n- **Custom Pipelines**: Create specialized processing workflows\n\n### 10.3 Testing Strategy\n- **Unit Tests**: Individual component testing\n- **Integration Tests**: End-to-end workflow testing\n- **Performance Tests**: Load and stress testing\n- **Health Checks**: Automated system validation\n\n---\n\n> **Note**: This overview reflects the current implementation as of 2025-01-09. For the latest changes, check the git history and individual component documentation. "
  },
  {
    "path": "Documentation/triage_system.md",
    "content": "# 🔀 Triage / Routing System\n\n_Maps to `rag_system/agent/loop.Agent._should_use_rag`, `_route_using_overviews`, and the fast-path router in `backend/server.py`._\n\n## Purpose\nDetermine, for every incoming query, whether it should be answered by:\n1. **Direct LLM Generation** (no retrieval) — faster, cheaper.\n2. **Retrieval-Augmented Generation (RAG)** — when the answer likely requires document context.\n\n## Decision Signals\n| Signal | Source | Notes |\n|--------|--------|-------|\n| Keyword/regex check | `backend/server.py` (fast path) | Hard-coded quick wins (`what time`, `define`, etc.). |\n| Index presence | SQLite (session → indexes) | If no indexes linked, direct LLM. |\n| Overview routing | `_route_using_overviews()` | Uses document overviews and enrichment model to predict relevance. |\n| LLM router prompt | `agent/loop.py` lines 648-665 | Final arbitrator (Ollama call, JSON output). |\n\n## High-level Flow\n```mermaid\nflowchart TD\n    Q[\"Incoming Query\"] --> S1{Session\\nHas Indexes?}\n    S1 -- no --> LLM[\"Direct LLM Generation\"]\n    S1 -- yes --> S2{Fast Regex\\nHeuristics}\n    S2 -- match--> LLM\n    S2 -- no --> S3{Overview\\nRelevance > τ?}\n    S3 -- low --> LLM\n    S3 -- high --> S4[LLM Router\\n(prompt @648)]\n    S4 -- \"route: RAG\" --> RAG[\"Retrieval Pipeline\"]\n    S4 -- \"route: DIRECT\" --> LLM\n```\n\n## Detailed Sequence (Code-level)\n1. **backend/server.py**\n   * `handle_session_chat()` builds `router_prompt` (line ~435) and makes a **first pass** decision before calling the heavy agent code.\n2. **agent.loop._should_use_rag()**\n   * Re-evaluates using richer features (e.g., token count, query type).\n3. **Overviews Phase** (`_route_using_overviews()`)\n   * Loads JSONL overviews file per index.\n   * Calls enrichment model (`qwen3:0.6b`) with prompt: _\"Does this overview mention … ? \"_ → returns yes/no.\n4. **LLM Router** (prompt lines 648-665)\n   * JSON-only response `{ \"route\": \"RAG\" | \"DIRECT\" }`.\n\n## Interfaces & Dependencies\n| Component | Calls / Data |\n|-----------|--------------|\n| SQLite `chat_sessions` | Reads `indexes` column to know linked index IDs. |\n| LanceDB Overviews | Reads `index_store/overviews/<idx>.jsonl`. |\n| `OllamaClient` | Generates LLM router decision. |\n\n## Config Flags\n* `PIPELINE_CONFIGS.triage.enabled` – global toggle.\n* Env var `TRIAGE_OVERVIEW_THRESHOLD` – min similarity score to prefer RAG (default 0.35).\n\n## Failure / Fallback Modes\n1. If overview file missing → skip to LLM router.\n2. If LLM router errors → default to RAG (safer) but log warning.\n\n---\n\n_Keep this document updated whenever routing heuristics, thresholds, or prompt wording change._ "
  },
  {
    "path": "Documentation/verifier.md",
    "content": "# ✅ Answer Verifier\n\n_File: `rag_system/agent/verifier.py`_\n\n## Objective\nAssess whether an answer produced by RAG is **grounded** in the retrieved context snippets.\n\n## Prompt (see `prompt_inventory.md` `verifier.fact_check`)\nStrict JSON schema:\n```jsonc\n{\n  \"verdict\": \"SUPPORTED\" | \"NOT_SUPPORTED\" | \"NEEDS_CLARIFICATION\",\n  \"is_grounded\": true | false,\n  \"reasoning\": \"< ≤30 words >\",\n  \"confidence_score\": 0-100\n}\n```\n\n## Sequence Diagram\n```mermaid\nsequenceDiagram\n    participant RP as Retrieval Pipeline\n    participant V as Verifier\n    participant LLM as Ollama\n\n    RP->>V: query, context, answer\n    V->>LLM: verification prompt\n    LLM-->>V: JSON verdict\n    V-->>RP: VerificationResult\n```\n\n## Usage Sites\n| Caller | Code | When |\n|--------|------|------|\n| `RetrievalPipeline.answer_stream()` | `pipelines/retrieval_pipeline.py` | If `verify=true` flag from frontend. |\n| `Agent.loop.run()` | fallback path | Experimental for composed answers. |\n\n## Config\n| Flag | Default | Meaning |\n|------|---------|---------|\n| `verify` | false | Frontend toggle; if true verifier runs. |\n| `generation_model` | `qwen3:8b` | Same model as answer generation.\n\n## Failure Modes\n* If LLM returns invalid JSON → parse exception handled, result = NOT_SUPPORTED.\n* If verification call times out → pipeline logs but still returns answer (unverified).\n\n---\n_Keep updated when schema or usage flags change._ "
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2025 PromptEngineer\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# LocalGPT - Private Document Intelligence Platform\n\n<div align=\"center\">\n\n<p align=\"center\">\n<a href=\"https://trendshift.io/repositories/2947\" target=\"_blank\"><img src=\"https://trendshift.io/api/badge/repositories/2947\" alt=\"PromtEngineer%2FlocalGPT | Trendshift\" style=\"width: 250px; height: 55px;\" width=\"250\" height=\"55\"/></a>\n</p>\n\n[![GitHub Stars](https://img.shields.io/github/stars/PromtEngineer/localGPT?style=flat-square)](https://github.com/PromtEngineer/localGPT/stargazers)\n[![GitHub Forks](https://img.shields.io/github/forks/PromtEngineer/localGPT?style=flat-square)](https://github.com/PromtEngineer/localGPT/network/members)\n[![GitHub Issues](https://img.shields.io/github/issues/PromtEngineer/localGPT?style=flat-square)](https://github.com/PromtEngineer/localGPT/issues)\n[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/PromtEngineer/localGPT?style=flat-square)](https://github.com/PromtEngineer/localGPT/pulls)\n[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg?style=flat-square)](https://www.python.org/downloads/)\n[![License](https://img.shields.io/badge/license-MIT-green.svg?style=flat-square)](LICENSE)\n[![Docker](https://img.shields.io/badge/docker-supported-blue.svg?style=flat-square)](https://www.docker.com/)\n\n<p align=\"center\">\n    <a href=\"https://x.com/engineerrprompt\">\n      <img src=\"https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white\" alt=\"Follow on X\" />\n    </a>\n    <a href=\"https://discord.gg/tUDWAFGc\">\n      <img src=\"https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white\" alt=\"Join our Discord\" />\n    </a>\n  </p>\n</div>\n\n## 🚀 What is LocalGPT?\n\nLocalGPT is a **fully private, on-premise Document Intelligence platform**. Ask questions, summarise, and uncover insights from your files with state-of-the-art AI—no data ever leaves your machine.\n\nMore than a traditional RAG (Retrieval-Augmented Generation) tool, LocalGPT features a **hybrid search engine** that blends semantic similarity, keyword matching, and [Late Chunking](https://jina.ai/news/late-chunking-in-long-context-embedding-models/) for long-context precision. A **smart router** automatically selects between RAG and direct LLM answering for every query, while **contextual enrichment** and sentence-level [Context Pruning](https://huggingface.co/naver/provence-reranker-debertav3-v1) surface only the most relevant content. An independent **verification** pass adds an extra layer of accuracy.\n\nThe architecture is **modular and lightweight**—enable only the components you need. With a pure-Python core and minimal dependencies, LocalGPT is simple to deploy, run, and maintain on any infrastructure.The system has minimal dependencies on frameworks and libraries, making it easy to deploy and maintain. The RAG system is pure python and does not require any additional dependencies.\n\n## ▶️ Video\nWatch this [video](https://youtu.be/JTbtGH3secI) to get started with LocalGPT. \n\n| Home | Create Index | Chat |\n|------|--------------|------|\n| ![](Documentation/images/Home.png) | ![](Documentation/images/Index%20Creation.png) | ![](Documentation/images/Retrieval%20Process.png) |\n\n## ✨ Features\n\n- **Utmost Privacy**: Your data remains on your computer, ensuring 100% security.\n- **Versatile Model Support**: Seamlessly integrate a variety of open-source models via Ollama.\n- **Diverse Embeddings**: Choose from a range of open-source embeddings.\n- **Reuse Your LLM**: Once downloaded, reuse your LLM without the need for repeated downloads.\n- **Chat History**: Remembers your previous conversations (in a session).\n- **API**: LocalGPT has an API that you can use for building RAG Applications.\n- **GPU, CPU, HPU & MPS Support**: Supports multiple platforms out of the box, Chat with your data using `CUDA`, `CPU`, `HPU (Intel® Gaudi®)` or `MPS` and more!\n\n### 📖 Document Processing\n- **Multi-format Support**: PDF, DOCX, TXT, Markdown, and more (Currently only PDF is supported)\n- **Contextual Enrichment**: Enhanced document understanding with AI-generated context, inspired by [Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval)\n- **Batch Processing**: Handle multiple documents simultaneously\n\n### 🤖 AI-Powered Chat\n- **Natural Language Queries**: Ask questions in plain English\n- **Source Attribution**: Every answer includes document references\n- **Smart Routing**: Automatically chooses between RAG and direct LLM responses\n- **Query Decomposition**: Breaks complex queries into sub-questions for better answers\n- **Semantic Caching**: TTL-based caching with similarity matching for faster responses\n- **Session-Aware History**: Maintains conversation context across interactions\n- **Answer Verification**: Independent verification pass for accuracy\n- **Multiple AI Models**: Ollama for inference, HuggingFace for embeddings and reranking\n\n\n### 🛠️ Developer-Friendly\n- **RESTful APIs**: Complete API access for integration\n- **Real-time Progress**: Live updates during document processing\n- **Flexible Configuration**: Customize models, chunk sizes, and search parameters\n- **Extensible Architecture**: Plugin system for custom components\n\n### 🎨 Modern Interface\n- **Intuitive Web UI**: Clean, responsive design\n- **Session Management**: Organize conversations by topic\n- **Index Management**: Easy document collection management\n- **Real-time Chat**: Streaming responses for immediate feedback\n\n---\n\n## 🚀 Quick Start\n\nNote: The installation is currently only tested on macOS. \n\n### Prerequisites\n- Python 3.8 or higher (tested with Python 3.11.5)\n- Node.js 16+ and npm (tested with Node.js 23.10.0, npm 10.9.2)\n- Docker (optional, for containerized deployment)\n- 8GB+ RAM (16GB+ recommended)\n- Ollama (required for both deployment approaches)\n\n### ***NOTE***\nBefore this brach is moved to the main branch, please clone this branch for instalation:\n\n```bash\ngit clone -b localgpt-v2 https://github.com/PromtEngineer/localGPT.git\ncd localGPT\n```\n\n### Option 1: Docker Deployment \n\n```bash\n# Clone the repository\ngit clone https://github.com/PromtEngineer/localGPT.git\ncd localGPT\n\n# Install Ollama locally (required even for Docker)\ncurl -fsSL https://ollama.ai/install.sh | sh\nollama pull qwen3:0.6b\nollama pull qwen3:8b\n\n# Start Ollama\nollama serve\n\n# Start with Docker (in a new terminal)\n./start-docker.sh\n\n# Access the application\nopen http://localhost:3000\n```\n\n**Docker Management Commands:**\n```bash\n# Check container status\ndocker compose ps\n\n# View logs\ndocker compose logs -f\n\n# Stop containers\n./start-docker.sh stop\n```\n\n### Option 2: Direct Development (Recommended for Development)\n\n```bash\n# Clone the repository\ngit clone https://github.com/PromtEngineer/localGPT.git\ncd localGPT\n\n# Install Python dependencies\npip install -r requirements.txt\n\n# Key dependencies installed:\n# - torch==2.4.1, transformers==4.51.0 (AI models)\n# - lancedb (vector database)\n# - rank_bm25, fuzzywuzzy (search algorithms)\n# - sentence_transformers, rerankers (embedding/reranking)\n# - docling (document processing)\n# - colpali-engine (multimodal processing - support coming soon)\n\n# Install Node.js dependencies\nnpm install\n\n# Install and start Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\nollama pull qwen3:0.6b\nollama pull qwen3:8b\nollama serve\n\n# Start the system (in a new terminal)\npython run_system.py\n\n# Access the application\nopen http://localhost:3000\n```\n\n**System Management:**\n```bash\n# Check system health (comprehensive diagnostics)\npython system_health_check.py\n\n# Check service status and health\npython run_system.py --health\n\n# Start in production mode\npython run_system.py --mode prod\n\n# Skip frontend (backend + RAG API only)\npython run_system.py --no-frontend\n\n# View aggregated logs\npython run_system.py --logs-only\n\n# Stop all services\npython run_system.py --stop\n# Or press Ctrl+C in the terminal running python run_system.py\n```\n\n**Service Architecture:**\nThe `run_system.py` launcher manages four key services:\n- **Ollama Server** (port 11434): AI model serving\n- **RAG API Server** (port 8001): Document processing and retrieval\n- **Backend Server** (port 8000): Session management and API endpoints\n- **Frontend Server** (port 3000): React/Next.js web interface\n\n### Option 3: Manual Component Startup\n\n```bash\n# Terminal 1: Start Ollama\nollama serve\n\n# Terminal 2: Start RAG API\npython -m rag_system.api_server\n\n# Terminal 3: Start Backend\ncd backend && python server.py\n\n# Terminal 4: Start Frontend\nnpm run dev\n\n# Access at http://localhost:3000\n```\n\n---\n\n### Detailed Installation\n\n#### 1. Install System Dependencies\n\n**Ubuntu/Debian:**\n```bash\nsudo apt update\nsudo apt install python3.8 python3-pip nodejs npm docker.io docker-compose\n```\n\n**macOS:**\n```bash\nbrew install python@3.8 node npm docker docker-compose\n```\n\n**Windows:**\n```bash\n# Install Python 3.8+, Node.js, and Docker Desktop\n# Then use PowerShell or WSL2\n```\n\n#### 2. Install AI Models\n\n**Install Ollama (Recommended):**\n```bash\n# Install Ollama\ncurl -fsSL https://ollama.ai/install.sh | sh\n\n# Pull recommended models\nollama pull qwen3:0.6b          # Fast generation model\nollama pull qwen3:8b            # High-quality generation model\n```\n\n#### 3. Configure Environment\n\n```bash\n# Copy environment template\ncp .env.example .env\n\n# Edit configuration\nnano .env\n```\n\n**Key Configuration Options:**\n```env\n# AI Models (referenced in rag_system/main.py)\nOLLAMA_HOST=http://localhost:11434\n\n# Database Paths (used by backend and RAG system)\nDATABASE_PATH=./backend/chat_data.db\nVECTOR_DB_PATH=./lancedb\n\n# Server Settings (used by run_system.py)\nBACKEND_PORT=8000\nFRONTEND_PORT=3000\nRAG_API_PORT=8001\n\n# Optional: Override default models\nGENERATION_MODEL=qwen3:8b\nENRICHMENT_MODEL=qwen3:0.6b\nEMBEDDING_MODEL=Qwen/Qwen3-Embedding-0.6B\nRERANKER_MODEL=answerdotai/answerai-colbert-small-v1\n```\n\n#### 4. Initialize the System\n\n```bash\n# Run system health check\npython system_health_check.py\n\n# Initialize databases\npython -c \"from backend.database import ChatDatabase; ChatDatabase().init_database()\"\n\n# Test installation\npython -c \"from rag_system.main import get_agent; print('✅ Installation successful!')\"\n\n# Validate complete setup\npython run_system.py --health\n```\n\n---\n\n## 🎯 Getting Started\n\n### 1. Create Your First Index\n\nAn **index** is a collection of processed documents that you can chat with.\n\n#### Using the Web Interface:\n1. Open http://localhost:3000\n2. Click \"Create New Index\"\n3. Upload your documents (PDF, DOCX, TXT)\n4. Configure processing options\n5. Click \"Build Index\"\n\n#### Using Scripts:\n```bash\n# Simple script approach\n./simple_create_index.sh \"My Documents\" \"path/to/document.pdf\"\n\n# Interactive script\npython create_index_script.py\n```\n\n#### Using API:\n```bash\n# Create index\ncurl -X POST http://localhost:8000/indexes \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"name\": \"My Index\", \"description\": \"My documents\"}'\n\n# Upload documents\ncurl -X POST http://localhost:8000/indexes/INDEX_ID/upload \\\n  -F \"files=@document.pdf\"\n\n# Build index\ncurl -X POST http://localhost:8000/indexes/INDEX_ID/build\n```\n\n### 2. Start Chatting\n\nOnce your index is built:\n\n1. **Create a Chat Session**: Click \"New Chat\" or use an existing session\n2. **Select Your Index**: Choose which document collection to query\n3. **Ask Questions**: Type natural language questions about your documents\n4. **Get Answers**: Receive AI-generated responses with source citations\n\n### 3. Advanced Features\n\n#### Custom Model Configuration\n```bash\n# Use different models for different tasks\ncurl -X POST http://localhost:8000/sessions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"title\": \"High Quality Session\",\n    \"model\": \"qwen3:8b\",\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-4B\"\n  }'\n```\n\n#### Batch Document Processing\n```bash\n# Process multiple documents at once\npython demo_batch_indexing.py --config batch_indexing_config.json\n```\n\n#### API Integration\n```python\nimport requests\n\n# Chat with your documents via API\nresponse = requests.post('http://localhost:8000/chat', json={\n    'query': 'What are the key findings in the research papers?',\n    'session_id': 'your-session-id',\n    'search_type': 'hybrid',\n    'retrieval_k': 20\n})\n\nprint(response.json()['response'])\n```\n\n---\n\n## 🔧 Configuration\n\n### Model Configuration\n\nLocalGPT supports multiple AI model providers with centralized configuration:\n\n#### Ollama Models (Local Inference)\n```python\nOLLAMA_CONFIG = {\n    \"host\": \"http://localhost:11434\",\n    \"generation_model\": \"qwen3:8b\",        # Main text generation\n    \"enrichment_model\": \"qwen3:0.6b\"       # Lightweight routing/enrichment\n}\n```\n\n#### External Models (HuggingFace Direct)\n```python\nEXTERNAL_MODELS = {\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",           # 1024 dimensions\n    \"reranker_model\": \"answerdotai/answerai-colbert-small-v1\", # ColBERT reranker\n    \"fallback_reranker\": \"BAAI/bge-reranker-base\"             # Backup reranker\n}\n```\n\n### Pipeline Configuration\n\nLocalGPT offers two main pipeline configurations:\n\n#### Default Pipeline (Production-Ready)\n```python\n\"default\": {\n    \"description\": \"Production-ready pipeline with hybrid search, AI reranking, and verification\",\n    \"storage\": {\n        \"lancedb_uri\": \"./lancedb\",\n        \"text_table_name\": \"text_pages_v3\",\n        \"bm25_path\": \"./index_store/bm25\"\n    },\n    \"retrieval\": {\n        \"retriever\": \"multivector\",\n        \"search_type\": \"hybrid\",\n        \"late_chunking\": {\"enabled\": True},\n        \"dense\": {\"enabled\": True, \"weight\": 0.7},\n        \"bm25\": {\"enabled\": True}\n    },\n    \"reranker\": {\n        \"enabled\": True,\n        \"type\": \"ai\",\n        \"strategy\": \"rerankers-lib\",\n        \"model_name\": \"answerdotai/answerai-colbert-small-v1\",\n        \"top_k\": 10\n    },\n    \"query_decomposition\": {\"enabled\": True, \"max_sub_queries\": 3},\n    \"verification\": {\"enabled\": True},\n    \"retrieval_k\": 20,\n    \"contextual_enricher\": {\"enabled\": True, \"window_size\": 1}\n}\n```\n\n#### Fast Pipeline (Speed-Optimized)\n```python\n\"fast\": {\n    \"description\": \"Speed-optimized pipeline with minimal overhead\",\n    \"retrieval\": {\n        \"search_type\": \"vector_only\",\n        \"late_chunking\": {\"enabled\": False}\n    },\n    \"reranker\": {\"enabled\": False},\n    \"query_decomposition\": {\"enabled\": False},\n    \"verification\": {\"enabled\": False},\n    \"retrieval_k\": 10,\n    \"contextual_enricher\": {\"enabled\": False}\n}\n```\n\n### Search Configuration\n\n```python\nSEARCH_CONFIG = {\n    'hybrid': {\n        'dense_weight': 0.7,\n        'sparse_weight': 0.3,\n        'retrieval_k': 20,\n        'reranker_top_k': 10\n    }\n}\n```\n---\n\n## 🛠️ Troubleshooting\n\n### Common Issues\n\n#### Installation Problems\n```bash\n# Check Python version\npython --version  # Should be 3.8+\n\n# Check dependencies\npip list | grep -E \"(torch|transformers|lancedb)\"\n\n# Reinstall dependencies\npip install -r requirements.txt --force-reinstall\n```\n\n#### Model Loading Issues\n```bash\n# Check Ollama status\nollama list\ncurl http://localhost:11434/api/tags\n\n# Pull missing models\nollama pull qwen3:0.6b\n```\n\n#### Database Issues\n```bash\n# Check database connectivity\npython -c \"from backend.database import ChatDatabase; db = ChatDatabase(); print('✅ Database OK')\"\n\n# Reset database (WARNING: This deletes all data)\nrm backend/chat_data.db\npython -c \"from backend.database import ChatDatabase; ChatDatabase().init_database()\"\n```\n\n#### Performance Issues\n```bash\n# Check system resources\npython system_health_check.py\n\n# Monitor memory usage\nhtop  # or Task Manager on Windows\n\n# Optimize for low-memory systems\nexport PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512\n```\n\n### Getting Help\n\n1. **Check Logs**: The system creates structured logs in the `logs/` directory:\n   - `logs/system.log`: Main system events and errors\n   - `logs/ollama.log`: Ollama server logs\n   - `logs/rag-api.log`: RAG API processing logs\n   - `logs/backend.log`: Backend server logs\n   - `logs/frontend.log`: Frontend build and runtime logs\n\n2. **System Health**: Run comprehensive diagnostics:\n   ```bash\n   python system_health_check.py  # Full system diagnostics\n   python run_system.py --health  # Service status check\n   ```\n\n3. **Health Endpoints**: Check individual service health:\n   - Backend: `http://localhost:8000/health`\n   - RAG API: `http://localhost:8001/health`\n   - Ollama: `http://localhost:11434/api/tags`\n\n4. **Documentation**: Check the [Technical Documentation](TECHNICAL_DOCS.md)\n5. **GitHub Issues**: Report bugs and request features\n6. **Community**: Join our Discord/Slack community\n\n---\n\n## 🔗 API Reference\n\n### Core Endpoints\n\n#### Chat API\n```http\n# Session-based chat (recommended)\nPOST /sessions/{session_id}/chat\nContent-Type: application/json\n\n{\n  \"query\": \"What are the main topics discussed?\",\n  \"search_type\": \"hybrid\",\n  \"retrieval_k\": 20,\n  \"ai_rerank\": true,\n  \"context_window_size\": 5\n}\n\n# Legacy chat endpoint\nPOST /chat\nContent-Type: application/json\n\n{\n  \"query\": \"What are the main topics discussed?\",\n  \"session_id\": \"uuid\",\n  \"search_type\": \"hybrid\",\n  \"retrieval_k\": 20\n}\n```\n\n#### Index Management\n```http\n# Create index\nPOST /indexes\nContent-Type: application/json\n{\n  \"name\": \"My Index\",\n  \"description\": \"Description\",\n  \"config\": \"default\"\n}\n\n# Get all indexes\nGET /indexes\n\n# Get specific index\nGET /indexes/{id}\n\n# Upload documents to index\nPOST /indexes/{id}/upload\nContent-Type: multipart/form-data\nfiles: [file1.pdf, file2.pdf, ...]\n\n# Build index (process uploaded documents)\nPOST /indexes/{id}/build\nContent-Type: application/json\n{\n  \"config_mode\": \"default\",\n  \"enable_enrich\": true,\n  \"chunk_size\": 512\n}\n\n# Delete index\nDELETE /indexes/{id}\n```\n\n#### Session Management\n```http\n# Create session\nPOST /sessions\nContent-Type: application/json\n{\n  \"title\": \"My Session\",\n  \"model\": \"qwen3:0.6b\"\n}\n\n# Get all sessions\nGET /sessions\n\n# Get specific session\nGET /sessions/{session_id}\n\n# Get session documents\nGET /sessions/{session_id}/documents\n\n# Get session indexes\nGET /sessions/{session_id}/indexes\n\n# Link index to session\nPOST /sessions/{session_id}/indexes/{index_id}\n\n# Delete session\nDELETE /sessions/{session_id}\n\n# Rename session\nPOST /sessions/{session_id}/rename\nContent-Type: application/json\n{\n  \"new_title\": \"Updated Session Name\"\n}\n```\n\n### Advanced Features\n\n#### Query Decomposition\nThe system can break complex queries into sub-questions for better answers:\n```http\nPOST /sessions/{session_id}/chat\nContent-Type: application/json\n\n{\n  \"query\": \"Compare the methodologies and analyze their effectiveness\",\n  \"query_decompose\": true,\n  \"compose_sub_answers\": true\n}\n```\n\n#### Answer Verification\nIndependent verification pass for accuracy using a separate verification model:\n```http\nPOST /sessions/{session_id}/chat\nContent-Type: application/json\n\n{\n  \"query\": \"What are the key findings?\",\n  \"verify\": true\n}\n```\n\n#### Contextual Enrichment\nDocument context enrichment during indexing for better understanding:\n```bash\n# Enable during index building\nPOST /indexes/{id}/build\n{\n  \"enable_enrich\": true,\n  \"window_size\": 2\n}\n```\n\n#### Late Chunking\nBetter context preservation by chunking after embedding:\n```bash\n# Configure in pipeline\n\"late_chunking\": {\"enabled\": true}\n```\n\n#### Streaming Chat\n```http\nPOST /chat/stream\nContent-Type: application/json\n\n{\n  \"query\": \"Explain the methodology\",\n  \"session_id\": \"uuid\",\n  \"stream\": true\n}\n```\n\n#### Batch Processing\n```bash\n# Using the batch indexing script\npython demo_batch_indexing.py --config batch_indexing_config.json\n\n# Example batch configuration (batch_indexing_config.json):\n{\n  \"index_name\": \"Sample Batch Index\",\n  \"index_description\": \"Example batch index configuration\",\n  \"documents\": [\n    \"./rag_system/documents/invoice_1039.pdf\",\n    \"./rag_system/documents/invoice_1041.pdf\"\n  ],\n  \"processing\": {\n    \"chunk_size\": 512,\n    \"chunk_overlap\": 64,\n    \"enable_enrich\": true,\n    \"enable_latechunk\": true,\n    \"enable_docling\": true,\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",\n    \"generation_model\": \"qwen3:0.6b\",\n    \"retrieval_mode\": \"hybrid\",\n    \"window_size\": 2\n  }\n}\n```\n\n```http\n# API endpoint for batch processing\nPOST /batch/index\nContent-Type: application/json\n\n{\n  \"file_paths\": [\"doc1.pdf\", \"doc2.pdf\"],\n  \"config\": {\n    \"chunk_size\": 512,\n    \"enable_enrich\": true,\n    \"enable_latechunk\": true,\n    \"enable_docling\": true\n  }\n}\n```\n\nFor complete API documentation, see [API_REFERENCE.md](API_REFERENCE.md).\n\n---\n\n## 🏗️ Architecture\n\nLocalGPT is built with a modular, scalable architecture:\n\n```mermaid\ngraph TB\n    UI[Web Interface] --> API[Backend API]\n    API --> Agent[RAG Agent]\n    Agent --> Retrieval[Retrieval Pipeline]\n    Agent --> Generation[Generation Pipeline]\n\n    Retrieval --> Vector[Vector Search]\n    Retrieval --> BM25[BM25 Search]\n    Retrieval --> Rerank[Reranking]\n\n    Vector --> LanceDB[(LanceDB)]\n    BM25 --> BM25DB[(BM25 Index)]\n\n    Generation --> Ollama[Ollama Models]\n    Generation --> HF[Hugging Face Models]\n\n    API --> SQLite[(SQLite DB)]\n```\n\nOverview of the Retrieval Agent\n\n```mermaid\ngraph TD\n    classDef llmcall fill:#e6f3ff,stroke:#007bff;\n    classDef pipeline fill:#e6ffe6,stroke:#28a745;\n    classDef cache fill:#fff3e0,stroke:#fd7e14;\n    classDef logic fill:#f8f9fa,stroke:#6c757d;\n    classDef thread stroke-dasharray: 5 5;\n\n    A(Start: Agent.run) --> B_asyncio.run(_run_async);\n    B --> C{_run_async};\n\n    C --> C1[Get Chat History];\n    C1 --> T1[Build Triage Prompt <br/> Query + Doc Overviews ];\n    T1 --> T2[\"(asyncio.to_thread)<br/>LLM Triage: RAG or LLM_DIRECT?\"]; class T2 llmcall,thread;\n    T2 --> T3{Decision?};\n\n    T3 -- RAG --> RAG_Path;\n    T3 -- LLM_DIRECT --> LLM_Path;\n\n    subgraph RAG Path\n        RAG_Path --> R1[Format Query + History];\n        R1 --> R2[\"(asyncio.to_thread)<br/>Generate Query Embedding\"]; class R2 pipeline,thread;\n        R2 --> R3{{Check Semantic Cache}}; class R3 cache;\n        R3 -- Hit --> R_Cache_Hit(Return Cached Result);\n        R_Cache_Hit --> R_Hist_Update;\n        R3 -- Miss --> R4{Decomposition <br/> Enabled?};\n\n        R4 -- Yes --> R5[\"(asyncio.to_thread)<br/>Decompose Raw Query\"]; class R5 llmcall,thread;\n        R5 --> R6{{Run Sub-Queries <br/> Parallel RAG Pipeline}}; class R6 pipeline,thread;\n        R6 --> R7[Collect Results & Docs];\n        R7 --> R8[\"(asyncio.to_thread)<br/>Compose Final Answer\"]; class R8 llmcall,thread;\n        R8 --> V1(RAG Answer);\n\n        R4 -- No --> R9[\"(asyncio.to_thread)<br/>Run Single Query <br/>(RAG Pipeline)\"]; class R9 pipeline,thread;\n        R9 --> V1;\n\n        V1 --> V2{{Verification <br/> await verify_async}}; class V2 llmcall;\n        V2 --> V3(Final RAG Result);\n        V3 --> R_Cache_Store{{Store in Semantic Cache}}; class R_Cache_Store cache;\n        R_Cache_Store --> FinalResult;\n    end\n\n    subgraph Direct LLM Path\n        LLM_Path --> L1[Format Query + History];\n        L1 --> L2[\"(asyncio.to_thread)<br/>Generate Direct LLM Answer <br/> (No RAG)\"]; class L2 llmcall,thread;\n        L2 --> FinalResult(Final Direct Result);\n    end\n\n    FinalResult --> R_Hist_Update(Update Chat History);\n    R_Hist_Update --> ZZZ(End: Return Result);\n```\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions from developers of all skill levels! LocalGPT is an open-source project that benefits from community involvement.\n\n### 🚀 Quick Start for Contributors\n\n```bash\n# Fork and clone the repository\ngit clone https://github.com/PromtEngineer/localGPT.git\ncd localGPT\n\n# Set up development environment\npip install -r requirements.txt\nnpm install\n\n# Install Ollama and models\ncurl -fsSL https://ollama.ai/install.sh | sh\nollama pull qwen3:0.6b qwen3:8b\n\n# Verify setup\npython system_health_check.py\npython run_system.py --mode dev\n```\n\n### 📋 How to Contribute\n\n1. **🐛 Report Bugs**: Use our [bug report template](.github/ISSUE_TEMPLATE/bug_report.md)\n2. **💡 Request Features**: Use our [feature request template](.github/ISSUE_TEMPLATE/feature_request.md)\n3. **🔧 Submit Code**: Follow our [development workflow](CONTRIBUTING.md#development-workflow)\n4. **📚 Improve Docs**: Help make our documentation better\n\n### 📖 Detailed Guidelines\n\nFor comprehensive contributing guidelines, including:\n- Development setup and workflow\n- Coding standards and best practices\n- Testing requirements\n- Documentation standards\n- Release process\n\n**👉 See our [CONTRIBUTING.md](CONTRIBUTING.md) guide**\n\n---\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. For models, please check their respective licenses.\n\n---\n\n## 📞 Support\n\n- **Documentation**: [Technical Docs](TECHNICAL_DOCS.md)\n- **Issues**: [GitHub Issues](https://github.com/PromtEngineer/localGPT/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/PromtEngineer/localGPT/discussions)\n- **Business Deployment and Customization**: [Contact Us](https://tally.so/r/wv6R2d)\n---\n\n<div align=\"center\">\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=PromtEngineer/localGPT&type=Date)](https://star-history.com/#PromtEngineer/localGPT&Date)\n"
  },
  {
    "path": "WATSONX_README.md",
    "content": "# Watson X Integration with Granite Models\n\nThis branch adds support for IBM Watson X AI with Granite models as an alternative to Ollama for running LocalGPT.\n\n## Overview\n\nLocalGPT now supports two LLM backends:\n1. **Ollama** (default): Run models locally using Ollama\n2. **Watson X**: Use IBM's Granite models hosted on Watson X AI\n\n## What Changed\n\n- Added `WatsonXClient` class in `rag_system/utils/watsonx_client.py` that provides an Ollama-compatible interface for Watson X\n- Updated `factory.py` and `main.py` to support backend switching via environment variable\n- Added `ibm-watsonx-ai` SDK dependency to `requirements.txt`\n- Configuration now supports both backends through environment variables\n\n## Prerequisites\n\nTo use Watson X with Granite models, you need:\n\n1. IBM Cloud account with Watson X access\n2. Watson X API key\n3. Watson X project ID\n\n### Getting Your Credentials\n\n1. Go to [IBM Cloud](https://cloud.ibm.com/)\n2. Navigate to Watson X AI service\n3. Create or select a project\n4. Get your API key from IBM Cloud IAM\n5. Copy your project ID from the Watson X project settings\n\n## Configuration\n\n### Environment Variables\n\nCreate a `.env` file or set these environment variables:\n\n```bash\n# Choose LLM backend (default: ollama)\nLLM_BACKEND=watsonx\n\n# Watson X Configuration\nWATSONX_API_KEY=your_api_key_here\nWATSONX_PROJECT_ID=your_project_id_here\nWATSONX_URL=https://us-south.ml.cloud.ibm.com\n\n# Model Configuration\nWATSONX_GENERATION_MODEL=ibm/granite-13b-chat-v2\nWATSONX_ENRICHMENT_MODEL=ibm/granite-8b-japanese\n```\n\n### Available Granite Models\n\nWatson X offers several Granite models:\n- `ibm/granite-13b-chat-v2` - General purpose chat model\n- `ibm/granite-13b-instruct-v2` - Instruction-following model\n- `ibm/granite-20b-multilingual` - Multilingual support\n- `ibm/granite-8b-japanese` - Lightweight Japanese model\n- `ibm/granite-3b-code-instruct` - Code generation model\n\nFor a full list of available models, visit the [Watson X documentation](https://www.ibm.com/docs/en/watsonx/saas?topic=solutions-supported-foundation-models).\n\n## Installation\n\n1. Install the Watson X SDK:\n```bash\npip install ibm-watsonx-ai>=1.3.39\n```\n\nOr install all dependencies:\n```bash\npip install -r rag_system/requirements.txt\n```\n\n## Usage\n\n### Running with Watson X\n\nOnce configured, simply set the environment variable and run as normal:\n\n```bash\nexport LLM_BACKEND=watsonx\npython -m rag_system.main api\n```\n\nOr in Python:\n\n```python\nimport os\nos.environ['LLM_BACKEND'] = 'watsonx'\n\nfrom rag_system.factory import get_agent\n\n# Get agent with Watson X backend\nagent = get_agent(mode=\"default\")\n\n# Use as normal\nresult = agent.run(\"What is artificial intelligence?\")\nprint(result)\n```\n\n### Switching Between Backends\n\nYou can easily switch between Ollama and Watson X:\n\n```bash\n# Use Ollama (local)\nexport LLM_BACKEND=ollama\npython -m rag_system.main api\n\n# Use Watson X (cloud)\nexport LLM_BACKEND=watsonx\npython -m rag_system.main api\n```\n\n## Features\n\nThe Watson X client supports all the key features used by LocalGPT:\n\n- ✅ Text generation / completion\n- ✅ Async generation\n- ✅ Streaming responses\n- ✅ Embeddings (if using Watson X embedding models)\n- ✅ Custom generation parameters (temperature, max_tokens, top_p, top_k)\n- ⚠️ Image/multimodal support (limited, depends on model availability)\n\n## API Compatibility\n\nThe `WatsonXClient` provides the same interface as `OllamaClient`:\n\n```python\nfrom rag_system.utils.watsonx_client import WatsonXClient\n\nclient = WatsonXClient(\n    api_key=\"your_api_key\",\n    project_id=\"your_project_id\"\n)\n\n# Generate completion\nresponse = client.generate_completion(\n    model=\"ibm/granite-13b-chat-v2\",\n    prompt=\"Explain quantum computing\"\n)\n\nprint(response['response'])\n\n# Stream completion\nfor chunk in client.stream_completion(\n    model=\"ibm/granite-13b-chat-v2\",\n    prompt=\"Write a story about AI\"\n):\n    print(chunk, end='', flush=True)\n```\n\n## Limitations\n\n1. **Embedding Models**: Watson X uses different embedding models than Ollama. Make sure to configure embedding models appropriately in `main.py` if needed.\n\n2. **Multimodal Support**: Image support varies by model availability in Watson X. Not all Granite models support multimodal inputs.\n\n3. **Streaming**: Streaming support depends on the Watson X SDK version and may fall back to returning the full response at once.\n\n4. **Rate Limits**: Watson X has API rate limits that may differ from local Ollama usage. Monitor your usage accordingly.\n\n## Troubleshooting\n\n### Authentication Errors\n\nIf you see authentication errors:\n- Verify your API key is correct\n- Check that your project ID matches an existing Watson X project\n- Ensure your IBM Cloud account has Watson X access\n\n### Model Not Found\n\nIf you get model not found errors:\n- Verify the model ID is correct (e.g., `ibm/granite-13b-chat-v2`)\n- Check that the model is available in your Watson X instance\n- Some models may require additional permissions\n\n### Connection Errors\n\nIf you experience connection issues:\n- Check your internet connection\n- Verify the Watson X URL is correct for your region\n- Check IBM Cloud status page for service outages\n\n## Cost Considerations\n\nUnlike local Ollama, Watson X is a cloud service with usage-based pricing:\n- Token-based pricing for generation\n- Consider your query volume\n- Monitor usage through IBM Cloud dashboard\n\n## Reverting to Ollama\n\nTo switch back to local Ollama:\n\n```bash\nunset LLM_BACKEND  # or set LLM_BACKEND=ollama\npython -m rag_system.main api\n```\n\n## Support\n\nFor Watson X specific issues:\n- [IBM Watson X Documentation](https://www.ibm.com/docs/en/watsonx/saas)\n- [Watson X Developer Hub](https://www.ibm.com/watsonx/developer/)\n- [IBM Cloud Support](https://cloud.ibm.com/docs/get-support)\n\nFor LocalGPT issues:\n- [LocalGPT GitHub Issues](https://github.com/PromtEngineer/localGPT/issues)\n\n## Contributing\n\nIf you find issues with the Watson X integration or want to add features:\n1. Create an issue describing the problem/feature\n2. Submit a pull request with your changes\n3. Ensure all tests pass\n\n## License\n\nThis integration follows the same license as LocalGPT (MIT License).\n"
  },
  {
    "path": "backend/README.md",
    "content": "# localGPT Backend\n\nSimple Python backend that connects your frontend to Ollama for local LLM chat.\n\n## Prerequisites\n\n1. **Install Ollama** (if not already installed):\n   ```bash\n   # Visit https://ollama.ai or run:\n   curl -fsSL https://ollama.ai/install.sh | sh\n   ```\n\n2. **Start Ollama**:\n   ```bash\n   ollama serve\n   ```\n\n3. **Pull a model** (optional, server will suggest if needed):\n   ```bash\n   ollama pull llama3.2\n   ```\n\n## Setup\n\n1. **Install Python dependencies**:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n2. **Test Ollama connection**:\n   ```bash\n   python ollama_client.py\n   ```\n\n3. **Start the backend server**:\n   ```bash\n   python server.py\n   ```\n\nServer will run on `http://localhost:8000`\n\n## API Endpoints\n\n### Health Check\n```bash\nGET /health\n```\nReturns server status and available models.\n\n### Chat\n```bash\nPOST /chat\nContent-Type: application/json\n\n{\n  \"message\": \"Hello!\",\n  \"model\": \"llama3.2:latest\",\n  \"conversation_history\": []\n}\n```\n\nReturns:\n```json\n{\n  \"response\": \"Hello! How can I help you?\",\n  \"model\": \"llama3.2:latest\",\n  \"message_count\": 1\n}\n```\n\n## Testing\n\nTest the chat endpoint:\n```bash\ncurl -X POST http://localhost:8000/chat \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"message\": \"Hello!\", \"model\": \"llama3.2:latest\"}'\n```\n\n## Frontend Integration\n\nYour React frontend should connect to:\n- **Backend**: `http://localhost:8000`\n- **Chat endpoint**: `http://localhost:8000/chat`\n\n## What's Next\n\nThis simple backend is ready for:\n- ✅ **Real-time chat** with local LLMs\n- 🔜 **Document upload** for RAG\n- 🔜 **Vector database** integration\n- 🔜 **Streaming responses**\n- 🔜 **Chat history** persistence "
  },
  {
    "path": "backend/database.py",
    "content": "import sqlite3\nimport uuid\nimport json\nfrom datetime import datetime\nfrom typing import List, Dict, Optional, Tuple\n\nclass ChatDatabase:\n    def __init__(self, db_path: str = None):\n        if db_path is None:\n            # Auto-detect environment and set appropriate path\n            import os\n            if os.path.exists(\"/app\"):  # Docker environment\n                self.db_path = \"/app/backend/chat_data.db\"\n            else:  # Local development environment\n                self.db_path = \"backend/chat_data.db\"\n        else:\n            self.db_path = db_path\n        self.init_database()\n    \n    def init_database(self):\n        \"\"\"Initialize the SQLite database with required tables\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        cursor = conn.cursor()\n        \n        # Enable foreign keys\n        conn.execute(\"PRAGMA foreign_keys = ON\")\n        \n        # Sessions table\n        conn.execute('''\n            CREATE TABLE IF NOT EXISTS sessions (\n                id TEXT PRIMARY KEY,\n                title TEXT NOT NULL,\n                created_at TEXT NOT NULL,\n                updated_at TEXT NOT NULL,\n                model_used TEXT NOT NULL,\n                message_count INTEGER DEFAULT 0\n            )\n        ''')\n        \n        # Messages table\n        conn.execute('''\n            CREATE TABLE IF NOT EXISTS messages (\n                id TEXT PRIMARY KEY,\n                session_id TEXT NOT NULL,\n                content TEXT NOT NULL,\n                sender TEXT NOT NULL CHECK (sender IN ('user', 'assistant')),\n                timestamp TEXT NOT NULL,\n                metadata TEXT DEFAULT '{}',\n                FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE\n            )\n        ''')\n        \n        # Create indexes for better performance\n        conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_session_id ON messages(session_id)')\n        conn.execute('CREATE INDEX IF NOT EXISTS idx_messages_timestamp ON messages(timestamp)')\n        conn.execute('CREATE INDEX IF NOT EXISTS idx_sessions_updated_at ON sessions(updated_at)')\n        \n        # Documents table\n        conn.execute('''\n            CREATE TABLE IF NOT EXISTS session_documents (\n                id INTEGER PRIMARY KEY AUTOINCREMENT,\n                session_id TEXT NOT NULL,\n                file_path TEXT NOT NULL,\n                indexed INTEGER DEFAULT 0,\n                FOREIGN KEY (session_id) REFERENCES sessions (id) ON DELETE CASCADE\n            )\n        ''')\n        conn.execute('CREATE INDEX IF NOT EXISTS idx_session_documents_session_id ON session_documents(session_id)')\n        \n        # --- NEW: Index persistence tables ---\n        cursor.execute('''\n            CREATE TABLE IF NOT EXISTS indexes (\n                id TEXT PRIMARY KEY,\n                name TEXT UNIQUE,\n                description TEXT,\n                created_at TEXT,\n                updated_at TEXT,\n                vector_table_name TEXT,\n                metadata TEXT\n            )\n        ''')\n\n        cursor.execute('''\n            CREATE TABLE IF NOT EXISTS index_documents (\n                id INTEGER PRIMARY KEY AUTOINCREMENT,\n                index_id TEXT,\n                original_filename TEXT,\n                stored_path TEXT,\n                FOREIGN KEY(index_id) REFERENCES indexes(id)\n            )\n        ''')\n\n        cursor.execute('''\n            CREATE TABLE IF NOT EXISTS session_indexes (\n                id INTEGER PRIMARY KEY AUTOINCREMENT,\n                session_id TEXT,\n                index_id TEXT,\n                linked_at TEXT,\n                FOREIGN KEY(session_id) REFERENCES sessions(id),\n                FOREIGN KEY(index_id) REFERENCES indexes(id)\n            )\n        ''')\n        \n        conn.commit()\n        conn.close()\n        print(\"✅ Database initialized successfully\")\n    \n    def create_session(self, title: str, model: str) -> str:\n        \"\"\"Create a new chat session\"\"\"\n        session_id = str(uuid.uuid4())\n        now = datetime.now().isoformat()\n        \n        conn = sqlite3.connect(self.db_path)\n        conn.execute('''\n            INSERT INTO sessions (id, title, created_at, updated_at, model_used)\n            VALUES (?, ?, ?, ?, ?)\n        ''', (session_id, title, now, now, model))\n        conn.commit()\n        conn.close()\n        \n        print(f\"📝 Created new session: {session_id[:8]}... - {title}\")\n        return session_id\n    \n    def get_sessions(self, limit: int = 50) -> List[Dict]:\n        \"\"\"Get all chat sessions, ordered by most recent\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        \n        cursor = conn.execute('''\n            SELECT id, title, created_at, updated_at, model_used, message_count\n            FROM sessions\n            ORDER BY updated_at DESC\n            LIMIT ?\n        ''', (limit,))\n        \n        sessions = [dict(row) for row in cursor.fetchall()]\n        conn.close()\n        \n        return sessions\n    \n    def get_session(self, session_id: str) -> Optional[Dict]:\n        \"\"\"Get a specific session\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        \n        cursor = conn.execute('''\n            SELECT id, title, created_at, updated_at, model_used, message_count\n            FROM sessions\n            WHERE id = ?\n        ''', (session_id,))\n        \n        row = cursor.fetchone()\n        conn.close()\n        \n        return dict(row) if row else None\n    \n    def add_message(self, session_id: str, content: str, sender: str, metadata: Dict = None) -> str:\n        \"\"\"Add a message to a session\"\"\"\n        message_id = str(uuid.uuid4())\n        now = datetime.now().isoformat()\n        metadata_json = json.dumps(metadata or {})\n        \n        conn = sqlite3.connect(self.db_path)\n        \n        # Add the message\n        conn.execute('''\n            INSERT INTO messages (id, session_id, content, sender, timestamp, metadata)\n            VALUES (?, ?, ?, ?, ?, ?)\n        ''', (message_id, session_id, content, sender, now, metadata_json))\n        \n        # Update session timestamp and message count\n        conn.execute('''\n            UPDATE sessions \n            SET updated_at = ?, \n                message_count = message_count + 1\n            WHERE id = ?\n        ''', (now, session_id))\n        \n        conn.commit()\n        conn.close()\n        \n        return message_id\n    \n    def get_messages(self, session_id: str, limit: int = 100) -> List[Dict]:\n        \"\"\"Get all messages for a session\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        \n        cursor = conn.execute('''\n            SELECT id, content, sender, timestamp, metadata\n            FROM messages\n            WHERE session_id = ?\n            ORDER BY timestamp ASC\n            LIMIT ?\n        ''', (session_id, limit))\n        \n        messages = []\n        for row in cursor.fetchall():\n            message = dict(row)\n            message['metadata'] = json.loads(message['metadata'])\n            messages.append(message)\n        \n        conn.close()\n        return messages\n    \n    def get_conversation_history(self, session_id: str) -> List[Dict]:\n        \"\"\"Get conversation history in the format expected by Ollama\"\"\"\n        messages = self.get_messages(session_id)\n        \n        history = []\n        for msg in messages:\n            history.append({\n                \"role\": msg[\"sender\"],\n                \"content\": msg[\"content\"]\n            })\n        \n        return history\n    \n    def update_session_title(self, session_id: str, title: str):\n        \"\"\"Update session title\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.execute('''\n            UPDATE sessions \n            SET title = ?, updated_at = ?\n            WHERE id = ?\n        ''', (title, datetime.now().isoformat(), session_id))\n        conn.commit()\n        conn.close()\n    \n    def delete_session(self, session_id: str) -> bool:\n        \"\"\"Delete a session and all its messages\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))\n        deleted = cursor.rowcount > 0\n        conn.commit()\n        conn.close()\n        \n        if deleted:\n            print(f\"🗑️ Deleted session: {session_id[:8]}...\")\n        \n        return deleted\n    \n    def cleanup_empty_sessions(self) -> int:\n        \"\"\"Remove sessions with no messages\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        \n        # Find sessions with no messages\n        cursor = conn.execute('''\n            SELECT s.id FROM sessions s\n            LEFT JOIN messages m ON s.id = m.session_id\n            WHERE m.id IS NULL\n        ''')\n        \n        empty_sessions = [row[0] for row in cursor.fetchall()]\n        \n        # Delete empty sessions\n        deleted_count = 0\n        for session_id in empty_sessions:\n            cursor = conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))\n            if cursor.rowcount > 0:\n                deleted_count += 1\n                print(f\"🗑️ Cleaned up empty session: {session_id[:8]}...\")\n        \n        conn.commit()\n        conn.close()\n        \n        if deleted_count > 0:\n            print(f\"✨ Cleaned up {deleted_count} empty sessions\")\n        \n        return deleted_count\n    \n    def get_stats(self) -> Dict:\n        \"\"\"Get database statistics\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        \n        # Get session count\n        cursor = conn.execute('SELECT COUNT(*) FROM sessions')\n        session_count = cursor.fetchone()[0]\n        \n        # Get message count\n        cursor = conn.execute('SELECT COUNT(*) FROM messages')\n        message_count = cursor.fetchone()[0]\n        \n        # Get most used model\n        cursor = conn.execute('''\n            SELECT model_used, COUNT(*) as count\n            FROM sessions\n            GROUP BY model_used\n            ORDER BY count DESC\n            LIMIT 1\n        ''')\n        most_used_model = cursor.fetchone()\n        \n        conn.close()\n        \n        return {\n            \"total_sessions\": session_count,\n            \"total_messages\": message_count,\n            \"most_used_model\": most_used_model[0] if most_used_model else None\n        }\n\n    def add_document_to_session(self, session_id: str, file_path: str) -> int:\n        \"\"\"Adds a document file path to a session.\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        cursor = conn.execute(\n            \"INSERT INTO session_documents (session_id, file_path) VALUES (?, ?)\",\n            (session_id, file_path)\n        )\n        doc_id = cursor.lastrowid\n        conn.commit()\n        conn.close()\n        print(f\"📄 Added document '{file_path}' to session {session_id[:8]}...\")\n        return doc_id\n\n    def get_documents_for_session(self, session_id: str) -> List[str]:\n        \"\"\"Retrieves all document file paths for a given session.\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        cursor = conn.execute(\n            \"SELECT file_path FROM session_documents WHERE session_id = ?\",\n            (session_id,)\n        )\n        paths = [row[0] for row in cursor.fetchall()]\n        conn.close()\n        return paths\n\n    # -------- Index helpers ---------\n\n    def create_index(self, name: str, description: str|None = None, metadata: dict | None = None) -> str:\n        idx_id = str(uuid.uuid4())\n        created = datetime.now().isoformat()\n        vector_table = f\"text_pages_{idx_id}\"\n        conn = sqlite3.connect(self.db_path)\n        conn.execute('''\n            INSERT INTO indexes (id, name, description, created_at, updated_at, vector_table_name, metadata)\n            VALUES (?,?,?,?,?,?,?)\n        ''', (idx_id, name, description, created, created, vector_table, json.dumps(metadata or {})))\n        conn.commit()\n        conn.close()\n        print(f\"📂 Created new index '{name}' ({idx_id[:8]})\")\n        return idx_id\n\n    def get_index(self, index_id: str) -> dict | None:\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        cur = conn.execute('SELECT * FROM indexes WHERE id=?', (index_id,))\n        row = cur.fetchone()\n        if not row:\n            conn.close()\n            return None\n        idx = dict(row)\n        idx['metadata'] = json.loads(idx['metadata'] or '{}')\n        cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (index_id,))\n        docs = [{'filename': r[0], 'stored_path': r[1]} for r in cur.fetchall()]\n        idx['documents'] = docs\n        conn.close()\n        return idx\n\n    def list_indexes(self) -> list[dict]:\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        rows = conn.execute('SELECT * FROM indexes').fetchall()\n        res = []\n        for r in rows:\n            item = dict(r)\n            item['metadata'] = json.loads(item['metadata'] or '{}')\n            # attach documents list for convenience\n            docs_cur = conn.execute('SELECT original_filename, stored_path FROM index_documents WHERE index_id=?', (item['id'],))\n            docs = [{'filename':d[0],'stored_path':d[1]} for d in docs_cur.fetchall()]\n            item['documents'] = docs\n            res.append(item)\n        conn.close()\n        return res\n\n    def add_document_to_index(self, index_id: str, filename: str, stored_path: str):\n        conn = sqlite3.connect(self.db_path)\n        conn.execute('INSERT INTO index_documents (index_id, original_filename, stored_path) VALUES (?,?,?)', (index_id, filename, stored_path))\n        conn.commit()\n        conn.close()\n\n    def link_index_to_session(self, session_id: str, index_id: str):\n        conn = sqlite3.connect(self.db_path)\n        conn.execute('INSERT INTO session_indexes (session_id, index_id, linked_at) VALUES (?,?,?)', (session_id, index_id, datetime.now().isoformat()))\n        conn.commit()\n        conn.close()\n\n    def get_indexes_for_session(self, session_id: str) -> list[str]:\n        conn = sqlite3.connect(self.db_path)\n        cursor = conn.execute('SELECT index_id FROM session_indexes WHERE session_id=? ORDER BY linked_at', (session_id,))\n        ids = [r[0] for r in cursor.fetchall()]\n        conn.close()\n        return ids\n\n    def delete_index(self, index_id: str) -> bool:\n        \"\"\"Delete an index and its related records (documents, session links). Returns True if deleted.\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        try:\n            # Get vector table name before deletion (optional, for LanceDB cleanup)\n            cur = conn.execute('SELECT vector_table_name FROM indexes WHERE id = ?', (index_id,))\n            row = cur.fetchone()\n            vector_table_name = row[0] if row else None\n\n            # Remove child rows first due to foreign‐key constraints\n            conn.execute('DELETE FROM index_documents WHERE index_id = ?', (index_id,))\n            conn.execute('DELETE FROM session_indexes WHERE index_id = ?', (index_id,))\n            cursor = conn.execute('DELETE FROM indexes WHERE id = ?', (index_id,))\n            deleted = cursor.rowcount > 0\n            conn.commit()\n        finally:\n            conn.close()\n\n        if deleted:\n            print(f\"🗑️ Deleted index {index_id[:8]}... and related records\")\n            # Optional: attempt to drop LanceDB table if available\n            if vector_table_name:\n                try:\n                    from rag_system.indexing.embedders import LanceDBManager\n                    import os\n                    db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'\n                    ldb = LanceDBManager(db_path)\n                    db = ldb.db\n                    if hasattr(db, 'table_names') and vector_table_name in db.table_names():\n                        db.drop_table(vector_table_name)\n                        print(f\"🚮 Dropped LanceDB table '{vector_table_name}'\")\n                except Exception as e:\n                    print(f\"⚠️ Could not drop LanceDB table '{vector_table_name}': {e}\")\n        return deleted\n\n    def update_index_metadata(self, index_id: str, updates: dict):\n        \"\"\"Merge new key/values into an index's metadata JSON column.\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.row_factory = sqlite3.Row\n        cur = conn.execute('SELECT metadata FROM indexes WHERE id=?', (index_id,))\n        row = cur.fetchone()\n        if row is None:\n            conn.close()\n            raise ValueError(\"Index not found\")\n        existing = json.loads(row['metadata'] or '{}')\n        existing.update(updates)\n        conn.execute('UPDATE indexes SET metadata=?, updated_at=? WHERE id=?', (json.dumps(existing), datetime.now().isoformat(), index_id))\n        conn.commit()\n        conn.close()\n\n    def inspect_and_populate_index_metadata(self, index_id: str) -> dict:\n        \"\"\"\n        Inspect LanceDB table to extract metadata for older indexes.\n        Returns the inferred metadata or empty dict if inspection fails.\n        \"\"\"\n        try:\n            # Get index info\n            index_info = self.get_index(index_id)\n            if not index_info:\n                return {}\n            \n            # Check if metadata is already populated\n            if index_info.get('metadata') and len(index_info['metadata']) > 0:\n                return index_info['metadata']\n            \n            # Try to inspect the LanceDB table\n            vector_table_name = index_info.get('vector_table_name')\n            if not vector_table_name:\n                return {}\n            \n            try:\n                # Try to import the RAG system modules\n                try:\n                    from rag_system.indexing.embedders import LanceDBManager\n                    import os\n                    \n                    # Use the same path as the system\n                    db_path = os.getenv('LANCEDB_PATH') or './rag_system/index_store/lancedb'\n                    ldb = LanceDBManager(db_path)\n                    \n                    # Check if table exists\n                    if not hasattr(ldb.db, 'table_names') or vector_table_name not in ldb.db.table_names():\n                        # Table doesn't exist - this means the index was never properly built\n                        inferred_metadata = {\n                            'status': 'incomplete',\n                            'issue': 'Vector table not found - index may not have been built properly',\n                            'vector_table_expected': vector_table_name,\n                            'available_tables': list(ldb.db.table_names()) if hasattr(ldb.db, 'table_names') else [],\n                            'metadata_inferred_at': datetime.now().isoformat(),\n                            'metadata_source': 'lancedb_inspection'\n                        }\n                        self.update_index_metadata(index_id, inferred_metadata)\n                        print(f\"⚠️ Index {index_id[:8]}... appears incomplete - vector table missing\")\n                        return inferred_metadata\n                    \n                    # Get table and inspect schema/data\n                    table = ldb.db.open_table(vector_table_name)\n                    \n                    # Get a sample record to inspect - use correct LanceDB API\n                    try:\n                        # Try to get sample data using proper LanceDB methods\n                        sample_df = table.to_pandas()\n                        if len(sample_df) == 0:\n                            inferred_metadata = {\n                                'status': 'empty',\n                                'issue': 'Vector table exists but contains no data',\n                                'metadata_inferred_at': datetime.now().isoformat(),\n                                'metadata_source': 'lancedb_inspection'\n                            }\n                            self.update_index_metadata(index_id, inferred_metadata)\n                            return inferred_metadata\n                        \n                        # Take only first row for inspection\n                        sample_df = sample_df.head(1)\n                    except Exception as e:\n                        print(f\"⚠️ Could not read data from table {vector_table_name}: {e}\")\n                        return {}\n                    \n                    # Infer metadata from table structure\n                    inferred_metadata = {\n                        'status': 'functional',\n                        'total_chunks': len(table.to_pandas()),  # Get total count\n                    }\n                    \n                    # Check vector dimensions\n                    if 'vector' in sample_df.columns:\n                        vector_data = sample_df['vector'].iloc[0]\n                        if isinstance(vector_data, list):\n                            inferred_metadata['vector_dimensions'] = len(vector_data)\n                            \n                            # Try to infer embedding model from vector dimensions\n                            dim_to_model = {\n                                384: 'BAAI/bge-small-en-v1.5 (or similar)',\n                                512: 'sentence-transformers/all-MiniLM-L6-v2 (or similar)',\n                                768: 'BAAI/bge-base-en-v1.5 (or similar)', \n                                1024: 'Qwen/Qwen3-Embedding-0.6B (or similar)',\n                                1536: 'text-embedding-ada-002 (or similar)'\n                            }\n                            if len(vector_data) in dim_to_model:\n                                inferred_metadata['embedding_model_inferred'] = dim_to_model[len(vector_data)]\n                    \n                    # Try to parse metadata from sample record\n                    if 'metadata' in sample_df.columns:\n                        try:\n                            sample_metadata = json.loads(sample_df['metadata'].iloc[0])\n                            # Look for common metadata fields that might give us clues\n                            if 'document_id' in sample_metadata:\n                                inferred_metadata['has_document_structure'] = True\n                            if 'chunk_index' in sample_metadata:\n                                inferred_metadata['has_chunk_indexing'] = True\n                            if 'original_text' in sample_metadata:\n                                inferred_metadata['has_contextual_enrichment'] = True\n                                inferred_metadata['retrieval_mode_inferred'] = 'hybrid (contextual enrichment detected)'\n                            \n                            # Check for chunk size patterns\n                            if 'text' in sample_df.columns:\n                                text_length = len(sample_df['text'].iloc[0])\n                                if text_length > 0:\n                                    inferred_metadata['sample_chunk_length'] = text_length\n                                    # Rough chunk size estimation\n                                    estimated_tokens = text_length // 4  # rough estimate: 4 chars per token\n                                    if estimated_tokens < 300:\n                                        inferred_metadata['chunk_size_inferred'] = '256 tokens (estimated)'\n                                    elif estimated_tokens < 600:\n                                        inferred_metadata['chunk_size_inferred'] = '512 tokens (estimated)'\n                                    else:\n                                        inferred_metadata['chunk_size_inferred'] = '1024+ tokens (estimated)'\n                                        \n                        except (json.JSONDecodeError, KeyError):\n                            pass\n                    \n                    # Check if FTS index exists\n                    try:\n                        indices = table.list_indices()\n                        fts_exists = any('fts' in idx.name.lower() for idx in indices)\n                        if fts_exists:\n                            inferred_metadata['has_fts_index'] = True\n                            inferred_metadata['retrieval_mode_inferred'] = 'hybrid (FTS + vector)'\n                        else:\n                            inferred_metadata['retrieval_mode_inferred'] = 'vector-only'\n                    except:\n                        pass\n                    \n                    # Add inspection timestamp\n                    inferred_metadata['metadata_inferred_at'] = datetime.now().isoformat()\n                    inferred_metadata['metadata_source'] = 'lancedb_inspection'\n                    \n                    # Update the database with inferred metadata\n                    if inferred_metadata:\n                        self.update_index_metadata(index_id, inferred_metadata)\n                        print(f\"🔍 Inferred metadata for index {index_id[:8]}...: {len(inferred_metadata)} fields\")\n                    \n                    return inferred_metadata\n                    \n                except ImportError as import_error:\n                    # RAG system modules not available - provide basic fallback metadata\n                    print(f\"⚠️ RAG system modules not available for inspection: {import_error}\")\n                    \n                    # Check if this is actually a legacy index by looking at creation date\n                    created_at = index_info.get('created_at', '')\n                    is_recent = False\n                    if created_at:\n                        try:\n                            from datetime import datetime, timedelta\n                            created_date = datetime.fromisoformat(created_at.replace('Z', '+00:00'))\n                            # Consider indexes created in the last 30 days as \"recent\"\n                            is_recent = created_date > datetime.now().replace(tzinfo=created_date.tzinfo) - timedelta(days=30)\n                        except:\n                            pass\n                    \n                    # Provide basic fallback metadata with better status detection\n                    if is_recent:\n                        status = 'functional'\n                        issue = 'Detailed configuration inspection requires RAG system modules, but index appears functional'\n                    else:\n                        status = 'legacy'\n                        issue = 'This index was created before metadata tracking was implemented. Configuration details are not available.'\n                    \n                    fallback_metadata = {\n                        'status': status,\n                        'issue': issue,\n                        'metadata_inferred_at': datetime.now().isoformat(),\n                        'metadata_source': 'fallback_inspection',\n                        'documents_count': len(index_info.get('documents', [])),\n                        'created_at': index_info.get('created_at', 'unknown'),\n                        'inspection_limitation': 'Backend server cannot access full RAG system modules for detailed inspection'\n                    }\n                    \n                    # Try to infer some basic info from the vector table name\n                    if vector_table_name:\n                        fallback_metadata['vector_table_name'] = vector_table_name\n                        fallback_metadata['note'] = 'Vector table exists but detailed inspection requires RAG system modules'\n                    \n                    self.update_index_metadata(index_id, fallback_metadata)\n                    status_msg = \"recent but limited inspection\" if is_recent else \"legacy\"\n                    print(f\"📝 Added fallback metadata for {status_msg} index {index_id[:8]}...\")\n                    return fallback_metadata\n                    \n            except Exception as e:\n                print(f\"⚠️ Could not inspect LanceDB table for index {index_id[:8]}...: {e}\")\n                return {}\n                \n        except Exception as e:\n            print(f\"⚠️ Failed to inspect index metadata for {index_id[:8]}...: {e}\")\n            return {}\n\ndef generate_session_title(first_message: str, max_length: int = 50) -> str:\n    \"\"\"Generate a session title from the first message\"\"\"\n    # Clean up the message\n    title = first_message.strip()\n    \n    # Remove common prefixes\n    prefixes = [\"hey\", \"hi\", \"hello\", \"can you\", \"please\", \"i want\", \"i need\"]\n    title_lower = title.lower()\n    for prefix in prefixes:\n        if title_lower.startswith(prefix):\n            title = title[len(prefix):].strip()\n            break\n    \n    # Capitalize first letter\n    if title:\n        title = title[0].upper() + title[1:]\n    \n    # Truncate if too long\n    if len(title) > max_length:\n        title = title[:max_length].strip() + \"...\"\n    \n    # Fallback\n    if not title or len(title) < 3:\n        title = \"New Chat\"\n    \n    return title\n\n# Global database instance\ndb = ChatDatabase()\n\nif __name__ == \"__main__\":\n    # Test the database\n    print(\"🧪 Testing database...\")\n    \n    # Create a test session\n    session_id = db.create_session(\"Test Chat\", \"llama3.2:latest\")\n    \n    # Add some messages\n    db.add_message(session_id, \"Hello!\", \"user\")\n    db.add_message(session_id, \"Hi there! How can I help you?\", \"assistant\")\n    \n    # Get messages\n    messages = db.get_messages(session_id)\n    print(f\"📨 Messages: {len(messages)}\")\n    \n    # Get sessions\n    sessions = db.get_sessions()\n    print(f\"📋 Sessions: {len(sessions)}\")\n    \n    # Get stats\n    stats = db.get_stats()\n    print(f\"📊 Stats: {stats}\")\n    \n    print(\"✅ Database test completed!\")  "
  },
  {
    "path": "backend/ollama_client.py",
    "content": "import requests\nimport json\nimport os\nfrom typing import List, Dict, Optional\n\nclass OllamaClient:\n    def __init__(self, base_url: Optional[str] = None):\n        if base_url is None:\n            base_url = os.getenv(\"OLLAMA_HOST\", \"http://localhost:11434\")\n        self.base_url = base_url\n        self.api_url = f\"{base_url}/api\"\n    \n    def is_ollama_running(self) -> bool:\n        \"\"\"Check if Ollama server is running\"\"\"\n        try:\n            response = requests.get(f\"{self.base_url}/api/tags\", timeout=5)\n            return response.status_code == 200\n        except requests.exceptions.RequestException:\n            return False\n    \n    def list_models(self) -> List[str]:\n        \"\"\"Get list of available models\"\"\"\n        try:\n            response = requests.get(f\"{self.api_url}/tags\")\n            if response.status_code == 200:\n                models = response.json().get(\"models\", [])\n                return [model[\"name\"] for model in models]\n            return []\n        except requests.exceptions.RequestException as e:\n            print(f\"Error fetching models: {e}\")\n            return []\n    \n    def pull_model(self, model_name: str) -> bool:\n        \"\"\"Pull a model if not available\"\"\"\n        try:\n            response = requests.post(\n                f\"{self.api_url}/pull\",\n                json={\"name\": model_name},\n                stream=True\n            )\n            \n            if response.status_code == 200:\n                print(f\"Pulling model {model_name}...\")\n                for line in response.iter_lines():\n                    if line:\n                        data = json.loads(line)\n                        if \"status\" in data:\n                            print(f\"Status: {data['status']}\")\n                        if data.get(\"status\") == \"success\":\n                            return True\n                return True\n            return False\n        except requests.exceptions.RequestException as e:\n            print(f\"Error pulling model: {e}\")\n            return False\n    \n    def chat(self, message: str, model: str = \"llama3.2\", conversation_history: List[Dict] = None, enable_thinking: bool = True) -> str:\n        \"\"\"Send a chat message to Ollama\"\"\"\n        if conversation_history is None:\n            conversation_history = []\n        \n        # Add user message to conversation\n        messages = conversation_history + [{\"role\": \"user\", \"content\": message}]\n        \n        try:\n            payload = {\n                \"model\": model,\n                \"messages\": messages,\n                \"stream\": False,\n            }\n            \n            # Multiple approaches to disable thinking tokens\n            if not enable_thinking:\n                payload.update({\n                    \"think\": False,  # Native Ollama parameter\n                    \"options\": {\n                        \"think\": False,\n                        \"thinking\": False,\n                        \"temperature\": 0.7,\n                        \"top_p\": 0.9\n                    }\n                })\n            else:\n                payload[\"think\"] = True\n            \n            response = requests.post(\n                f\"{self.api_url}/chat\",\n                json=payload,\n                timeout=60\n            )\n            \n            if response.status_code == 200:\n                result = response.json()\n                response_text = result[\"message\"][\"content\"]\n                \n                # Additional cleanup: remove any thinking tokens that might slip through\n                if not enable_thinking:\n                    # Remove common thinking token patterns\n                    import re\n                    response_text = re.sub(r'<think>.*?</think>', '', response_text, flags=re.DOTALL | re.IGNORECASE)\n                    response_text = re.sub(r'<thinking>.*?</thinking>', '', response_text, flags=re.DOTALL | re.IGNORECASE)\n                    response_text = response_text.strip()\n                \n                return response_text\n            else:\n                return f\"Error: {response.status_code} - {response.text}\"\n                \n        except requests.exceptions.RequestException as e:\n            return f\"Connection error: {e}\"\n    \n    def chat_stream(self, message: str, model: str = \"llama3.2\", conversation_history: List[Dict] = None, enable_thinking: bool = True):\n        \"\"\"Stream chat response from Ollama\"\"\"\n        if conversation_history is None:\n            conversation_history = []\n        \n        messages = conversation_history + [{\"role\": \"user\", \"content\": message}]\n        \n        try:\n            payload = {\n                \"model\": model,\n                \"messages\": messages,\n                \"stream\": True,\n            }\n            \n            # Multiple approaches to disable thinking tokens\n            if not enable_thinking:\n                payload.update({\n                    \"think\": False,  # Native Ollama parameter\n                    \"options\": {\n                        \"think\": False,\n                        \"thinking\": False,\n                        \"temperature\": 0.7,\n                        \"top_p\": 0.9\n                    }\n                })\n            else:\n                payload[\"think\"] = True\n            \n            response = requests.post(\n                f\"{self.api_url}/chat\",\n                json=payload,\n                stream=True,\n                timeout=60\n            )\n            \n            if response.status_code == 200:\n                for line in response.iter_lines():\n                    if line:\n                        try:\n                            data = json.loads(line)\n                            if \"message\" in data and \"content\" in data[\"message\"]:\n                                content = data[\"message\"][\"content\"]\n                                \n                                # Filter out thinking tokens in streaming mode\n                                if not enable_thinking:\n                                    # Skip content that looks like thinking tokens\n                                    if '<think>' in content.lower() or '<thinking>' in content.lower():\n                                        continue\n                                \n                                yield content\n                        except json.JSONDecodeError:\n                            continue\n            else:\n                yield f\"Error: {response.status_code} - {response.text}\"\n                \n        except requests.exceptions.RequestException as e:\n            yield f\"Connection error: {e}\"\n\ndef main():\n    \"\"\"Test the Ollama client\"\"\"\n    client = OllamaClient()\n    \n    # Check if Ollama is running\n    if not client.is_ollama_running():\n        print(\"❌ Ollama is not running. Please start Ollama first.\")\n        print(\"Install: https://ollama.ai\")\n        print(\"Run: ollama serve\")\n        return\n    \n    print(\"✅ Ollama is running!\")\n    \n    # List available models\n    models = client.list_models()\n    print(f\"Available models: {models}\")\n    \n    # Try to use llama3.2, pull if needed\n    model_name = \"llama3.2\"\n    if model_name not in [m.split(\":\")[0] for m in models]:\n        print(f\"Model {model_name} not found. Pulling...\")\n        if client.pull_model(model_name):\n            print(f\"✅ Model {model_name} pulled successfully!\")\n        else:\n            print(f\"❌ Failed to pull model {model_name}\")\n            return\n    \n    # Test chat\n    print(\"\\n🤖 Testing chat...\")\n    response = client.chat(\"Hello! Can you tell me a short joke?\", model_name)\n    print(f\"AI: {response}\")\n\nif __name__ == \"__main__\":\n    main()    "
  },
  {
    "path": "backend/requirements.txt",
    "content": "requests\npython-dotenv\nPyPDF2 "
  },
  {
    "path": "backend/server.py",
    "content": "import json\nimport http.server\nimport socketserver\nimport cgi\nimport os\nimport uuid\nfrom urllib.parse import urlparse, parse_qs\nimport requests  # 🆕 Import requests for making HTTP calls\nimport sys\nfrom datetime import datetime\n\n# Add parent directory to path so we can import rag_system modules\nsys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))\n\n# Import RAG system modules for complete metadata\ntry:\n    from rag_system.main import PIPELINE_CONFIGS\n    RAG_SYSTEM_AVAILABLE = True\n    print(\"✅ RAG system modules accessible from backend\")\nexcept ImportError as e:\n    PIPELINE_CONFIGS = {}\n    RAG_SYSTEM_AVAILABLE = False\n    print(f\"⚠️ RAG system modules not available: {e}\")\n\nfrom ollama_client import OllamaClient\nfrom database import db, generate_session_title\nimport simple_pdf_processor as pdf_module\nfrom simple_pdf_processor import initialize_simple_pdf_processor\nfrom typing import List, Dict, Any\nimport re\n\n# 🆕 Reusable TCPServer with address reuse enabled\nclass ReusableTCPServer(socketserver.TCPServer):\n    allow_reuse_address = True\n\nclass ChatHandler(http.server.BaseHTTPRequestHandler):\n    def __init__(self, *args, **kwargs):\n        self.ollama_client = OllamaClient()\n        super().__init__(*args, **kwargs)\n    \n    def do_OPTIONS(self):\n        \"\"\"Handle CORS preflight requests\"\"\"\n        self.send_response(200)\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.send_header('Access-Control-Allow-Methods', 'GET, POST, DELETE, OPTIONS')\n        self.send_header('Access-Control-Allow-Headers', 'Content-Type')\n        self.end_headers()\n    \n    def do_GET(self):\n        \"\"\"Handle GET requests\"\"\"\n        parsed_path = urlparse(self.path)\n        \n        if parsed_path.path == '/health':\n            self.send_json_response({\n                \"status\": \"ok\",\n                \"ollama_running\": self.ollama_client.is_ollama_running(),\n                \"available_models\": self.ollama_client.list_models(),\n                \"database_stats\": db.get_stats()\n            })\n        elif parsed_path.path == '/sessions':\n            self.handle_get_sessions()\n        elif parsed_path.path == '/sessions/cleanup':\n            self.handle_cleanup_sessions()\n        elif parsed_path.path == '/models':\n            self.handle_get_models()\n        elif parsed_path.path == '/indexes':\n            self.handle_get_indexes()\n        elif parsed_path.path.startswith('/indexes/') and parsed_path.path.count('/') == 2:\n            index_id = parsed_path.path.split('/')[-1]\n            self.handle_get_index(index_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/documents'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_get_session_documents(session_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/indexes'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_get_session_indexes(session_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.count('/') == 2:\n            session_id = parsed_path.path.split('/')[-1]\n            self.handle_get_session(session_id)\n        else:\n            self.send_response(404)\n            self.end_headers()\n    \n    def do_POST(self):\n        \"\"\"Handle POST requests\"\"\"\n        parsed_path = urlparse(self.path)\n        \n        if parsed_path.path == '/chat':\n            self.handle_chat()\n        elif parsed_path.path == '/sessions':\n            self.handle_create_session()\n        elif parsed_path.path == '/indexes':\n            self.handle_create_index()\n        elif parsed_path.path.startswith('/indexes/') and parsed_path.path.endswith('/upload'):\n            index_id = parsed_path.path.split('/')[-2]\n            self.handle_index_file_upload(index_id)\n        elif parsed_path.path.startswith('/indexes/') and parsed_path.path.endswith('/build'):\n            index_id = parsed_path.path.split('/')[-2]\n            self.handle_build_index(index_id)\n        elif parsed_path.path.startswith('/sessions/') and '/indexes/' in parsed_path.path:\n            parts = parsed_path.path.split('/')\n            session_id = parts[2]\n            index_id = parts[4]\n            self.handle_link_index_to_session(session_id, index_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/messages'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_session_chat(session_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/upload'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_file_upload(session_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/index'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_index_documents(session_id)\n        elif parsed_path.path.startswith('/sessions/') and parsed_path.path.endswith('/rename'):\n            session_id = parsed_path.path.split('/')[-2]\n            self.handle_rename_session(session_id)\n        else:\n            self.send_response(404)\n            self.end_headers()\n\n    def do_DELETE(self):\n        \"\"\"Handle DELETE requests\"\"\"\n        parsed_path = urlparse(self.path)\n        \n        if parsed_path.path.startswith('/sessions/') and parsed_path.path.count('/') == 2:\n            session_id = parsed_path.path.split('/')[-1]\n            self.handle_delete_session(session_id)\n        elif parsed_path.path.startswith('/indexes/') and parsed_path.path.count('/') == 2:\n            index_id = parsed_path.path.split('/')[-1]\n            self.handle_delete_index(index_id)\n        else:\n            self.send_response(404)\n            self.end_headers()\n    \n    def handle_chat(self):\n        \"\"\"Handle legacy chat requests (without sessions)\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            message = data.get('message', '')\n            model = data.get('model', 'llama3.2:latest')\n            conversation_history = data.get('conversation_history', [])\n            \n            if not message:\n                self.send_json_response({\n                    \"error\": \"Message is required\"\n                }, status_code=400)\n                return\n            \n            # Check if Ollama is running\n            if not self.ollama_client.is_ollama_running():\n                self.send_json_response({\n                    \"error\": \"Ollama is not running. Please start Ollama first.\"\n                }, status_code=503)\n                return\n            \n            # Get response from Ollama\n            response = self.ollama_client.chat(message, model, conversation_history)\n            \n            self.send_json_response({\n                \"response\": response,\n                \"model\": model,\n                \"message_count\": len(conversation_history) + 1\n            })\n            \n        except json.JSONDecodeError:\n            self.send_json_response({\n                \"error\": \"Invalid JSON\"\n            }, status_code=400)\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Server error: {str(e)}\"\n            }, status_code=500)\n    \n    def handle_get_sessions(self):\n        \"\"\"Get all chat sessions\"\"\"\n        try:\n            sessions = db.get_sessions()\n            self.send_json_response({\n                \"sessions\": sessions,\n                \"total\": len(sessions)\n            })\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Failed to get sessions: {str(e)}\"\n            }, status_code=500)\n    \n    def handle_cleanup_sessions(self):\n        \"\"\"Clean up empty sessions\"\"\"\n        try:\n            cleanup_count = db.cleanup_empty_sessions()\n            self.send_json_response({\n                \"message\": f\"Cleaned up {cleanup_count} empty sessions\",\n                \"cleanup_count\": cleanup_count\n            })\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Failed to cleanup sessions: {str(e)}\"\n            }, status_code=500)\n    \n    def handle_get_session(self, session_id: str):\n        \"\"\"Get a specific session with its messages\"\"\"\n        try:\n            session = db.get_session(session_id)\n            if not session:\n                self.send_json_response({\n                    \"error\": \"Session not found\"\n                }, status_code=404)\n                return\n            \n            messages = db.get_messages(session_id)\n            \n            self.send_json_response({\n                \"session\": session,\n                \"messages\": messages\n            })\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Failed to get session: {str(e)}\"\n            }, status_code=500)\n    \n    def handle_get_session_documents(self, session_id: str):\n        \"\"\"Return documents and basic info for a session.\"\"\"\n        try:\n            session = db.get_session(session_id)\n            if not session:\n                self.send_json_response({\"error\": \"Session not found\"}, status_code=404)\n                return\n\n            docs = db.get_documents_for_session(session_id)\n\n            # Extract original filenames from stored paths\n            filenames = [os.path.basename(p).split('_', 1)[-1] if '_' in os.path.basename(p) else os.path.basename(p) for p in docs]\n\n            self.send_json_response({\n                \"session\": session,\n                \"files\": filenames,\n                \"file_count\": len(docs)\n            })\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Failed to get documents: {str(e)}\"}, status_code=500)\n    \n    def handle_create_session(self):\n        \"\"\"Create a new chat session\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            title = data.get('title', 'New Chat')\n            model = data.get('model', 'llama3.2:latest')\n            \n            session_id = db.create_session(title, model)\n            session = db.get_session(session_id)\n            \n            self.send_json_response({\n                \"session\": session,\n                \"session_id\": session_id\n            }, status_code=201)\n            \n        except json.JSONDecodeError:\n            self.send_json_response({\n                \"error\": \"Invalid JSON\"\n            }, status_code=400)\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Failed to create session: {str(e)}\"\n            }, status_code=500)\n    \n    def handle_session_chat(self, session_id: str):\n        \"\"\"\n        Handle chat within a specific session.\n        Intelligently routes between direct LLM (fast) and RAG pipeline (document-aware).\n        \"\"\"\n        try:\n            session = db.get_session(session_id)\n            if not session:\n                self.send_json_response({\"error\": \"Session not found\"}, status_code=404)\n                return\n            \n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            message = data.get('message', '')\n\n            if not message:\n                self.send_json_response({\"error\": \"Message is required\"}, status_code=400)\n                return\n\n            if session['message_count'] == 0:\n                title = generate_session_title(message)\n                db.update_session_title(session_id, title)\n\n            # Add user message to database first\n            user_message_id = db.add_message(session_id, message, \"user\")\n            \n            # 🎯 SMART ROUTING: Decide between direct LLM vs RAG\n            idx_ids = db.get_indexes_for_session(session_id)\n            force_rag = bool(data.get(\"force_rag\", False))\n            use_rag = True if force_rag else self._should_use_rag(message, idx_ids)\n            \n            if use_rag:\n                # 🔍 --- Use RAG Pipeline for Document-Related Queries ---\n                print(f\"🔍 Using RAG pipeline for document query: '{message[:50]}...'\")\n                response_text, source_docs = self._handle_rag_query(session_id, message, data, idx_ids)\n            else:\n                # ⚡ --- Use Direct LLM for General Queries (FAST) ---\n                print(f\"⚡ Using direct LLM for general query: '{message[:50]}...'\")\n                response_text, source_docs = self._handle_direct_llm_query(session_id, message, session)\n\n            # Add AI response to database\n            ai_message_id = db.add_message(session_id, response_text, \"assistant\")\n            \n            updated_session = db.get_session(session_id)\n            \n            # Send response with proper error handling\n            self.send_json_response({\n                \"response\": response_text,\n                \"session\": updated_session,\n                \"source_documents\": source_docs,\n                \"used_rag\": use_rag\n            })\n            \n        except BrokenPipeError:\n            # Client disconnected - this is normal for long queries, just log it\n            print(f\"⚠️  Client disconnected during RAG processing for query: '{message[:30]}...'\")\n        except json.JSONDecodeError:\n            self.send_json_response({\n                \"error\": \"Invalid JSON\"\n            }, status_code=400)\n        except Exception as e:\n            print(f\"❌ Server error in session chat: {str(e)}\")\n            try:\n                self.send_json_response({\n                    \"error\": f\"Server error: {str(e)}\"\n                }, status_code=500)\n            except BrokenPipeError:\n                print(f\"⚠️  Client disconnected during error response\")\n    \n    def _should_use_rag(self, message: str, idx_ids: List[str]) -> bool:\n        \"\"\"\n        🧠 ENHANCED: Determine if a query should use RAG pipeline using document overviews.\n        \n        Args:\n            message: The user's query\n            idx_ids: List of index IDs associated with the session\n            \n        Returns:\n            bool: True if should use RAG, False for direct LLM\n        \"\"\"\n        # No indexes = definitely no RAG needed\n        if not idx_ids:\n            return False\n\n        # Load document overviews for intelligent routing\n        try:\n            doc_overviews = self._load_document_overviews(idx_ids)\n            if doc_overviews:\n                return self._route_using_overviews(message, doc_overviews)\n        except Exception as e:\n            print(f\"⚠️ Overview-based routing failed, falling back to simple routing: {e}\")\n        \n        # Fallback to simple pattern matching if overviews unavailable\n        return self._simple_pattern_routing(message, idx_ids)\n\n    def _load_document_overviews(self, idx_ids: List[str]) -> List[str]:\n        \"\"\"Load and aggregate overviews for the given index IDs.\n        \n        Strategy:\n        1. Attempt to load each index's dedicated overview file.\n        2. Aggregate all overviews found across available files (deduplicated).\n        3. If none of the index files exist, fall back to the legacy global overview file.\n        \"\"\"\n        import os, json\n\n        aggregated: list[str] = []\n\n        # 1️⃣  Collect overviews from per-index files\n        for idx in idx_ids:\n            candidate_paths = [\n                f\"../index_store/overviews/{idx}.jsonl\",\n                f\"index_store/overviews/{idx}.jsonl\",\n                f\"./index_store/overviews/{idx}.jsonl\",\n            ]\n            for p in candidate_paths:\n                if os.path.exists(p):\n                    print(f\"📖 Loading overviews from: {p}\")\n                    try:\n                        with open(p, \"r\", encoding=\"utf-8\") as f:\n                            for line in f:\n                                if not line.strip():\n                                    continue\n                                try:\n                                    record = json.loads(line)\n                                    overview = record.get(\"overview\", \"\").strip()\n                                    if overview:\n                                        aggregated.append(overview)\n                                except json.JSONDecodeError:\n                                    continue  # skip malformed lines\n                        break  # Stop after the first existing path for this idx\n                    except Exception as e:\n                        print(f\"⚠️ Error reading {p}: {e}\")\n                        break  # Don't keep trying other paths for this idx if read failed\n\n        # 2️⃣  Fall back to legacy global file if no per-index overviews found\n        if not aggregated:\n            legacy_paths = [\n                \"../index_store/overviews/overviews.jsonl\",\n                \"index_store/overviews/overviews.jsonl\",\n                \"./index_store/overviews/overviews.jsonl\",\n            ]\n            for p in legacy_paths:\n                if os.path.exists(p):\n                    print(f\"⚠️ Falling back to legacy overviews file: {p}\")\n                    try:\n                        with open(p, \"r\", encoding=\"utf-8\") as f:\n                            for line in f:\n                                if not line.strip():\n                                    continue\n                                try:\n                                    record = json.loads(line)\n                                    overview = record.get(\"overview\", \"\").strip()\n                                    if overview:\n                                        aggregated.append(overview)\n                                except json.JSONDecodeError:\n                                    continue\n                    except Exception as e:\n                        print(f\"⚠️ Error reading legacy overviews file {p}: {e}\")\n                    break\n\n        # Limit for performance\n        if aggregated:\n            print(f\"✅ Loaded {len(aggregated)} document overviews from {len(idx_ids)} index(es)\")\n        else:\n            print(f\"⚠️ No overviews found for indices {idx_ids}\")\n        return aggregated[:40]\n\n    def _route_using_overviews(self, query: str, overviews: List[str]) -> bool:\n        \"\"\"\n        🎯 Use document overviews and LLM to make intelligent routing decisions.\n        \n        Returns True if RAG should be used, False for direct LLM.\n        \"\"\"\n        if not overviews:\n            return False\n        \n        # Format overviews for the routing prompt\n        overviews_block = \"\\n\".join(f\"[{i+1}] {ov}\" for i, ov in enumerate(overviews))\n        \n        router_prompt = f\"\"\"You are an AI router deciding whether a user question should be answered via:\n• \"USE_RAG\" – search the user's private documents (described below)  \n• \"DIRECT_LLM\" – reply from general knowledge (greetings, public facts, unrelated topics)\n\nCRITICAL PRINCIPLE: When documents exist in the KB, strongly prefer USE_RAG unless the query is purely conversational or completely unrelated to any possible document content.\n\nRULES:\n1. If ANY overview clearly relates to the question (entities, numbers, addresses, dates, amounts, companies, technical terms) → USE_RAG\n2. For document operations (summarize, analyze, explain, extract, find) → USE_RAG  \n3. For greetings only (\"Hi\", \"Hello\", \"Thanks\") → DIRECT_LLM\n4. For pure math/world knowledge clearly unrelated to documents → DIRECT_LLM\n5. When in doubt → USE_RAG\n\nDOCUMENT OVERVIEWS:\n{overviews_block}\n\nDECISION EXAMPLES:\n• \"What invoice amounts are mentioned?\" → USE_RAG (document-specific)\n• \"Who is PromptX AI LLC?\" → USE_RAG (entity in documents)  \n• \"What is the DeepSeek model?\" → USE_RAG (mentioned in documents)\n• \"Summarize the research paper\" → USE_RAG (document operation)\n• \"What is 2+2?\" → DIRECT_LLM (pure math)\n• \"Hi there\" → DIRECT_LLM (greeting only)\n\nUSER QUERY: \"{query}\"\n\nRespond with exactly one word: USE_RAG or DIRECT_LLM\"\"\"\n\n        try:\n            # Use Ollama to make the routing decision\n            response = self.ollama_client.chat(\n                message=router_prompt,\n                model=\"qwen3:0.6b\",  # Fast model for routing\n                enable_thinking=False  # Fast routing\n            )\n            \n            # The response is directly the text, not a dict\n            decision = response.strip().upper()\n            \n            # Parse decision\n            if \"USE_RAG\" in decision:\n                print(f\"🎯 Overview-based routing: USE_RAG for query: '{query[:50]}...'\")\n                return True\n            elif \"DIRECT_LLM\" in decision:\n                print(f\"⚡ Overview-based routing: DIRECT_LLM for query: '{query[:50]}...'\")\n                return False\n            else:\n                print(f\"⚠️ Unclear routing decision '{decision}', defaulting to RAG\")\n                return True  # Default to RAG when uncertain\n                \n        except Exception as e:\n            print(f\"❌ LLM routing failed: {e}, falling back to pattern matching\")\n            return self._simple_pattern_routing(query, [])\n\n    def _simple_pattern_routing(self, message: str, idx_ids: List[str]) -> bool:\n        \"\"\"\n        📝 FALLBACK: Simple pattern-based routing (original logic).\n        \"\"\"\n        message_lower = message.lower()\n        \n        # Always use Direct LLM for greetings and casual conversation\n        greeting_patterns = [\n            'hello', 'hi', 'hey', 'greetings', 'good morning', 'good afternoon', 'good evening',\n            'how are you', 'how do you do', 'nice to meet', 'pleasure to meet',\n            'thanks', 'thank you', 'bye', 'goodbye', 'see you', 'talk to you later',\n            'test', 'testing', 'check', 'ping', 'just saying', 'nevermind',\n            'ok', 'okay', 'alright', 'got it', 'understood', 'i see'\n        ]\n        \n        # Check for greeting patterns\n        for pattern in greeting_patterns:\n            if pattern in message_lower:\n                return False  # Use Direct LLM for greetings\n        \n        # Keywords that strongly suggest document-related queries\n        rag_indicators = [\n            'document', 'doc', 'file', 'pdf', 'text', 'content', 'page',\n            'according to', 'based on', 'mentioned', 'states', 'says',\n            'what does', 'summarize', 'summary', 'analyze', 'analysis',\n            'quote', 'citation', 'reference', 'source', 'evidence',\n            'explain from', 'extract', 'find in', 'search for'\n        ]\n        \n        # Check for strong RAG indicators\n        for indicator in rag_indicators:\n            if indicator in message_lower:\n                return True\n        \n        # Question words + substantial length might benefit from RAG\n        question_words = ['what', 'how', 'when', 'where', 'why', 'who', 'which']\n        starts_with_question = any(message_lower.startswith(word) for word in question_words)\n        \n        if starts_with_question and len(message) > 40:\n            return True\n        \n        # Very short messages - use direct LLM\n        if len(message.strip()) < 20:\n            return False\n        \n        # Default to Direct LLM unless there's clear indication of document query\n        return False\n    \n    def _handle_direct_llm_query(self, session_id: str, message: str, session: dict):\n        \"\"\"\n        Handle query using direct Ollama client with thinking disabled for speed.\n        \n        Returns:\n            tuple: (response_text, empty_source_docs)\n        \"\"\"\n        try:\n            # Get conversation history for context\n            conversation_history = db.get_conversation_history(session_id)\n            \n            # Use the session's model or default\n            model = session.get('model', 'qwen3:8b')  # Default to fast model\n            \n            # Direct Ollama call with thinking disabled for speed\n            response_text = self.ollama_client.chat(\n                message=message,\n                model=model,\n                conversation_history=conversation_history,\n                enable_thinking=False  # ⚡ DISABLE THINKING FOR SPEED\n            )\n            \n            return response_text, []  # No source docs for direct LLM\n            \n        except Exception as e:\n            print(f\"❌ Direct LLM error: {e}\")\n            return f\"Error processing query: {str(e)}\", []\n    \n    def _handle_rag_query(self, session_id: str, message: str, data: dict, idx_ids: List[str]):\n        \"\"\"\n        Handle query using the full RAG pipeline (delegates to the advanced RAG API running on port 8001).\n\n        Returns:\n            tuple[str, List[dict]]: (response_text, source_documents)\n        \"\"\"\n        # Defaults\n        response_text = \"\"\n        source_docs: List[dict] = []\n\n        # Build payload for RAG API\n        rag_api_url = \"http://localhost:8001/chat\"\n        table_name = f\"text_pages_{idx_ids[-1]}\" if idx_ids else None\n        payload: Dict[str, Any] = {\n            \"query\": message,\n            \"session_id\": session_id,\n        }\n        if table_name:\n            payload[\"table_name\"] = table_name\n\n        # Copy optional parameters from the incoming request\n        optional_params: Dict[str, tuple[type, str]] = {\n            \"compose_sub_answers\": (bool, \"compose_sub_answers\"),\n            \"query_decompose\": (bool, \"query_decompose\"),\n            \"ai_rerank\": (bool, \"ai_rerank\"),\n            \"context_expand\": (bool, \"context_expand\"),\n            \"verify\": (bool, \"verify\"),\n            \"retrieval_k\": (int, \"retrieval_k\"),\n            \"context_window_size\": (int, \"context_window_size\"),\n            \"reranker_top_k\": (int, \"reranker_top_k\"),\n            \"search_type\": (str, \"search_type\"),\n            \"dense_weight\": (float, \"dense_weight\"),\n            \"provence_prune\": (bool, \"provence_prune\"),\n            \"provence_threshold\": (float, \"provence_threshold\"),\n        }\n        for key, (caster, payload_key) in optional_params.items():\n            val = data.get(key)\n            if val is not None:\n                try:\n                    payload[payload_key] = caster(val)  # type: ignore[arg-type]\n                except Exception:\n                    payload[payload_key] = val\n\n        try:\n            rag_response = requests.post(rag_api_url, json=payload)\n            if rag_response.status_code == 200:\n                rag_data = rag_response.json()\n                response_text = rag_data.get(\"answer\", \"No answer found.\")\n                source_docs = rag_data.get(\"source_documents\", [])\n            else:\n                response_text = f\"Error from RAG API ({rag_response.status_code}): {rag_response.text}\"\n                print(f\"❌ RAG API error: {response_text}\")\n        except requests.exceptions.ConnectionError:\n            response_text = \"Could not connect to the RAG API server. Please ensure it is running.\"\n            print(\"❌ Connection to RAG API failed (port 8001).\")\n        except Exception as e:\n            response_text = f\"Error processing RAG query: {str(e)}\"\n            print(f\"❌ RAG processing error: {e}\")\n\n        # Strip any <think>/<thinking> tags that might slip through\n        response_text = re.sub(r'<(think|thinking)>.*?</\\\\1>', '', response_text, flags=re.DOTALL | re.IGNORECASE).strip()\n\n        return response_text, source_docs\n\n    def handle_delete_session(self, session_id: str):\n        \"\"\"Delete a session and its messages\"\"\"\n        try:\n            deleted = db.delete_session(session_id)\n            if deleted:\n                self.send_json_response({'deleted': deleted})\n            else:\n                self.send_json_response({'error': 'Session not found'}, status_code=404)\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n    \n    def handle_file_upload(self, session_id: str):\n        \"\"\"Handle file uploads, save them, and associate with the session.\"\"\"\n        form = cgi.FieldStorage(\n            fp=self.rfile,\n            headers=self.headers,\n            environ={'REQUEST_METHOD': 'POST', 'CONTENT_TYPE': self.headers['Content-Type']}\n        )\n\n        uploaded_files = []\n        if 'files' in form:\n            files = form['files']\n            if not isinstance(files, list):\n                files = [files]\n            \n            upload_dir = \"shared_uploads\"\n            os.makedirs(upload_dir, exist_ok=True)\n\n            for file_item in files:\n                if file_item.filename:\n                    # Create a unique filename to avoid overwrites\n                    unique_filename = f\"{uuid.uuid4()}_{file_item.filename}\"\n                    file_path = os.path.join(upload_dir, unique_filename)\n                    \n                    with open(file_path, 'wb') as f:\n                        f.write(file_item.file.read())\n                    \n                    # Store the absolute path for the indexing service\n                    absolute_file_path = os.path.abspath(file_path)\n                    db.add_document_to_session(session_id, absolute_file_path)\n                    uploaded_files.append({\"filename\": file_item.filename, \"stored_path\": absolute_file_path})\n\n        if not uploaded_files:\n            self.send_json_response({\"error\": \"No files were uploaded\"}, status_code=400)\n            return\n            \n        self.send_json_response({\n            \"message\": f\"Successfully uploaded {len(uploaded_files)} files.\",\n            \"uploaded_files\": uploaded_files\n        })\n\n    def handle_index_documents(self, session_id: str):\n        \"\"\"Triggers indexing for all documents in a session.\"\"\"\n        print(f\"🔥 Received request to index documents for session {session_id[:8]}...\")\n        try:\n            file_paths = db.get_documents_for_session(session_id)\n            if not file_paths:\n                self.send_json_response({\"message\": \"No documents to index for this session.\"}, status_code=200)\n                return\n\n            print(f\"Found {len(file_paths)} documents to index. Sending to RAG API...\")\n            \n            rag_api_url = \"http://localhost:8001/index\"\n            rag_response = requests.post(rag_api_url, json={\"file_paths\": file_paths, \"session_id\": session_id})\n\n            if rag_response.status_code == 200:\n                print(\"✅ RAG API successfully indexed documents.\")\n                # Merge key config values into index metadata\n                idx_meta = {\n                    \"session_linked\": True,\n                    \"retrieval_mode\": \"hybrid\",\n                }\n                try:\n                    db.update_index_metadata(session_id, idx_meta)  # session_id used as index_id in text table naming\n                except Exception as e:\n                    print(f\"⚠️ Failed to update index metadata for session index: {e}\")\n                self.send_json_response(rag_response.json())\n            else:\n                error_info = rag_response.text\n                print(f\"❌ RAG API indexing failed ({rag_response.status_code}): {error_info}\")\n                self.send_json_response({\"error\": f\"Indexing failed: {error_info}\"}, status_code=500)\n\n        except Exception as e:\n            print(f\"❌ Exception during indexing: {str(e)}\")\n            self.send_json_response({\"error\": f\"An unexpected error occurred: {str(e)}\"}, status_code=500)\n            \n    def handle_pdf_upload(self, session_id: str):\n        \"\"\"\n        Processes PDF files: extracts text and stores it in the database.\n        DEPRECATED: This is the old method. Use handle_file_upload instead.\n        \"\"\"\n        # This function is now deprecated in favor of the new indexing workflow\n        # but is kept for potential legacy/compatibility reasons.\n        # For new functionality, it should not be used.\n        self.send_json_response({\n            \"warning\": \"This upload method is deprecated. Use the new file upload and indexing flow.\",\n            \"message\": \"No action taken.\"\n        }, status_code=410) # 410 Gone\n\n    def handle_get_models(self):\n        \"\"\"Get available models from both Ollama and HuggingFace, grouped by capability\"\"\"\n        try:\n            generation_models = []\n            embedding_models = []\n            \n            # Get Ollama models if available\n            if self.ollama_client.is_ollama_running():\n                all_ollama_models = self.ollama_client.list_models()\n                \n                # Very naive classification - same logic as RAG API server\n                ollama_embedding_models = [m for m in all_ollama_models if any(k in m for k in ['embed','bge','embedding','text'])]\n                ollama_generation_models = [m for m in all_ollama_models if m not in ollama_embedding_models]\n                \n                generation_models.extend(ollama_generation_models)\n                embedding_models.extend(ollama_embedding_models)\n            \n            # Add supported HuggingFace embedding models\n            huggingface_embedding_models = [\n                \"Qwen/Qwen3-Embedding-0.6B\",\n                \"Qwen/Qwen3-Embedding-4B\", \n                \"Qwen/Qwen3-Embedding-8B\"\n            ]\n            embedding_models.extend(huggingface_embedding_models)\n            \n            # Sort models for consistent ordering\n            generation_models.sort()\n            embedding_models.sort()\n            \n            self.send_json_response({\n                \"generation_models\": generation_models,\n                \"embedding_models\": embedding_models\n            })\n        except Exception as e:\n            self.send_json_response({\n                \"error\": f\"Could not list models: {str(e)}\"\n            }, status_code=500)\n\n    def handle_get_indexes(self):\n        try:\n            data = db.list_indexes()\n            self.send_json_response({'indexes': data, 'total': len(data)})\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n    \n    def handle_get_index(self, index_id: str):\n        try:\n            data = db.get_index(index_id)\n            if not data:\n                self.send_json_response({'error': 'Index not found'}, status_code=404)\n                return\n            self.send_json_response(data)\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n    \n    def handle_create_index(self):\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            name = data.get('name')\n            description = data.get('description')\n            metadata = data.get('metadata', {})\n            \n            if not name:\n                self.send_json_response({'error': 'Name required'}, status_code=400)\n                return\n            \n            # Add complete metadata from RAG system configuration if available\n            if RAG_SYSTEM_AVAILABLE and PIPELINE_CONFIGS.get('default'):\n                default_config = PIPELINE_CONFIGS['default']\n                complete_metadata = {\n                    'status': 'created',\n                    'metadata_source': 'rag_system_config',\n                    'created_at': json.loads(json.dumps(datetime.now().isoformat())),\n                    'chunk_size': 512,  # From default config\n                    'chunk_overlap': 64,  # From default config\n                    'retrieval_mode': 'hybrid',  # From default config\n                    'window_size': 5,  # From default config\n                    'embedding_model': 'Qwen/Qwen3-Embedding-0.6B',  # From default config\n                    'enrich_model': 'qwen3:0.6b',  # From default config\n                    'overview_model': 'qwen3:0.6b',  # From default config\n                    'enable_enrich': True,  # From default config\n                    'latechunk': True,  # From default config\n                    'docling_chunk': True,  # From default config\n                    'note': 'Default configuration from RAG system'\n                }\n                # Merge with any provided metadata\n                complete_metadata.update(metadata)\n                metadata = complete_metadata\n            \n            idx_id = db.create_index(name, description, metadata)\n            self.send_json_response({'index_id': idx_id}, status_code=201)\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n    \n    def handle_index_file_upload(self, index_id: str):\n        \"\"\"Reuse file upload logic but store docs under index.\"\"\"\n        form = cgi.FieldStorage(fp=self.rfile, headers=self.headers, environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE': self.headers['Content-Type']})\n        uploaded_files=[]\n        if 'files' in form:\n            files=form['files']\n            if not isinstance(files, list):\n                files=[files]\n            upload_dir='shared_uploads'\n            os.makedirs(upload_dir, exist_ok=True)\n            for f in files:\n                if f.filename:\n                    unique=f\"{uuid.uuid4()}_{f.filename}\"\n                    path=os.path.join(upload_dir, unique)\n                    with open(path,'wb') as out: out.write(f.file.read())\n                    db.add_document_to_index(index_id, f.filename, os.path.abspath(path))\n                    uploaded_files.append({'filename':f.filename,'stored_path':os.path.abspath(path)})\n        if not uploaded_files:\n            self.send_json_response({'error':'No files uploaded'}, status_code=400); return\n        self.send_json_response({'message':f\"Uploaded {len(uploaded_files)} files\",\"uploaded_files\":uploaded_files})\n    \n    def handle_build_index(self, index_id: str):\n        try:\n            index=db.get_index(index_id)\n            if not index:\n                self.send_json_response({'error':'Index not found'}, status_code=404); return\n            file_paths=[d['stored_path'] for d in index.get('documents',[])]\n            if not file_paths:\n                self.send_json_response({'error':'No documents to index'}, status_code=400); return\n\n            # Parse request body for optional flags and configuration\n            latechunk = False\n            docling_chunk = False\n            chunk_size = 512\n            chunk_overlap = 64\n            retrieval_mode = 'hybrid'\n            window_size = 2\n            enable_enrich = True\n            embedding_model = None\n            enrich_model = None\n            batch_size_embed = 50\n            batch_size_enrich = 25\n            overview_model = None\n            \n            if 'Content-Length' in self.headers and int(self.headers['Content-Length']) > 0:\n                try:\n                    length = int(self.headers['Content-Length'])\n                    body = self.rfile.read(length)\n                    opts = json.loads(body.decode('utf-8'))\n                    latechunk = bool(opts.get('latechunk', False))\n                    docling_chunk = bool(opts.get('doclingChunk', False))\n                    chunk_size = int(opts.get('chunkSize', 512))\n                    chunk_overlap = int(opts.get('chunkOverlap', 64))\n                    retrieval_mode = str(opts.get('retrievalMode', 'hybrid'))\n                    window_size = int(opts.get('windowSize', 2))\n                    enable_enrich = bool(opts.get('enableEnrich', True))\n                    embedding_model = opts.get('embeddingModel')\n                    enrich_model = opts.get('enrichModel')\n                    batch_size_embed = int(opts.get('batchSizeEmbed', 50))\n                    batch_size_enrich = int(opts.get('batchSizeEnrich', 25))\n                    overview_model = opts.get('overviewModel')\n                except Exception:\n                    # Keep defaults on parse error\n                    pass\n\n            # Set per-index overview file path\n            overview_path = f\"index_store/overviews/{index_id}.jsonl\"\n\n            # Ensure config_override includes overview_path\n            def ensure_overview_path(cfg: dict):\n                cfg[\"overview_path\"] = overview_path\n            \n            # we'll inject later when we build config_override\n\n            # Delegate to advanced RAG API same as session indexing\n            rag_api_url = \"http://localhost:8001/index\"\n            import requests, json as _json\n            # Use the index's dedicated LanceDB table so retrieval matches\n            table_name = index.get(\"vector_table_name\")\n            payload = {\n                \"file_paths\": file_paths,\n                \"session_id\": index_id,  # reuse index_id for progress tracking\n                \"table_name\": table_name,\n                \"chunk_size\": chunk_size,\n                \"chunk_overlap\": chunk_overlap,\n                \"retrieval_mode\": retrieval_mode,\n                \"window_size\": window_size,\n                \"enable_enrich\": enable_enrich,\n                \"batch_size_embed\": batch_size_embed,\n                \"batch_size_enrich\": batch_size_enrich\n            }\n            if latechunk:\n                payload[\"enable_latechunk\"] = True\n            if docling_chunk:\n                payload[\"enable_docling_chunk\"] = True\n            if embedding_model:\n                payload[\"embedding_model\"] = embedding_model\n            if enrich_model:\n                payload[\"enrich_model\"] = enrich_model\n            if overview_model:\n                payload[\"overview_model_name\"] = overview_model\n                \n            rag_resp = requests.post(rag_api_url, json=payload)\n            if rag_resp.status_code==200:\n                meta_updates = {\n                    \"chunk_size\": chunk_size,\n                    \"chunk_overlap\": chunk_overlap,\n                    \"retrieval_mode\": retrieval_mode,\n                    \"window_size\": window_size,\n                    \"enable_enrich\": enable_enrich,\n                    \"latechunk\": latechunk,\n                    \"docling_chunk\": docling_chunk,\n                }\n                if embedding_model:\n                    meta_updates[\"embedding_model\"] = embedding_model\n                if enrich_model:\n                    meta_updates[\"enrich_model\"] = enrich_model\n                if overview_model:\n                    meta_updates[\"overview_model\"] = overview_model\n                try:\n                    db.update_index_metadata(index_id, meta_updates)\n                except Exception as e:\n                    print(f\"⚠️ Failed to update index metadata: {e}\")\n\n                self.send_json_response({\n                    \"response\": rag_resp.json(),\n                    **meta_updates\n                })\n            else:\n                # Gracefully handle scenario where table already exists (idempotent build)\n                try:\n                    err_json = rag_resp.json()\n                except Exception:\n                    err_json = {}\n                err_text = err_json.get('error') if isinstance(err_json, dict) else rag_resp.text\n                if err_text and 'already exists' in err_text:\n                    # Treat as non-fatal; return message indicating index previously built\n                    self.send_json_response({\n                        \"message\": \"Index already built – skipping rebuild.\",\n                        \"note\": err_text\n                })\n                else:\n                    self.send_json_response({\"error\":f\"RAG indexing failed: {rag_resp.text}\"}, status_code=500)\n        except Exception as e:\n            self.send_json_response({'error':str(e)}, status_code=500)\n    \n    def handle_link_index_to_session(self, session_id: str, index_id: str):\n        try:\n            db.link_index_to_session(session_id, index_id)\n            self.send_json_response({'message':'Index linked to session'})\n        except Exception as e:\n            self.send_json_response({'error':str(e)}, status_code=500)\n\n    def handle_get_session_indexes(self, session_id: str):\n        try:\n            idx_ids = db.get_indexes_for_session(session_id)\n            indexes = []\n            for idx_id in idx_ids:\n                idx = db.get_index(idx_id)\n                if idx:\n                    # Try to populate metadata for older indexes that have empty metadata\n                    if not idx.get('metadata') or len(idx['metadata']) == 0:\n                        print(f\"🔍 Attempting to infer metadata for index {idx_id[:8]}...\")\n                        inferred_metadata = db.inspect_and_populate_index_metadata(idx_id)\n                        if inferred_metadata:\n                            # Refresh the index data with the new metadata\n                            idx = db.get_index(idx_id)\n                    indexes.append(idx)\n            self.send_json_response({'indexes': indexes, 'total': len(indexes)})\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n\n    def handle_delete_index(self, index_id: str):\n        \"\"\"Remove an index, its documents, links, and the underlying LanceDB table.\"\"\"\n        try:\n            deleted = db.delete_index(index_id)\n            if deleted:\n                self.send_json_response({'message': 'Index deleted successfully', 'index_id': index_id})\n            else:\n                self.send_json_response({'error': 'Index not found'}, status_code=404)\n        except Exception as e:\n            self.send_json_response({'error': str(e)}, status_code=500)\n\n    def handle_rename_session(self, session_id: str):\n        \"\"\"Rename an existing session title\"\"\"\n        try:\n            session = db.get_session(session_id)\n            if not session:\n                self.send_json_response({\"error\": \"Session not found\"}, status_code=404)\n                return\n\n            content_length = int(self.headers.get('Content-Length', 0))\n            if content_length == 0:\n                self.send_json_response({\"error\": \"Request body required\"}, status_code=400)\n                return\n\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            new_title: str = data.get('title', '').strip()\n\n            if not new_title:\n                self.send_json_response({\"error\": \"Title cannot be empty\"}, status_code=400)\n                return\n\n            db.update_session_title(session_id, new_title)\n            updated_session = db.get_session(session_id)\n\n            self.send_json_response({\n                \"message\": \"Session renamed successfully\",\n                \"session\": updated_session\n            })\n\n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Failed to rename session: {str(e)}\"}, status_code=500)\n\n    def send_json_response(self, data, status_code: int = 200):\n        \"\"\"Send a JSON (UTF-8) response with CORS headers. Safe against client disconnects.\"\"\"\n        try:\n            self.send_response(status_code)\n            self.send_header('Content-Type', 'application/json')\n            self.send_header('Access-Control-Allow-Origin', '*')\n            self.send_header('Access-Control-Allow-Methods', 'GET, POST, PUT, DELETE, OPTIONS')\n            self.send_header('Access-Control-Allow-Headers', 'Content-Type, Authorization')\n            self.send_header('Access-Control-Allow-Credentials', 'true')\n            self.end_headers()\n        \n            response_bytes = json.dumps(data, indent=2).encode('utf-8')\n            self.wfile.write(response_bytes)\n        except BrokenPipeError:\n            # Client disconnected before we could finish sending\n            print(\"⚠️  Client disconnected during response – ignoring.\")\n        except Exception as e:\n            print(f\"❌ Error sending response: {e}\")\n    \n    def log_message(self, format, *args):\n        \"\"\"Custom log format\"\"\"\n        print(f\"[{self.date_time_string()}] {format % args}\")\n\ndef main():\n    \"\"\"Main function to initialize and start the server\"\"\"\n    PORT = 8000  # 🆕 Define port\n    try:\n        # Initialize the database\n        print(\"✅ Database initialized successfully\")\n\n        # Initialize the PDF processor\n        try:\n            pdf_module.initialize_simple_pdf_processor()\n            print(\"📄 Initializing simple PDF processing...\")\n            if pdf_module.simple_pdf_processor:\n                print(\"✅ Simple PDF processor initialized\")\n            else:\n                print(\"⚠️ PDF processing could not be initialized.\")\n        except Exception as e:\n            print(f\"❌ Error initializing PDF processor: {e}\")\n            print(\"⚠️ PDF processing disabled - server will run without RAG functionality\")\n\n        # Set a global reference to the initialized processor if needed elsewhere\n        global pdf_processor\n        pdf_processor = pdf_module.simple_pdf_processor\n        if pdf_processor:\n            print(\"✅ Global PDF processor initialized\")\n        else:\n            print(\"⚠️ PDF processing disabled - server will run without RAG functionality\")\n        \n        # Cleanup empty sessions on startup\n        print(\"🧹 Cleaning up empty sessions...\")\n        cleanup_count = db.cleanup_empty_sessions()\n        if cleanup_count > 0:\n            print(f\"✨ Cleaned up {cleanup_count} empty sessions\")\n        else:\n            print(\"✨ No empty sessions to clean up\")\n\n        # Start the server\n        with ReusableTCPServer((\"\", PORT), ChatHandler) as httpd:\n            print(f\"🚀 Starting localGPT backend server on port {PORT}\")\n            print(f\"📍 Chat endpoint: http://localhost:{PORT}/chat\")\n            print(f\"🔍 Health check: http://localhost:{PORT}/health\")\n            \n            # Test Ollama connection\n            client = OllamaClient()\n            if client.is_ollama_running():\n                models = client.list_models()\n                print(f\"✅ Ollama is running with {len(models)} models\")\n                print(f\"📋 Available models: {', '.join(models[:3])}{'...' if len(models) > 3 else ''}\")\n            else:\n                print(\"⚠️  Ollama is not running. Please start Ollama:\")\n                print(\"   Install: https://ollama.ai\")\n                print(\"   Run: ollama serve\")\n            \n            print(f\"\\n🌐 Frontend should connect to: http://localhost:{PORT}\")\n            print(\"💬 Ready to chat!\\n\")\n            \n            httpd.serve_forever()\n    except KeyboardInterrupt:\n        print(\"\\n🛑 Server stopped\")\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "backend/simple_pdf_processor.py",
    "content": "\"\"\"\nSimple PDF Processing Service\nHandles PDF upload and text extraction for RAG functionality\n\"\"\"\n\nimport uuid\nfrom typing import List, Dict, Any\nimport PyPDF2\nfrom io import BytesIO\nimport sqlite3\nfrom datetime import datetime\n\nclass SimplePDFProcessor:\n    def __init__(self, db_path: str = \"chat_data.db\"):\n        \"\"\"Initialize simple PDF processor with SQLite storage\"\"\"\n        self.db_path = db_path\n        self.init_database()\n        print(\"✅ Simple PDF processor initialized\")\n    \n    def init_database(self):\n        \"\"\"Initialize SQLite database for storing PDF content\"\"\"\n        conn = sqlite3.connect(self.db_path)\n        conn.execute('''\n            CREATE TABLE IF NOT EXISTS pdf_documents (\n                id TEXT PRIMARY KEY,\n                session_id TEXT NOT NULL,\n                filename TEXT NOT NULL,\n                content TEXT NOT NULL,\n                created_at TEXT NOT NULL\n            )\n        ''')\n        \n        conn.commit()\n        conn.close()\n    \n    def extract_text_from_pdf(self, pdf_bytes: bytes) -> str:\n        \"\"\"Extract text from PDF bytes\"\"\"\n        try:\n            print(f\"📄 Starting PDF text extraction ({len(pdf_bytes)} bytes)\")\n            pdf_file = BytesIO(pdf_bytes)\n            pdf_reader = PyPDF2.PdfReader(pdf_file)\n            \n            print(f\"📖 PDF has {len(pdf_reader.pages)} pages\")\n            \n            text = \"\"\n            for page_num, page in enumerate(pdf_reader.pages):\n                print(f\"📄 Processing page {page_num + 1}\")\n                try:\n                    page_text = page.extract_text()\n                    if page_text.strip():\n                        text += f\"\\n--- Page {page_num + 1} ---\\n\"\n                        text += page_text + \"\\n\"\n                    print(f\"✅ Page {page_num + 1}: extracted {len(page_text)} characters\")\n                except Exception as page_error:\n                    print(f\"❌ Error on page {page_num + 1}: {str(page_error)}\")\n                    continue\n            \n            print(f\"📄 Total extracted text: {len(text)} characters\")\n            return text.strip()\n            \n        except Exception as e:\n            print(f\"❌ Error extracting text from PDF: {str(e)}\")\n            print(f\"❌ Error type: {type(e).__name__}\")\n            return \"\"\n    \n    def process_pdf(self, pdf_bytes: bytes, filename: str, session_id: str) -> Dict[str, Any]:\n        \"\"\"Process a PDF file and store in database\"\"\"\n        print(f\"📄 Processing PDF: {filename}\")\n        \n        # Extract text\n        text = self.extract_text_from_pdf(pdf_bytes)\n        if not text:\n            return {\n                \"success\": False,\n                \"error\": \"Could not extract text from PDF\",\n                \"filename\": filename\n            }\n        \n        print(f\"📝 Extracted {len(text)} characters from {filename}\")\n        \n        # Store in database\n        document_id = str(uuid.uuid4())\n        now = datetime.now().isoformat()\n        \n        try:\n            conn = sqlite3.connect(self.db_path)\n            \n            # Store document\n            conn.execute('''\n                INSERT INTO pdf_documents (id, session_id, filename, content, created_at)\n                VALUES (?, ?, ?, ?, ?)\n            ''', (document_id, session_id, filename, text, now))\n            \n            conn.commit()\n            conn.close()\n            \n            print(f\"💾 Stored document {filename} in database\")\n            \n            return {\n                \"success\": True,\n                \"filename\": filename,\n                \"file_id\": document_id,\n                \"text_length\": len(text)\n            }\n            \n        except Exception as e:\n            print(f\"❌ Error storing in database: {str(e)}\")\n            return {\n                \"success\": False,\n                \"error\": f\"Database storage failed: {str(e)}\",\n                \"filename\": filename\n            }\n    \n    def get_session_documents(self, session_id: str) -> List[Dict[str, Any]]:\n        \"\"\"Get all documents for a session\"\"\"\n        try:\n            conn = sqlite3.connect(self.db_path)\n            conn.row_factory = sqlite3.Row\n            \n            cursor = conn.execute('''\n                SELECT id, filename, created_at\n                FROM pdf_documents\n                WHERE session_id = ?\n                ORDER BY created_at DESC\n            ''', (session_id,))\n            \n            documents = [dict(row) for row in cursor.fetchall()]\n            conn.close()\n            \n            return documents\n            \n        except Exception as e:\n            print(f\"❌ Error getting session documents: {str(e)}\")\n            return []\n    \n    def get_document_content(self, session_id: str) -> str:\n        \"\"\"Get all document content for a session (for LLM context)\"\"\"\n        try:\n            conn = sqlite3.connect(self.db_path)\n            \n            cursor = conn.execute('''\n                SELECT filename, content\n                FROM pdf_documents\n                WHERE session_id = ?\n                ORDER BY created_at ASC\n            ''', (session_id,))\n            \n            rows = cursor.fetchall()\n            conn.close()\n            \n            if not rows:\n                return \"\"\n            \n            # Combine all document content\n            combined_content = \"\"\n            for filename, content in rows:\n                combined_content += f\"\\n\\n=== Document: {filename} ===\\n\\n\"\n                combined_content += content\n            \n            return combined_content.strip()\n            \n        except Exception as e:\n            print(f\"❌ Error getting document content: {str(e)}\")\n            return \"\"\n    \n    def delete_session_documents(self, session_id: str) -> bool:\n        \"\"\"Delete all documents for a session\"\"\"\n        try:\n            conn = sqlite3.connect(self.db_path)\n            cursor = conn.execute('''\n                DELETE FROM pdf_documents\n                WHERE session_id = ?\n            ''', (session_id,))\n            \n            deleted_count = cursor.rowcount\n            conn.commit()\n            conn.close()\n            \n            if deleted_count > 0:\n                print(f\"🗑️ Deleted {deleted_count} documents for session {session_id[:8]}...\")\n            \n            return deleted_count > 0\n            \n        except Exception as e:\n            print(f\"❌ Error deleting session documents: {str(e)}\")\n            return False\n\n\n# Global instance\nsimple_pdf_processor = None\n\ndef initialize_simple_pdf_processor():\n    \"\"\"Initialize the global PDF processor\"\"\"\n    global simple_pdf_processor\n    try:\n        simple_pdf_processor = SimplePDFProcessor()\n        print(\"✅ Global PDF processor initialized\")\n    except Exception as e:\n        print(f\"❌ Failed to initialize PDF processor: {str(e)}\")\n        simple_pdf_processor = None\n\ndef get_simple_pdf_processor():\n    \"\"\"Get the global PDF processor instance\"\"\"\n    global simple_pdf_processor\n    if simple_pdf_processor is None:\n        initialize_simple_pdf_processor()\n    return simple_pdf_processor\n\nif __name__ == \"__main__\":\n    # Test the simple PDF processor\n    print(\"🧪 Testing simple PDF processor...\")\n    \n    processor = SimplePDFProcessor()\n    print(\"✅ Simple PDF processor test completed!\") "
  },
  {
    "path": "backend/test_backend.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nSimple test script for the localGPT backend\n\"\"\"\n\nimport requests\n\ndef test_health_endpoint():\n    \"\"\"Test the health endpoint\"\"\"\n    print(\"🔍 Testing health endpoint...\")\n    try:\n        response = requests.get(\"http://localhost:8000/health\", timeout=5)\n        if response.status_code == 200:\n            data = response.json()\n            print(f\"✅ Health check passed\")\n            print(f\"   Ollama running: {data['ollama_running']}\")\n            print(f\"   Models available: {len(data['available_models'])}\")\n            return True\n        else:\n            print(f\"❌ Health check failed: {response.status_code}\")\n            return False\n    except requests.exceptions.RequestException as e:\n        print(f\"❌ Health check failed: {e}\")\n        return False\n\ndef test_chat_endpoint():\n    \"\"\"Test the chat endpoint\"\"\"\n    print(\"\\n💬 Testing chat endpoint...\")\n    \n    test_message = {\n        \"message\": \"Say 'Hello World' and nothing else.\",\n        \"model\": \"llama3.2:latest\"\n    }\n    \n    try:\n        response = requests.post(\n            \"http://localhost:8000/chat\",\n            headers={\"Content-Type\": \"application/json\"},\n            json=test_message,\n            timeout=30\n        )\n        \n        if response.status_code == 200:\n            data = response.json()\n            print(f\"✅ Chat test passed\")\n            print(f\"   Model: {data['model']}\")\n            print(f\"   Response: {data['response']}\")\n            print(f\"   Message count: {data['message_count']}\")\n            return True\n        else:\n            print(f\"❌ Chat test failed: {response.status_code}\")\n            print(f\"   Response: {response.text}\")\n            return False\n            \n    except requests.exceptions.RequestException as e:\n        print(f\"❌ Chat test failed: {e}\")\n        return False\n\ndef test_conversation_history():\n    \"\"\"Test conversation with history\"\"\"\n    print(\"\\n🗨️  Testing conversation history...\")\n    \n    # First message\n    conversation = []\n    \n    message1 = {\n        \"message\": \"My name is Alice. Remember this.\",\n        \"model\": \"llama3.2:latest\",\n        \"conversation_history\": conversation\n    }\n    \n    try:\n        response1 = requests.post(\n            \"http://localhost:8000/chat\",\n            headers={\"Content-Type\": \"application/json\"},\n            json=message1,\n            timeout=30\n        )\n        \n        if response1.status_code == 200:\n            data1 = response1.json()\n            \n            # Add to conversation history\n            conversation.append({\"role\": \"user\", \"content\": \"My name is Alice. Remember this.\"})\n            conversation.append({\"role\": \"assistant\", \"content\": data1[\"response\"]})\n            \n            # Second message asking about the name\n            message2 = {\n                \"message\": \"What is my name?\",\n                \"model\": \"llama3.2:latest\", \n                \"conversation_history\": conversation\n            }\n            \n            response2 = requests.post(\n                \"http://localhost:8000/chat\",\n                headers={\"Content-Type\": \"application/json\"},\n                json=message2,\n                timeout=30\n            )\n            \n            if response2.status_code == 200:\n                data2 = response2.json()\n                print(f\"✅ Conversation history test passed\")\n                print(f\"   First response: {data1['response']}\")\n                print(f\"   Second response: {data2['response']}\")\n                \n                # Check if the AI remembered the name\n                if \"alice\" in data2['response'].lower():\n                    print(f\"✅ AI correctly remembered the name!\")\n                else:\n                    print(f\"⚠️  AI might not have remembered the name\")\n                return True\n            else:\n                print(f\"❌ Second message failed: {response2.status_code}\")\n                return False\n        else:\n            print(f\"❌ First message failed: {response1.status_code}\")\n            return False\n            \n    except requests.exceptions.RequestException as e:\n        print(f\"❌ Conversation test failed: {e}\")\n        return False\n\ndef main():\n    print(\"🧪 Testing localGPT Backend\")\n    print(\"=\" * 40)\n    \n    # Test health endpoint\n    health_ok = test_health_endpoint()\n    if not health_ok:\n        print(\"\\n❌ Backend server is not running or not healthy\")\n        print(\"   Make sure to run: python server.py\")\n        return\n    \n    # Test basic chat\n    chat_ok = test_chat_endpoint()\n    if not chat_ok:\n        print(\"\\n❌ Chat functionality is not working\")\n        return\n    \n    # Test conversation history\n    conversation_ok = test_conversation_history()\n    \n    print(\"\\n\" + \"=\" * 40)\n    if health_ok and chat_ok and conversation_ok:\n        print(\"🎉 All tests passed! Backend is ready for frontend integration.\")\n    else:\n        print(\"⚠️  Some tests failed. Check the issues above.\")\n    \n    print(\"\\n🔗 Ready to connect to frontend at http://localhost:3000\")\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "backend/test_ollama_connectivity.py",
    "content": "#!/usr/bin/env python3\n\nimport os\nimport sys\n\ndef test_ollama_connectivity():\n    \"\"\"Test Ollama connectivity from within Docker container\"\"\"\n    print(\"🧪 Testing Ollama Connectivity\")\n    print(\"=\" * 40)\n    \n    ollama_host = os.getenv('OLLAMA_HOST', 'Not set')\n    print(f\"OLLAMA_HOST environment variable: {ollama_host}\")\n    \n    try:\n        from ollama_client import OllamaClient\n        client = OllamaClient()\n        print(f\"OllamaClient base_url: {client.base_url}\")\n        \n        is_running = client.is_ollama_running()\n        print(f\"Ollama running: {is_running}\")\n        \n        if is_running:\n            models = client.list_models()\n            print(f\"Available models: {models}\")\n            print(\"✅ Ollama connectivity test passed!\")\n            return True\n        else:\n            print(\"❌ Ollama connectivity test failed!\")\n            return False\n            \n    except Exception as e:\n        print(f\"❌ Error testing Ollama connectivity: {e}\")\n        return False\n\nif __name__ == \"__main__\":\n    success = test_ollama_connectivity()\n    sys.exit(0 if success else 1)\n"
  },
  {
    "path": "batch_indexing_config.json",
    "content": "{\n  \"index_name\": \"Sample Batch Index\",\n  \"index_description\": \"Example batch index configuration\",\n  \"documents\": [\n    \"./rag_system/documents/invoice_1039.pdf\",\n    \"./rag_system/documents/invoice_1041.pdf\"\n  ],\n  \"processing\": {\n    \"chunk_size\": 512,\n    \"chunk_overlap\": 64,\n    \"enable_enrich\": true,\n    \"enable_latechunk\": true,\n    \"enable_docling\": true,\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",\n    \"generation_model\": \"qwen3:0.6b\",\n    \"retrieval_mode\": \"hybrid\",\n    \"window_size\": 2\n  }\n}"
  },
  {
    "path": "create_index_script.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nInteractive Index Creation Script for LocalGPT RAG System\n\nThis script provides a user-friendly interface for creating document indexes\nusing the LocalGPT RAG system. It supports both single documents and batch\nprocessing of multiple documents.\n\nUsage:\n    python create_index_script.py\n    python create_index_script.py --batch\n    python create_index_script.py --config custom_config.json\n\"\"\"\n\nimport os\nimport sys\nimport json\nimport argparse\nfrom typing import List, Optional\nfrom pathlib import Path\n\n# Add the project root to the path so we can import rag_system modules\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\ntry:\n    from rag_system.main import PIPELINE_CONFIGS, get_agent\n    from rag_system.pipelines.indexing_pipeline import IndexingPipeline\n    from rag_system.utils.ollama_client import OllamaClient\n    from backend.database import ChatDatabase\nexcept ImportError as e:\n    print(f\"❌ Error importing required modules: {e}\")\n    print(\"Please ensure you're running this script from the project root directory.\")\n    sys.exit(1)\n\n\nclass IndexCreator:\n    \"\"\"Interactive index creation utility.\"\"\"\n    \n    def __init__(self, config_path: Optional[str] = None):\n        \"\"\"Initialize the index creator with optional custom configuration.\"\"\"\n        self.db = ChatDatabase()\n        self.config = self._load_config(config_path)\n        \n        # Initialize Ollama client\n        self.ollama_client = OllamaClient()\n        self.ollama_config = {\n            \"generation_model\": \"qwen3:0.6b\",\n            \"embedding_model\": \"qwen3:0.6b\"\n        }\n        \n        # Initialize indexing pipeline\n        self.pipeline = IndexingPipeline(\n            self.config, \n            self.ollama_client, \n            self.ollama_config\n        )\n    \n    def _load_config(self, config_path: Optional[str] = None) -> dict:\n        \"\"\"Load configuration from file or use default.\"\"\"\n        if config_path and os.path.exists(config_path):\n            try:\n                with open(config_path, 'r') as f:\n                    return json.load(f)\n            except Exception as e:\n                print(f\"⚠️  Error loading config from {config_path}: {e}\")\n                print(\"Using default configuration...\")\n        \n        return PIPELINE_CONFIGS.get(\"default\", {})\n    \n    def get_user_input(self, prompt: str, default: str = \"\") -> str:\n        \"\"\"Get user input with optional default value.\"\"\"\n        if default:\n            user_input = input(f\"{prompt} [{default}]: \").strip()\n            return user_input if user_input else default\n        return input(f\"{prompt}: \").strip()\n    \n    def select_documents(self) -> List[str]:\n        \"\"\"Interactive document selection.\"\"\"\n        print(\"\\n📁 Document Selection\")\n        print(\"=\" * 50)\n        \n        documents = []\n        \n        while True:\n            print(\"\\nOptions:\")\n            print(\"1. Add a single document\")\n            print(\"2. Add all documents from a directory\")\n            print(\"3. Finish and proceed with selected documents\")\n            print(\"4. Show selected documents\")\n            \n            choice = self.get_user_input(\"Select an option (1-4)\", \"1\")\n            \n            if choice == \"1\":\n                doc_path = self.get_user_input(\"Enter document path\")\n                if os.path.exists(doc_path):\n                    documents.append(os.path.abspath(doc_path))\n                    print(f\"✅ Added: {doc_path}\")\n                else:\n                    print(f\"❌ File not found: {doc_path}\")\n            \n            elif choice == \"2\":\n                dir_path = self.get_user_input(\"Enter directory path\")\n                if os.path.isdir(dir_path):\n                    supported_extensions = ['.pdf', '.txt', '.docx', '.md', '.html', '.htm']\n                    found_docs = []\n                    \n                    for ext in supported_extensions:\n                        found_docs.extend(Path(dir_path).glob(f\"*{ext}\"))\n                        found_docs.extend(Path(dir_path).glob(f\"**/*{ext}\"))\n                    \n                    if found_docs:\n                        print(f\"Found {len(found_docs)} documents:\")\n                        for doc in found_docs:\n                            print(f\"  - {doc}\")\n                        \n                        if self.get_user_input(\"Add all these documents? (y/n)\", \"y\").lower() == 'y':\n                            documents.extend([str(doc.absolute()) for doc in found_docs])\n                            print(f\"✅ Added {len(found_docs)} documents\")\n                    else:\n                        print(\"❌ No supported documents found in directory\")\n                else:\n                    print(f\"❌ Directory not found: {dir_path}\")\n            \n            elif choice == \"3\":\n                if documents:\n                    break\n                else:\n                    print(\"❌ No documents selected. Please add at least one document.\")\n            \n            elif choice == \"4\":\n                if documents:\n                    print(f\"\\n📄 Selected documents ({len(documents)}):\")\n                    for i, doc in enumerate(documents, 1):\n                        print(f\"  {i}. {doc}\")\n                else:\n                    print(\"No documents selected yet.\")\n            \n            else:\n                print(\"Invalid choice. Please select 1-4.\")\n        \n        return documents\n    \n    def configure_processing(self) -> dict:\n        \"\"\"Interactive processing configuration.\"\"\"\n        print(\"\\n⚙️  Processing Configuration\")\n        print(\"=\" * 50)\n        \n        print(\"Configure how documents will be processed:\")\n        \n        # Basic settings\n        chunk_size = int(self.get_user_input(\"Chunk size\", \"512\"))\n        chunk_overlap = int(self.get_user_input(\"Chunk overlap\", \"64\"))\n        \n        # Advanced settings\n        print(\"\\nAdvanced options:\")\n        enable_enrich = self.get_user_input(\"Enable contextual enrichment? (y/n)\", \"y\").lower() == 'y'\n        enable_latechunk = self.get_user_input(\"Enable late chunking? (y/n)\", \"y\").lower() == 'y'\n        enable_docling = self.get_user_input(\"Enable Docling chunking? (y/n)\", \"y\").lower() == 'y'\n        \n        # Model selection\n        print(\"\\nModel Configuration:\")\n        embedding_model = self.get_user_input(\"Embedding model\", \"Qwen/Qwen3-Embedding-0.6B\")\n        generation_model = self.get_user_input(\"Generation model\", \"qwen3:0.6b\")\n        \n        return {\n            \"chunk_size\": chunk_size,\n            \"chunk_overlap\": chunk_overlap,\n            \"enable_enrich\": enable_enrich,\n            \"enable_latechunk\": enable_latechunk,\n            \"enable_docling\": enable_docling,\n            \"embedding_model\": embedding_model,\n            \"generation_model\": generation_model,\n            \"retrieval_mode\": \"hybrid\",\n            \"window_size\": 2\n        }\n    \n    def create_index_interactive(self) -> None:\n        \"\"\"Run the interactive index creation process.\"\"\"\n        print(\"🚀 LocalGPT Index Creation Tool\")\n        print(\"=\" * 50)\n        \n        # Get index details\n        index_name = self.get_user_input(\"Enter index name\")\n        index_description = self.get_user_input(\"Enter index description (optional)\")\n        \n        # Select documents\n        documents = self.select_documents()\n        \n        # Configure processing\n        processing_config = self.configure_processing()\n        \n        # Confirm creation\n        print(\"\\n📋 Index Summary\")\n        print(\"=\" * 50)\n        print(f\"Name: {index_name}\")\n        print(f\"Description: {index_description or 'None'}\")\n        print(f\"Documents: {len(documents)}\")\n        print(f\"Chunk size: {processing_config['chunk_size']}\")\n        print(f\"Enrichment: {'Enabled' if processing_config['enable_enrich'] else 'Disabled'}\")\n        print(f\"Embedding model: {processing_config['embedding_model']}\")\n        \n        if self.get_user_input(\"\\nProceed with index creation? (y/n)\", \"y\").lower() != 'y':\n            print(\"❌ Index creation cancelled.\")\n            return\n        \n        # Create the index\n        try:\n            print(\"\\n🔥 Creating index...\")\n            \n            # Create index record in database\n            index_id = self.db.create_index(\n                name=index_name,\n                description=index_description,\n                metadata=processing_config\n            )\n            \n            # Add documents to index\n            for doc_path in documents:\n                filename = os.path.basename(doc_path)\n                self.db.add_document_to_index(index_id, filename, doc_path)\n            \n            # Process documents through pipeline\n            print(\"📚 Processing documents...\")\n            self.pipeline.process_documents(documents)\n            \n            print(f\"\\n✅ Index '{index_name}' created successfully!\")\n            print(f\"Index ID: {index_id}\")\n            print(f\"Processed {len(documents)} documents\")\n            \n            # Test the index\n            if self.get_user_input(\"\\nTest the index with a sample query? (y/n)\", \"y\").lower() == 'y':\n                self.test_index(index_id)\n                \n        except Exception as e:\n            print(f\"❌ Error creating index: {e}\")\n            import traceback\n            traceback.print_exc()\n    \n    def test_index(self, index_id: str) -> None:\n        \"\"\"Test the created index with a sample query.\"\"\"\n        try:\n            print(\"\\n🧪 Testing Index\")\n            print(\"=\" * 50)\n            \n            # Get agent for testing\n            agent = get_agent(\"default\")\n            \n            # Test query\n            test_query = self.get_user_input(\"Enter a test query\", \"What is this document about?\")\n            \n            print(f\"\\nProcessing query: {test_query}\")\n            response = agent.run(test_query, table_name=f\"text_pages_{index_id}\")\n            \n            print(f\"\\n🤖 Response:\")\n            print(response)\n            \n        except Exception as e:\n            print(f\"❌ Error testing index: {e}\")\n    \n    def batch_create_from_config(self, config_file: str) -> None:\n        \"\"\"Create index from batch configuration file.\"\"\"\n        try:\n            with open(config_file, 'r') as f:\n                batch_config = json.load(f)\n            \n            index_name = batch_config.get(\"index_name\", \"Batch Index\")\n            index_description = batch_config.get(\"index_description\", \"\")\n            documents = batch_config.get(\"documents\", [])\n            processing_config = batch_config.get(\"processing\", {})\n            \n            if not documents:\n                print(\"❌ No documents specified in batch configuration\")\n                return\n            \n            # Validate documents exist\n            valid_documents = []\n            for doc_path in documents:\n                if os.path.exists(doc_path):\n                    valid_documents.append(doc_path)\n                else:\n                    print(f\"⚠️  Document not found: {doc_path}\")\n            \n            if not valid_documents:\n                print(\"❌ No valid documents found\")\n                return\n            \n            print(f\"🚀 Creating batch index: {index_name}\")\n            print(f\"📄 Processing {len(valid_documents)} documents...\")\n            \n            # Create index\n            index_id = self.db.create_index(\n                name=index_name,\n                description=index_description,\n                metadata=processing_config\n            )\n            \n            # Add documents\n            for doc_path in valid_documents:\n                filename = os.path.basename(doc_path)\n                self.db.add_document_to_index(index_id, filename, doc_path)\n            \n            # Process documents\n            self.pipeline.process_documents(valid_documents)\n            \n            print(f\"✅ Batch index '{index_name}' created successfully!\")\n            print(f\"Index ID: {index_id}\")\n            \n        except Exception as e:\n            print(f\"❌ Error creating batch index: {e}\")\n            import traceback\n            traceback.print_exc()\n\n\ndef create_sample_batch_config():\n    \"\"\"Create a sample batch configuration file.\"\"\"\n    sample_config = {\n        \"index_name\": \"Sample Batch Index\",\n        \"index_description\": \"Example batch index configuration\",\n        \"documents\": [\n            \"./rag_system/documents/invoice_1039.pdf\",\n            \"./rag_system/documents/invoice_1041.pdf\"\n        ],\n        \"processing\": {\n            \"chunk_size\": 512,\n            \"chunk_overlap\": 64,\n            \"enable_enrich\": True,\n            \"enable_latechunk\": True,\n            \"enable_docling\": True,\n            \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",\n            \"generation_model\": \"qwen3:0.6b\",\n            \"retrieval_mode\": \"hybrid\",\n            \"window_size\": 2\n        }\n    }\n    \n    with open(\"batch_indexing_config.json\", \"w\") as f:\n        json.dump(sample_config, f, indent=2)\n    \n    print(\"📄 Sample batch configuration created: batch_indexing_config.json\")\n\n\ndef main():\n    \"\"\"Main entry point for the script.\"\"\"\n    parser = argparse.ArgumentParser(description=\"LocalGPT Index Creation Tool\")\n    parser.add_argument(\"--batch\", help=\"Batch configuration file\", type=str)\n    parser.add_argument(\"--config\", help=\"Custom pipeline configuration file\", type=str)\n    parser.add_argument(\"--create-sample\", action=\"store_true\", help=\"Create sample batch config\")\n    \n    args = parser.parse_args()\n    \n    if args.create_sample:\n        create_sample_batch_config()\n        return\n    \n    try:\n        creator = IndexCreator(config_path=args.config)\n        \n        if args.batch:\n            creator.batch_create_from_config(args.batch)\n        else:\n            creator.create_index_interactive()\n            \n    except KeyboardInterrupt:\n        print(\"\\n\\n❌ Operation cancelled by user.\")\n    except Exception as e:\n        print(f\"❌ Unexpected error: {e}\")\n        import traceback\n        traceback.print_exc()\n\n\nif __name__ == \"__main__\":\n    main()  "
  },
  {
    "path": "demo_batch_indexing.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nDemo Batch Indexing Script for LocalGPT RAG System\n\nThis script demonstrates how to perform batch indexing of multiple documents\nusing configuration files. It's designed to showcase the full capabilities\nof the indexing pipeline with various configuration options.\n\nUsage:\n    python demo_batch_indexing.py --config batch_indexing_config.json\n    python demo_batch_indexing.py --create-sample-config\n    python demo_batch_indexing.py --help\n\"\"\"\n\nimport os\nimport sys\nimport json\nimport argparse\nimport time\nimport logging\nfrom typing import List, Dict, Any, Optional\nfrom pathlib import Path\nfrom datetime import datetime\n\n# Add the project root to the path so we can import rag_system modules\nsys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))\n\ntry:\n    from rag_system.main import PIPELINE_CONFIGS\n    from rag_system.pipelines.indexing_pipeline import IndexingPipeline\n    from rag_system.utils.ollama_client import OllamaClient\n    from backend.database import ChatDatabase\nexcept ImportError as e:\n    print(f\"❌ Error importing required modules: {e}\")\n    print(\"Please ensure you're running this script from the project root directory.\")\n    sys.exit(1)\n\n# Configure logging\nlogging.basicConfig(\n    level=logging.INFO,\n    format=\"%(asctime)s | %(levelname)-7s | %(name)s | %(message)s\",\n)\n\n\nclass BatchIndexingDemo:\n    \"\"\"Demonstration of batch indexing capabilities.\"\"\"\n    \n    def __init__(self, config_path: str):\n        \"\"\"Initialize the batch indexing demo.\"\"\"\n        self.config_path = config_path\n        self.config = self._load_config()\n        self.db = ChatDatabase()\n        \n        # Initialize Ollama client\n        self.ollama_client = OllamaClient()\n        \n        # Initialize pipeline with merged configuration\n        self.pipeline_config = self._merge_configurations()\n        self.pipeline = IndexingPipeline(\n            self.pipeline_config,\n            self.ollama_client,\n            self.config.get(\"ollama_config\", {\n                \"generation_model\": \"qwen3:0.6b\",\n                \"embedding_model\": \"qwen3:0.6b\"\n            })\n        )\n    \n    def _load_config(self) -> Dict[str, Any]:\n        \"\"\"Load batch indexing configuration from file.\"\"\"\n        try:\n            with open(self.config_path, 'r') as f:\n                config = json.load(f)\n            print(f\"✅ Loaded configuration from {self.config_path}\")\n            return config\n        except FileNotFoundError:\n            print(f\"❌ Configuration file not found: {self.config_path}\")\n            sys.exit(1)\n        except json.JSONDecodeError as e:\n            print(f\"❌ Invalid JSON in configuration file: {e}\")\n            sys.exit(1)\n    \n    def _merge_configurations(self) -> Dict[str, Any]:\n        \"\"\"Merge batch config with default pipeline config.\"\"\"\n        # Start with default pipeline configuration\n        merged_config = PIPELINE_CONFIGS.get(\"default\", {}).copy()\n        \n        # Override with batch-specific settings\n        batch_settings = self.config.get(\"pipeline_settings\", {})\n        \n        # Deep merge for nested dictionaries\n        def deep_merge(base: dict, override: dict) -> dict:\n            result = base.copy()\n            for key, value in override.items():\n                if key in result and isinstance(result[key], dict) and isinstance(value, dict):\n                    result[key] = deep_merge(result[key], value)\n                else:\n                    result[key] = value\n            return result\n        \n        return deep_merge(merged_config, batch_settings)\n    \n    def validate_documents(self, documents: List[str]) -> List[str]:\n        \"\"\"Validate and filter document paths.\"\"\"\n        valid_documents = []\n        \n        print(f\"📋 Validating {len(documents)} documents...\")\n        \n        for doc_path in documents:\n            # Handle relative paths\n            if not os.path.isabs(doc_path):\n                doc_path = os.path.abspath(doc_path)\n            \n            if os.path.exists(doc_path):\n                # Check file extension\n                ext = Path(doc_path).suffix.lower()\n                if ext in ['.pdf', '.txt', '.docx', '.md', '.html', '.htm']:\n                    valid_documents.append(doc_path)\n                    print(f\"  ✅ {doc_path}\")\n                else:\n                    print(f\"  ⚠️  Unsupported file type: {doc_path}\")\n            else:\n                print(f\"  ❌ File not found: {doc_path}\")\n        \n        print(f\"📊 {len(valid_documents)} valid documents found\")\n        return valid_documents\n    \n    def create_indexes(self) -> List[str]:\n        \"\"\"Create multiple indexes based on configuration.\"\"\"\n        indexes = self.config.get(\"indexes\", [])\n        created_indexes = []\n        \n        for index_config in indexes:\n            index_id = self.create_single_index(index_config)\n            if index_id:\n                created_indexes.append(index_id)\n        \n        return created_indexes\n    \n    def create_single_index(self, index_config: Dict[str, Any]) -> Optional[str]:\n        \"\"\"Create a single index from configuration.\"\"\"\n        try:\n            # Extract index metadata\n            index_name = index_config.get(\"name\", \"Unnamed Index\")\n            index_description = index_config.get(\"description\", \"\")\n            documents = index_config.get(\"documents\", [])\n            \n            if not documents:\n                print(f\"⚠️  No documents specified for index '{index_name}', skipping...\")\n                return None\n            \n            # Validate documents\n            valid_documents = self.validate_documents(documents)\n            if not valid_documents:\n                print(f\"❌ No valid documents found for index '{index_name}'\")\n                return None\n            \n            print(f\"\\n🚀 Creating index: {index_name}\")\n            print(f\"📄 Processing {len(valid_documents)} documents\")\n            \n            # Create index record in database\n            index_metadata = {\n                \"created_by\": \"demo_batch_indexing.py\",\n                \"created_at\": datetime.now().isoformat(),\n                \"document_count\": len(valid_documents),\n                \"config_used\": index_config.get(\"processing_options\", {})\n            }\n            \n            index_id = self.db.create_index(\n                name=index_name,\n                description=index_description,\n                metadata=index_metadata\n            )\n            \n            # Add documents to index\n            for doc_path in valid_documents:\n                filename = os.path.basename(doc_path)\n                self.db.add_document_to_index(index_id, filename, doc_path)\n            \n            # Process documents through pipeline\n            start_time = time.time()\n            self.pipeline.process_documents(valid_documents)\n            processing_time = time.time() - start_time\n            \n            print(f\"✅ Index '{index_name}' created successfully!\")\n            print(f\"   Index ID: {index_id}\")\n            print(f\"   Processing time: {processing_time:.2f} seconds\")\n            print(f\"   Documents processed: {len(valid_documents)}\")\n            \n            return index_id\n            \n        except Exception as e:\n            print(f\"❌ Error creating index '{index_name}': {e}\")\n            import traceback\n            traceback.print_exc()\n            return None\n    \n    def demonstrate_features(self):\n        \"\"\"Demonstrate various indexing features.\"\"\"\n        print(\"\\n🎯 Batch Indexing Demo Features:\")\n        print(\"=\" * 50)\n        \n        # Show configuration\n        print(f\"📋 Configuration file: {self.config_path}\")\n        print(f\"📊 Number of indexes to create: {len(self.config.get('indexes', []))}\")\n        \n        # Show pipeline settings\n        pipeline_settings = self.config.get(\"pipeline_settings\", {})\n        if pipeline_settings:\n            print(\"\\n⚙️  Pipeline Settings:\")\n            for key, value in pipeline_settings.items():\n                print(f\"   {key}: {value}\")\n        \n        # Show model configuration\n        ollama_config = self.config.get(\"ollama_config\", {})\n        if ollama_config:\n            print(\"\\n🤖 Model Configuration:\")\n            for key, value in ollama_config.items():\n                print(f\"   {key}: {value}\")\n    \n    def run_demo(self):\n        \"\"\"Run the complete batch indexing demo.\"\"\"\n        print(\"🚀 LocalGPT Batch Indexing Demo\")\n        print(\"=\" * 50)\n        \n        # Show demo features\n        self.demonstrate_features()\n        \n        # Create indexes\n        print(f\"\\n📚 Starting batch indexing process...\")\n        start_time = time.time()\n        \n        created_indexes = self.create_indexes()\n        \n        total_time = time.time() - start_time\n        \n        # Summary\n        print(f\"\\n📊 Batch Indexing Summary\")\n        print(\"=\" * 50)\n        print(f\"✅ Successfully created {len(created_indexes)} indexes\")\n        print(f\"⏱️  Total processing time: {total_time:.2f} seconds\")\n        \n        if created_indexes:\n            print(f\"\\n📋 Created Indexes:\")\n            for i, index_id in enumerate(created_indexes, 1):\n                index_info = self.db.get_index(index_id)\n                if index_info:\n                    print(f\"   {i}. {index_info['name']} ({index_id[:8]}...)\")\n                    print(f\"      Documents: {len(index_info.get('documents', []))}\")\n        \n        print(f\"\\n🎉 Demo completed successfully!\")\n        print(f\"💡 You can now use these indexes in the LocalGPT interface.\")\n\n\ndef create_sample_config():\n    \"\"\"Create a comprehensive sample configuration file.\"\"\"\n    sample_config = {\n        \"description\": \"Demo batch indexing configuration showcasing various features\",\n        \"pipeline_settings\": {\n            \"embedding_model_name\": \"Qwen/Qwen3-Embedding-0.6B\",\n            \"indexing\": {\n                \"embedding_batch_size\": 50,\n                \"enrichment_batch_size\": 25,\n                \"enable_progress_tracking\": True\n            },\n            \"contextual_enricher\": {\n                \"enabled\": True,\n                \"window_size\": 2,\n                \"model_name\": \"qwen3:0.6b\"\n            },\n            \"chunking\": {\n                \"chunk_size\": 512,\n                \"chunk_overlap\": 64,\n                \"enable_latechunk\": True,\n                \"enable_docling\": True\n            },\n            \"retrievers\": {\n                \"dense\": {\n                    \"enabled\": True,\n                    \"lancedb_table_name\": \"demo_text_pages\"\n                },\n                \"bm25\": {\n                    \"enabled\": True,\n                    \"index_name\": \"demo_bm25_index\"\n                }\n            },\n            \"storage\": {\n                \"lancedb_uri\": \"./index_store/lancedb\",\n                \"bm25_path\": \"./index_store/bm25\"\n            }\n        },\n        \"ollama_config\": {\n            \"generation_model\": \"qwen3:0.6b\",\n            \"embedding_model\": \"qwen3:0.6b\"\n        },\n        \"indexes\": [\n            {\n                \"name\": \"Sample Invoice Collection\",\n                \"description\": \"Demo index containing sample invoice documents\",\n                \"documents\": [\n                    \"./rag_system/documents/invoice_1039.pdf\",\n                    \"./rag_system/documents/invoice_1041.pdf\"\n                ],\n                \"processing_options\": {\n                    \"chunk_size\": 512,\n                    \"enable_enrichment\": True,\n                    \"retrieval_mode\": \"hybrid\"\n                }\n            },\n            {\n                \"name\": \"Research Papers Demo\",\n                \"description\": \"Demo index for research papers and whitepapers\",\n                \"documents\": [\n                    \"./rag_system/documents/Newwhitepaper_Agents2.pdf\"\n                ],\n                \"processing_options\": {\n                    \"chunk_size\": 1024,\n                    \"enable_enrichment\": True,\n                    \"retrieval_mode\": \"dense\"\n                }\n            }\n        ]\n    }\n    \n    config_filename = \"batch_indexing_config.json\"\n    with open(config_filename, \"w\") as f:\n        json.dump(sample_config, f, indent=2)\n    \n    print(f\"✅ Sample configuration created: {config_filename}\")\n    print(f\"📝 Edit this file to customize your batch indexing setup\")\n    print(f\"🚀 Run: python demo_batch_indexing.py --config {config_filename}\")\n\n\ndef main():\n    \"\"\"Main entry point for the demo script.\"\"\"\n    parser = argparse.ArgumentParser(\n        description=\"LocalGPT Batch Indexing Demo\",\n        formatter_class=argparse.RawDescriptionHelpFormatter,\n        epilog=\"\"\"\nExamples:\n  python demo_batch_indexing.py --config batch_indexing_config.json\n  python demo_batch_indexing.py --create-sample-config\n  \nThis demo showcases the advanced batch indexing capabilities of LocalGPT,\nincluding multi-index creation, advanced configuration options, and\ncomprehensive processing pipelines.\n        \"\"\"\n    )\n    \n    parser.add_argument(\n        \"--config\",\n        type=str,\n        default=\"batch_indexing_config.json\",\n        help=\"Path to batch indexing configuration file\"\n    )\n    \n    parser.add_argument(\n        \"--create-sample-config\",\n        action=\"store_true\",\n        help=\"Create a sample configuration file\"\n    )\n    \n    args = parser.parse_args()\n    \n    if args.create_sample_config:\n        create_sample_config()\n        return\n    \n    if not os.path.exists(args.config):\n        print(f\"❌ Configuration file not found: {args.config}\")\n        print(f\"💡 Create a sample config with: python {sys.argv[0]} --create-sample-config\")\n        sys.exit(1)\n    \n    try:\n        demo = BatchIndexingDemo(args.config)\n        demo.run_demo()\n        \n    except KeyboardInterrupt:\n        print(\"\\n\\n❌ Demo cancelled by user.\")\n    except Exception as e:\n        print(f\"❌ Demo failed: {e}\")\n        import traceback\n        traceback.print_exc()\n\n\nif __name__ == \"__main__\":\n    main()  "
  },
  {
    "path": "docker-compose.local-ollama.yml",
    "content": "services:\n  # RAG API server (connects to host Ollama)\n  rag-api:\n    build:\n      context: .\n      dockerfile: Dockerfile.rag-api\n    container_name: rag-api\n    ports:\n      - \"8001:8001\"\n    environment:\n      - OLLAMA_HOST=http://host.docker.internal:11434\n      - NODE_ENV=production\n    volumes:\n      - ./lancedb:/app/lancedb\n      - ./index_store:/app/index_store\n      - ./shared_uploads:/app/shared_uploads\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:8001/models\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\n  # Backend API server\n  backend:\n    build:\n      context: .\n      dockerfile: Dockerfile.backend\n    container_name: rag-backend\n    ports:\n      - \"8000:8000\"\n    environment:\n      - NODE_ENV=production\n      - RAG_API_URL=http://rag-api:8001\n    volumes:\n      - ./backend/chat_data.db:/app/backend/chat_data.db\n      - ./shared_uploads:/app/shared_uploads\n    depends_on:\n      rag-api:\n        condition: service_healthy\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:8000/health\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\n  # Frontend Next.js application\n  frontend:\n    build:\n      context: .\n      dockerfile: Dockerfile.frontend\n    container_name: rag-frontend\n    ports:\n      - \"3000:3000\"\n    environment:\n      - NODE_ENV=production\n      - NEXT_PUBLIC_API_URL=http://localhost:8000\n    depends_on:\n      backend:\n        condition: service_healthy\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:3000\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\nnetworks:\n  rag-network:\n    driver: bridge "
  },
  {
    "path": "docker-compose.yml",
    "content": "services:\n  # Ollama service for LLM inference (optional - can use host Ollama instead)\n  ollama:\n    image: ollama/ollama:latest\n    container_name: rag-ollama\n    ports:\n      - \"11434:11434\"\n    volumes:\n      - ollama_data:/root/.ollama\n    environment:\n      - OLLAMA_HOST=0.0.0.0\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:11434/api/tags\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n    profiles:\n      - with-ollama  # Optional service - enable with --profile with-ollama\n\n  # RAG API server\n  rag-api:\n    build:\n      context: .\n      dockerfile: Dockerfile.rag-api\n    container_name: rag-api\n    ports:\n      - \"8001:8001\"\n    environment:\n      # Use host Ollama by default, or containerized Ollama if enabled\n      - OLLAMA_HOST=${OLLAMA_HOST:-http://host.docker.internal:11434}\n      - NODE_ENV=production\n    volumes:\n      - ./lancedb:/app/lancedb\n      - ./index_store:/app/index_store\n      - ./shared_uploads:/app/shared_uploads\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:8001/models\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\n  # Backend API server\n  backend:\n    build:\n      context: .\n      dockerfile: Dockerfile.backend\n    container_name: rag-backend\n    ports:\n      - \"8000:8000\"\n    environment:\n      - NODE_ENV=production\n      - RAG_API_URL=http://rag-api:8001\n      - OLLAMA_HOST=${OLLAMA_HOST:-http://172.18.0.1:11434}\n    volumes:\n      - ./backend:/app/backend\n      - ./shared_uploads:/app/shared_uploads\n    depends_on:\n      rag-api:\n        condition: service_healthy\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:8000/health\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\n  # Frontend Next.js application\n  frontend:\n    build:\n      context: .\n      dockerfile: Dockerfile.frontend\n    container_name: rag-frontend\n    ports:\n      - \"3000:3000\"\n    environment:\n      - NODE_ENV=production\n      - NEXT_PUBLIC_API_URL=http://localhost:8000\n    depends_on:\n      backend:\n        condition: service_healthy\n    healthcheck:\n      test: [\"CMD\", \"curl\", \"-f\", \"http://localhost:3000\"]\n      interval: 30s\n      timeout: 10s\n      retries: 3\n    restart: unless-stopped\n    networks:\n      - rag-network\n\nvolumes:\n  ollama_data:\n    driver: local\n\nnetworks:\n  rag-network:\n    driver: bridge    "
  },
  {
    "path": "docker.env",
    "content": "# Docker environment configuration\n# Set this to use local Ollama instance running on host\n# Note: Using Docker gateway IP instead of host.docker.internal for Linux compatibility\nOLLAMA_HOST=http://172.18.0.1:11434\n\n# Alternative: Use containerized Ollama (uncomment and run with --profile with-ollama)\n# OLLAMA_HOST=http://ollama:11434\n\n# Other configuration\nNODE_ENV=production\nNEXT_PUBLIC_API_URL=http://localhost:8000\nRAG_API_URL=http://rag-api:8001   "
  },
  {
    "path": "env.example.watsonx",
    "content": "# ====================================================================\n# LocalGPT Watson X Configuration Example\n# ====================================================================\n# This file shows how to configure LocalGPT to use IBM Watson X AI\n# with Granite models instead of local Ollama.\n#\n# Copy this file to .env and fill in your credentials:\n#   cp .env.example.watsonx .env\n# ====================================================================\n\n# LLM Backend Selection\n# Options: \"ollama\" (default) or \"watsonx\"\nLLM_BACKEND=watsonx\n\n# ====================================================================\n# Watson X Credentials\n# ====================================================================\n# Get these from your IBM Cloud Watson X project:\n# 1. Go to https://cloud.ibm.com/\n# 2. Navigate to Watson X AI service\n# 3. Create or select a project\n# 4. Get API key from IBM Cloud IAM\n# 5. Copy project ID from project settings\n\n# Your IBM Cloud API key\nWATSONX_API_KEY=your_api_key_here\n\n# Your Watson X project ID\nWATSONX_PROJECT_ID=your_project_id_here\n\n# Watson X service URL (default: us-south region)\n# Options:\n#   - https://us-south.ml.cloud.ibm.com (US South)\n#   - https://eu-de.ml.cloud.ibm.com (Frankfurt)\n#   - https://eu-gb.ml.cloud.ibm.com (London)\n#   - https://jp-tok.ml.cloud.ibm.com (Tokyo)\nWATSONX_URL=https://us-south.ml.cloud.ibm.com\n\n# ====================================================================\n# Model Configuration\n# ====================================================================\n# Granite models available on Watson X\n\n# Main generation model for answering queries\n# Options:\n#   - ibm/granite-13b-chat-v2 (recommended for chat)\n#   - ibm/granite-13b-instruct-v2 (for instructions)\n#   - ibm/granite-20b-multilingual (for multilingual)\n#   - ibm/granite-3b-code-instruct (for code)\nWATSONX_GENERATION_MODEL=ibm/granite-13b-chat-v2\n\n# Lightweight model for enrichment and routing\n# Use a smaller model for better performance on simple tasks\nWATSONX_ENRICHMENT_MODEL=ibm/granite-8b-japanese\n\n# ====================================================================\n# Optional: Ollama Configuration (fallback)\n# ====================================================================\n# These settings are used if LLM_BACKEND=ollama\n\nOLLAMA_HOST=http://localhost:11434\n"
  },
  {
    "path": "eslint.config.mjs",
    "content": "import { dirname } from \"path\";\nimport { fileURLToPath } from \"url\";\nimport { FlatCompat } from \"@eslint/eslintrc\";\n\nconst __filename = fileURLToPath(import.meta.url);\nconst __dirname = dirname(__filename);\n\nconst compat = new FlatCompat({\n  baseDirectory: __dirname,\n});\n\nconst eslintConfig = [\n  ...compat.extends(\"next/core-web-vitals\", \"next/typescript\"),\n];\n\nexport default eslintConfig;\n"
  },
  {
    "path": "next.config.ts",
    "content": "import type { NextConfig } from \"next\";\n\nconst nextConfig: NextConfig = {\n  /* config options here */\n  eslint: {\n    // Warning: This allows production builds to successfully complete even if your project has ESLint errors.\n    ignoreDuringBuilds: true,\n  },\n  typescript: {\n    // Warning: This allows production builds to successfully complete even if your project has type errors.\n    ignoreBuildErrors: true,\n  },\n};\n\nexport default nextConfig;\n"
  },
  {
    "path": "package.json",
    "content": "{\n  \"name\": \"multimodal_rag\",\n  \"version\": \"0.1.0\",\n  \"private\": true,\n  \"scripts\": {\n    \"dev\": \"next dev\",\n    \"build\": \"next build\",\n    \"start\": \"next start\",\n    \"lint\": \"next lint\"\n  },\n  \"dependencies\": {\n    \"@radix-ui/react-avatar\": \"^1.1.10\",\n    \"@radix-ui/react-dropdown-menu\": \"^2.1.15\",\n    \"@radix-ui/react-scroll-area\": \"^1.2.9\",\n    \"@radix-ui/react-separator\": \"^1.1.7\",\n    \"@radix-ui/react-slot\": \"^1.2.3\",\n    \"class-variance-authority\": \"^0.7.1\",\n    \"clsx\": \"^2.1.1\",\n    \"framer-motion\": \"^12.16.0\",\n    \"lucide-react\": \"^0.513.0\",\n    \"next\": \"15.3.3\",\n    \"react\": \"^19.0.0\",\n    \"react-dom\": \"^19.0.0\",\n    \"react-markdown\": \"^10.1.0\",\n    \"remark-gfm\": \"^4.0.1\",\n    \"tailwind-merge\": \"^3.3.0\"\n  },\n  \"devDependencies\": {\n    \"@eslint/eslintrc\": \"^3\",\n    \"@tailwindcss/postcss\": \"^4\",\n    \"@types/node\": \"^20\",\n    \"@types/react\": \"^19\",\n    \"@types/react-dom\": \"^19\",\n    \"eslint\": \"^9\",\n    \"eslint-config-next\": \"15.3.3\",\n    \"tailwindcss\": \"^4\",\n    \"tw-animate-css\": \"^1.3.4\",\n    \"typescript\": \"^5\"\n  }\n}\n"
  },
  {
    "path": "postcss.config.mjs",
    "content": "const config = {\n  plugins: [\"@tailwindcss/postcss\"],\n};\n\nexport default config;\n"
  },
  {
    "path": "rag_system/DOCUMENTATION.md",
    "content": "# RAG System Documentation\n\nThis document provides a detailed overview of the RAG (Retrieval-Augmented Generation) system, its architecture, and how to use it.\n\n## System Overview\n\nThis RAG system is a sophisticated, multimodal question-answering system designed to work with a variety of documents. It can understand and process both the text and the visual layout of documents, and it uses a knowledge graph to understand the relationships between the entities in the documents.\n\nThe system is built around an agentic workflow that allows it to:\n\n*   **Decompose complex questions** into smaller, more manageable sub-questions.\n*   **Triage queries** to determine if they can be answered directly or if they require retrieval from the knowledge base.\n*   **Verify answers** against the retrieved context to ensure they are accurate and supported by the documents.\n\n## Architecture\n\nThe system is composed of two main pipelines: an indexing pipeline and a retrieval pipeline.\n\n### Indexing Pipeline\n\nThe indexing pipeline is responsible for processing the documents and building the knowledge base. It performs the following steps:\n\n1.  **Text Extraction**: The pipeline uses `PyMuPDF` to extract the text from each page of the PDF documents, preserving the original layout.\n2.  **Text Embedding**: The extracted text is then passed to a text embedding model (`Qwen/Qwen3-Embedding-0.6B`) to create numerical vector representations of the text.\n3.  **Knowledge Graph Creation**: The text is also passed to a `GraphExtractor` that uses a large language model (`qwen2.5vl:7b`) to extract entities and their relationships. This information is then used to build a knowledge graph, which is stored as a `.gml` file.\n4.  **Indexing**: The text embeddings and the knowledge graph are then stored in a LanceDB database.\n\n### Retrieval Pipeline\n\nThe retrieval pipeline is responsible for answering user queries. It uses an agentic workflow that includes the following steps:\n\n1.  **Triage**: The agent first triages the user's query to determine if it can be answered directly or if it requires retrieval from the knowledge base.\n2.  **Query Decomposition**: If the query is complex, the agent uses a `QueryDecomposer` to break it down into smaller, more manageable sub-questions.\n3.  **Retrieval**: The agent then uses a `MultiVectorRetriever` and a `GraphRetriever` to retrieve relevant information from the knowledge base.\n4.  **Verification**: The retrieved context is then passed to a `Verifier` that uses an LLM to check if the context is sufficient to answer the query.\n5.  **Synthesis**: Finally, the agent uses an LLM to synthesize a final answer from the verified context.\n\n## API Endpoints\n\nThe system provides the following command-line endpoints:\n\n*   `index`: This endpoint runs the indexing pipeline to process the documents and build the knowledge base.\n*   `chat`: This endpoint runs the retrieval pipeline to answer a user's query.\n*   `show_graph`: This endpoint displays the knowledge graph in a human-readable format and also provides a visual representation of the graph.\n\n### Usage\n\nTo run the system, use the following commands:\n\n```bash\n# Activate the virtual environment\nsource rag_system/rag_venv/bin/activate\n\n# Index the documents\npython rag_system/main.py index\n\n# Ask a question\npython rag_system/main.py chat \"Your question here\"\n\n# Show the knowledge graph\npython rag_system/main.py show_graph\n```\n"
  },
  {
    "path": "rag_system/README.md",
    "content": "# Multimodal RAG System\n\nThis document provides a detailed overview of the multimodal Retrieval-Augmented Generation (RAG) system implemented in this directory. The system is designed to process and understand information from PDF documents, combining both textual and visual data to answer complex queries.\n\n## 1. Overview\n\nThis RAG system is a sophisticated pipeline that leverages state-of-the-art open-source models to provide accurate, context-aware answers from a document corpus. Unlike traditional RAG systems that only process text, this implementation is fully multimodal. It extracts and indexes both text and images from PDFs, allowing a Vision Language Model (VLM) to reason over both modalities when generating a final answer.\n\nThe core capabilities include:\n-   **Multimodal Indexing**: Extracts text and images from PDFs and creates separate vector embeddings for each.\n-   **Hybrid Retrieval**: Combines dense vector search (for semantic similarity) with traditional keyword-based search (BM25) for robust retrieval.\n-   **Advanced Reranking**: Utilizes a powerful reranker model to improve the relevance of retrieved documents before they are passed to the generator.\n-   **VLM-Powered Synthesis**: Employs a Vision Language Model to synthesize the final answer, allowing it to analyze both the text and the images from the retrieved document chunks.\n\n## 2. Architecture\n\nThe system is composed of several key Python modules that work together to form the RAG pipeline.\n\n### Key Modules:\n\n-   `main.py`: The main entry point for the application. It contains the configuration for all models and pipelines and orchestrates the indexing and retrieval processes.\n-   `rag_system/pipelines/`: Contains the high-level orchestration for indexing and retrieval.\n    -   `indexing_pipeline.py`: Manages the process of converting raw PDFs into indexed, searchable data.\n    -   `retrieval_pipeline.py`: Handles the end-to-end process of taking a user query, retrieving relevant information, and generating a final answer.\n-   `rag_system/indexing/`: Contains all modules related to data processing and indexing.\n    -   `multimodal.py`: Responsible for extracting text and images from PDFs and generating embeddings using the configured vision model (`colqwen2-v1.0`).\n    -   `representations.py`: Defines the text embedding model (`Qwen2-7B-instruct`) and other data representation generators.\n    -   `embedders.py`: Manages the connection to the **LanceDB** vector database and handles the indexing of vector embeddings.\n-   `rag_system/retrieval/`: Contains modules for retrieving and ranking documents.\n    -   `retrievers.py`: Implements the logic for searching the vector database to find relevant text and image chunks.\n    -   `reranker.py`: Contains the `QwenReranker` class, which re-ranks the retrieved documents for improved relevance.\n-   `rag_system/agent/`: Contains the `Agent` loop that interacts with the user and the RAG pipelines.\n-   `rag_system/utils/`: Contains utility clients, such as the `OllamaClient` for interacting with the Ollama server.\n\n### Data Flow:\n\n1.  **Indexing**:\n    -   The `MultimodalProcessor` reads a PDF and splits it into pages.\n    -   For each page, it extracts the raw text and a full-page image.\n    -   The `QwenEmbedder` generates a vector embedding for the text.\n    -   The `LocalVisionModel` (using `colqwen2-v1.0`) generates a vector embedding for the image.\n    -   The `VectorIndexer` stores these embeddings in separate tables within a **LanceDB** database.\n2.  **Retrieval**:\n    -   A user submits a query to the `Agent`.\n    -   The `RetrievalPipeline`'s `MultiVectorRetriever` searches both the text and image tables in LanceDB for relevant chunks.\n    -   The retrieved documents are passed to the `QwenReranker`, which re-orders them based on relevance to the query.\n    -   The top-ranked documents (containing both text and image references) are passed to the Vision Language Model (`qwen-vl`).\n    -   The VLM analyzes the text and images to extract key facts.\n    -   A final text generation model (`llama3`) synthesizes these facts into a coherent, human-readable answer.\n\n## 3. Models\n\nThis system relies on a suite of powerful, open-source models.\n\n| Component             | Model                               | Framework      | Purpose                                     |\n| --------------------- | ----------------------------------- | -------------- | ------------------------------------------- |\n| **Image Embedding**   | `vidore/colqwen2-v1.0`              | `colpali`      | Generates vector embeddings from images.    |\n| **Text Embedding**    | `Qwen/Qwen2-7B-instruct`            | `transformers` | Generates vector embeddings from text.      |\n| **Reranker**          | `Qwen/Qwen-reranker`                | `transformers` | Re-ranks retrieved documents for relevance. |\n| **Vision Language Model** | `qwen2.5vl:7b`                      | `Ollama`       | Extracts facts from text and images.        |\n| **Text Generation**   | `llama3`                            | `Ollama`       | Synthesizes the final answer.               |\n\n## 4. Configuration\n\nAll system configurations are centralized in `main.py`.\n\n-   **`OLLAMA_CONFIG`**: Defines the models that will be run via the Ollama server. This includes the final text generation model and the Vision Language Model.\n-   **`PIPELINE_CONFIGS`**: Contains the configurations for both the `indexing` and `retrieval` pipelines. Here you can specify:\n    -   The paths for the LanceDB database and source documents.\n    -   The names of the tables to be used for text and image embeddings.\n    -   The Hugging Face model names for the text embedder, vision model, and reranker.\n    -   Parameters for the reranker and retrieval process (e.g., `top_k`, `retrieval_k`).\n\nTo change a model, simply update the corresponding model name in this configuration file.\n\n## 5. Usage\n\nTo run the system, you first need to ensure the required models are available.\n\n### Prerequisites:\n\n1.  **Install Dependencies**:\n    ```bash\n    pip install -r requirements.txt\n    ```\n2.  **Download Ollama Models**:\n    ```bash\n    ollama pull llama3\n    ollama pull qwen2.5vl:7b\n    ```\n3.  **Hugging Face Models**: The `transformers` and `colpali` libraries will automatically download the required models the first time they are used. Ensure you have a stable internet connection.\n\n### Running the System:\n\n1.  **Execute the Main Script**:\n    ```bash\n    python rag_system/main.py\n    ```\n2.  **Indexing**: The script will first run the indexing pipeline, processing any documents in the `rag_system/documents` directory and storing their embeddings in LanceDB.\n3.  **Querying**: Once indexing is complete, the RAG agent will be ready. You can ask questions about the documents you have indexed.\n    ```\n    > What was the revenue growth in Q3?\n    ```\n4.  **Exit**: To stop the agent, type `quit`.\n"
  },
  {
    "path": "rag_system/__init__.py",
    "content": "import logging\nimport os\n\n# ---------------------------------------------------------\n# Global logging setup for the entire `rag_system` package.\n# ---------------------------------------------------------\n# You can control verbosity with an env variable, e.g.:\n#   export RAG_LOG_LEVEL=DEBUG  (or INFO / WARNING / ERROR)\n# If not set, we default to INFO to avoid excessive noise.\n# ---------------------------------------------------------\n_level_str = os.getenv(\"RAG_LOG_LEVEL\", \"INFO\").upper()\n_level = getattr(logging, _level_str, logging.INFO)\n\n# Only configure root logger if it hasn't been configured yet\nif not logging.getLogger().handlers:\n    logging.basicConfig(\n        level=_level,\n        format=\"%(asctime)s | %(levelname)-8s | %(name)s | %(message)s\",\n    )\nelse:\n    logging.getLogger().setLevel(_level)\n\nlogging.getLogger(__name__).debug(\n    \"Initialized rag_system logging (level=%s)\", _level_str\n)\n\n# ---------------------------------------------------------\n# Authenticate to Hugging Face Hub if a token is provided\n# ---------------------------------------------------------\nfrom typing import Optional\n\n\ndef _hf_auto_login() -> None:\n    \"\"\"Attempt to authenticate with Hugging Face Hub using an env token.\n\n    We support both the new canonical env var name (HF_TOKEN) and the two\n    historical variants to avoid breaking user setups. The login call is\n    idempotent: if a cached token already exists, the hub library will simply\n    reuse it, so it is safe to run on every import.\n    \"\"\"\n\n    import os\n\n    token: Optional[str] = (\n        os.getenv(\"HF_TOKEN\")\n        or os.getenv(\"HUGGINGFACE_HUB_TOKEN\")\n        or os.getenv(\"HUGGING_FACE_HUB_TOKEN\")\n    )\n\n    if not token:\n        logging.getLogger(__name__).debug(\"No Hugging Face token found in env; proceeding anonymously.\")\n        return\n\n    try:\n        from huggingface_hub import login as hf_login\n\n        hf_login(token=token, add_to_git_credential=False)  # type: ignore\n        logging.getLogger(__name__).info(\"Authenticated to Hugging Face Hub via env token.\")\n    except Exception as exc:  # pragma: no cover – best-effort login\n        logging.getLogger(__name__).warning(\n            \"Failed to login to Hugging Face Hub automatically: %s\", exc\n        )\n\n\n# Run on module import\n_hf_auto_login() "
  },
  {
    "path": "rag_system/agent/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/agent/loop.py",
    "content": "from typing import Dict, Any, Optional\nimport json\nimport time, asyncio, os\nimport numpy as np\nimport concurrent.futures\nfrom cachetools import TTLCache, LRUCache\nfrom rag_system.utils.ollama_client import OllamaClient\nfrom rag_system.pipelines.retrieval_pipeline import RetrievalPipeline\nfrom rag_system.agent.verifier import Verifier\nfrom rag_system.retrieval.query_transformer import QueryDecomposer, GraphQueryTranslator\nfrom rag_system.retrieval.retrievers import GraphRetriever\n\nclass Agent:\n    \"\"\"\n    The main agent, now fully wired to use a live Ollama client.\n    \"\"\"\n    def __init__(self, pipeline_configs: Dict[str, Dict], llm_client: OllamaClient, ollama_config: Dict[str, str]):\n        self.pipeline_configs = pipeline_configs\n        self.llm_client = llm_client\n        self.ollama_config = ollama_config\n        \n        gen_model = self.ollama_config[\"generation_model\"]\n        \n        # Initialize the single, persistent retrieval pipeline for this agent\n        self.retrieval_pipeline = RetrievalPipeline(pipeline_configs, self.llm_client, self.ollama_config)\n        \n        self.verifier = Verifier(llm_client, gen_model)\n        self.query_decomposer = QueryDecomposer(llm_client, gen_model)\n        \n        # 🚀 OPTIMIZED: TTL cache now stores embeddings for semantic matching\n        self._cache_max_size = 100  # fallback size limit for manual eviction helper\n        self._query_cache: TTLCache = TTLCache(maxsize=self._cache_max_size, ttl=300)\n        self.semantic_cache_threshold = self.pipeline_configs.get(\"semantic_cache_threshold\", 0.98)\n        # If set to \"session\", semantic-cache hits will be restricted to the same chat session.\n        # Otherwise (default \"global\") answers can be reused across sessions.\n        self.cache_scope = self.pipeline_configs.get(\"cache_scope\", \"global\")  # 'global' or 'session'\n        \n        # 🚀 NEW: In-memory store for conversational history per session\n        self.chat_histories: LRUCache = LRUCache(maxsize=100) # Stores history for 100 recent sessions\n\n        graph_config = self.pipeline_configs.get(\"graph_strategy\", {})\n        if graph_config.get(\"enabled\"):\n            self.graph_query_translator = GraphQueryTranslator(llm_client, gen_model)\n            self.graph_retriever = GraphRetriever(graph_config[\"graph_path\"])\n            print(\"Agent initialized with live GraphRAG capabilities.\")\n        else:\n            print(\"Agent initialized (GraphRAG disabled).\")\n\n        # ---- Load document overviews for fast routing ----\n        self._global_overview_path = os.path.join(\"index_store\", \"overviews\", \"overviews.jsonl\")\n        self.doc_overviews: list[str] = []\n        self._current_overview_session: str | None = None  # cache key to avoid rereading on every query\n        self._load_overviews(self._global_overview_path)\n\n    def _load_overviews(self, path: str):\n        \"\"\"Helper to load overviews from a .jsonl file into self.doc_overviews.\"\"\"\n        import json, os\n        self.doc_overviews.clear()\n        if not os.path.exists(path):\n            return\n        try:\n            with open(path, encoding=\"utf-8\") as fh:\n                for line in fh:\n                    try:\n                        rec = json.loads(line)\n                        if isinstance(rec, dict) and rec.get(\"overview\"):\n                            self.doc_overviews.append(rec[\"overview\"].strip())\n                    except Exception:\n                        continue\n            print(f\"📖 Loaded {len(self.doc_overviews)} overviews from {path}\")\n        except Exception as e:\n            print(f\"⚠️  Failed to load document overviews from {path}: {e}\")\n\n    def load_overviews_for_indexes(self, idx_ids: list[str]):\n        \"\"\"Aggregate overviews for the given indexes or fall back to global file.\"\"\"\n        import os, json\n        aggregated: list[str] = []\n        for idx in idx_ids:\n            path = os.path.join(\"index_store\", \"overviews\", f\"{idx}.jsonl\")\n            if os.path.exists(path):\n                try:\n                    with open(path, encoding=\"utf-8\") as fh:\n                        for line in fh:\n                            if not line.strip():\n                                continue\n                            try:\n                                rec = json.loads(line)\n                                ov = rec.get(\"overview\", \"\").strip()\n                                if ov:\n                                    aggregated.append(ov)\n                            except json.JSONDecodeError:\n                                continue\n                except Exception as e:\n                    print(f\"⚠️  Error reading {path}: {e}\")\n        if aggregated:\n            self.doc_overviews = aggregated\n            self._current_overview_session = \"|\".join(idx_ids)  # cache composite key so no overwrite\n            print(f\"📖 Loaded {len(aggregated)} overviews for indexes {[i[:8] for i in idx_ids]}\")\n        else:\n            print(f\"⚠️  No per-index overviews found for {idx_ids}. Using global overview file.\")\n            self._load_overviews(self._global_overview_path)\n            self._current_overview_session = \"GLOBAL\"\n\n    def _cosine_similarity(self, v1: np.ndarray, v2: np.ndarray) -> float:\n        \"\"\"Computes cosine similarity between two vectors.\"\"\"\n        if not isinstance(v1, np.ndarray): v1 = np.array(v1)\n        if not isinstance(v2, np.ndarray): v2 = np.array(v2)\n        \n        if v1.shape != v2.shape:\n            raise ValueError(\"Vectors must have the same shape for cosine similarity.\")\n\n        if np.all(v1 == 0) or np.all(v2 == 0):\n            return 0.0\n            \n        dot_product = np.dot(v1, v2)\n        norm_v1 = np.linalg.norm(v1)\n        norm_v2 = np.linalg.norm(v2)\n        \n        # Avoid division by zero\n        if norm_v1 == 0 or norm_v2 == 0:\n            return 0.0\n        \n        return dot_product / (norm_v1 * norm_v2)\n\n    def _find_in_semantic_cache(self, query_embedding: np.ndarray, session_id: Optional[str] = None) -> Optional[Dict[str, Any]]:\n        \"\"\"Finds a semantically similar query in the cache.\"\"\"\n        if not self._query_cache or query_embedding is None:\n            return None\n\n        for key, cached_item in self._query_cache.items():\n            cached_embedding = cached_item.get('embedding')\n            if cached_embedding is None:\n                continue\n\n            # Respect cache scoping: if scope is session-level, skip results from other sessions\n            if self.cache_scope == \"session\" and session_id is not None:\n                if cached_item.get(\"session_id\") != session_id:\n                    continue\n\n            try:\n                similarity = self._cosine_similarity(query_embedding, cached_embedding)\n\n                if similarity >= self.semantic_cache_threshold:\n                    print(f\"🚀 Semantic cache hit! Similarity: {similarity:.3f} with cached query '{key}'\")\n                    return cached_item.get('result')\n            except ValueError:\n                # In case of shape mismatch, just skip\n                continue\n\n        return None\n\n    def _format_query_with_history(self, query: str, history: list) -> str:\n        \"\"\"Formats the user query with conversation history for context.\"\"\"\n        if not history:\n            return query\n        \n        formatted_history = \"\\n\".join([f\"User: {turn['query']}\\nAssistant: {turn['answer']}\" for turn in history])\n        \n        prompt = f\"\"\"\nGiven the following conversation history, answer the user's latest query. The history provides context for resolving pronouns or follow-up questions.\n\n--- Conversation History ---\n{formatted_history}\n---\n\nLatest User Query: \"{query}\"\n\"\"\"\n        return prompt\n\n    # ---------------- Asynchronous triage using Ollama ----------------\n    async def _triage_query_async(self, query: str, history: list) -> str:\n        \n        print(f\"🔍 ROUTING DEBUG: Starting triage for query: '{query[:100]}...'\")\n        \n        # 1️⃣ Fast routing using precomputed overviews (if available)\n        print(f\"📖 ROUTING DEBUG: Attempting overview-based routing...\")\n        routed = self._route_via_overviews(query)\n        if routed:\n            print(f\"✅ ROUTING DEBUG: Overview routing decided: '{routed}'\")\n            return routed\n        else:\n            print(f\"❌ ROUTING DEBUG: Overview routing returned None, falling back to LLM triage\")\n\n        if history:\n            # If there's history, the query is likely a follow-up, so we default to RAG.\n            # A more advanced implementation could use an LLM to see if the new query\n            # changes the topic entirely.\n            print(f\"📜 ROUTING DEBUG: History exists, defaulting to 'rag_query'\")\n            return \"rag_query\"\n\n        print(f\"🤖 ROUTING DEBUG: No history, using LLM fallback triage...\")\n        prompt = f\"\"\"\nYou are a query routing expert. Analyze the user's question and decide which backend should handle it.\n\nChoose **exactly one** category:\n\n1. \"rag_query\" – Questions about the user's uploaded documents or specific document content that should be searched. Examples: \"What is the invoice amount?\", \"Summarize the research paper\", \"What companies are mentioned?\"\n\n2. \"direct_answer\" – General knowledge questions, greetings, or queries unrelated to uploaded documents. Examples: \"Who are the CEOs of Tesla and Amazon?\", \"What is the capital of France?\", \"Hello\", \"Explain quantum physics\"\n\n3. \"graph_query\" – Specific factual relations for knowledge-graph lookup (currently limited use)\n\nIMPORTANT: For general world knowledge about well-known companies, people, or facts NOT related to uploaded documents, choose \"direct_answer\".\n\nUser query: \"{query}\"\n\nRespond with JSON: {{\"category\": \"<your_choice>\"}}\n\"\"\"\n        resp = self.llm_client.generate_completion(\n            model=self.ollama_config[\"generation_model\"], prompt=prompt, format=\"json\"\n        )\n        try:\n            data = json.loads(resp.get(\"response\", \"{}\"))\n            decision = data.get(\"category\", \"rag_query\")\n            print(f\"🤖 ROUTING DEBUG: LLM fallback triage decided: '{decision}'\")\n            return decision\n        except json.JSONDecodeError:\n            print(f\"❌ ROUTING DEBUG: LLM fallback triage JSON parsing failed, defaulting to 'rag_query'\")\n            return \"rag_query\"\n\n    def _run_graph_query(self, query: str, history: list) -> Dict[str, Any]:\n        contextual_query = self._format_query_with_history(query, history)\n        structured_query = self.graph_query_translator.translate(contextual_query)\n        if not structured_query.get(\"start_node\"):\n            return self.retrieval_pipeline.run(contextual_query, window_size_override=0)\n        results = self.graph_retriever.retrieve(structured_query)\n        if not results:\n            return self.retrieval_pipeline.run(contextual_query, window_size_override=0)\n        answer = \", \".join([res['details']['node_id'] for res in results])\n        return {\"answer\": f\"From the knowledge graph: {answer}\", \"source_documents\": results}\n\n    def _get_cache_key(self, query: str, query_type: str) -> str:\n        \"\"\"Generate a cache key for the query\"\"\"\n        # Simple cache key based on query and type\n        return f\"{query_type}:{query.strip().lower()}\"\n    \n    def _cache_result(self, cache_key: str, result: Dict[str, Any], session_id: Optional[str] = None):\n        \"\"\"Cache a result with size limit\"\"\"\n        if len(self._query_cache) >= self._cache_max_size:\n            # Remove oldest entry (simple FIFO eviction)\n            oldest_key = next(iter(self._query_cache))\n            del self._query_cache[oldest_key]\n        \n        self._query_cache[cache_key] = {\n            'result': result,\n            'timestamp': time.time(),\n            'session_id': session_id\n        }\n\n    # ---------------- Public sync API (kept for backwards compatibility) --------------\n    def run(self, query: str, table_name: str = None, session_id: str = None, compose_sub_answers: Optional[bool] = None, query_decompose: Optional[bool] = None, ai_rerank: Optional[bool] = None, context_expand: Optional[bool] = None, verify: Optional[bool] = None, retrieval_k: Optional[int] = None, context_window_size: Optional[int] = None, reranker_top_k: Optional[int] = None, search_type: Optional[str] = None, dense_weight: Optional[float] = None, max_retries: int = 1, event_callback: Optional[callable] = None) -> Dict[str, Any]:\n        \"\"\"Synchronous helper. If *event_callback* is supplied, important\n        milestones will be forwarded to that callable as\n\n            event_callback(phase:str, payload:Any)\n        \"\"\"\n        return asyncio.run(self._run_async(query, table_name, session_id, compose_sub_answers, query_decompose, ai_rerank, context_expand, verify, retrieval_k, context_window_size, reranker_top_k, search_type, dense_weight, max_retries, event_callback))\n\n    # ---------------- Main async implementation --------------------------------------\n    async def _run_async(self, query: str, table_name: str = None, session_id: str = None, compose_sub_answers: Optional[bool] = None, query_decompose: Optional[bool] = None, ai_rerank: Optional[bool] = None, context_expand: Optional[bool] = None, verify: Optional[bool] = None, retrieval_k: Optional[int] = None, context_window_size: Optional[int] = None, reranker_top_k: Optional[int] = None, search_type: Optional[str] = None, dense_weight: Optional[float] = None, max_retries: int = 1, event_callback: Optional[callable] = None) -> Dict[str, Any]:\n        start_time = time.time()\n        \n        # Emit analyze event at the start\n        if event_callback:\n            event_callback(\"analyze\", {\"query\": query})\n        \n        # 🚀 NEW: Get conversation history\n        history = self.chat_histories.get(session_id, []) if session_id else []\n        \n        # 🔄 Refresh overviews for this session if available\n        # if session_id and session_id != getattr(self, \"_current_overview_session\", None):\n        #     candidate_path = os.path.join(\"index_store\", \"overviews\", f\"{session_id}.jsonl\")\n        #     if os.path.exists(candidate_path):\n        #         self._load_overviews(candidate_path)\n        #         self._current_overview_session = session_id\n        #     else:\n        #         # Fall back to global overviews if per-session file not found\n        #         if self._current_overview_session != \"GLOBAL\":\n        #             self._load_overviews(self._global_overview_path)\n        #             self._current_overview_session = \"GLOBAL\"\n        \n        query_type = await self._triage_query_async(query, history)\n        print(f\"🎯 ROUTING DEBUG: Final triage decision: '{query_type}'\")\n        print(f\"Agent Triage Decision: '{query_type}'\")\n        \n        # Create a contextual query that includes history for most operations\n        contextual_query = self._format_query_with_history(query, history)\n        raw_query = query.strip()\n        \n        # --- Apply runtime AI reranker override (must happen before any retrieval calls) ---\n        if ai_rerank is not None:\n            rr_cfg = self.retrieval_pipeline.config.setdefault(\"reranker\", {})\n            rr_cfg[\"enabled\"] = bool(ai_rerank)\n            if ai_rerank:\n                # Ensure the pipeline knows to use the external ColBERT reranker\n                rr_cfg.setdefault(\"type\", \"ai\")\n                rr_cfg.setdefault(\"strategy\", \"rerankers-lib\")\n                rr_cfg.setdefault(\n                    \"model_name\",\n                    # Falls back to ColBERT-small if the caller did not supply one\n                    self.ollama_config.get(\"rerank_model\", \"answerai-colbert-small-v1\"),\n                )\n\n        # --- Apply runtime retrieval configuration overrides ---\n        if retrieval_k is not None:\n            self.retrieval_pipeline.config[\"retrieval_k\"] = retrieval_k\n            print(f\"🔍 Retrieval K set to: {retrieval_k}\")\n            \n        if context_window_size is not None:\n            self.retrieval_pipeline.config[\"context_window_size\"] = context_window_size\n            print(f\"🔍 Context window size set to: {context_window_size}\")\n            \n        if reranker_top_k is not None:\n            rr_cfg = self.retrieval_pipeline.config.setdefault(\"reranker\", {})\n            rr_cfg[\"top_k\"] = reranker_top_k\n            print(f\"🔍 Reranker top K set to: {reranker_top_k}\")\n            \n        if search_type is not None:\n            retrieval_cfg = self.retrieval_pipeline.config.setdefault(\"retrieval\", {})\n            retrieval_cfg[\"search_type\"] = search_type\n            print(f\"🔍 Search type set to: {search_type}\")\n            \n        if dense_weight is not None:\n            dense_cfg = self.retrieval_pipeline.config.setdefault(\"retrieval\", {}).setdefault(\"dense\", {})\n            dense_cfg[\"weight\"] = dense_weight\n            print(f\"🔍 Dense search weight set to: {dense_weight}\")\n\n        query_embedding = None\n        # 🚀 OPTIMIZED: Semantic Cache Check\n        if query_type != \"direct_answer\":\n            text_embedder = self.retrieval_pipeline._get_text_embedder()\n            if text_embedder:\n                # The embedder expects a list, so we wrap the *raw* query only.\n                query_embedding_list = text_embedder.create_embeddings([raw_query])\n                if isinstance(query_embedding_list, np.ndarray):\n                    query_embedding = query_embedding_list[0]\n                else:\n                    # Some embedders return a list – convert if necessary\n                    query_embedding = np.array(query_embedding_list[0])\n\n                cached_result = self._find_in_semantic_cache(query_embedding, session_id)\n\n                if cached_result:\n                    # Update history even on cache hit\n                    if session_id:\n                        history.append({\"query\": query, \"answer\": cached_result.get('answer', 'Cached answer not found.')})\n                        self.chat_histories[session_id] = history\n                    return cached_result\n\n        if query_type == \"direct_answer\":\n            print(f\"✅ ROUTING DEBUG: Executing DIRECT_ANSWER path\")\n            if event_callback:\n                event_callback(\"direct_answer\", {})\n\n            prompt = (\n                \"You are a helpful assistant. Read the conversation history below. \"\n                \"If the answer to the user's latest question is already present in the history, quote it concisely. \"\n                \"Otherwise answer from your general world knowledge. Provide a short, factual reply (1‒2 sentences).\\n\\n\"\n                f\"Conversation + Latest Question:\\n{contextual_query}\\n\\nAssistant:\"\n            )\n\n            async def _run_stream():\n                answer_parts: list[str] = []\n\n                def _blocking_stream():\n                    for tok in self.llm_client.stream_completion(\n                        model=self.ollama_config[\"generation_model\"], prompt=prompt\n                    ):\n                        answer_parts.append(tok)\n                        if event_callback:\n                            event_callback(\"token\", {\"text\": tok})\n\n                # Run the blocking generator in a thread so the event loop stays responsive\n                await asyncio.to_thread(_blocking_stream)\n                return \"\".join(answer_parts)\n\n            final_answer = await _run_stream()\n            result = {\"answer\": final_answer, \"source_documents\": []}\n        \n        elif query_type == \"graph_query\" and hasattr(self, 'graph_retriever'):\n            print(f\"✅ ROUTING DEBUG: Executing GRAPH_QUERY path\")\n            result = self._run_graph_query(query, history)\n\n        # --- RAG Query Processing with Optional Query Decomposition ---\n        else: # Default to rag_query\n            print(f\"✅ ROUTING DEBUG: Executing RAG_QUERY path (query_type='{query_type}')\")\n            query_decomp_config = self.pipeline_configs.get(\"query_decomposition\", {})\n            decomp_enabled = query_decomp_config.get(\"enabled\", False)\n            if query_decompose is not None:\n                decomp_enabled = query_decompose\n\n            if decomp_enabled:\n                print(f\"\\n--- Query Decomposition Enabled ---\")\n                # Use the raw user query (without conversation history) for decomposition to avoid leakage of prior context\n                # Pass the last 5 conversation turns for context resolution within the decomposer\n                recent_history = history[-5:] if history else []\n                sub_queries = self.query_decomposer.decompose(raw_query, recent_history)\n                if event_callback:\n                    event_callback(\"decomposition\", {\"sub_queries\": sub_queries})\n                print(f\"Original query: '{query}' (Contextual: '{contextual_query}')\")\n                print(f\"Decomposed into {len(sub_queries)} sub-queries: {sub_queries}\")\n                \n                # Emit retrieval_started event before any retrievals\n                if event_callback:\n                    event_callback(\"retrieval_started\", {\"count\": len(sub_queries)})\n                \n                # If decomposition produced only a single sub-query, skip the\n                # parallel/composition machinery for efficiency.\n                if len(sub_queries) == 1:\n                    print(\"--- Only one sub-query after decomposition; using direct retrieval path ---\")\n                    result = self.retrieval_pipeline.run(\n                        sub_queries[0],\n                        table_name,\n                        0 if context_expand is False else None,\n                        event_callback=event_callback\n                    )\n                    if event_callback:\n                        event_callback(\"single_query_result\", result)\n                    # Emit retrieval_done and rerank_done for single sub-query\n                    if event_callback:\n                        event_callback(\"retrieval_done\", {\"count\": 1})\n                        event_callback(\"rerank_started\", {\"count\": 1})\n                        event_callback(\"rerank_done\", {\"count\": 1})\n                else:\n                    compose_from_sub_answers = query_decomp_config.get(\"compose_from_sub_answers\", True)\n                    if compose_sub_answers is not None:\n                        compose_from_sub_answers = compose_sub_answers\n\n                    print(f\"\\n--- Processing {len(sub_queries)} sub-queries in parallel ---\")\n                    start_time_inner = time.time()\n\n                    # Shared containers\n                    sub_answers = []  # For two-stage composition\n                    all_source_docs = []  # For single-stage aggregation\n                    citations_seen = set()\n\n                    # Emit rerank_started event before parallel retrievals (since each sub-query will rerank)\n                    if event_callback:\n                        event_callback(\"rerank_started\", {\"count\": len(sub_queries)})\n\n                    # Emit token chunks as soon as we receive them. The UI\n                    # keeps answers separated by `index`, so interleaving is\n                    # harmless and gives continuous feedback.\n\n                    def make_cb(idx: int):\n                        def _cb(ev_type: str, payload):\n                            if event_callback is None:\n                                return\n                            if ev_type == \"token\":\n                                event_callback(\"sub_query_token\", {\"index\": idx, \"text\": payload.get(\"text\", \"\"), \"question\": sub_queries[idx]})\n                            else:\n                                event_callback(ev_type, payload)\n                        return _cb\n\n                    with concurrent.futures.ThreadPoolExecutor(max_workers=min(3, len(sub_queries))) as executor:\n                        future_to_query = {\n                            executor.submit(\n                                self.retrieval_pipeline.run,\n                                sub_query,\n                                table_name,\n                                0 if context_expand is False else None,\n                                make_cb(i),\n                            ): (i, sub_query)\n                            for i, sub_query in enumerate(sub_queries)\n                        }\n\n                        for future in concurrent.futures.as_completed(future_to_query):\n                            i, sub_query = future_to_query[future]\n                            try:\n                                sub_result = future.result()\n                                print(f\"✅ Sub-Query {i+1} completed: '{sub_query}'\")\n\n                                if event_callback:\n                                    event_callback(\"sub_query_result\", {\n                                        \"index\": i,\n                                        \"query\": sub_query,\n                                        \"answer\": sub_result.get(\"answer\", \"\"),\n                                        \"source_documents\": sub_result.get(\"source_documents\", []),\n                                    })\n\n                                if compose_from_sub_answers:\n                                    sub_answers.append({\n                                        \"question\": sub_query,\n                                        \"answer\": sub_result.get(\"answer\", \"\")\n                                    })\n                                    # Keep up to 5 citations per sub-query for traceability\n                                    for doc in sub_result.get(\"source_documents\", [])[:5]:\n                                        if doc['chunk_id'] not in citations_seen:\n                                            all_source_docs.append(doc)\n                                            citations_seen.add(doc['chunk_id'])\n                                else:\n                                    # Aggregate unique docs (single-stage path)\n                                    for doc in sub_result.get('source_documents', []):\n                                        if doc['chunk_id'] not in citations_seen:\n                                            all_source_docs.append(doc)\n                                            citations_seen.add(doc['chunk_id'])\n                            except Exception as e:\n                                print(f\"❌ Sub-Query {i+1} failed: '{sub_query}' - {e}\")\n\n                    parallel_time = time.time() - start_time_inner\n                    print(f\"🚀 Parallel processing completed in {parallel_time:.2f}s\")\n\n                    # Emit retrieval_done and rerank_done after all sub-queries are processed\n                    if event_callback:\n                        event_callback(\"retrieval_done\", {\"count\": len(sub_queries)})\n                        event_callback(\"rerank_done\", {\"count\": len(sub_queries)})\n\n                    if compose_from_sub_answers:\n                        print(\"\\n--- Composing final answer from sub-answers ---\")\n                        compose_prompt = f\"\"\"\nYou are an expert answer composer for a Retrieval-Augmented Generation (RAG) system.\n\nContext:\n• The ORIGINAL QUESTION from the user is shown below.\n• That question was automatically decomposed into simpler SUB-QUESTIONS.\n• Each sub-question has already been answered by an earlier step and the resulting Question→Answer pairs are provided to you in JSON.\n\nYour task:\n1. Read every sub-answer carefully.\n2. Write a single, final answer to the ORIGINAL QUESTION **using only the information contained in the sub-answers**. Do NOT invent facts that are not present.\n3. If the original question includes a comparison (e.g., \"Which, A or B, …\") clearly state the outcome (e.g., \"A > B\"). Quote concrete numbers when available.\n4. If any aspect of the original question cannot be answered with the given sub-answers, explicitly say so (e.g., \"The provided context does not mention …\").\n5. Keep the answer concise (≤ 5 sentences) and use a factual, third-person tone.\n\nInput\n------\nORIGINAL QUESTION:\n\"{contextual_query}\"\n\nSUB-ANSWERS (JSON):\n{json.dumps(sub_answers, indent=2)}\n\n------\nFINAL ANSWER:\n\"\"\"\n                        # --- Stream composition answer token-by-token ---\n                        answer_parts: list[str] = []\n\n                        for tok in self.llm_client.stream_completion(\n                            model=self.ollama_config[\"generation_model\"],\n                            prompt=compose_prompt,\n                        ):\n                            answer_parts.append(tok)\n                            if event_callback:\n                                event_callback(\"token\", {\"text\": tok})\n\n                        final_answer = \"\".join(answer_parts) or \"Unable to generate an answer.\"\n\n                        result = {\n                            \"answer\": final_answer,\n                            \"source_documents\": all_source_docs\n                        }\n                        if event_callback:\n                            event_callback(\"final_answer\", result)\n                    else:\n                        print(f\"\\n--- Aggregated {len(all_source_docs)} unique documents from all sub-queries ---\")\n\n                        if all_source_docs:\n                            aggregated_context = \"\\n\\n\".join([doc['text'] for doc in all_source_docs])\n                            final_answer = self.retrieval_pipeline._synthesize_final_answer(contextual_query, aggregated_context)\n                            result = {\n                                \"answer\": final_answer,\n                                \"source_documents\": all_source_docs\n                            }\n                            if event_callback:\n                                event_callback(\"final_answer\", result)\n                        else:\n                            result = {\n                                \"answer\": \"I could not find relevant information to answer your question.\",\n                                \"source_documents\": []\n                            }\n                            if event_callback:\n                                event_callback(\"final_answer\", result)\n            else:\n                # Standard retrieval (single-query)\n                retrieved_docs = (self.retrieval_pipeline.retriever.retrieve(\n                    text_query=contextual_query,\n                    table_name=table_name or self.retrieval_pipeline.storage_config[\"text_table_name\"],\n                    k=self.retrieval_pipeline.config.get(\"retrieval_k\", 10),\n                ) if hasattr(self.retrieval_pipeline, \"retriever\") and self.retrieval_pipeline.retriever else [])\n\n                print(\"\\n=== DEBUG: Original retrieval order ===\")\n                for i, d in enumerate(retrieved_docs[:10]):\n                    snippet = (d.get('text','') or '')[:200].replace('\\n',' ')\n                    print(f\"Orig[{i}] id={d.get('chunk_id')} dist={d.get('_distance','') or d.get('score','')}  {snippet}\")\n\n                result = self.retrieval_pipeline.run(contextual_query, table_name, 0 if context_expand is False else None, event_callback=event_callback)\n\n                # After run, result['source_documents'] is reranked list\n                reranked_docs = result.get('source_documents', [])\n                print(\"\\n=== DEBUG: Reranked docs order ===\")\n                for i, d in enumerate(reranked_docs[:10]):\n                    snippet = (d.get('text','') or '')[:200].replace('\\n',' ')\n                    print(f\"ReRank[{i}] id={d.get('chunk_id')} score={d.get('rerank_score','')} {snippet}\")\n        \n        # Verification step (simplified for now) - Skip in fast mode\n        verification_enabled = self.pipeline_configs.get(\"verification\", {}).get(\"enabled\", True)\n        if verify is not None:\n            verification_enabled = verify\n            \n        if verification_enabled and result.get(\"source_documents\"):\n            context_str = \"\\n\".join([doc['text'] for doc in result['source_documents']])\n            verification = await self.verifier.verify_async(contextual_query, context_str, result['answer'])\n            \n            score = verification.confidence_score\n\n            # Only include confidence details if we received a non-zero score (0 usually means JSON parse failure)\n            if score > 0:\n                result['answer'] += f\" [Confidence: {score}%]\"\n                # Add warning only when the verifier explicitly reported low confidence / not grounded\n                if (not verification.is_grounded) or score < 50:\n                    result['answer'] += f\" [Warning: Low confidence. Groundedness: {verification.is_grounded}]\"\n            else:\n                # Skip appending any verifier note – 0 likely indicates a parser error\n                print(\"⚠️  Verifier returned 0 confidence – likely JSON parse error; omitting tags.\")\n        else:\n            print(\"🚀 Skipping verification for speed or lack of sources\")\n        \n        # 🚀 NEW: Update history\n        if session_id:\n            history.append({\"query\": query, \"answer\": result['answer']})\n            self.chat_histories[session_id] = history\n            \n        # 🚀 OPTIMIZED: Cache the result for future queries\n        if query_type != \"direct_answer\" and query_embedding is not None:\n            cache_key = raw_query  # Key is for logging/debugging\n            self._query_cache[cache_key] = {\n                \"embedding\": query_embedding,\n                \"result\": result,\n                \"session_id\": session_id,\n            }\n        \n        total_time = time.time() - start_time\n        print(f\"🚀 Total query processing time: {total_time:.2f}s\")\n        \n        return result\n\n    # ------------------------------------------------------------------\n    def _route_via_overviews(self, query: str) -> str | None:\n        \"\"\"Use document overviews and a small model to decide routing.\n        Returns 'rag_query', 'direct_answer', or None if unsure/disabled.\"\"\"\n        if not self.doc_overviews:\n            print(f\"📖 ROUTING DEBUG: No document overviews available, returning None\")\n            return None\n        \n        print(f\"📖 ROUTING DEBUG: Found {len(self.doc_overviews)} document overviews, using LLM routing...\")\n\n        # Keep prompt concise: if more than 40 overviews, take first 40\n        overviews_snip = self.doc_overviews[:40]\n        overviews_block = \"\\n\".join(f\"[{i+1}] {ov}\" for i, ov in enumerate(overviews_snip))\n\n        router_prompt = f\"\"\"Task: Route query to correct system.\n\nDocuments available: Invoices, DeepSeek-V3 research papers\n\nQuery: \"{query}\"\n\nIs this query asking about:\nA) Greetings/social: \"Hi\", \"Hello\", \"Thanks\", \"What's up\", \"How are you\"\nB) General knowledge: \"CEO of Tesla\", \"capital of France\", \"what is 2+2\"  \nC) Document content: invoice amounts, DeepSeek-V3 details, companies mentioned\n\nIf A or B → {{\"category\": \"direct_answer\"}}\nIf C → {{\"category\": \"rag_query\"}}\n\nResponse:\"\"\"\n        \n        resp = self.llm_client.generate_completion(\n            model=self.ollama_config[\"generation_model\"], prompt=router_prompt, format=\"json\"\n        )\n        try:\n            raw_response = resp.get(\"response\", \"{}\")\n            print(f\"📖 ROUTING DEBUG: Overview LLM raw response: '{raw_response[:200]}...'\")\n            data = json.loads(raw_response)\n            decision = data.get(\"category\", \"rag_query\")\n            print(f\"📖 ROUTING DEBUG: Overview routing final decision: '{decision}'\")\n            return decision\n        except json.JSONDecodeError as e:\n            print(f\"❌ ROUTING DEBUG: Overview routing JSON parsing failed: {e}, defaulting to 'rag_query'\")\n            return \"rag_query\"\n"
  },
  {
    "path": "rag_system/agent/verifier.py",
    "content": "import json\nfrom rag_system.utils.ollama_client import OllamaClient\n\nclass VerificationResult:\n    def __init__(self, is_grounded: bool, reasoning: str, verdict: str, confidence_score: int):\n        self.is_grounded = is_grounded\n        self.reasoning = reasoning\n        self.verdict = verdict\n        self.confidence_score = confidence_score\n\nclass Verifier:\n    \"\"\"\n    Verifies if a generated answer is grounded in the provided context using Ollama.\n    \"\"\"\n    def __init__(self, llm_client: OllamaClient, llm_model: str):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n        print(f\"Initialized Verifier with Ollama model '{self.llm_model}'.\")\n\n    # Synchronous verify() method removed – async version is used everywhere.\n\n    # --- Async wrapper ------------------------------------------------\n    async def verify_async(self, query: str, context: str, answer: str) -> VerificationResult:\n        \"\"\"Async variant that calls the Ollama client asynchronously.\"\"\"\n        prompt = f\"\"\"\n        You are an automated fact-checker. Determine whether the ANSWER is fully supported by the CONTEXT and output a single line of JSON.\n\n        # EXAMPLES\n\n        <QUERY>\n        What color is the sky?\n        </QUERY>\n        <CONTEXT>\n        During the day, the sky appears blue due to Rayleigh scattering.\n        </CONTEXT>\n        <ANSWER>\n        The sky is blue during the day.\n        </ANSWER>\n        <OUTPUT>\n        {{\"verdict\": \"SUPPORTED\", \"is_grounded\": true, \"reasoning\": \"The context explicitly supports that the sky is blue during the day.\", \"confidence_score\": 100}}\n        </OUTPUT>\n\n        <QUERY>\n        Where are apples and oranges grown?\n        </QUERY>\n        <CONTEXT>\n        Apples are grown in orchards.\n        </CONTEXT>\n        <ANSWER>\n        Apples are grown in orchards and oranges are grown in groves.\n        </ANSWER>\n        <OUTPUT>\n        {{\"verdict\": \"NOT_SUPPORTED\", \"is_grounded\": false, \"reasoning\": \"The context mentions orchards, but not oranges or groves.\", \"confidence_score\": 80}}\n        </OUTPUT>\n\n        <QUERY>\n        How long is the process?\n        </QUERY>\n        <CONTEXT>\n        The first step takes 3 days. The second step takes 5 days.\n        </CONTEXT>\n        <ANSWER>\n        The process takes 3 days.\n        </ANSWER>\n        <OUTPUT>\n        {{\"verdict\": \"NEEDS_CLARIFICATION\", \"is_grounded\": false, \"reasoning\": \"The answer omits the 5 days required for the second step.\", \"confidence_score\": 70}}\n        </OUTPUT>\n\n        # TASK\n\n        <QUERY>\n        \"{query}\"\n        </QUERY>\n        <CONTEXT>\n        \"\"\"\n        prompt += context[:4000]  # Clamp to avoid huge prompts\n        prompt += \"\"\"\n        </CONTEXT>\n        <ANSWER>\n        \"\"\"\n        prompt += answer\n        prompt += \"\"\"\n        </ANSWER>\n        <OUTPUT>\n        \"\"\"\n        resp = await self.llm_client.generate_completion_async(self.llm_model, prompt, format=\"json\")\n        try:\n            data = json.loads(resp.get(\"response\", \"{}\"))\n            return VerificationResult(\n                is_grounded=data.get(\"is_grounded\", False),\n                reasoning=data.get(\"reasoning\", \"async parse error\"),\n                verdict=data.get(\"verdict\", \"NOT_SUPPORTED\"),\n                confidence_score=data.get('confidence_score', 0)\n            )\n        except (json.JSONDecodeError, AttributeError):\n            return VerificationResult(False, \"Failed async parse\", \"NOT_SUPPORTED\", 0)\n"
  },
  {
    "path": "rag_system/api_server.py",
    "content": "import json\nimport http.server\nimport socketserver\nfrom urllib.parse import urlparse, parse_qs\nimport os\nimport requests\nimport sys\nimport logging\n\n# Add backend directory to path for database imports\nbackend_dir = os.path.join(os.path.dirname(__file__), '..', 'backend')\nif backend_dir not in sys.path:\n    sys.path.append(backend_dir)\n\nfrom backend.database import ChatDatabase, generate_session_title\nfrom rag_system.main import get_agent\nfrom rag_system.factory import get_indexing_pipeline\n\n# Initialize database connection once at module level\n# Use auto-detection for environment-appropriate path\ndb = ChatDatabase()\n\n# Get the desired agent mode from environment variables, defaulting to 'default'\n# This allows us to easily switch between 'default', 'fast', 'react', etc.\nAGENT_MODE = os.getenv(\"RAG_CONFIG_MODE\", \"default\")\nRAG_AGENT = get_agent(AGENT_MODE)\nINDEXING_PIPELINE = get_indexing_pipeline(AGENT_MODE)\n\n# --- Global Singleton for the RAG Agent ---\n# The agent is initialized once when the server starts.\n# This avoids reloading all the models on every request.\nprint(\"🧠 Initializing RAG Agent with MAXIMUM ACCURACY... (This may take a moment)\")\nif RAG_AGENT is None:\n    print(\"❌ Critical error: RAG Agent could not be initialized. Exiting.\")\n    exit(1)\nprint(\"✅ RAG Agent initialized successfully with MAXIMUM ACCURACY.\")\n# ---\n\n# Add helper near top after db & agent init\n# -------------- Helper ----------------\n\ndef _apply_index_embedding_model(idx_ids):\n    \"\"\"Ensure retrieval pipeline uses the embedding model stored with the first index.\"\"\"\n    debug_info = f\"🔧 _apply_index_embedding_model called with idx_ids: {idx_ids}\\n\"\n    \n    if not idx_ids:\n        debug_info += \"⚠️ No index IDs provided\\n\"\n        with open(\"logs/embedding_debug.log\", \"a\") as f:\n            f.write(debug_info)\n        return\n    try:\n        idx = db.get_index(idx_ids[0])\n        debug_info += f\"🔧 Retrieved index: {idx.get('id')} with metadata: {idx.get('metadata', {})}\\n\"\n        model = (idx.get(\"metadata\") or {}).get(\"embedding_model\")\n        debug_info += f\"🔧 Embedding model from metadata: {model}\\n\"\n        if model:\n            rp = RAG_AGENT.retrieval_pipeline\n            current_model = rp.config.get(\"embedding_model_name\")\n            debug_info += f\"🔧 Current embedding model: {current_model}\\n\"\n            rp.update_embedding_model(model)\n            debug_info += f\"🔧 Updated embedding model to: {model}\\n\"\n        else:\n            debug_info += \"⚠️ No embedding model found in metadata\\n\"\n    except Exception as e:\n        debug_info += f\"⚠️ Could not apply index embedding model: {e}\\n\"\n    \n    # Write debug info to file\n    with open(\"logs/embedding_debug.log\", \"a\") as f:\n        f.write(debug_info)\n\ndef _get_table_name_for_session(session_id):\n    \"\"\"Get the correct vector table name for a session by looking up its linked indexes.\"\"\"\n    logger = logging.getLogger(__name__)\n    \n    if not session_id:\n        logger.info(\"❌ No session_id provided\")\n        return None\n    \n    try:\n        # Get indexes linked to this session\n        idx_ids = db.get_indexes_for_session(session_id)\n        logger.info(f\"🔍 Session {session_id[:8]}... has {len(idx_ids)} indexes: {idx_ids}\")\n        \n        if not idx_ids:\n            logger.warning(f\"⚠️ No indexes found for session {session_id}\")\n            # Use the default table name from config instead of session-specific name\n            from rag_system.main import PIPELINE_CONFIGS\n            default_table = PIPELINE_CONFIGS[\"default\"][\"storage\"][\"text_table_name\"]\n            logger.info(f\"📊 Using default table '{default_table}' for session {session_id[:8]}...\")\n            return default_table\n        \n        # Use the first index's vector table name\n        idx = db.get_index(idx_ids[0])\n        if idx and idx.get('vector_table_name'):\n            table_name = idx['vector_table_name']\n            logger.info(f\"📊 Using table '{table_name}' for session {session_id[:8]}...\")\n            print(f\"📊 RAG API: Using table '{table_name}' for session {session_id[:8]}...\")\n            return table_name\n        else:\n            logger.warning(f\"⚠️ Index found but no vector table name for session {session_id}\")\n            # Use the default table name from config instead of session-specific name\n            from rag_system.main import PIPELINE_CONFIGS\n            default_table = PIPELINE_CONFIGS[\"default\"][\"storage\"][\"text_table_name\"]\n            logger.info(f\"📊 Using default table '{default_table}' for session {session_id[:8]}...\")\n            return default_table\n            \n    except Exception as e:\n        logger.error(f\"❌ Error getting table name for session {session_id}: {e}\")\n        # Use the default table name from config instead of session-specific name\n        from rag_system.main import PIPELINE_CONFIGS\n        default_table = PIPELINE_CONFIGS[\"default\"][\"storage\"][\"text_table_name\"]\n        logger.info(f\"📊 Using default table '{default_table}' for session {session_id[:8]}...\")\n        return default_table\n\nclass AdvancedRagApiHandler(http.server.BaseHTTPRequestHandler):\n    def do_OPTIONS(self):\n        \"\"\"Handle CORS preflight requests for frontend integration.\"\"\"\n        self.send_response(200)\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.send_header('Access-Control-Allow-Methods', 'POST, OPTIONS')\n        self.send_header('Access-Control-Allow-Headers', 'Content-Type')\n        self.end_headers()\n\n    def do_POST(self):\n        \"\"\"Handle POST requests for chat and indexing.\"\"\"\n        parsed_path = urlparse(self.path)\n\n        if parsed_path.path == '/chat':\n            self.handle_chat()\n        elif parsed_path.path == '/chat/stream':\n            self.handle_chat_stream()\n        elif parsed_path.path == '/index':\n            self.handle_index()\n        else:\n            self.send_json_response({\"error\": \"Not Found\"}, status_code=404)\n\n    def do_GET(self):\n        parsed_path = urlparse(self.path)\n\n        if parsed_path.path == '/models':\n            self.handle_models()\n        else:\n            self.send_json_response({\"error\": \"Not Found\"}, status_code=404)\n\n    def handle_chat(self):\n        \"\"\"Handles a chat query by calling the agentic RAG pipeline.\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            query = data.get('query')\n            session_id = data.get('session_id')\n            compose_flag = data.get('compose_sub_answers')\n            decomp_flag = data.get('query_decompose')\n            ai_rerank_flag = data.get('ai_rerank')\n            ctx_expand_flag = data.get('context_expand')\n            verify_flag = data.get('verify')\n            \n            # ✨ NEW RETRIEVAL PARAMETERS\n            retrieval_k = data.get('retrieval_k', 20)\n            context_window_size = data.get('context_window_size', 1)\n            reranker_top_k = data.get('reranker_top_k', 10)\n            search_type = data.get('search_type', 'hybrid')\n            dense_weight = data.get('dense_weight', 0.7)\n            \n            # 🚩 NEW: Force RAG override from frontend\n            force_rag = bool(data.get('force_rag', False))\n            \n            # 🌿 Provence sentence pruning\n            provence_prune = data.get('provence_prune')\n            provence_threshold = data.get('provence_threshold')\n            \n            # User-selected generation model\n            requested_model = data.get('model')\n            if isinstance(requested_model,str) and requested_model:\n                RAG_AGENT.ollama_config['generation_model']=requested_model\n            \n            if not query:\n                self.send_json_response({\"error\": \"Query is required\"}, status_code=400)\n                return\n\n            # 🔄 UPDATE SESSION TITLE: If this is the first message in the session, update the title\n            if session_id:\n                try:\n                    # Check if this is the first message by calling the backend server\n                    backend_url = f\"http://localhost:8000/sessions/{session_id}\"\n                    session_resp = requests.get(backend_url)\n                    if session_resp.status_code == 200:\n                        session_data = session_resp.json()\n                        session = session_data.get('session', {})\n                        # If message_count is 0, this is the first message\n                        if session.get('message_count', 0) == 0:\n                            # Generate a title from the first message\n                            title = generate_session_title(query)\n                            # Update the session title via backend API\n                            # We'll need to add this endpoint to the backend, for now let's make a direct database call\n                            # This is a temporary solution until we add a proper API endpoint\n                            db.update_session_title(session_id, title)\n                            print(f\"📝 Updated session title to: {title}\")\n                            \n                            # 💾 STORE USER MESSAGE: Add the user message to the database\n                            user_message_id = db.add_message(session_id, query, \"user\")\n                            print(f\"💾 Stored user message: {user_message_id}\")\n                        else:\n                            # Not the first message, but still store the user message\n                            user_message_id = db.add_message(session_id, query, \"user\")\n                            print(f\"💾 Stored user message: {user_message_id}\")\n                except Exception as e:\n                    print(f\"⚠️ Failed to update session title or store user message: {e}\")\n                    # Continue with the request even if title update fails\n\n            # Allow explicit table_name override\n            table_name = data.get('table_name')\n            if not table_name and session_id:\n                table_name = _get_table_name_for_session(session_id)\n\n            # Decide execution path\n            print(f\"🔧 Force RAG flag: {force_rag}\")\n            if force_rag:\n                # --- Apply runtime overrides manually because we skip Agent.run()\n                rp_cfg = RAG_AGENT.retrieval_pipeline.config\n                if retrieval_k is not None:\n                    rp_cfg[\"retrieval_k\"] = retrieval_k\n                if reranker_top_k is not None:\n                    rp_cfg.setdefault(\"reranker\", {})[\"top_k\"] = reranker_top_k\n                if search_type is not None:\n                    rp_cfg.setdefault(\"retrieval\", {})[\"search_type\"] = search_type\n                if dense_weight is not None:\n                    rp_cfg.setdefault(\"retrieval\", {}).setdefault(\"dense\", {})[\"weight\"] = dense_weight\n\n                # Provence overrides\n                if provence_prune is not None:\n                    rp_cfg.setdefault(\"provence\", {})[\"enabled\"] = bool(provence_prune)\n                if provence_threshold is not None:\n                    rp_cfg.setdefault(\"provence\", {})[\"threshold\"] = float(provence_threshold)\n\n                # 🔄 Apply embedding model for this session (same as in agent path)\n                if session_id:\n                    idx_ids = db.get_indexes_for_session(session_id)\n                    _apply_index_embedding_model(idx_ids)\n\n                # Directly invoke retrieval pipeline to bypass triage\n                result = RAG_AGENT.retrieval_pipeline.run(\n                    query,\n                    table_name=table_name,\n                    window_size_override=context_window_size,\n                )\n            else:\n                # Use full agent with smart routing\n                # Apply Provence overrides even in agent path\n                rp_cfg = RAG_AGENT.retrieval_pipeline.config\n                if provence_prune is not None:\n                    rp_cfg.setdefault(\"provence\", {})[\"enabled\"] = bool(provence_prune)\n                if provence_threshold is not None:\n                    rp_cfg.setdefault(\"provence\", {})[\"threshold\"] = float(provence_threshold)\n\n                # 🔄 Refresh document overviews for this session\n                if session_id:\n                    idx_ids = db.get_indexes_for_session(session_id)\n                    _apply_index_embedding_model(idx_ids)\n                    RAG_AGENT.load_overviews_for_indexes(idx_ids)\n\n                # 🔧 Set index-specific overview path\n                if session_id:\n                    rp_cfg[\"overview_path\"] = f\"index_store/overviews/{session_id}.jsonl\"\n\n                # 🔧 Configure late chunking\n                rp_cfg.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n\n                result = RAG_AGENT.run(\n                    query,\n                    table_name=table_name,\n                    session_id=session_id,\n                    compose_sub_answers=compose_flag,\n                    query_decompose=decomp_flag,\n                    ai_rerank=ai_rerank_flag,\n                    context_expand=ctx_expand_flag,\n                    verify=verify_flag,\n                    retrieval_k=retrieval_k,\n                    context_window_size=context_window_size,\n                    reranker_top_k=reranker_top_k,\n                    search_type=search_type,\n                    dense_weight=dense_weight,\n                )\n            \n            # The result is a dict, so we need to dump it to a JSON string\n            self.send_json_response(result)\n            \n            # 💾 STORE AI RESPONSE: Add the AI response to the database\n            if session_id and result and result.get(\"answer\"):\n                try:\n                    ai_message_id = db.add_message(session_id, result[\"answer\"], \"assistant\")\n                    print(f\"💾 Stored AI response: {ai_message_id}\")\n                except Exception as e:\n                    print(f\"⚠️ Failed to store AI response: {e}\")\n                    # Continue even if storage fails\n\n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Server error: {str(e)}\"}, status_code=500)\n\n    def handle_chat_stream(self):\n        \"\"\"Stream internal phases and final answer using SSE (text/event-stream).\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n\n            query = data.get('query')\n            session_id = data.get('session_id')\n            compose_flag = data.get('compose_sub_answers')\n            decomp_flag = data.get('query_decompose')\n            ai_rerank_flag = data.get('ai_rerank')\n            ctx_expand_flag = data.get('context_expand')\n            verify_flag = data.get('verify')\n            \n            # ✨ NEW RETRIEVAL PARAMETERS\n            retrieval_k = data.get('retrieval_k', 20)\n            context_window_size = data.get('context_window_size', 1)\n            reranker_top_k = data.get('reranker_top_k', 10)\n            search_type = data.get('search_type', 'hybrid')\n            dense_weight = data.get('dense_weight', 0.7)\n\n            # 🚩 NEW: Force RAG override from frontend\n            force_rag = bool(data.get('force_rag', False))\n\n            # 🌿 Provence sentence pruning\n            provence_prune = data.get('provence_prune')\n            provence_threshold = data.get('provence_threshold')\n\n            # User-selected generation model\n            requested_model = data.get('model')\n            if isinstance(requested_model,str) and requested_model:\n                RAG_AGENT.ollama_config['generation_model']=requested_model\n\n            if not query:\n                self.send_json_response({\"error\": \"Query is required\"}, status_code=400)\n                return\n\n            # 🔄 UPDATE SESSION TITLE: If this is the first message in the session, update the title\n            if session_id:\n                try:\n                    # Check if this is the first message by calling the backend server\n                    backend_url = f\"http://localhost:8000/sessions/{session_id}\"\n                    session_resp = requests.get(backend_url)\n                    if session_resp.status_code == 200:\n                        session_data = session_resp.json()\n                        session = session_data.get('session', {})\n                        # If message_count is 0, this is the first message\n                        if session.get('message_count', 0) == 0:\n                            # Generate a title from the first message\n                            title = generate_session_title(query)\n                            # Update the session title via backend API\n                            # We'll need to add this endpoint to the backend, for now let's make a direct database call\n                            # This is a temporary solution until we add a proper API endpoint\n                            db.update_session_title(session_id, title)\n                            print(f\"📝 Updated session title to: {title}\")\n                            \n                            # 💾 STORE USER MESSAGE: Add the user message to the database\n                            user_message_id = db.add_message(session_id, query, \"user\")\n                            print(f\"💾 Stored user message: {user_message_id}\")\n                        else:\n                            # Not the first message, but still store the user message\n                            user_message_id = db.add_message(session_id, query, \"user\")\n                            print(f\"💾 Stored user message: {user_message_id}\")\n                except Exception as e:\n                    print(f\"⚠️ Failed to update session title or store user message: {e}\")\n                    # Continue with the request even if title update fails\n\n            # Allow explicit table_name override\n            table_name = data.get('table_name')\n            if not table_name and session_id:\n                table_name = _get_table_name_for_session(session_id)\n\n            # Prepare response headers for SSE\n            self.send_response(200)\n            self.send_header('Content-Type', 'text/event-stream')\n            self.send_header('Cache-Control', 'no-cache')\n            # Keep connection alive for SSE; no manual chunked encoding (Python http.server\n            # does not add chunk sizes automatically, so declaring it breaks clients).\n            self.send_header('Connection', 'keep-alive')\n            self.send_header('Access-Control-Allow-Origin', '*')\n            self.end_headers()\n\n            def emit(event_type: str, payload):\n                \"\"\"Send a single SSE event.\"\"\"\n                try:\n                    data_str = json.dumps({\"type\": event_type, \"data\": payload})\n                    self.wfile.write(f\"data: {data_str}\\n\\n\".encode('utf-8'))\n                    self.wfile.flush()\n                except BrokenPipeError:\n                    # Client disconnected\n                    raise\n\n            # Run the agent synchronously, emitting checkpoints\n            try:\n                if force_rag:\n                    # Apply overrides same as above since we bypass Agent.run\n                    rp_cfg = RAG_AGENT.retrieval_pipeline.config\n                    if retrieval_k is not None:\n                        rp_cfg[\"retrieval_k\"] = retrieval_k\n                    if reranker_top_k is not None:\n                        rp_cfg.setdefault(\"reranker\", {})[\"top_k\"] = reranker_top_k\n                    if search_type is not None:\n                        rp_cfg.setdefault(\"retrieval\", {})[\"search_type\"] = search_type\n                    if dense_weight is not None:\n                        rp_cfg.setdefault(\"retrieval\", {}).setdefault(\"dense\", {})[\"weight\"] = dense_weight\n\n                    # Provence overrides\n                    if provence_prune is not None:\n                        rp_cfg.setdefault(\"provence\", {})[\"enabled\"] = bool(provence_prune)\n                    if provence_threshold is not None:\n                        rp_cfg.setdefault(\"provence\", {})[\"threshold\"] = float(provence_threshold)\n\n                    # 🔄 Apply embedding model for this session (same as in agent path)\n                    if session_id:\n                        idx_ids = db.get_indexes_for_session(session_id)\n                        _apply_index_embedding_model(idx_ids)\n\n                    # 🔧 Set index-specific overview path so each index writes separate file\n                    if session_id:\n                        rp_cfg[\"overview_path\"] = f\"index_store/overviews/{session_id}.jsonl\"\n\n                    # 🔧 Configure late chunking\n                    rp_cfg.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n\n                    # Straight retrieval pipeline with streaming events\n                    final_result = RAG_AGENT.retrieval_pipeline.run(\n                        query,\n                        table_name=table_name,\n                        window_size_override=context_window_size,\n                        event_callback=emit,\n                    )\n                else:\n                    # Provence overrides\n                    rp_cfg = RAG_AGENT.retrieval_pipeline.config\n                    if provence_prune is not None:\n                        rp_cfg.setdefault(\"provence\", {})[\"enabled\"] = bool(provence_prune)\n                    if provence_threshold is not None:\n                        rp_cfg.setdefault(\"provence\", {})[\"threshold\"] = float(provence_threshold)\n\n                    # 🔄 Refresh overviews for this session\n                    if session_id:\n                        idx_ids = db.get_indexes_for_session(session_id)\n                        _apply_index_embedding_model(idx_ids)\n                        RAG_AGENT.load_overviews_for_indexes(idx_ids)\n\n                    # 🔧 Set index-specific overview path\n                    if session_id:\n                        rp_cfg[\"overview_path\"] = f\"index_store/overviews/{session_id}.jsonl\"\n\n                    # 🔧 Configure late chunking\n                    rp_cfg.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n\n                    final_result = RAG_AGENT.run(\n                        query,\n                        table_name=table_name,\n                        session_id=session_id,\n                        compose_sub_answers=compose_flag,\n                        query_decompose=decomp_flag,\n                        ai_rerank=ai_rerank_flag,\n                        context_expand=ctx_expand_flag,\n                        verify=verify_flag,\n                        # ✨ NEW RETRIEVAL PARAMETERS\n                        retrieval_k=retrieval_k,\n                        context_window_size=context_window_size,\n                        reranker_top_k=reranker_top_k,\n                        search_type=search_type,\n                        dense_weight=dense_weight,\n                        event_callback=emit,\n                    )\n\n                # Ensure the final answer is sent (in case callback missed it)\n                emit(\"complete\", final_result)\n                \n                # 💾 STORE AI RESPONSE: Add the AI response to the database\n                if session_id and final_result and final_result.get(\"answer\"):\n                    try:\n                        ai_message_id = db.add_message(session_id, final_result[\"answer\"], \"assistant\")\n                        print(f\"💾 Stored AI response: {ai_message_id}\")\n                    except Exception as e:\n                        print(f\"⚠️ Failed to store AI response: {e}\")\n                        # Continue even if storage fails\n            except BrokenPipeError:\n                print(\"🔌 Client disconnected from SSE stream.\")\n            except Exception as e:\n                # Send error event then close\n                error_payload = {\"error\": str(e)}\n                try:\n                    emit(\"error\", error_payload)\n                finally:\n                    print(f\"❌ Stream error: {e}\")\n\n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Server error: {str(e)}\"}, status_code=500)\n\n    def handle_index(self):\n        \"\"\"Triggers the document indexing pipeline for specific files.\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            file_paths = data.get('file_paths')\n            session_id = data.get('session_id')\n            compose_flag = data.get('compose_sub_answers')\n            decomp_flag = data.get('query_decompose')\n            ai_rerank_flag = data.get('ai_rerank')\n            ctx_expand_flag = data.get('context_expand')\n            enable_latechunk = bool(data.get(\"enable_latechunk\", False))\n            enable_docling_chunk = bool(data.get(\"enable_docling_chunk\", False))\n            \n            # 🆕 NEW CONFIGURATION OPTIONS:\n            chunk_size = int(data.get(\"chunk_size\", 512))\n            chunk_overlap = int(data.get(\"chunk_overlap\", 64))\n            retrieval_mode = data.get(\"retrieval_mode\", \"hybrid\")\n            window_size = int(data.get(\"window_size\", 2))\n            enable_enrich = bool(data.get(\"enable_enrich\", True))\n            embedding_model = data.get('embeddingModel')\n            enrich_model = data.get('enrichModel')\n            overview_model = data.get('overviewModel') or data.get('overview_model_name')\n            batch_size_embed = int(data.get(\"batch_size_embed\", 50))\n            batch_size_enrich = int(data.get(\"batch_size_enrich\", 25))\n            \n            if not file_paths or not isinstance(file_paths, list):\n                self.send_json_response({\n                    \"error\": \"A 'file_paths' list is required.\"\n                }, status_code=400)\n                return\n\n            # Allow explicit table_name override\n            table_name = data.get('table_name')\n            if not table_name and session_id:\n                table_name = _get_table_name_for_session(session_id)\n\n            # The INDEXING_PIPELINE is already initialized. We just need to use it.\n            # If a session-specific table is needed, we can override the config for this run.\n            if table_name:\n                import copy\n                config_override = copy.deepcopy(INDEXING_PIPELINE.config)\n                config_override[\"storage\"][\"text_table_name\"] = table_name\n                config_override.setdefault(\"retrievers\", {}).setdefault(\"dense\", {})[\"lancedb_table_name\"] = table_name\n                \n                # 🔧 Configure late chunking\n                if enable_latechunk:\n                    config_override[\"retrievers\"].setdefault(\"latechunk\", {})[\"enabled\"] = True\n                else:\n                    # ensure disabled if not requested\n                    config_override[\"retrievers\"].setdefault(\"latechunk\", {})[\"enabled\"] = False\n                \n                # 🔧 Configure docling chunking\n                if enable_docling_chunk:\n                    config_override[\"chunker_mode\"] = \"docling\"\n                \n                # 🔧 Configure contextual enrichment (THIS WAS MISSING!)\n                config_override.setdefault(\"contextual_enricher\", {})\n                config_override[\"contextual_enricher\"][\"enabled\"] = enable_enrich\n                config_override[\"contextual_enricher\"][\"window_size\"] = window_size\n                \n                # 🔧 Configure indexing batch sizes\n                config_override.setdefault(\"indexing\", {})\n                config_override[\"indexing\"][\"embedding_batch_size\"] = batch_size_embed\n                config_override[\"indexing\"][\"enrichment_batch_size\"] = batch_size_enrich\n                \n                # 🔧 Configure chunking parameters\n                config_override.setdefault(\"chunking\", {})\n                config_override[\"chunking\"][\"chunk_size\"] = chunk_size\n                config_override[\"chunking\"][\"chunk_overlap\"] = chunk_overlap\n                \n                # 🔧 Configure embedding model if specified\n                if embedding_model:\n                    config_override[\"embedding_model_name\"] = embedding_model\n                \n                # 🔧 Configure enrichment model if specified\n                if enrich_model:\n                    config_override[\"enrich_model\"] = enrich_model\n                \n                # 🔧 Overview model (can differ from enrichment)\n                if overview_model:\n                    config_override[\"overview_model_name\"] = overview_model\n                \n                print(f\"🔧 INDEXING CONFIG: Contextual Enrichment: {enable_enrich}, Window Size: {window_size}\")\n                print(f\"🔧 CHUNKING CONFIG: Size: {chunk_size}, Overlap: {chunk_overlap}\")\n                print(f\"🔧 MODEL CONFIG: Embedding: {embedding_model or 'default'}, Enrichment: {enrich_model or 'default'}\")\n                \n                # 🔧 Set index-specific overview path so each index writes separate file\n                if session_id:\n                    config_override[\"overview_path\"] = f\"index_store/overviews/{session_id}.jsonl\"\n\n                # 🔧 Configure late chunking\n                config_override.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n\n                # Create a temporary pipeline instance with the overridden config\n                temp_pipeline = INDEXING_PIPELINE.__class__(\n                    config_override, \n                    INDEXING_PIPELINE.llm_client, \n                    INDEXING_PIPELINE.ollama_config\n                )\n                temp_pipeline.run(file_paths)\n            else:\n                # Use the default pipeline with overrides\n                import copy\n                config_override = copy.deepcopy(INDEXING_PIPELINE.config)\n                \n                # 🔧 Configure late chunking\n                if enable_latechunk:\n                    config_override.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n                \n                # 🔧 Configure docling chunking\n                if enable_docling_chunk:\n                    config_override[\"chunker_mode\"] = \"docling\"\n                \n                # 🔧 Configure contextual enrichment (THIS WAS MISSING!)\n                config_override.setdefault(\"contextual_enricher\", {})\n                config_override[\"contextual_enricher\"][\"enabled\"] = enable_enrich\n                config_override[\"contextual_enricher\"][\"window_size\"] = window_size\n                \n                # 🔧 Configure indexing batch sizes\n                config_override.setdefault(\"indexing\", {})\n                config_override[\"indexing\"][\"embedding_batch_size\"] = batch_size_embed\n                config_override[\"indexing\"][\"enrichment_batch_size\"] = batch_size_enrich\n                \n                # 🔧 Configure chunking parameters\n                config_override.setdefault(\"chunking\", {})\n                config_override[\"chunking\"][\"chunk_size\"] = chunk_size\n                config_override[\"chunking\"][\"chunk_overlap\"] = chunk_overlap\n                \n                # 🔧 Configure embedding model if specified\n                if embedding_model:\n                    config_override[\"embedding_model_name\"] = embedding_model\n                \n                # 🔧 Configure enrichment model if specified\n                if enrich_model:\n                    config_override[\"enrich_model\"] = enrich_model\n                \n                # 🔧 Overview model (can differ from enrichment)\n                if overview_model:\n                    config_override[\"overview_model_name\"] = overview_model\n                \n                print(f\"🔧 INDEXING CONFIG: Contextual Enrichment: {enable_enrich}, Window Size: {window_size}\")\n                print(f\"🔧 CHUNKING CONFIG: Size: {chunk_size}, Overlap: {chunk_overlap}\")\n                print(f\"🔧 MODEL CONFIG: Embedding: {embedding_model or 'default'}, Enrichment: {enrich_model or 'default'}\")\n                \n                # 🔧 Set index-specific overview path so each index writes separate file\n                if session_id:\n                    config_override[\"overview_path\"] = f\"index_store/overviews/{session_id}.jsonl\"\n\n                # 🔧 Configure late chunking\n                config_override.setdefault(\"retrievers\", {}).setdefault(\"latechunk\", {})[\"enabled\"] = True\n\n                # Create temporary pipeline with overridden config\n                temp_pipeline = INDEXING_PIPELINE.__class__(\n                    config_override, \n                    INDEXING_PIPELINE.llm_client, \n                    INDEXING_PIPELINE.ollama_config\n                )\n                temp_pipeline.run(file_paths)\n\n            self.send_json_response({\n                \"message\": f\"Indexing process for {len(file_paths)} file(s) completed successfully.\",\n                \"table_name\": table_name or \"default_text_table\",\n                \"latechunk\": enable_latechunk,\n                \"docling_chunk\": enable_docling_chunk,\n                \"indexing_config\": {\n                    \"chunk_size\": chunk_size,\n                    \"chunk_overlap\": chunk_overlap,\n                    \"retrieval_mode\": retrieval_mode,\n                    \"window_size\": window_size,\n                    \"enable_enrich\": enable_enrich,\n                    \"embedding_model\": embedding_model,\n                    \"enrich_model\": enrich_model,\n                    \"batch_size_embed\": batch_size_embed,\n                    \"batch_size_enrich\": batch_size_enrich\n                }\n            })\n\n            if embedding_model:\n                try:\n                    db.update_index_metadata(session_id, {\"embedding_model\": embedding_model})\n                except Exception as e:\n                    print(f\"⚠️ Could not update embedding_model metadata: {e}\")\n\n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Failed to start indexing: {str(e)}\"}, status_code=500)\n\n    def handle_models(self):\n        \"\"\"Return a list of locally installed Ollama models and supported HuggingFace models, grouped by capability.\"\"\"\n        try:\n            generation_models = []\n            embedding_models = []\n            \n            # Get Ollama models if available\n            try:\n                resp = requests.get(f\"{RAG_AGENT.ollama_config['host']}/api/tags\", timeout=5)\n                resp.raise_for_status()\n                data = resp.json()\n\n                all_ollama_models = [m.get('name') for m in data.get('models', [])]\n\n                # Very naive classification\n                ollama_embedding_models = [m for m in all_ollama_models if any(k in m for k in ['embed','bge','embedding','text'])]\n                ollama_generation_models = [m for m in all_ollama_models if m not in ollama_embedding_models]\n                \n                generation_models.extend(ollama_generation_models)\n                embedding_models.extend(ollama_embedding_models)\n            except Exception as e:\n                print(f\"⚠️ Could not get Ollama models: {e}\")\n            \n            # Add supported HuggingFace embedding models\n            huggingface_embedding_models = [\n                \"Qwen/Qwen3-Embedding-0.6B\",\n                \"Qwen/Qwen3-Embedding-4B\", \n                \"Qwen/Qwen3-Embedding-8B\"\n            ]\n            embedding_models.extend(huggingface_embedding_models)\n            \n            # Sort models for consistent ordering\n            generation_models.sort()\n            embedding_models.sort()\n\n            self.send_json_response({\n                \"generation_models\": generation_models,\n                \"embedding_models\": embedding_models\n            })\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Could not list models: {e}\"}, status_code=500)\n\n    def send_json_response(self, data, status_code=200):\n        \"\"\"Utility to send a JSON response with CORS headers.\"\"\"\n        self.send_response(status_code)\n        self.send_header('Content-Type', 'application/json')\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.end_headers()\n        response = json.dumps(data, indent=2)\n        self.wfile.write(response.encode('utf-8'))\n\ndef start_server(port=8001):\n    \"\"\"Starts the API server.\"\"\"\n    # Use a reusable TCP server to avoid \"address in use\" errors on restart\n    class ReusableTCPServer(socketserver.TCPServer):\n        allow_reuse_address = True\n\n    with ReusableTCPServer((\"\", port), AdvancedRagApiHandler) as httpd:\n        print(f\"🚀 Starting Advanced RAG API server on port {port}\")\n        print(f\"💬 Chat endpoint: http://localhost:{port}/chat\")\n        print(f\"✨ Indexing endpoint: http://localhost:{port}/index\")\n        httpd.serve_forever()\n\nif __name__ == \"__main__\":\n    # To run this server: python -m rag_system.api_server\n    start_server() "
  },
  {
    "path": "rag_system/api_server_with_progress.py",
    "content": "import json\nimport threading\nimport time\nfrom typing import Dict, List, Any\nimport logging\nfrom urllib.parse import urlparse, parse_qs\nimport http.server\nimport socketserver\n\n# Import the core logic and batch processing utilities\nfrom rag_system.main import get_agent\nfrom rag_system.utils.batch_processor import ProgressTracker, timer\n\n# Set up logging\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n# Global progress tracking storage\nACTIVE_PROGRESS_SESSIONS: Dict[str, Dict[str, Any]] = {}\n\n# --- Global Singleton for the RAG Agent ---\nprint(\"🧠 Initializing RAG Agent... (This may take a moment)\")\nRAG_AGENT = get_agent()\nif RAG_AGENT is None:\n    print(\"❌ Critical error: RAG Agent could not be initialized. Exiting.\")\n    exit(1)\nprint(\"✅ RAG Agent initialized successfully.\")\n\nclass ServerSentEventsHandler:\n    \"\"\"Handler for Server-Sent Events (SSE) for real-time progress updates\"\"\"\n    \n    active_connections: Dict[str, Any] = {}\n    \n    @classmethod\n    def add_connection(cls, session_id: str, response_handler):\n        \"\"\"Add a new SSE connection\"\"\"\n        cls.active_connections[session_id] = response_handler\n        logger.info(f\"SSE connection added for session: {session_id}\")\n    \n    @classmethod\n    def remove_connection(cls, session_id: str):\n        \"\"\"Remove an SSE connection\"\"\"\n        if session_id in cls.active_connections:\n            del cls.active_connections[session_id]\n            logger.info(f\"SSE connection removed for session: {session_id}\")\n    \n    @classmethod\n    def send_event(cls, session_id: str, event_type: str, data: Dict[str, Any]):\n        \"\"\"Send an SSE event to a specific session\"\"\"\n        if session_id not in cls.active_connections:\n            return\n        \n        try:\n            handler = cls.active_connections[session_id]\n            event_data = json.dumps(data)\n            message = f\"event: {event_type}\\ndata: {event_data}\\n\\n\"\n            handler.wfile.write(message.encode('utf-8'))\n            handler.wfile.flush()\n        except Exception as e:\n            logger.error(f\"Failed to send SSE event: {e}\")\n            cls.remove_connection(session_id)\n\nclass RealtimeProgressTracker(ProgressTracker):\n    \"\"\"Enhanced ProgressTracker that sends updates via Server-Sent Events\"\"\"\n    \n    def __init__(self, total_items: int, operation_name: str, session_id: str):\n        super().__init__(total_items, operation_name)\n        self.session_id = session_id\n        self.last_update = 0\n        self.update_interval = 1  # Update every 1 second\n        \n        # Initialize session progress\n        ACTIVE_PROGRESS_SESSIONS[session_id] = {\n            \"operation_name\": operation_name,\n            \"total_items\": total_items,\n            \"processed_items\": 0,\n            \"errors_encountered\": 0,\n            \"start_time\": self.start_time,\n            \"status\": \"running\",\n            \"current_step\": \"\",\n            \"eta_seconds\": 0,\n            \"throughput\": 0,\n            \"progress_percentage\": 0\n        }\n        \n        # Send initial progress update\n        self._send_progress_update()\n    \n    def update(self, items_processed: int, errors: int = 0, current_step: str = \"\"):\n        \"\"\"Update progress and send notification\"\"\"\n        super().update(items_processed, errors)\n        \n        # Update session data\n        session_data = ACTIVE_PROGRESS_SESSIONS.get(self.session_id)\n        if session_data:\n            session_data.update({\n                \"processed_items\": self.processed_items,\n                \"errors_encountered\": self.errors_encountered,\n                \"current_step\": current_step,\n                \"progress_percentage\": (self.processed_items / self.total_items) * 100,\n            })\n            \n            # Calculate throughput and ETA\n            elapsed = time.time() - self.start_time\n            if elapsed > 0:\n                session_data[\"throughput\"] = self.processed_items / elapsed\n                remaining = self.total_items - self.processed_items\n                session_data[\"eta_seconds\"] = remaining / session_data[\"throughput\"] if session_data[\"throughput\"] > 0 else 0\n        \n        # Send update if enough time has passed\n        current_time = time.time()\n        if current_time - self.last_update >= self.update_interval:\n            self._send_progress_update()\n            self.last_update = current_time\n    \n    def finish(self):\n        \"\"\"Mark progress as finished and send final update\"\"\"\n        super().finish()\n        \n        # Update session status\n        session_data = ACTIVE_PROGRESS_SESSIONS.get(self.session_id)\n        if session_data:\n            session_data.update({\n                \"status\": \"completed\",\n                \"progress_percentage\": 100,\n                \"eta_seconds\": 0\n            })\n        \n        # Send final update\n        self._send_progress_update(final=True)\n    \n    def _send_progress_update(self, final: bool = False):\n        \"\"\"Send progress update via Server-Sent Events\"\"\"\n        session_data = ACTIVE_PROGRESS_SESSIONS.get(self.session_id, {})\n        \n        event_data = {\n            \"session_id\": self.session_id,\n            \"progress\": session_data.copy(),\n            \"final\": final,\n            \"timestamp\": time.time()\n        }\n        \n        ServerSentEventsHandler.send_event(self.session_id, \"progress\", event_data)\n\ndef run_indexing_with_progress(file_paths: List[str], session_id: str):\n    \"\"\"Enhanced indexing function with real-time progress tracking\"\"\"\n    from rag_system.pipelines.indexing_pipeline import IndexingPipeline\n    from rag_system.utils.ollama_client import OllamaClient\n    import json\n    \n    try:\n        # Send initial status\n        ServerSentEventsHandler.send_event(session_id, \"status\", {\n            \"message\": \"Initializing indexing pipeline...\",\n            \"session_id\": session_id\n        })\n        \n        # Load configuration\n        config_file = \"batch_indexing_config.json\"\n        try:\n            with open(config_file, 'r') as f:\n                config = json.load(f)\n        except FileNotFoundError:\n            # Fallback to default config\n            config = {\n                \"embedding_model_name\": \"Qwen/Qwen3-Embedding-0.6B\",\n                \"indexing\": {\n                    \"embedding_batch_size\": 50,\n                    \"enrichment_batch_size\": 10,\n                    \"enable_progress_tracking\": True\n                },\n                \"contextual_enricher\": {\"enabled\": True, \"window_size\": 1},\n                \"retrievers\": {\n                    \"dense\": {\"enabled\": True, \"lancedb_table_name\": \"default_text_table\"},\n                    \"bm25\": {\"enabled\": True, \"index_name\": \"default_bm25_index\"}\n                },\n                \"storage\": {\n                    \"chunk_store_path\": \"./index_store/chunks/chunks.pkl\",\n                    \"lancedb_uri\": \"./index_store/lancedb\",\n                    \"bm25_path\": \"./index_store/bm25\"\n                }\n            }\n        \n        # Initialize components\n        ollama_client = OllamaClient()\n        ollama_config = {\n            \"generation_model\": \"llama3.2:1b\",\n            \"embedding_model\": \"mxbai-embed-large\"\n        }\n        \n        # Create enhanced pipeline\n        pipeline = IndexingPipeline(config, ollama_client, ollama_config)\n        \n        # Create progress tracker for the overall process\n        total_steps = 6  # Rough estimate of pipeline steps\n        step_tracker = RealtimeProgressTracker(total_steps, \"Document Indexing\", session_id)\n        \n        with timer(\"Complete Indexing Pipeline\"):\n            try:\n                # Step 1: Document Processing\n                step_tracker.update(1, current_step=\"Processing documents...\")\n                \n                # Run the indexing pipeline\n                pipeline.run(file_paths)\n                \n                # Update progress through the steps\n                step_tracker.update(1, current_step=\"Chunking completed...\")\n                step_tracker.update(1, current_step=\"BM25 indexing completed...\")\n                step_tracker.update(1, current_step=\"Contextual enrichment completed...\")\n                step_tracker.update(1, current_step=\"Vector embeddings completed...\")\n                step_tracker.update(1, current_step=\"Indexing finalized...\")\n                \n                step_tracker.finish()\n                \n                # Send completion notification\n                ServerSentEventsHandler.send_event(session_id, \"completion\", {\n                    \"message\": f\"Successfully indexed {len(file_paths)} file(s)\",\n                    \"file_count\": len(file_paths),\n                    \"session_id\": session_id\n                })\n                \n            except Exception as e:\n                # Send error notification\n                ServerSentEventsHandler.send_event(session_id, \"error\", {\n                    \"message\": str(e),\n                    \"session_id\": session_id\n                })\n                raise\n        \n    except Exception as e:\n        logger.error(f\"Indexing failed for session {session_id}: {e}\")\n        ServerSentEventsHandler.send_event(session_id, \"error\", {\n            \"message\": str(e),\n            \"session_id\": session_id\n        })\n        raise\n\nclass EnhancedRagApiHandler(http.server.BaseHTTPRequestHandler):\n    \"\"\"Enhanced API handler with progress tracking support\"\"\"\n    \n    def do_OPTIONS(self):\n        \"\"\"Handle CORS preflight requests for frontend integration.\"\"\"\n        self.send_response(200)\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.send_header('Access-Control-Allow-Methods', 'POST, GET, OPTIONS')\n        self.send_header('Access-Control-Allow-Headers', 'Content-Type')\n        self.end_headers()\n\n    def do_GET(self):\n        \"\"\"Handle GET requests for progress status and SSE streams\"\"\"\n        parsed_path = urlparse(self.path)\n        \n        if parsed_path.path == '/progress':\n            self.handle_progress_status()\n        elif parsed_path.path == '/stream':\n            self.handle_progress_stream()\n        else:\n            self.send_json_response({\"error\": \"Not Found\"}, status_code=404)\n\n    def do_POST(self):\n        \"\"\"Handle POST requests for chat and indexing.\"\"\"\n        parsed_path = urlparse(self.path)\n\n        if parsed_path.path == '/chat':\n            self.handle_chat()\n        elif parsed_path.path == '/index':\n            self.handle_index_with_progress()\n        else:\n            self.send_json_response({\"error\": \"Not Found\"}, status_code=404)\n\n    def handle_chat(self):\n        \"\"\"Handles a chat query by calling the agentic RAG pipeline.\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            query = data.get('query')\n            if not query:\n                self.send_json_response({\"error\": \"Query is required\"}, status_code=400)\n                return\n\n            # Use the single, persistent agent instance to run the query\n            result = RAG_AGENT.run(query)\n            \n            # The result is a dict, so we need to dump it to a JSON string\n            self.send_json_response(result)\n\n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Server error: {str(e)}\"}, status_code=500)\n\n    def handle_index_with_progress(self):\n        \"\"\"Triggers the document indexing pipeline with real-time progress tracking.\"\"\"\n        try:\n            content_length = int(self.headers['Content-Length'])\n            post_data = self.rfile.read(content_length)\n            data = json.loads(post_data.decode('utf-8'))\n            \n            file_paths = data.get('file_paths')\n            session_id = data.get('session_id')\n            \n            if not file_paths or not isinstance(file_paths, list):\n                self.send_json_response({\n                    \"error\": \"A 'file_paths' list is required.\"\n                }, status_code=400)\n                return\n            \n            if not session_id:\n                self.send_json_response({\n                    \"error\": \"A 'session_id' is required for progress tracking.\"\n                }, status_code=400)\n                return\n\n            # Start indexing in a separate thread to avoid blocking\n            def run_indexing_thread():\n                try:\n                    run_indexing_with_progress(file_paths, session_id)\n                except Exception as e:\n                    logger.error(f\"Indexing thread failed: {e}\")\n\n            thread = threading.Thread(target=run_indexing_thread)\n            thread.daemon = True\n            thread.start()\n\n            # Return immediate response\n            self.send_json_response({\n                \"message\": f\"Indexing started for {len(file_paths)} file(s)\",\n                \"session_id\": session_id,\n                \"status\": \"started\",\n                \"progress_stream_url\": f\"http://localhost:8001/stream?session_id={session_id}\"\n            })\n            \n        except json.JSONDecodeError:\n            self.send_json_response({\"error\": \"Invalid JSON\"}, status_code=400)\n        except Exception as e:\n            self.send_json_response({\"error\": f\"Failed to start indexing: {str(e)}\"}, status_code=500)\n\n    def handle_progress_status(self):\n        \"\"\"Handle GET requests for current progress status\"\"\"\n        parsed_url = urlparse(self.path)\n        params = parse_qs(parsed_url.query)\n        session_id = params.get('session_id', [None])[0]\n        \n        if not session_id:\n            self.send_json_response({\"error\": \"session_id is required\"}, status_code=400)\n            return\n        \n        progress_data = ACTIVE_PROGRESS_SESSIONS.get(session_id)\n        if not progress_data:\n            self.send_json_response({\"error\": \"No active progress for this session\"}, status_code=404)\n            return\n        \n        self.send_json_response({\n            \"session_id\": session_id,\n            \"progress\": progress_data\n        })\n\n    def handle_progress_stream(self):\n        \"\"\"Handle Server-Sent Events stream for real-time progress\"\"\"\n        parsed_url = urlparse(self.path)\n        params = parse_qs(parsed_url.query)\n        session_id = params.get('session_id', [None])[0]\n        \n        if not session_id:\n            self.send_response(400)\n            self.end_headers()\n            return\n        \n        # Set up SSE headers\n        self.send_response(200)\n        self.send_header('Content-Type', 'text/event-stream')\n        self.send_header('Cache-Control', 'no-cache')\n        self.send_header('Connection', 'keep-alive')\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.end_headers()\n        \n        # Add this connection to the SSE handler\n        ServerSentEventsHandler.add_connection(session_id, self)\n        \n        # Send initial connection message\n        initial_message = json.dumps({\n            \"session_id\": session_id,\n            \"message\": \"Progress stream connected\",\n            \"timestamp\": time.time()\n        })\n        self.wfile.write(f\"event: connected\\ndata: {initial_message}\\n\\n\".encode('utf-8'))\n        self.wfile.flush()\n        \n        # Keep connection alive\n        try:\n            while session_id in ServerSentEventsHandler.active_connections:\n                time.sleep(1)\n                # Send heartbeat\n                heartbeat = json.dumps({\"type\": \"heartbeat\", \"timestamp\": time.time()})\n                self.wfile.write(f\"event: heartbeat\\ndata: {heartbeat}\\n\\n\".encode('utf-8'))\n                self.wfile.flush()\n        except Exception as e:\n            logger.info(f\"SSE connection closed for session {session_id}: {e}\")\n        finally:\n            ServerSentEventsHandler.remove_connection(session_id)\n    \n    def send_json_response(self, data, status_code=200):\n        \"\"\"Utility to send a JSON response with CORS headers.\"\"\"\n        self.send_response(status_code)\n        self.send_header('Content-Type', 'application/json')\n        self.send_header('Access-Control-Allow-Origin', '*')\n        self.end_headers()\n        response = json.dumps(data, indent=2)\n        self.wfile.write(response.encode('utf-8'))\n\ndef start_enhanced_server(port=8000):\n    \"\"\"Start the enhanced API server with a reusable TCP socket.\"\"\"\n    \n    # Use a custom TCPServer that allows address reuse\n    class ReusableTCPServer(socketserver.TCPServer):\n        allow_reuse_address = True\n\n    with ReusableTCPServer((\"\", port), EnhancedRagApiHandler) as httpd:\n        print(f\"🚀 Starting Enhanced RAG API server on port {port}\")\n        print(f\"💬 Chat endpoint: http://localhost:{port}/chat\")\n        print(f\"✨ Indexing endpoint: http://localhost:{port}/index\")\n        print(f\"📊 Progress endpoint: http://localhost:{port}/progress\")\n        print(f\"🌊 Progress stream: http://localhost:{port}/stream\")\n        print(f\"📈 Real-time progress tracking enabled via Server-Sent Events!\")\n        httpd.serve_forever()\n\nif __name__ == '__main__':\n    # Start the server on a dedicated thread\n    server_thread = threading.Thread(target=start_enhanced_server)\n    server_thread.daemon = True\n    server_thread.start()\n    \n    print(\"🚀 Enhanced RAG API server with progress tracking is running.\")\n    print(\"Press Ctrl+C to stop.\")\n    \n    # Keep the main thread alive\n    try:\n        while True:\n            time.sleep(1)\n    except KeyboardInterrupt:\n        print(\"\\nStopping server...\") "
  },
  {
    "path": "rag_system/factory.py",
    "content": "from dotenv import load_dotenv\n\ndef get_agent(mode: str = \"default\"):\n    \"\"\"\n    Factory function to get an instance of the RAG agent based on the specified mode.\n    This uses local imports to prevent circular dependencies.\n    \"\"\"\n    from rag_system.agent.loop import Agent\n    from rag_system.utils.ollama_client import OllamaClient\n    from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG, LLM_BACKEND, WATSONX_CONFIG\n\n    load_dotenv()\n    \n    # Initialize the appropriate LLM client based on backend configuration\n    if LLM_BACKEND.lower() == \"watsonx\":\n        from rag_system.utils.watsonx_client import WatsonXClient\n        \n        if not WATSONX_CONFIG[\"api_key\"] or not WATSONX_CONFIG[\"project_id\"]:\n            raise ValueError(\n                \"Watson X configuration incomplete. Please set WATSONX_API_KEY and WATSONX_PROJECT_ID \"\n                \"environment variables.\"\n            )\n        \n        llm_client = WatsonXClient(\n            api_key=WATSONX_CONFIG[\"api_key\"],\n            project_id=WATSONX_CONFIG[\"project_id\"],\n            url=WATSONX_CONFIG[\"url\"]\n        )\n        llm_config = WATSONX_CONFIG\n    else:\n        llm_client = OllamaClient(host=OLLAMA_CONFIG[\"host\"])\n        llm_config = OLLAMA_CONFIG\n    \n    config = PIPELINE_CONFIGS.get(mode, PIPELINE_CONFIGS['default'])\n    \n    if 'storage' not in config:\n        config['storage'] = {\n            'db_path': 'lancedb',\n            'text_table_name': 'text_pages_default',\n            'image_table_name': 'image_pages'\n        }\n    \n    agent = Agent(\n        pipeline_configs=config, \n        llm_client=llm_client, \n        ollama_config=llm_config\n    )\n    return agent\n\ndef get_indexing_pipeline(mode: str = \"default\"):\n    \"\"\"\n    Factory function to get an instance of the Indexing Pipeline.\n    \"\"\"\n    from rag_system.pipelines.indexing_pipeline import IndexingPipeline\n    from rag_system.main import PIPELINE_CONFIGS, OLLAMA_CONFIG, LLM_BACKEND, WATSONX_CONFIG\n    from rag_system.utils.ollama_client import OllamaClient\n\n    load_dotenv()\n    \n    # Initialize the appropriate LLM client based on backend configuration\n    if LLM_BACKEND.lower() == \"watsonx\":\n        from rag_system.utils.watsonx_client import WatsonXClient\n        \n        if not WATSONX_CONFIG[\"api_key\"] or not WATSONX_CONFIG[\"project_id\"]:\n            raise ValueError(\n                \"Watson X configuration incomplete. Please set WATSONX_API_KEY and WATSONX_PROJECT_ID \"\n                \"environment variables.\"\n            )\n        \n        llm_client = WatsonXClient(\n            api_key=WATSONX_CONFIG[\"api_key\"],\n            project_id=WATSONX_CONFIG[\"project_id\"],\n            url=WATSONX_CONFIG[\"url\"]\n        )\n        llm_config = WATSONX_CONFIG\n    else:\n        llm_client = OllamaClient(host=OLLAMA_CONFIG[\"host\"])\n        llm_config = OLLAMA_CONFIG\n    \n    config = PIPELINE_CONFIGS.get(mode, PIPELINE_CONFIGS['default'])\n    \n    return IndexingPipeline(config, llm_client, llm_config)     "
  },
  {
    "path": "rag_system/indexing/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/indexing/contextualizer.py",
    "content": "from typing import List, Dict, Any\nfrom rag_system.utils.ollama_client import OllamaClient\nfrom rag_system.ingestion.chunking import create_contextual_window\nimport logging\nimport re\n\n# Set up logging\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n# Define the structured prompt templates, adapted from the example\nSYSTEM_PROMPT = \"You are an expert at summarizing and providing context for document sections based on their local surroundings.\"\n\nLOCAL_CONTEXT_PROMPT_TEMPLATE = \"\"\"<local_context>\n{local_context_text}\n</local_context>\"\"\"\n\nCHUNK_PROMPT_TEMPLATE = \"\"\"Here is the specific chunk we want to situate within the local context provided:\n<chunk>\n{chunk_content}\n</chunk>\n\nBased *only* on the local context provided, give a very short (2-5 sentence) context summary to situate this specific chunk. \nFocus on the chunk's topic and its relation to the immediately surrounding text shown in the local context. \nFocus on the the overall theme of the context, make sure to include topics, concepts, and other relevant information.\nAnswer *only* with the succinct context and nothing else.\"\"\"\n\nclass ContextualEnricher:\n    \"\"\"\n    Enriches chunks with a prepended summary of their surrounding context using Ollama,\n    while preserving the original text.\n    \"\"\"\n    def __init__(self, llm_client: OllamaClient, llm_model: str, batch_size: int = 10):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n        self.batch_size = batch_size\n        logger.info(f\"Initialized ContextualEnricher with Ollama model '{self.llm_model}' (batch_size={batch_size}).\")\n\n    def _generate_summary(self, local_context_text: str, chunk_text: str) -> str:\n        \"\"\"Generates a contextual summary using a structured, multi-part prompt.\"\"\"\n        # Combine the templates to form the final content for the HumanMessage equivalent\n        human_prompt_content = (\n            f\"{LOCAL_CONTEXT_PROMPT_TEMPLATE.format(local_context_text=local_context_text)}\\n\\n\"\n            f\"{CHUNK_PROMPT_TEMPLATE.format(chunk_content=chunk_text)}\"\n        )\n\n        try:\n            # Although we don't use LangChain's message objects, we can simulate the\n            # System + Human message structure in the single prompt for the Ollama client.\n            # A common way is to provide the system prompt and then the user's request.\n            full_prompt = f\"{SYSTEM_PROMPT}\\n\\n{human_prompt_content}\"\n            \n            response = self.llm_client.generate_completion(self.llm_model, full_prompt, enable_thinking=False)\n            summary_raw = response.get('response', '').strip()\n\n            # --- Sanitize the summary to remove chain-of-thought markers ---\n            # Many Qwen models wrap reasoning in <think>...</think> or similar tags.\n            cleaned = re.sub(r'<think[^>]*>.*?</think>', '', summary_raw, flags=re.IGNORECASE | re.DOTALL)\n            # Remove any assistant role tags that may appear\n            cleaned = re.sub(r'<assistant[^>]*>|</assistant>', '', cleaned, flags=re.IGNORECASE)\n            # If the model used an explicit \"Answer:\" delimiter keep only the part after it\n            if 'Answer:' in cleaned:\n                cleaned = cleaned.split('Answer:', 1)[1]\n\n            # Take the first non-empty line to avoid leftover blank lines\n            summary = next((ln.strip() for ln in cleaned.splitlines() if ln.strip()), '')\n\n            # Fallback to raw if cleaning removed everything\n            if not summary:\n                summary = summary_raw\n\n            if not summary or len(summary) < 5:\n                logger.warning(\"Generated context summary is too short or empty. Skipping enrichment for this chunk.\")\n                return \"\"\n            \n            return summary\n\n        except Exception as e:\n            logger.error(f\"LLM invocation failed during contextualization: {e}\", exc_info=True)\n            return \"\" # Gracefully fail by returning no summary\n\n    def enrich_chunks(self, chunks: List[Dict[str, Any]], window_size: int = 1) -> List[Dict[str, Any]]:\n        if not chunks:\n            return []\n\n        logger.info(f\"Enriching {len(chunks)} chunks with contextual summaries (window_size={window_size}) using Ollama...\")\n        \n        # Import batch processor\n        from rag_system.utils.batch_processor import BatchProcessor, estimate_memory_usage\n        \n        # Estimate memory usage\n        memory_mb = estimate_memory_usage(chunks)\n        logger.info(f\"Estimated memory usage for contextual enrichment: {memory_mb:.1f}MB\")\n        \n        # Use batch processing for better performance and progress tracking\n        batch_processor = BatchProcessor(batch_size=self.batch_size)\n        \n        def process_chunk_batch(chunk_indices):\n            \"\"\"Process a batch of chunk indices for contextual enrichment\"\"\"\n            batch_results = []\n            for i in chunk_indices:\n                chunk = chunks[i]\n                try:\n                    local_context_text = create_contextual_window(chunks, chunk_index=i, window_size=window_size)\n                    \n                    # The summary is generated based on the original, unmodified text\n                    original_text = chunk['text']\n                    summary = self._generate_summary(local_context_text, original_text)\n                    \n                    new_chunk = chunk.copy()\n                    \n                    # Ensure metadata is a dictionary\n                    if 'metadata' not in new_chunk or not isinstance(new_chunk['metadata'], dict):\n                        new_chunk['metadata'] = {}\n\n                    # Store original text and summary in metadata\n                    new_chunk['metadata']['original_text'] = original_text\n                    new_chunk['metadata']['contextual_summary'] = \"N/A\"\n\n                    # Prepend the context summary ONLY if it was successfully generated\n                    if summary:\n                        new_chunk['text'] = f\"Context: {summary}\\n\\n---\\n\\n{original_text}\"\n                        new_chunk['metadata']['contextual_summary'] = summary\n                    \n                    batch_results.append(new_chunk)\n                    \n                except Exception as e:\n                    logger.error(f\"Error enriching chunk {i}: {e}\")\n                    # Return original chunk if enrichment fails\n                    batch_results.append(chunk)\n                    \n            return batch_results\n        \n        # Create list of chunk indices for batch processing\n        chunk_indices = list(range(len(chunks)))\n        \n        # Process chunks in batches\n        enriched_chunks = batch_processor.process_in_batches(\n            chunk_indices,\n            process_chunk_batch,\n            \"Contextual Enrichment\"\n        )\n        \n        return enriched_chunks\n    \n    def enrich_chunks_sequential(self, chunks: List[Dict[str, Any]], window_size: int = 1) -> List[Dict[str, Any]]:\n        \"\"\"Sequential enrichment method (legacy) - kept for comparison\"\"\"\n        if not chunks:\n            return []\n\n        logger.info(f\"Enriching {len(chunks)} chunks sequentially (window_size={window_size})...\")\n        enriched_chunks = []\n        \n        for i, chunk in enumerate(chunks):\n            local_context_text = create_contextual_window(chunks, chunk_index=i, window_size=window_size)\n            \n            # The summary is generated based on the original, unmodified text\n            original_text = chunk['text']\n            summary = self._generate_summary(local_context_text, original_text)\n            \n            new_chunk = chunk.copy()\n            \n            # Ensure metadata is a dictionary\n            if 'metadata' not in new_chunk or not isinstance(new_chunk['metadata'], dict):\n                new_chunk['metadata'] = {}\n\n            # Store original text and summary in metadata\n            new_chunk['metadata']['original_text'] = original_text\n            new_chunk['metadata']['contextual_summary'] = \"N/A\"\n\n            # Prepend the context summary ONLY if it was successfully generated\n            if summary:\n                new_chunk['text'] = f\"Context: {summary}\\n\\n---\\n\\n{original_text}\"\n                new_chunk['metadata']['contextual_summary'] = summary\n            \n            enriched_chunks.append(new_chunk)\n            \n            if (i + 1) % 10 == 0 or i == len(chunks) - 1:\n                logger.info(f\"  ...processed {i+1}/{len(chunks)} chunks.\")\n            \n        return enriched_chunks"
  },
  {
    "path": "rag_system/indexing/embedders.py",
    "content": "# from rag_system.indexing.representations import BM25Generator\nimport lancedb\nimport pyarrow as pa\nfrom typing import List, Dict, Any\nimport numpy as np\nimport json\n\nclass LanceDBManager:\n    def __init__(self, db_path: str):\n        self.db_path = db_path\n        self.db = lancedb.connect(db_path)\n        print(f\"LanceDB connection established at: {db_path}\")\n\n    def get_table(self, table_name: str):\n        return self.db.open_table(table_name)\n\n    def create_table(self, table_name: str, schema: pa.Schema, mode: str = \"overwrite\"):\n        print(f\"Creating table '{table_name}' with mode '{mode}'...\")\n        return self.db.create_table(table_name, schema=schema, mode=mode)\n\nclass VectorIndexer:\n    \"\"\"\n    Handles the indexing of vector embeddings and rich metadata into LanceDB.\n    The 'text' field is the content that gets embedded (which can be enriched).\n    The original, clean text is stored in the metadata.\n    \"\"\"\n    def __init__(self, db_manager: LanceDBManager):\n        self.db_manager = db_manager\n\n    def index(self, table_name: str, chunks: List[Dict[str, Any]], embeddings: np.ndarray):\n        if len(chunks) != len(embeddings):\n            raise ValueError(\"The number of chunks and embeddings must be the same.\")\n        if not chunks:\n            print(\"No chunks to index.\")\n            return\n\n        vector_dim = embeddings[0].shape[0]\n        \n        # The schema stores the text that was used for the embedding (potentially enriched)\n        # and the full metadata object as a JSON string.\n        schema = pa.schema([\n            pa.field(\"vector\", pa.list_(pa.float32(), vector_dim)),\n            pa.field(\"text\", pa.string(), nullable=False),\n            pa.field(\"chunk_id\", pa.string()),\n            pa.field(\"document_id\", pa.string()),\n            pa.field(\"chunk_index\", pa.int32()),\n            pa.field(\"metadata\", pa.string())\n        ])\n\n        data = []\n        skipped_count = 0\n        \n        for chunk, vector in zip(chunks, embeddings):\n            # Check for NaN values in the vector\n            if np.isnan(vector).any():\n                print(f\"⚠️ Skipping chunk '{chunk.get('chunk_id', 'unknown')}' due to NaN values in embedding\")\n                skipped_count += 1\n                continue\n                \n            # Check for infinite values in the vector\n            if np.isinf(vector).any():\n                print(f\"⚠️ Skipping chunk '{chunk.get('chunk_id', 'unknown')}' due to infinite values in embedding\")\n                skipped_count += 1\n                continue\n            \n            # Ensure original_text is in metadata if not already present\n            if 'original_text' not in chunk['metadata']:\n                chunk['metadata']['original_text'] = chunk['text']\n\n            # Extract document_id and chunk_index for top-level storage\n            doc_id = chunk.get(\"metadata\", {}).get(\"document_id\", \"unknown\")\n            chunk_idx = chunk.get(\"metadata\", {}).get(\"chunk_index\", -1)\n\n            # Defensive check for text content to ensure it's a non-empty string\n            text_content = chunk.get('text', '')\n            if not text_content or not isinstance(text_content, str):\n                text_content = \"\"\n\n            data.append({\n                \"vector\": vector.tolist(),\n                \"text\": text_content,\n                \"chunk_id\": chunk['chunk_id'],\n                \"document_id\": doc_id,\n                \"chunk_index\": chunk_idx,\n                \"metadata\": json.dumps(chunk)\n            })\n\n        if skipped_count > 0:\n            print(f\"⚠️ Skipped {skipped_count} chunks due to invalid embeddings (NaN or infinite values)\")\n        \n        if not data:\n            print(\"❌ No valid embeddings to index after filtering out NaN/infinite values\")\n            return\n\n        # Incremental indexing: append to existing table if present, otherwise create it\n        db = self.db_manager.db  # underlying LanceDB connection\n\n        if hasattr(db, \"table_names\") and table_name in db.table_names():\n            tbl = self.db_manager.get_table(table_name)\n            print(f\"Appending {len(data)} vectors to existing table '{table_name}'.\")\n        else:\n            print(f\"Creating table '{table_name}' (new) and adding {len(data)} vectors...\")\n            tbl = self.db_manager.create_table(table_name, schema=schema, mode=\"create\")\n\n        # Add data with NaN handling configuration\n        try:\n            tbl.add(data, on_bad_vectors='drop')\n            print(f\"✅ Indexed {len(data)} vectors into table '{table_name}'.\")\n        except Exception as e:\n            print(f\"❌ Failed to add data to table: {e}\")\n            # Fallback: try with fill strategy\n            try:\n                print(\"🔄 Retrying with NaN fill strategy...\")\n                tbl.add(data, on_bad_vectors='fill', fill_value=0.0)\n                print(f\"✅ Indexed {len(data)} vectors into table '{table_name}' (with NaN fill).\")\n            except Exception as e2:\n                print(f\"❌ Failed to add data even with NaN fill: {e2}\")\n                raise\n\n# BM25Indexer is no longer needed as we are moving to LanceDB's native FTS.\n# class BM25Indexer:\n#     ...\n\nif __name__ == '__main__':\n    print(\"embedders.py updated for contextual enrichment.\")\n    \n    # This chunk has been \"enriched\". The 'text' field contains the context.\n    enriched_chunk = {\n        'chunk_id': 'doc1_0', \n        'text': 'Context: Discusses animals.\\n\\n---\\n\\nOriginal: The cat sat on the mat.', \n        'metadata': {\n            'original_text': 'The cat sat on the mat.',\n            'contextual_summary': 'Discusses animals.',\n            'document_id': 'doc1', \n            'title': 'Pet Stories'\n        }\n    }\n    sample_embeddings = np.random.rand(1, 128).astype('float32')\n\n    DB_PATH = \"./rag_system/index_store/lancedb\"\n    db_manager = LanceDBManager(db_path=DB_PATH)\n    vector_indexer = VectorIndexer(db_manager=db_manager)\n\n    vector_indexer.index(\n        table_name=\"enriched_text_embeddings\", \n        chunks=[enriched_chunk], \n        embeddings=sample_embeddings\n    )\n    \n    try:\n        tbl = db_manager.get_table(\"enriched_text_embeddings\")\n        df = tbl.limit(1).to_pandas()\n        df['metadata'] = df['metadata'].apply(json.loads)\n        print(\"\\n--- Verification ---\")\n        print(\"Embedded Text:\", df['text'].iloc[0])\n        print(\"Original Text from Metadata:\", df['metadata'].iloc[0]['original_text'])\n    except Exception as e:\n        print(f\"Could not verify LanceDB table. Error: {e}\")\n"
  },
  {
    "path": "rag_system/indexing/graph_extractor.py",
    "content": "from typing import List, Dict, Any\nimport json\nfrom rag_system.utils.ollama_client import OllamaClient\n\nclass GraphExtractor:\n    \"\"\"\n    Extracts entities and relationships from text chunks using a live Ollama model.\n    \"\"\"\n    def __init__(self, llm_client: OllamaClient, llm_model: str):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n        print(f\"Initialized GraphExtractor with Ollama model '{self.llm_model}'.\")\n\n    def extract(self, chunks: List[Dict[str, Any]]) -> Dict[str, List[Dict]]:\n        all_entities = {}\n        all_relationships = set()\n\n        print(f\"Extracting graph from {len(chunks)} chunks with Ollama...\")\n        for i, chunk in enumerate(chunks):\n            # Step 1: Extract Entities\n            entity_prompt = f\"\"\"\n            From the following text, extract key entities (people, companies, locations).\n            Return the answer as a JSON object with a single key 'entities', which is a list of strings.\n            Each entity should be a short, specific name, not a long string of text.\n\n            Text: \"{chunk['text']}\"\n            \"\"\"\n            \n            entity_response = self.llm_client.generate_completion(\n                self.llm_model, \n                entity_prompt,\n                format=\"json\" \n            )\n            \n            entity_response_text = entity_response.get('response', '{}')\n\n            try:\n                entity_data = json.loads(entity_response_text)\n                entities = entity_data.get('entities', [])\n                \n                if not entities:\n                    continue\n\n                # Clean up entities\n                cleaned_entities = []\n                for entity in entities:\n                    if len(entity) < 50 and not any(c in entity for c in \"[]{}()\"):\n                        cleaned_entities.append(entity)\n\n                if not cleaned_entities:\n                    continue\n\n                # Step 2: Extract Relationships\n                relationship_prompt = f\"\"\"\n                Given the following entities: {cleaned_entities}\n                And the following text: \"{chunk['text']}\"\n                Extract the relationships between the entities.\n                Return the answer as a JSON object with a single key 'relationships', which is a list of objects, each with 'source', 'target', and 'label'.\n                \"\"\"\n\n                relationship_response = self.llm_client.generate_completion(\n                    self.llm_model,\n                    relationship_prompt,\n                    format=\"json\"\n                )\n\n                relationship_response_text = relationship_response.get('response', '{}')\n                relationship_data = json.loads(relationship_response_text)\n\n                for entity_name in cleaned_entities:\n                    all_entities[entity_name] = {\"id\": entity_name, \"type\": \"Unknown\"} # Placeholder type\n\n                for rel in relationship_data.get(\"relationships\", []):\n                    if 'source' in rel and 'target' in rel and 'label' in rel:\n                        all_relationships.add(\n                            (rel['source'], rel['target'], rel['label'])\n                        )\n\n            except json.JSONDecodeError:\n                print(f\"Warning: Could not decode JSON from LLM for chunk {i+1}.\")\n                continue\n        \n        return {\n            \"entities\": list(all_entities.values()),\n            \"relationships\": [{\"source\": s, \"target\": t, \"label\": l} for s, t, l in all_relationships]\n        }\n"
  },
  {
    "path": "rag_system/indexing/latechunk.py",
    "content": "from __future__ import annotations\n\n\"\"\"Late Chunking encoder.\n\nThis helper feeds the *entire* document to the embedding model, collects\nper-token hidden-states and then mean-pools those vectors inside pre-defined\nchunk spans.  The end result is one vector per chunk – but each vector has\nbeen produced with knowledge of the *whole* document, alleviating context-loss\nissues of vanilla chunking.\n\nWe purposefully keep this class lightweight and free of LanceDB/Chunking\nlogic so it can be re-used elsewhere (e.g. notebook experiments).\n\"\"\"\n\nfrom typing import List, Tuple\n\nimport torch\nfrom transformers import AutoModel, AutoTokenizer\nimport numpy as np\n\nclass LateChunkEncoder:\n    \"\"\"Generate late-chunked embeddings given character-offset spans.\"\"\"\n\n    def __init__(self, model_name: str = \"Qwen/Qwen3-Embedding-0.6B\", *, max_tokens: int = 8192) -> None:\n        self.model_name = model_name\n        self.max_len = max_tokens\n        self.device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n        # Back-compat: allow short alias without repo namespace\n        repo_id = model_name\n        if \"/\" not in model_name and not model_name.startswith(\"Qwen/\"):\n            # map common alias to official repo\n            alias_map = {\n                \"qwen3-embedding-0.6b\": \"Qwen/Qwen3-Embedding-0.6B\",\n            }\n            repo_id = alias_map.get(model_name.lower(), model_name)\n\n        self.tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)\n        self.model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)\n        self.model.to(self.device)\n        self.model.eval()\n\n    @torch.inference_mode()\n    def encode(self, text: str, chunk_spans: List[Tuple[int, int]]) -> List[np.ndarray]:\n        \"\"\"Return one vector *per* span.\n\n        Args:\n            text: Full document text.\n            chunk_spans: List of (char_start, char_end) offsets for each chunk.\n\n        Returns:\n            List of numpy float32 arrays – one per chunk.\n        \"\"\"\n        if not chunk_spans:\n            return []\n\n        # Tokenise and obtain per-token hidden states\n        inputs = self.tokenizer(\n            text,\n            return_tensors=\"pt\",\n            return_offsets_mapping=True,\n            truncation=True,\n            max_length=self.max_len,\n        )\n        inputs = {k: v.to(self.device) for k, v in inputs.items()}\n        offsets = inputs.pop(\"offset_mapping\").squeeze(0).cpu().tolist()  # (seq_len, 2)\n\n        out = self.model(**inputs)\n        last_hidden = out.last_hidden_state.squeeze(0)  # (seq_len, dim)\n        last_hidden = last_hidden.cpu()\n\n        # For each chunk span, gather token indices belonging to it\n        vectors: List[np.ndarray] = []\n        for start_char, end_char in chunk_spans:\n            token_indices = [i for i, (s, e) in enumerate(offsets) if s >= start_char and e <= end_char]\n            if not token_indices:\n                # Fallback: if tokenizer lost the span (e.g. due to trimming) just average CLS + SEP\n                token_indices = [0]\n            chunk_vec = last_hidden[token_indices].mean(dim=0).numpy().astype(\"float32\")\n            \n            # Check for NaN or infinite values\n            if np.isnan(chunk_vec).any() or np.isinf(chunk_vec).any():\n                print(f\"⚠️ Warning: Invalid values detected in late chunk embedding for span ({start_char}, {end_char})\")\n                # Replace invalid values with zeros\n                chunk_vec = np.nan_to_num(chunk_vec, nan=0.0, posinf=0.0, neginf=0.0)\n                print(f\"🔄 Replaced invalid values with zeros\")\n            \n            vectors.append(chunk_vec)\n        return vectors "
  },
  {
    "path": "rag_system/indexing/multimodal.py",
    "content": "import fitz  # PyMuPDF\nfrom PIL import Image\nimport torch\nimport os\nfrom typing import List, Dict, Any\n\nfrom rag_system.indexing.embedders import LanceDBManager, VectorIndexer\nfrom rag_system.indexing.representations import QwenEmbedder\n\n\nfrom transformers import ColPaliForRetrieval, ColPaliProcessor, Qwen2TokenizerFast\n\nclass LocalVisionModel:\n    \"\"\"\n    A wrapper for a local vision model (ColPali) from the transformers library.\n    \"\"\"\n    def __init__(self, model_name: str = \"vidore/colqwen2-v1.0\", device: str = \"cpu\"):\n        print(f\"Initializing local vision model '{model_name}' on device '{device}'.\")\n        self.device = device\n        self.model = ColPaliForRetrieval.from_pretrained(model_name).to(self.device).eval()\n        self.tokenizer = Qwen2TokenizerFast.from_pretrained(model_name)\n        self.image_processor = ColPaliProcessor.from_pretrained(model_name).image_processor\n        self.processor = ColPaliProcessor(tokenizer=self.tokenizer, image_processor=self.image_processor)\n        print(\"Local vision model loaded successfully.\")\n\n    def embed_image(self, image: Image.Image) -> torch.Tensor:\n        \"\"\"\n        Generates a multi-vector embedding for a single image.\n        \"\"\"\n        inputs = self.processor(text=\"\", images=image, return_tensors=\"pt\").to(self.device)\n        with torch.no_grad():\n            image_embeds = self.model.get_image_features(**inputs)\n        return image_embeds\n\n\nclass MultimodalProcessor:\n    \"\"\"\n    Processes PDFs into separate text and image embeddings using local models.\n    \"\"\"\n    def __init__(self, vision_model: LocalVisionModel, text_embedder: QwenEmbedder, db_manager: LanceDBManager):\n        self.vision_model = vision_model\n        self.text_embedder = text_embedder\n        self.text_vector_indexer = VectorIndexer(db_manager)\n        self.image_vector_indexer = VectorIndexer(db_manager)\n\n    def process_and_index(\n        self, \n        pdf_path: str, \n        text_table_name: str, \n        image_table_name: str\n    ):\n        print(f\"\\n--- Processing PDF for multimodal indexing: {os.path.basename(pdf_path)} ---\")\n        doc = fitz.open(pdf_path)\n        document_id = os.path.basename(pdf_path)\n        \n        all_pages_text_chunks = []\n        all_pages_images = []\n        \n        for page_num in range(len(doc)):\n            page = doc.load_page(page_num)\n            \n            # 1. Extract Text\n            text = page.get_text(\"text\")\n            if not text.strip():\n                text = f\"Page {page_num + 1} contains no extractable text.\"\n            \n            all_pages_text_chunks.append({\n                \"chunk_id\": f\"{document_id}_page_{page_num+1}\",\n                \"text\": text,\n                \"metadata\": {\"document_id\": document_id, \"page_number\": page_num + 1}\n            })\n            \n            # 2. Extract Image\n            pix = page.get_pixmap()\n            img = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n            all_pages_images.append(img)\n\n        # --- Batch Indexing ---\n        # Index all text chunks\n        if all_pages_text_chunks:\n            text_embeddings = self.text_embedder.create_embeddings([c['text'] for c in all_pages_text_chunks])\n            self.text_vector_indexer.index(text_table_name, all_pages_text_chunks, text_embeddings)\n            print(f\"Indexed {len(all_pages_text_chunks)} text pages into '{text_table_name}'.\")\n\n        # Index all images\n        if all_pages_images:\n            image_embeddings = self.vision_model.create_image_embeddings(all_pages_images)\n            # We use the text chunks as placeholders for metadata\n            self.image_vector_indexer.index(image_table_name, all_pages_text_chunks, image_embeddings)\n            print(f\"Indexed {len(all_pages_images)} image pages into '{image_table_name}'.\")\n\nif __name__ == '__main__':\n    # This test requires an internet connection to download the models.\n    try:\n        # 1. Setup models and dependencies\n        text_embedder = QwenEmbedder()\n        vision_model = LocalVisionModel()\n        db_manager = LanceDBManager(db_path=\"./rag_system/index_store/lancedb\")\n        \n        # 2. Create a dummy PDF\n        dummy_pdf_path = \"multimodal_test.pdf\"\n        doc = fitz.open()\n        page = doc.new_page()\n        page.insert_text((50, 72), \"This is a test page with text and an image.\")\n        doc.save(dummy_pdf_path)\n        \n        # 3. Run the processor\n        processor = MultimodalProcessor(vision_model, text_embedder, db_manager)\n        processor.process_and_index(\n            pdf_path=dummy_pdf_path,\n            text_table_name=\"test_text_pages\",\n            image_table_name=\"test_image_pages\"\n        )\n        \n        # 4. Verify\n        print(\"\\n--- Verification ---\")\n        text_tbl = db_manager.get_table(\"test_text_pages\")\n        img_tbl = db_manager.get_table(\"test_image_pages\")\n        print(f\"Text table has {len(text_tbl)} rows.\")\n        print(f\"Image table has {len(img_tbl)} rows.\")\n\n    except Exception as e:\n        print(f\"\\nAn error occurred during the multimodal test: {e}\")\n        print(\"Please ensure you have an internet connection for model downloads.\")"
  },
  {
    "path": "rag_system/indexing/overview_builder.py",
    "content": "from __future__ import annotations\n\nimport os, json, logging, re\nfrom typing import List, Dict, Any\n\nlogger = logging.getLogger(__name__)\n\nclass OverviewBuilder:\n    \"\"\"Generates and stores a one-paragraph overview for each document.\n    The overview is derived from the first *n* chunks of the document.\n    \"\"\"\n\n    DEFAULT_PROMPT = (\n        \"You will receive the beginning of a document. \"\n        \"In no more than 120 tokens, describe what the document is about, \"\n        \"state its type (e.g. invoice, slide deck, policy, research paper, receipt) \"\n        \"and mention 3-5 important entities, numbers or dates it contains.\\n\\n\"\n        \"DOCUMENT_START:\\n{text}\\n\\nOVERVIEW:\"\n    )\n\n    def __init__(self, llm_client, model: str = \"qwen3:0.6b\", first_n_chunks: int = 5,\n                 out_path: str | None = None):\n        if out_path is None:\n            out_path = \"index_store/overviews/overviews.jsonl\"\n        self.llm_client = llm_client\n        self.model = model\n        self.first_n = first_n_chunks\n        self.out_path = out_path\n        os.makedirs(os.path.dirname(out_path), exist_ok=True)\n\n    def build_and_store(self, doc_id: str, chunks: List[Dict[str, Any]]):\n        if not chunks:\n            return\n        head_text = \"\\n\".join(c[\"text\"] for c in chunks[: self.first_n] if c.get(\"text\"))\n        prompt = self.DEFAULT_PROMPT.format(text=head_text[:5000])  # safety cap\n        try:\n            resp = self.llm_client.generate_completion(model=self.model, prompt=prompt, enable_thinking=False)\n            summary_raw = resp.get(\"response\", \"\")\n            # Remove any lingering <think>...</think> blocks just in case\n            summary = re.sub(r'<think[^>]*>.*?</think>', '', summary_raw, flags=re.IGNORECASE | re.DOTALL).strip()\n        except Exception as e:\n            summary = f\"Failed to generate overview: {e}\"\n        record = {\"doc_id\": doc_id, \"overview\": summary.strip()}\n        with open(self.out_path, \"a\", encoding=\"utf-8\") as f:\n            f.write(json.dumps(record, ensure_ascii=False) + \"\\n\")\n\n        logger.info(f\"📄 Overview generated for {doc_id} (stored in {self.out_path})\") "
  },
  {
    "path": "rag_system/indexing/representations.py",
    "content": "from typing import List, Dict, Any, Protocol\nimport numpy as np\nfrom transformers import AutoModel, AutoTokenizer\nimport torch\nimport os\n\n# We keep the protocol to ensure a consistent interface\nclass EmbeddingModel(Protocol):\n    def create_embeddings(self, texts: List[str]) -> np.ndarray: ...\n\n# Global cache for models - use dict to cache by model name\n_MODEL_CACHE = {}\n\n# --- New Ollama Embedder ---\nclass QwenEmbedder(EmbeddingModel):\n    \"\"\"\n    An embedding model that uses a local Hugging Face transformer model.\n    \"\"\"\n    def __init__(self, model_name: str = \"Qwen/Qwen3-Embedding-0.6B\"):\n        self.model_name = model_name\n        # Auto-select the best available device: CUDA > MPS > CPU\n        if torch.cuda.is_available():\n            self.device = \"cuda\"\n        elif getattr(torch.backends, \"mps\", None) and torch.backends.mps.is_available():\n            self.device = \"mps\"\n        else:\n            self.device = \"cpu\"\n\n        # Use model-specific cache\n        if model_name not in _MODEL_CACHE:\n            print(f\"Initializing HF Embedder with model '{model_name}' on device '{self.device}'. (first load)\")\n            tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, padding_side=\"left\")\n            model = AutoModel.from_pretrained(\n                model_name,\n                trust_remote_code=True,\n                torch_dtype=torch.float16 if self.device != \"cpu\" else None,\n            ).to(self.device).eval()\n            _MODEL_CACHE[model_name] = (tokenizer, model)\n            print(f\"QwenEmbedder weights loaded and cached for {model_name}.\")\n        else:\n            print(f\"Reusing cached QwenEmbedder weights for {model_name}.\")\n        \n        self.tokenizer, self.model = _MODEL_CACHE[model_name]\n\n    def create_embeddings(self, texts: List[str]) -> np.ndarray:\n        print(f\"Generating {len(texts)} embeddings with {self.model_name} model...\")\n        inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors=\"pt\").to(self.device)\n        with torch.no_grad():\n            outputs = self.model(**inputs)\n            last_hidden = outputs.last_hidden_state  # [B, seq, dim]\n            # Pool via last valid token per sequence (recommended for Qwen3)\n            seq_len = inputs[\"attention_mask\"].sum(dim=1) - 1  # index of last token\n            batch_indices = torch.arange(last_hidden.size(0), device=self.device)\n            embeddings = last_hidden[batch_indices, seq_len]\n        \n        # Convert to numpy and validate\n        embeddings_np = embeddings.cpu().numpy()\n        \n        # Check for NaN or infinite values\n        if np.isnan(embeddings_np).any():\n            print(f\"⚠️ Warning: NaN values detected in embeddings from {self.model_name}\")\n            # Replace NaN values with zeros\n            embeddings_np = np.nan_to_num(embeddings_np, nan=0.0, posinf=0.0, neginf=0.0)\n            print(f\"🔄 Replaced NaN values with zeros\")\n        \n        if np.isinf(embeddings_np).any():\n            print(f\"⚠️ Warning: Infinite values detected in embeddings from {self.model_name}\")\n            # Replace infinite values with zeros\n            embeddings_np = np.nan_to_num(embeddings_np, nan=0.0, posinf=0.0, neginf=0.0)\n            print(f\"🔄 Replaced infinite values with zeros\")\n        \n        return embeddings_np\n\nclass EmbeddingGenerator:\n    def __init__(self, embedding_model: EmbeddingModel, batch_size: int = 50):\n        self.model = embedding_model\n        self.batch_size = batch_size\n\n    def generate(self, chunks: List[Dict[str, Any]]) -> List[np.ndarray]:\n        \"\"\"Generate embeddings for all chunks using batch processing\"\"\"\n        texts_to_embed = [chunk['text'] for chunk in chunks]\n        if not texts_to_embed: \n            return []\n        \n        from rag_system.utils.batch_processor import BatchProcessor, estimate_memory_usage\n        \n        memory_mb = estimate_memory_usage(chunks)\n        print(f\"Estimated memory usage for {len(chunks)} chunks: {memory_mb:.1f}MB\")\n        \n        batch_processor = BatchProcessor(batch_size=self.batch_size)\n        \n        def process_text_batch(text_batch):\n            if not text_batch:\n                return []\n            batch_embeddings = self.model.create_embeddings(text_batch)\n            return [embedding for embedding in batch_embeddings]\n        \n        all_embeddings = batch_processor.process_in_batches(\n            texts_to_embed,\n            process_text_batch,\n            \"Embedding Generation\"\n        )\n        \n        return all_embeddings\n\nclass OllamaEmbedder(EmbeddingModel):\n    \"\"\"Call Ollama's /api/embeddings endpoint for each text.\"\"\"\n    def __init__(self, model_name: str, host: str | None = None, timeout: int = 60):\n        self.model_name = model_name\n        self.host = (host or os.getenv(\"OLLAMA_HOST\") or \"http://localhost:11434\").rstrip(\"/\")\n        self.timeout = timeout\n\n    def _embed_single(self, text: str):\n        import requests, numpy as np, json\n        payload = {\"model\": self.model_name, \"prompt\": text}\n        r = requests.post(f\"{self.host}/api/embeddings\", json=payload, timeout=self.timeout)\n        r.raise_for_status()\n        data = r.json()\n        # Ollama may return {\"embedding\": [...]} or {\"data\": [...]} depending on version\n        vec = data.get(\"embedding\") or data.get(\"data\")\n        if vec is None:\n            raise ValueError(\"Unexpected Ollama embeddings response format\")\n        return np.array(vec, dtype=\"float32\")\n\n    def create_embeddings(self, texts: List[str]):\n        import numpy as np\n        vectors = [self._embed_single(t) for t in texts]\n        embeddings_np = np.vstack(vectors)\n        \n        # Check for NaN or infinite values\n        if np.isnan(embeddings_np).any():\n            print(f\"⚠️ Warning: NaN values detected in Ollama embeddings from {self.model_name}\")\n            # Replace NaN values with zeros\n            embeddings_np = np.nan_to_num(embeddings_np, nan=0.0, posinf=0.0, neginf=0.0)\n            print(f\"🔄 Replaced NaN values with zeros\")\n        \n        if np.isinf(embeddings_np).any():\n            print(f\"⚠️ Warning: Infinite values detected in Ollama embeddings from {self.model_name}\")\n            # Replace infinite values with zeros\n            embeddings_np = np.nan_to_num(embeddings_np, nan=0.0, posinf=0.0, neginf=0.0)\n            print(f\"🔄 Replaced infinite values with zeros\")\n        \n        return embeddings_np\n\ndef select_embedder(model_name: str, ollama_host: str | None = None):\n    \"\"\"Return appropriate EmbeddingModel implementation for the given name.\"\"\"\n    if \"/\" in model_name or model_name.startswith(\"http\"):\n        # Treat as HF model path\n        return QwenEmbedder(model_name=model_name)\n    # Otherwise assume it's an Ollama tag\n    return OllamaEmbedder(model_name=model_name, host=ollama_host)\n\nif __name__ == '__main__':\n    print(\"representations.py cleaned up.\")\n    try:\n        qwen_embedder = QwenEmbedder()\n        emb_gen = EmbeddingGenerator(embedding_model=qwen_embedder)\n        \n        sample_chunks = [{'text': 'Hello world'}, {'text': 'This is a test'}]\n        embeddings = emb_gen.generate(sample_chunks)\n        \n        print(f\"\\nSuccessfully generated {len(embeddings)} embeddings.\")\n        print(f\"Shape of first embedding: {embeddings[0].shape}\")\n\n    except Exception as e:\n        print(f\"\\nAn error occurred during the QwenEmbedder test: {e}\")\n        print(\"Please ensure you have an internet connection for model downloads.\")"
  },
  {
    "path": "rag_system/ingestion/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/ingestion/chunking.py",
    "content": "from typing import List, Dict, Any, Optional\nimport re\nfrom transformers import AutoTokenizer\n\nclass MarkdownRecursiveChunker:\n    \"\"\"\n    A recursive chunker that splits Markdown text based on its semantic structure\n    and embeds document-level metadata into each chunk.\n    \"\"\"\n\n    def __init__(self, max_chunk_size: int = 1500, min_chunk_size: int = 200, tokenizer_model: str = \"Qwen/Qwen3-Embedding-0.6B\"):\n        self.max_chunk_size = max_chunk_size\n        self.min_chunk_size = min_chunk_size\n        self.split_priority = [\"\\n## \", \"\\n### \", \"\\n#### \", \"```\", \"\\n\\n\"]\n        \n        repo_id = tokenizer_model\n        if \"/\" not in tokenizer_model and not tokenizer_model.startswith(\"Qwen/\"):\n            repo_id = {\n                \"qwen3-embedding-0.6b\": \"Qwen/Qwen3-Embedding-0.6B\",\n            }.get(tokenizer_model.lower(), tokenizer_model)\n        \n        try:\n            self.tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)\n        except Exception as e:\n            print(f\"Warning: Failed to load tokenizer {repo_id}: {e}\")\n            print(\"Falling back to character-based approximation (4 chars ≈ 1 token)\")\n            self.tokenizer = None\n\n    def _token_len(self, text: str) -> int:\n        \"\"\"Get token count for text using the tokenizer.\"\"\"\n        if self.tokenizer is not None:\n            return len(self.tokenizer.tokenize(text))\n        else:\n            return max(1, len(text) // 4)\n    \n    def _split_text(self, text: str, separators: List[str]) -> List[str]:\n        final_chunks = []\n        chunks_to_process = [text]\n        \n        for sep in separators:\n            new_chunks = []\n            for chunk in chunks_to_process:\n                if self._token_len(chunk) > self.max_chunk_size:\n                    sub_chunks = re.split(f'({sep})', chunk)\n                    combined = []\n                    i = 0\n                    while i < len(sub_chunks):\n                        if i + 1 < len(sub_chunks) and sub_chunks[i+1] == sep:\n                            combined.append(sub_chunks[i+1] + sub_chunks[i+2])\n                            i += 3\n                        else:\n                            if sub_chunks[i]:\n                                combined.append(sub_chunks[i])\n                            i += 1\n                    new_chunks.extend(combined)\n                else:\n                    new_chunks.append(chunk)\n            chunks_to_process = new_chunks\n        \n        final_chunks = []\n        for chunk in chunks_to_process:\n            if self._token_len(chunk) > self.max_chunk_size:\n                words = chunk.split()\n                current_chunk = \"\"\n                for word in words:\n                    test_chunk = current_chunk + \" \" + word if current_chunk else word\n                    if self._token_len(test_chunk) <= self.max_chunk_size:\n                        current_chunk = test_chunk\n                    else:\n                        if current_chunk:\n                            final_chunks.append(current_chunk)\n                        current_chunk = word\n                if current_chunk:\n                    final_chunks.append(current_chunk)\n            else:\n                final_chunks.append(chunk)\n\n        return final_chunks\n\n    def chunk(self, text: str, document_id: str, document_metadata: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]:\n        \"\"\"\n        Chunks the Markdown text and injects metadata.\n\n        Args:\n            text: The Markdown text to chunk.\n            document_id: The identifier for the source document.\n            document_metadata: A dictionary of metadata for the source document.\n\n        Returns:\n            A list of dictionaries, where each dictionary is a chunk with metadata.\n        \"\"\"\n        if not text:\n            return []\n\n        raw_chunks = self._split_text(text, self.split_priority)\n        \n        merged_chunks_text = []\n        current_chunk = \"\"\n        for chunk_text in raw_chunks:\n            test_chunk = current_chunk + chunk_text if current_chunk else chunk_text\n            if not current_chunk or self._token_len(test_chunk) <= self.max_chunk_size:\n                current_chunk = test_chunk\n            elif self._token_len(current_chunk) < self.min_chunk_size:\n                 current_chunk = test_chunk\n            else:\n                merged_chunks_text.append(current_chunk)\n                current_chunk = chunk_text\n        if current_chunk:\n            merged_chunks_text.append(current_chunk)\n\n        final_chunks = []\n        for i, chunk_text in enumerate(merged_chunks_text):\n            # Combine document-level metadata with chunk-specific metadata\n            combined_metadata = (document_metadata or {}).copy()\n            combined_metadata.update({\n                \"document_id\": document_id,\n                \"chunk_number\": i,\n            })\n            \n            final_chunks.append({\n                \"chunk_id\": f\"{document_id}_{i}\", # Create a more unique ID\n                \"text\": chunk_text.strip(),\n                \"metadata\": combined_metadata\n            })\n\n        return final_chunks\n\ndef create_contextual_window(all_chunks: List[Dict[str, Any]], chunk_index: int, window_size: int = 1) -> str:\n    if not (0 <= chunk_index < len(all_chunks)):\n        raise ValueError(\"chunk_index is out of bounds.\")\n    start = max(0, chunk_index - window_size)\n    end = min(len(all_chunks), chunk_index + window_size + 1)\n    context_chunks = all_chunks[start:end]\n    return \" \".join([chunk['text'] for chunk in context_chunks])\n\nif __name__ == '__main__':\n    print(\"chunking.py updated to include document metadata in each chunk.\")\n    \n    sample_markdown = \"# Doc Title\\n\\nContent paragraph.\"\n    doc_meta = {\"title\": \"My Awesome Document\", \"author\": \"Jane Doe\", \"year\": 2024}\n    \n    chunker = MarkdownRecursiveChunker()\n    chunks = chunker.chunk(\n        text=sample_markdown, \n        document_id=\"doc456\", \n        document_metadata=doc_meta\n    )\n    \n    print(f\"\\n--- Created {len(chunks)} chunk(s) ---\")\n    for chunk in chunks:\n        print(f\"Chunk ID: {chunk['chunk_id']}\")\n        print(f\"Text: '{chunk['text']}'\")\n        print(f\"Metadata: {chunk['metadata']}\")\n        print(\"-\" * 20)\n"
  },
  {
    "path": "rag_system/ingestion/docling_chunker.py",
    "content": "from __future__ import annotations\n\n\"\"\"Docling-aware chunker (simplified).\n\nFor now we proxy the old MarkdownRecursiveChunker but add:\n• sentence-aware packing to max_tokens with overlap\n• breadcrumb metadata stubs so downstream code already handles them\n\nIn a follow-up we can replace the internals with true Docling element-tree\nwalking once the PDFConverter returns structured nodes.\n\"\"\"\nfrom typing import List, Dict, Any, Tuple\nimport math\nimport re\nfrom itertools import islice\nfrom rag_system.ingestion.chunking import MarkdownRecursiveChunker\nfrom transformers import AutoTokenizer\n\nclass DoclingChunker:\n    def __init__(self, *, max_tokens: int = 512, overlap: int = 1, tokenizer_model: str = \"Qwen/Qwen3-Embedding-0.6B\"):\n        self.max_tokens = max_tokens\n        self.overlap = overlap  # sentences of overlap\n        repo_id = tokenizer_model\n        if \"/\" not in tokenizer_model and not tokenizer_model.startswith(\"Qwen/\"):\n            repo_id = {\n                \"qwen3-embedding-0.6b\": \"Qwen/Qwen3-Embedding-0.6B\",\n            }.get(tokenizer_model.lower(), tokenizer_model)\n        \n        try:\n            self.tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)\n        except Exception as e:\n            print(f\"Warning: Failed to load tokenizer {repo_id}: {e}\")\n            print(\"Falling back to character-based approximation (4 chars ≈ 1 token)\")\n            self.tokenizer = None\n        # Fallback simple sentence splitter (period, question, exclamation, newline)\n        self._sent_re = re.compile(r\"(?<=[\\.\\!\\?])\\s+|\\n+\")\n        self.legacy = MarkdownRecursiveChunker(max_chunk_size=10_000, min_chunk_size=100)\n\n    # ------------------------------------------------------------------\n    def _token_len(self, text: str) -> int:\n        if self.tokenizer is not None:\n            return len(self.tokenizer.tokenize(text))\n        else:\n            # Fallback: approximate 4 characters per token\n            return max(1, len(text) // 4)\n\n    def split_markdown(self, markdown: str, *, document_id: str, metadata: Dict[str, Any]) -> List[Dict[str, Any]]:\n        \"\"\"Split one Markdown doc into chunks with max_tokens limit.\"\"\"\n        base_chunks = self.legacy.chunk(markdown, document_id, metadata)\n        new_chunks: List[Dict[str, Any]] = []\n        global_idx = 0\n        for ch in base_chunks:\n            sentences = [s.strip() for s in self._sent_re.split(ch[\"text\"]) if s.strip()]\n            if not sentences:\n                continue\n            window: List[str] = []\n            while sentences:\n                # Add until over limit\n                while sentences and self._token_len(\" \".join(window + [sentences[0]])) <= self.max_tokens:\n                    window.append(sentences.pop(0))\n                if not window:  # single sentence > limit → hard cut\n                    window.append(sentences.pop(0))\n                chunk_text = \" \".join(window)\n                new_chunk = {\n                    \"chunk_id\": f\"{document_id}_{global_idx}\",\n                    \"text\": chunk_text,\n                    \"metadata\": {\n                        **metadata,\n                        \"document_id\": document_id,\n                        \"chunk_index\": global_idx,\n                        \"heading_path\": metadata.get(\"heading_path\", []),\n                        \"heading_level\": len(metadata.get(\"heading_path\", [])),\n                        \"block_type\": metadata.get(\"block_type\", \"paragraph\"),\n                    },\n                }\n                new_chunks.append(new_chunk)\n                global_idx += 1\n                # Overlap: prepend last `overlap` sentences of the current window to the remaining queue\n                if self.overlap and sentences:\n                    back = window[-self.overlap:] if self.overlap <= len(window) else window[:]\n                    sentences = back + sentences\n                window = []\n        return new_chunks\n\n    # ------------------------------------------------------------------\n    # Element-tree based chunking (true Docling path)\n    # ------------------------------------------------------------------\n    def chunk_document(self, doc, *, document_id: str, metadata: Dict[str, Any] | None = None) -> List[Dict[str, Any]]:\n        \"\"\"Walk a DoclingDocument and emit chunks.\n\n        Tables / Code / Figures are emitted as atomic chunks.\n        Paragraph-like nodes are sentence-packed to <= max_tokens.\n        \"\"\"\n        metadata = metadata or {}\n\n        def _token_len(txt: str) -> int:\n            if self.tokenizer is not None:\n                return len(self.tokenizer.tokenize(txt))\n            else:\n                # Fallback: approximate 4 characters per token\n                return max(1, len(txt) // 4)\n\n        chunks: List[Dict[str, Any]] = []\n        global_idx = 0\n\n        # Helper to create a chunk and append to list\n        def _add_chunk(text: str, block_type: str, heading_path: List[str], page_no: int | None = None):\n            nonlocal global_idx\n            if not text.strip():\n                return\n            chunk_meta = {\n                **metadata,\n                \"document_id\": document_id,\n                \"chunk_index\": global_idx,\n                \"heading_path\": heading_path,\n                \"heading_level\": len(heading_path),\n                \"block_type\": block_type,\n            }\n            if page_no is not None:\n                chunk_meta[\"page\"] = page_no\n            chunks.append({\n                \"chunk_id\": f\"{document_id}_{global_idx}\",\n                \"text\": text,\n                \"metadata\": chunk_meta,\n            })\n            global_idx += 1\n\n        # The Docling API exposes .body which is a tree of nodes; we fall back to .texts/.tables lists if available\n        try:\n            # We walk doc.texts (reading order). We'll buffer consecutive paragraph items\n            current_heading_path: List[str] = []\n            buffer: List[str] = []\n            buffer_tokens = 0\n            buffer_page = None\n\n            def flush_buffer():\n                nonlocal buffer, buffer_tokens, buffer_page\n                if buffer:\n                    _add_chunk(\" \".join(buffer), \"paragraph\", heading_path=current_heading_path[:], page_no=buffer_page)\n                buffer, buffer_tokens, buffer_page = [], 0, None\n\n            # Create quick lookup for table items by id to preserve later insertion order if needed\n            tables_by_anchor = {\n                getattr(t, \"anchor_text_id\", None): t\n                for t in getattr(doc, \"tables\", [])\n                if getattr(t, \"anchor_text_id\", None) is not None\n            }\n\n            for txt_item in getattr(doc, \"texts\", []):\n                # If this text item is a placeholder for a table anchor, emit table first\n                anchor_id = getattr(txt_item, \"id\", None)\n                if anchor_id in tables_by_anchor:\n                    flush_buffer()\n                    tbl = tables_by_anchor[anchor_id]\n                    try:\n                        tbl_md = tbl.export_to_markdown(doc)  # pass doc for deprecation compliance\n                    except Exception:\n                        tbl_md = tbl.export_to_markdown() if hasattr(tbl, \"export_to_markdown\") else str(tbl)\n                    _add_chunk(tbl_md, \"table\", heading_path=current_heading_path[:], page_no=getattr(tbl, \"page_no\", None))\n\n                role = getattr(txt_item, \"role\", None)\n                if role == \"heading\":\n                    flush_buffer()\n                    level = getattr(txt_item, \"level\", 1)\n                    current_heading_path = current_heading_path[: max(0, level - 1)]\n                    current_heading_path.append(txt_item.text.strip())\n                    continue  # skip heading as content\n\n                text_piece = txt_item.text if hasattr(txt_item, \"text\") else str(txt_item)\n                piece_tokens = _token_len(text_piece)\n                if piece_tokens > self.max_tokens:  # very long paragraph\n                    flush_buffer()\n                    _add_chunk(text_piece, \"paragraph\", heading_path=current_heading_path[:], page_no=getattr(txt_item, \"page_no\", None))\n                    continue\n\n                if buffer_tokens + piece_tokens > self.max_tokens:\n                    flush_buffer()\n\n                buffer.append(text_piece)\n                buffer_tokens += piece_tokens\n                if buffer_page is None:\n                    buffer_page = getattr(txt_item, \"page_no\", None)\n\n            flush_buffer()\n\n            # Emit any remaining tables that were not anchored\n            for tbl in getattr(doc, \"tables\", []):\n                if tbl in tables_by_anchor.values():\n                    continue  # already emitted\n                try:\n                    tbl_md = tbl.export_to_markdown(doc)\n                except Exception:\n                    tbl_md = tbl.export_to_markdown() if hasattr(tbl, \"export_to_markdown\") else str(tbl)\n                _add_chunk(tbl_md, \"table\", heading_path=current_heading_path[:], page_no=getattr(tbl, \"page_no\", None))\n        except Exception as e:\n            print(f\"⚠️  Docling tree walk failed: {e}. Falling back to markdown splitter.\")\n            return self.split_markdown(doc.export_to_markdown(), document_id=document_id, metadata=metadata)\n\n        # --------------------------------------------------------------\n        # Second-pass consolidation: merge small consecutive paragraph\n        # chunks that share heading & page into up-to-max_tokens blobs.\n        # --------------------------------------------------------------\n        consolidated: List[Dict[str, Any]] = []\n        buf_txt: List[str] = []\n        buf_meta: Dict[str, Any] | None = None\n\n        def flush_paragraph_buffer():\n            nonlocal buf_txt, buf_meta\n            if not buf_txt:\n                return\n            merged_text = \" \".join(buf_txt)\n            # Re-use meta from first piece but update chunk_id later\n            new_chunk = {\n                \"chunk_id\": buf_meta[\"chunk_id\"],\n                \"text\": merged_text,\n                \"metadata\": buf_meta[\"metadata\"],\n            }\n            consolidated.append(new_chunk)\n            buf_txt = []\n            buf_meta = None\n\n        for ch in chunks:\n            if ch[\"metadata\"].get(\"block_type\") != \"paragraph\":\n                flush_paragraph_buffer()\n                consolidated.append(ch)\n                continue\n\n            if not buf_txt:\n                buf_txt.append(ch[\"text\"])\n                buf_meta = ch\n                continue\n\n            same_page = ch[\"metadata\"].get(\"page\") == buf_meta[\"metadata\"].get(\"page\")\n            same_heading = ch[\"metadata\"].get(\"heading_path\") == buf_meta[\"metadata\"].get(\"heading_path\")\n\n            prospective_len = self._token_len(\" \".join(buf_txt + [ch[\"text\"]]))\n            if same_page and same_heading and prospective_len <= self.max_tokens:\n                buf_txt.append(ch[\"text\"])\n            else:\n                flush_paragraph_buffer()\n                buf_txt.append(ch[\"text\"])\n                buf_meta = ch\n\n        flush_paragraph_buffer()\n\n        return consolidated\n\n    # Public API expected by IndexingPipeline --------------------------------\n    def chunk(self, text: str, document_id: str, document_metadata: Dict[str, Any] | None = None) -> List[Dict[str, Any]]:\n        return self.split_markdown(text, document_id=document_id, metadata=document_metadata or {})    "
  },
  {
    "path": "rag_system/ingestion/document_converter.py",
    "content": "from typing import List, Tuple, Dict, Any\nfrom docling.document_converter import DocumentConverter as DoclingConverter, PdfFormatOption\nfrom docling.datamodel.pipeline_options import PdfPipelineOptions, OcrMacOptions\nfrom docling.datamodel.base_models import InputFormat\nimport fitz  # PyMuPDF for quick text inspection\nimport os\n\nclass DocumentConverter:\n    \"\"\"\n    A class to convert various document formats to structured Markdown using the docling library.\n    Supports PDF, DOCX, HTML, and other formats.\n    \"\"\"\n    \n    # Mapping of file extensions to InputFormat\n    SUPPORTED_FORMATS = {\n        '.pdf': InputFormat.PDF,\n        '.docx': InputFormat.DOCX,\n        '.html': InputFormat.HTML,\n        '.htm': InputFormat.HTML,\n        '.md': InputFormat.MD,\n        '.txt': 'TXT',  # Special handling for plain text files\n    }\n    \n    def __init__(self):\n        \"\"\"Initializes the docling document converter with forced OCR enabled for macOS.\"\"\"\n        try:\n            # --- Converter WITHOUT OCR (fast path) ---\n            pipeline_no_ocr = PdfPipelineOptions()\n            pipeline_no_ocr.do_ocr = False\n            format_no_ocr = {\n                InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_no_ocr)\n            }\n            self.converter_no_ocr = DoclingConverter(format_options=format_no_ocr)\n\n            # --- Converter WITH OCR (fallback) ---\n            pipeline_ocr = PdfPipelineOptions()\n            pipeline_ocr.do_ocr = True\n            ocr_options = OcrMacOptions(force_full_page_ocr=True)\n            pipeline_ocr.ocr_options = ocr_options\n            format_ocr = {\n                InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_ocr)\n            }\n            self.converter_ocr = DoclingConverter(format_options=format_ocr)\n            \n            self.converter_general = DoclingConverter()\n\n            print(\"docling DocumentConverter(s) initialized (OCR + no-OCR + general).\")\n        except Exception as e:\n            print(f\"Error initializing docling DocumentConverter(s): {e}\")\n            self.converter_no_ocr = None\n            self.converter_ocr = None\n            self.converter_general = None\n\n    def convert_to_markdown(self, file_path: str) -> List[Tuple[str, Dict[str, Any]]]:\n        \"\"\"\n        Converts a document to a single Markdown string, preserving layout and tables.\n        Supports PDF, DOCX, HTML, and other formats.\n        \"\"\"\n        if not (self.converter_no_ocr and self.converter_ocr and self.converter_general):\n            print(\"docling converters not available. Skipping conversion.\")\n            return []\n        \n        file_ext = os.path.splitext(file_path)[1].lower()\n        if file_ext not in self.SUPPORTED_FORMATS:\n            print(f\"Unsupported file format: {file_ext}\")\n            return []\n        \n        input_format = self.SUPPORTED_FORMATS[file_ext]\n        \n        if input_format == InputFormat.PDF:\n            return self._convert_pdf_to_markdown(file_path)\n        elif input_format == 'TXT':\n            return self._convert_txt_to_markdown(file_path)\n        else:\n            return self._convert_general_to_markdown(file_path, input_format)\n    \n    def _convert_pdf_to_markdown(self, pdf_path: str) -> List[Tuple[str, Dict[str, Any]]]:\n        \"\"\"Convert PDF with OCR detection logic.\"\"\"\n        # Quick heuristic: if the PDF already contains a text layer, skip OCR for speed\n        def _pdf_has_text(path: str) -> bool:\n            try:\n                doc = fitz.open(path)\n                for page in doc:\n                    if page.get_text(\"text\").strip():\n                        return True\n            except Exception:\n                pass\n            return False\n\n        use_ocr = not _pdf_has_text(pdf_path)\n        converter = self.converter_ocr if use_ocr else self.converter_no_ocr\n        ocr_msg = \"(OCR enabled)\" if use_ocr else \"(no OCR)\"\n\n        print(f\"Converting {pdf_path} to Markdown using docling {ocr_msg}...\")\n        return self._perform_conversion(pdf_path, converter, ocr_msg)\n    \n    def _convert_txt_to_markdown(self, file_path: str) -> List[Tuple[str, Dict[str, Any]]]:\n        \"\"\"Convert plain text files to markdown by reading content directly.\"\"\"\n        print(f\"Converting {file_path} (TXT) to Markdown...\")\n        try:\n            with open(file_path, 'r', encoding='utf-8') as f:\n                content = f.read()\n            \n            markdown_content = f\"```\\n{content}\\n```\"\n            metadata = {\"source\": file_path}\n            \n            print(f\"Successfully converted {file_path} (TXT) to Markdown.\")\n            return [(markdown_content, metadata)]\n        except Exception as e:\n            print(f\"Error processing TXT file {file_path}: {e}\")\n            return []\n    \n    def _convert_general_to_markdown(self, file_path: str, input_format: InputFormat) -> List[Tuple[str, Dict[str, Any]]]:\n        \"\"\"Convert non-PDF formats using general converter.\"\"\"\n        print(f\"Converting {file_path} ({input_format.name}) to Markdown using docling...\")\n        return self._perform_conversion(file_path, self.converter_general, f\"({input_format.name})\")\n    \n    def _perform_conversion(self, file_path: str, converter, format_msg: str) -> List[Tuple[str, Dict[str, Any]]]:\n        \"\"\"Perform the actual conversion using the specified converter.\"\"\"\n        pages_data = []\n        try:\n            result = converter.convert(file_path)\n            markdown_content = result.document.export_to_markdown()\n            \n            metadata = {\"source\": file_path}\n            # Return the *DoclingDocument* object as third tuple element so downstream\n            # chunkers that understand the element tree can use it.  Legacy callers that\n            # expect only (markdown, metadata) can simply ignore the extra value.\n            pages_data.append((markdown_content, metadata, result.document))\n            print(f\"Successfully converted {file_path} with docling {format_msg}.\")\n            return pages_data\n        except Exception as e:\n            print(f\"Error processing {file_path} with docling: {e}\")\n            return []\n"
  },
  {
    "path": "rag_system/main.py",
    "content": "import os\nimport json\nimport sys\nimport argparse\nfrom dotenv import load_dotenv\n\n# Load environment variables from .env file\nload_dotenv()\n\n# The sys.path manipulation has been removed to prevent import conflicts.\n# This script should be run as a module from the project root, e.g.:\n# python -m rag_system.main api\n\nfrom rag_system.agent.loop import Agent\nfrom rag_system.utils.ollama_client import OllamaClient\n# Configuration is now defined in this file - no import needed\n\n# Advanced RAG System Configuration\n# ==================================\n# This file contains the MASTER configuration for all models used in the RAG system.\n# All components should reference these configurations to ensure consistency.\n\n# ============================================================================\n# 🎯 MASTER MODEL CONFIGURATION\n# ============================================================================\n# All model configurations are centralized here to prevent conflicts\n\n# LLM Backend Configuration\nLLM_BACKEND = os.getenv(\"LLM_BACKEND\", \"ollama\")\n\n# Ollama Models Configuration (for inference via Ollama)\nOLLAMA_CONFIG = {\n    \"host\": os.getenv(\"OLLAMA_HOST\", \"http://localhost:11434\"),\n    \"generation_model\": \"qwen3:8b\",  # Main text generation model\n    \"enrichment_model\": \"qwen3:0.6b\",  # Lightweight model for routing/enrichment\n}\n\nWATSONX_CONFIG = {\n    \"api_key\": os.getenv(\"WATSONX_API_KEY\", \"\"),\n    \"project_id\": os.getenv(\"WATSONX_PROJECT_ID\", \"\"),\n    \"url\": os.getenv(\"WATSONX_URL\", \"https://us-south.ml.cloud.ibm.com\"),\n    \"generation_model\": os.getenv(\"WATSONX_GENERATION_MODEL\", \"ibm/granite-13b-chat-v2\"),\n    \"enrichment_model\": os.getenv(\"WATSONX_ENRICHMENT_MODEL\", \"ibm/granite-8b-japanese\"),  # Lightweight model\n}\n\n# External Model Configuration (HuggingFace models used directly)\nEXTERNAL_MODELS = {\n    \"embedding_model\": \"Qwen/Qwen3-Embedding-0.6B\",  # HuggingFace embedding model (1024 dims - fresh start)\n    \"reranker_model\": \"answerdotai/answerai-colbert-small-v1\",  # ColBERT reranker\n    \"vision_model\": \"Qwen/Qwen-VL-Chat\",  # Vision model for multimodal\n    \"fallback_reranker\": \"BAAI/bge-reranker-base\",  # Backup reranker\n}\n\n# ============================================================================\n# 🔧 PIPELINE CONFIGURATIONS\n# ============================================================================\n\nPIPELINE_CONFIGS = {\n    \"default\": {\n        \"description\": \"Production-ready pipeline with hybrid search, AI reranking, and verification\",\n        \"storage\": {\n            \"lancedb_uri\": \"./lancedb\",\n            \"text_table_name\": \"text_pages_v3\", \n            \"image_table_name\": \"image_pages_v3\",\n            \"bm25_path\": \"./index_store/bm25\",\n            \"graph_path\": \"./index_store/graph/knowledge_graph.gml\"\n        },\n        \"retrieval\": {\n            \"retriever\": \"multivector\",\n            \"search_type\": \"hybrid\",\n            \"late_chunking\": {\n                \"enabled\": True,\n                \"table_suffix\": \"_lc_v3\"\n        },\n            \"dense\": { \n                \"enabled\": True,\n                \"weight\": 0.7\n            },\n            \"bm25\": { \n                \"enabled\": True,\n                \"index_name\": \"rag_bm25_index\"\n            },\n            \"graph\": { \n                \"enabled\": False,\n                \"graph_path\": \"./index_store/graph/knowledge_graph.gml\"\n            }\n        },\n        # 🎯 EMBEDDING MODEL: Uses HuggingFace Qwen model directly\n        \"embedding_model_name\": EXTERNAL_MODELS[\"embedding_model\"],\n        # 🎯 VISION MODEL: For multimodal capabilities  \n        \"vision_model_name\": EXTERNAL_MODELS[\"vision_model\"],\n        # 🎯 RERANKER: AI-powered reranking with ColBERT\n        \"reranker\": {\n            \"enabled\": True, \n            \"type\": \"ai\",\n            \"strategy\": \"rerankers-lib\",\n            \"model_name\": EXTERNAL_MODELS[\"reranker_model\"],\n            \"top_k\": 10\n        },\n        \"query_decomposition\": {\n            \"enabled\": True,\n            \"max_sub_queries\": 3,\n            \"compose_from_sub_answers\": True\n        },\n        \"verification\": {\"enabled\": True},\n        \"retrieval_k\": 20,\n        \"context_window_size\": 0,\n        \"semantic_cache_threshold\": 0.98,\n        \"cache_scope\": \"global\",\n        # 🔧 Contextual enrichment configuration\n        \"contextual_enricher\": {\n            \"enabled\": True,\n            \"window_size\": 1\n        },\n        # 🔧 Indexing configuration\n        \"indexing\": {\n            \"embedding_batch_size\": 50,\n            \"enrichment_batch_size\": 10,\n            \"enable_progress_tracking\": True\n        }\n    },\n    \"fast\": {\n        \"description\": \"Speed-optimized pipeline with minimal overhead\",\n        \"storage\": {\n            \"lancedb_uri\": \"./lancedb\",\n            \"text_table_name\": \"text_pages_v3\",\n            \"image_table_name\": \"image_pages_v3\", \n            \"bm25_path\": \"./index_store/bm25\"\n        },\n        \"retrieval\": {\n            \"retriever\": \"multivector\",\n            \"search_type\": \"vector_only\",\n            \"late_chunking\": {\"enabled\": False},\n            \"dense\": {\"enabled\": True}\n        },\n        \"embedding_model_name\": EXTERNAL_MODELS[\"embedding_model\"],\n        \"reranker\": {\"enabled\": False},\n        \"query_decomposition\": {\"enabled\": False},\n        \"verification\": {\"enabled\": False},\n        \"retrieval_k\": 10,\n        \"context_window_size\": 0,\n        # 🔧 Contextual enrichment (disabled for speed)\n        \"contextual_enricher\": {\n            \"enabled\": False,\n            \"window_size\": 1\n        },\n        # 🔧 Indexing configuration\n        \"indexing\": {\n            \"embedding_batch_size\": 100,\n            \"enrichment_batch_size\": 50,\n            \"enable_progress_tracking\": False\n        }\n    },\n    \"bm25\": {\n        \"enabled\": True,\n        \"index_name\": \"rag_bm25_index\"\n    },\n    \"graph_rag\": {\n        \"enabled\": False, # Keep disabled for now unless specified\n    }\n}\n\n# ============================================================================\n# 🏭 FACTORY FUNCTIONS\n# ============================================================================\n\ndef get_agent(mode: str = \"default\") -> Agent:\n    \"\"\"\n    Factory function to get an instance of the RAG agent based on the specified mode.\n    \n    Args:\n        mode: Configuration mode (\"default\", \"fast\")\n        \n    Returns:\n        Configured Agent instance\n    \"\"\"\n    load_dotenv()\n    \n    # Initialize the appropriate LLM client based on backend configuration\n    if LLM_BACKEND.lower() == \"watsonx\":\n        from rag_system.utils.watsonx_client import WatsonXClient\n        \n        if not WATSONX_CONFIG[\"api_key\"] or not WATSONX_CONFIG[\"project_id\"]:\n            raise ValueError(\n                \"Watson X configuration incomplete. Please set WATSONX_API_KEY and WATSONX_PROJECT_ID \"\n                \"environment variables.\"\n            )\n        \n        llm_client = WatsonXClient(\n            api_key=WATSONX_CONFIG[\"api_key\"],\n            project_id=WATSONX_CONFIG[\"project_id\"],\n            url=WATSONX_CONFIG[\"url\"]\n        )\n        llm_config = WATSONX_CONFIG\n        print(f\"🔧 Using Watson X backend with granite models\")\n    else:\n        llm_client = OllamaClient(host=OLLAMA_CONFIG[\"host\"])\n        llm_config = OLLAMA_CONFIG\n        print(f\"🔧 Using Ollama backend\")\n    \n    # Get the configuration for the specified mode\n    config = PIPELINE_CONFIGS.get(mode, PIPELINE_CONFIGS['default'])\n    \n    agent = Agent(\n        pipeline_configs=config, \n        llm_client=llm_client, \n        ollama_config=llm_config\n    )\n    return agent\n\ndef validate_model_config():\n    \"\"\"\n    Validates the model configuration for consistency and availability.\n    \n    Raises:\n        ValueError: If configuration conflicts are detected\n    \"\"\"\n    print(\"🔍 Validating model configuration...\")\n    \n    # Check for embedding model consistency\n    default_embedding = PIPELINE_CONFIGS[\"default\"][\"embedding_model_name\"]\n    external_embedding = EXTERNAL_MODELS[\"embedding_model\"]\n    \n    if default_embedding != external_embedding:\n        raise ValueError(f\"Embedding model mismatch: {default_embedding} != {external_embedding}\")\n    \n    # Check reranker configuration\n    default_reranker = PIPELINE_CONFIGS[\"default\"][\"reranker\"][\"model_name\"]\n    external_reranker = EXTERNAL_MODELS[\"reranker_model\"]\n    \n    if default_reranker != external_reranker:\n        raise ValueError(f\"Reranker model mismatch: {default_reranker} != {external_reranker}\")\n    \n    print(\"✅ Model configuration validation passed!\")\n    \n    return True\n\n# ============================================================================\n# 🚀 UTILITY FUNCTIONS  \n# ============================================================================\n\ndef run_indexing(docs_path: str, config_mode: str = \"default\"):\n    \"\"\"Runs the indexing pipeline for the specified documents.\"\"\"\n    print(f\"📚 Starting indexing for documents in: {docs_path}\")\n    validate_model_config()\n    \n    # Local import to avoid circular dependencies\n    from rag_system.pipelines.indexing_pipeline import IndexingPipeline\n    \n    # Get the appropriate indexing pipeline from the factory\n    indexing_pipeline = IndexingPipeline(PIPELINE_CONFIGS[config_mode])\n    \n    # Find all PDF files in the directory\n    pdf_files = [os.path.join(docs_path, f) for f in os.listdir(docs_path) if f.endswith(\".pdf\")]\n    \n    if not pdf_files:\n        print(\"No PDF files found to index.\")\n        return\n\n    # Process all documents through the pipeline\n    indexing_pipeline.process_documents(pdf_files)\n    print(\"✅ Indexing complete.\")\n\ndef run_chat(query: str):\n    \"\"\"\n    Runs the agentic RAG pipeline for a given query.\n    Returns the result as a JSON string.\n    \"\"\"\n    try:\n        validate_model_config()\n        ollama_client = OllamaClient(OLLAMA_CONFIG[\"host\"])\n    except ConnectionError as e:\n        print(e)\n        return json.dumps({\"error\": str(e)}, indent=2)\n    except ValueError as e:\n        print(f\"Configuration Error: {e}\")\n        return json.dumps({\"error\": f\"Configuration Error: {e}\"}, indent=2)\n\n    agent = Agent(PIPELINE_CONFIGS['default'], ollama_client, OLLAMA_CONFIG)\n    result = agent.run(query)\n    return json.dumps(result, indent=2, ensure_ascii=False)\n\ndef show_graph():\n    \"\"\"\n    Loads and displays the knowledge graph.\n    \"\"\"\n    import networkx as nx\n    import matplotlib.pyplot as plt\n\n    graph_path = PIPELINE_CONFIGS[\"indexing\"][\"graph_path\"]\n    if not os.path.exists(graph_path):\n        print(\"Knowledge graph not found. Please run the 'index' command first.\")\n        return\n\n    G = nx.read_gml(graph_path)\n    print(\"--- Knowledge Graph ---\")\n    print(\"Nodes:\", G.nodes(data=True))\n    print(\"Edges:\", G.edges(data=True))\n    print(\"---------------------\")\n\n    # Optional: Visualize the graph\n    try:\n        pos = nx.spring_layout(G)\n        nx.draw(G, pos, with_labels=True, node_size=2000, node_color=\"skyblue\", font_size=10, font_weight=\"bold\")\n        edge_labels = nx.get_edge_attributes(G, 'label')\n        nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)\n        plt.title(\"Knowledge Graph Visualization\")\n        plt.show()\n    except Exception as e:\n        print(f\"\\nCould not visualize the graph. Matplotlib might not be installed or configured for your environment.\")\n        print(f\"Error: {e}\")\n\ndef run_api_server():\n    \"\"\"Starts the advanced RAG API server.\"\"\"\n    from rag_system.api_server import start_server\n    start_server()\n\ndef main():\n    if len(sys.argv) < 2:\n        print(\"Usage: python main.py [index|chat|show_graph|api] [query]\")\n        return\n\n    command = sys.argv[1]\n    if command == \"index\":\n        # Allow passing file paths from the command line\n        files = sys.argv[2:] if len(sys.argv) > 2 else None\n        run_indexing(files)\n    elif command == \"chat\":\n        if len(sys.argv) < 3:\n            print(\"Usage: python main.py chat <query>\")\n            return\n        query = \" \".join(sys.argv[2:])\n        # 🆕 Print the result for command-line usage\n        print(run_chat(query))\n    elif command == \"show_graph\":\n        show_graph()\n    elif command == \"api\":\n        run_api_server()\n    else:\n        print(f\"Unknown command: {command}\")\n\nif __name__ == \"__main__\":\n    # This allows running the script from the command line to index documents.\n    parser = argparse.ArgumentParser(description=\"Main entry point for the RAG system.\")\n    parser.add_argument(\n        '--index',\n        type=str,\n        help='Path to the directory containing documents to index.'\n    )\n    parser.add_argument(\n        '--config',\n        type=str,\n        default='default',\n        help='The configuration profile to use (e.g., \"default\", \"fast\").'\n    )\n\n    args = parser.parse_args()\n\n    # Load environment variables\n    load_dotenv()\n\n    if args.index:\n        run_indexing(args.index, args.config)\n    else:\n        # This is where you might start a server or interactive session\n        print(\"No action specified. Use --index to process documents.\")\n        # Example of how to get an agent instance\n        # agent = get_agent(args.config)\n        # print(f\"Agent loaded with '{args.config}' config.\")\n"
  },
  {
    "path": "rag_system/pipelines/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/pipelines/indexing_pipeline.py",
    "content": "from typing import List, Dict, Any\nimport os\nimport networkx as nx\nfrom rag_system.ingestion.document_converter import DocumentConverter\nfrom rag_system.ingestion.chunking import MarkdownRecursiveChunker\nfrom rag_system.indexing.representations import EmbeddingGenerator, select_embedder\nfrom rag_system.indexing.embedders import LanceDBManager, VectorIndexer\nfrom rag_system.indexing.graph_extractor import GraphExtractor\nfrom rag_system.utils.ollama_client import OllamaClient\nfrom rag_system.indexing.contextualizer import ContextualEnricher\nfrom rag_system.indexing.overview_builder import OverviewBuilder\n\nclass IndexingPipeline:\n    def __init__(self, config: Dict[str, Any], ollama_client: OllamaClient, ollama_config: Dict[str, str]):\n        self.config = config\n        self.llm_client = ollama_client\n        self.ollama_config = ollama_config\n        self.document_converter = DocumentConverter()\n        # Chunker selection: docling (token-based) or legacy (character-based)\n        chunker_mode = config.get(\"chunker_mode\", \"docling\")\n        \n        # 🔧 Get chunking configuration from frontend parameters\n        chunking_config = config.get(\"chunking\", {})\n        chunk_size = chunking_config.get(\"chunk_size\", config.get(\"chunk_size\", 1500))\n        chunk_overlap = chunking_config.get(\"chunk_overlap\", config.get(\"chunk_overlap\", 200))\n        \n        print(f\"🔧 CHUNKING CONFIG: Size: {chunk_size}, Overlap: {chunk_overlap}, Mode: {chunker_mode}\")\n        \n        if chunker_mode == \"docling\":\n            try:\n                from rag_system.ingestion.docling_chunker import DoclingChunker\n                self.chunker = DoclingChunker(\n                    max_tokens=config.get(\"max_tokens\", chunk_size),\n                    overlap=config.get(\"overlap_sentences\", 1),\n                    tokenizer_model=config.get(\"embedding_model_name\", \"qwen3-embedding-0.6b\"),\n                )\n                print(\"🪄 Using DoclingChunker for high-recall sentence packing.\")\n            except Exception as e:\n                print(f\"⚠️  Failed to initialise DoclingChunker: {e}. Falling back to legacy chunker.\")\n                self.chunker = MarkdownRecursiveChunker(\n                    max_chunk_size=chunk_size,\n                    min_chunk_size=min(chunk_overlap, chunk_size // 4),  # Sensible minimum\n                    tokenizer_model=config.get(\"embedding_model_name\", \"Qwen/Qwen3-Embedding-0.6B\")\n                )\n        else:\n            self.chunker = MarkdownRecursiveChunker(\n                max_chunk_size=chunk_size,\n                min_chunk_size=min(chunk_overlap, chunk_size // 4),  # Sensible minimum\n                tokenizer_model=config.get(\"embedding_model_name\", \"Qwen/Qwen3-Embedding-0.6B\")\n            )\n\n        retriever_configs = self.config.get(\"retrievers\") or self.config.get(\"retrieval\", {})\n        storage_config = self.config[\"storage\"]\n        \n        # Get batch processing configuration\n        indexing_config = self.config.get(\"indexing\", {})\n        self.embedding_batch_size = indexing_config.get(\"embedding_batch_size\", 50)\n        self.enrichment_batch_size = indexing_config.get(\"enrichment_batch_size\", 10)\n        self.enable_progress_tracking = indexing_config.get(\"enable_progress_tracking\", True)\n\n        # Treat dense retrieval as enabled by default unless explicitly disabled\n        dense_cfg = retriever_configs.setdefault(\"dense\", {})\n        dense_cfg.setdefault(\"enabled\", True)\n\n        if dense_cfg.get(\"enabled\"):\n            # Accept modern keys: db_path or lancedb_path; fall back to legacy lancedb_uri\n            db_path = (\n                storage_config.get(\"db_path\")\n                or storage_config.get(\"lancedb_path\")\n                or storage_config.get(\"lancedb_uri\")\n            )\n            if not db_path:\n                raise KeyError(\n                    \"Storage config must include 'db_path', 'lancedb_path', or 'lancedb_uri' for LanceDB.\"\n                )\n            self.lancedb_manager = LanceDBManager(db_path=db_path)\n            self.vector_indexer = VectorIndexer(self.lancedb_manager)\n            embedding_model = select_embedder(\n                self.config.get(\"embedding_model_name\", \"BAAI/bge-small-en-v1.5\"),\n                self.ollama_config.get(\"host\") if isinstance(self.ollama_config, dict) else None,\n            )\n            self.embedding_generator = EmbeddingGenerator(\n                embedding_model=embedding_model, \n                batch_size=self.embedding_batch_size\n            )\n\n        if retriever_configs.get(\"graph\", {}).get(\"enabled\"):\n            self.graph_extractor = GraphExtractor(\n                llm_client=self.llm_client,\n                llm_model=self.ollama_config[\"generation_model\"]\n            )\n\n        if self.config.get(\"contextual_enricher\", {}).get(\"enabled\"):\n            # 🔧 Use frontend enrich_model parameter if provided\n            enrichment_model = (\n                self.config.get(\"enrich_model\") or  # Frontend parameter\n                self.config.get(\"enrichment_model_name\") or  # Alternative config key\n                self.ollama_config.get(\"enrichment_model\") or  # Default from ollama config\n                self.ollama_config[\"generation_model\"]  # Final fallback\n            )\n            print(f\"🔧 ENRICHMENT MODEL: Using '{enrichment_model}' for contextual enrichment\")\n            \n            self.contextual_enricher = ContextualEnricher(\n                llm_client=self.llm_client,\n                llm_model=enrichment_model,\n                batch_size=self.enrichment_batch_size\n            )\n\n        # Overview builder always enabled for triage routing\n        ov_path = self.config.get(\"overview_path\")\n        self.overview_builder = OverviewBuilder(\n            llm_client=self.llm_client,\n            model=self.config.get(\"overview_model_name\", self.ollama_config.get(\"enrichment_model\", \"qwen3:0.6b\")),\n            first_n_chunks=self.config.get(\"overview_first_n_chunks\", 5),\n            out_path=ov_path if ov_path else None,\n        )\n\n        # ------------------------------------------------------------------\n        # Late-Chunk encoder initialisation (optional)\n        # ------------------------------------------------------------------\n        self.latechunk_enabled = retriever_configs.get(\"latechunk\", {}).get(\"enabled\", False)\n        if self.latechunk_enabled:\n            try:\n                from rag_system.indexing.latechunk import LateChunkEncoder\n                self.latechunk_cfg = retriever_configs[\"latechunk\"]\n                self.latechunk_encoder = LateChunkEncoder(model_name=self.config.get(\"embedding_model_name\", \"qwen3-embedding-0.6b\"))\n            except Exception as e:\n                print(f\"⚠️  Failed to initialise LateChunkEncoder: {e}. Disabling latechunk retrieval.\")\n                self.latechunk_enabled = False\n\n    def run(self, file_paths: List[str] | None = None, *, documents: List[str] | None = None):\n        \"\"\"\n        Processes and indexes documents based on the pipeline's configuration.\n        Accepts legacy keyword *documents* as an alias for *file_paths* so that\n        older callers (backend/index builder) keep working.\n        \"\"\"\n        # Back-compat shim ---------------------------------------------------\n        if file_paths is None and documents is not None:\n            file_paths = documents\n        if file_paths is None:\n            raise TypeError(\"IndexingPipeline.run() expects 'file_paths' (or alias 'documents') argument\")\n\n        print(f\"--- Starting indexing process for {len(file_paths)} files. ---\")\n        \n        # Import progress tracking utilities\n        from rag_system.utils.batch_processor import timer, ProgressTracker, estimate_memory_usage\n        \n        with timer(\"Complete Indexing Pipeline\"):\n            # Step 1: Document Processing and Chunking\n            all_chunks = []\n            doc_chunks_map = {}\n            with timer(\"Document Processing & Chunking\"):\n                file_tracker = ProgressTracker(len(file_paths), \"Document Processing\")\n                \n                for file_path in file_paths:\n                    try:\n                        document_id = os.path.basename(file_path)\n                        print(f\"Processing: {document_id}\")\n                        \n                        pages_data = self.document_converter.convert_to_markdown(file_path)\n                        file_chunks = []\n                        \n                        for tpl in pages_data:\n                            if len(tpl) == 3:\n                                markdown_text, metadata, doc_obj = tpl\n                                if hasattr(self.chunker, \"chunk_document\"):\n                                    chunks = self.chunker.chunk_document(doc_obj, document_id=document_id, metadata=metadata)\n                                else:\n                                    chunks = self.chunker.chunk(markdown_text, document_id, metadata)\n                            else:\n                                markdown_text, metadata = tpl\n                                chunks = self.chunker.chunk(markdown_text, document_id, metadata)\n                            file_chunks.extend(chunks)\n                        \n                        # Add a sequential chunk_index to each chunk within the document\n                        for i, chunk in enumerate(file_chunks):\n                            if 'metadata' not in chunk:\n                                chunk['metadata'] = {}\n                            chunk['metadata']['chunk_index'] = i\n                        \n                        # Build and persist document overview (non-blocking errors)\n                        try:\n                            self.overview_builder.build_and_store(document_id, file_chunks)\n                        except Exception as e:\n                            print(f\"  ⚠️  Failed to create overview for {document_id}: {e}\")\n                        \n                        all_chunks.extend(file_chunks)\n                        doc_chunks_map[document_id] = file_chunks  # save for late-chunk step\n                        print(f\"  Generated {len(file_chunks)} chunks from {document_id}\")\n                        file_tracker.update(1)\n                        \n                    except Exception as e:\n                        print(f\"  ❌ Error processing {file_path}: {e}\")\n                        file_tracker.update(1, errors=1)\n                        continue\n                \n                file_tracker.finish()\n\n            if not all_chunks:\n                print(\"No text chunks were generated. Skipping indexing.\")\n                return\n\n            print(f\"\\n✅ Generated {len(all_chunks)} text chunks total.\")\n            memory_mb = estimate_memory_usage(all_chunks)\n            print(f\"📊 Estimated memory usage: {memory_mb:.1f}MB\")\n\n            retriever_configs = self.config.get(\"retrievers\") or self.config.get(\"retrieval\", {})\n\n            # Step 3: Optional Contextual Enrichment (before indexing for consistency)\n            enricher_config = self.config.get(\"contextual_enricher\", {})\n            enricher_enabled = enricher_config.get(\"enabled\", False)\n            \n            print(f\"\\n🔍 CONTEXTUAL ENRICHMENT DEBUG:\")\n            print(f\"   Config present: {bool(enricher_config)}\")\n            print(f\"   Enabled: {enricher_enabled}\")\n            print(f\"   Has enricher object: {hasattr(self, 'contextual_enricher')}\")\n            \n            if hasattr(self, 'contextual_enricher') and enricher_enabled:\n                with timer(\"Contextual Enrichment\"):\n                    window_size = enricher_config.get(\"window_size\", 1)\n                    print(f\"\\n🚀 CONTEXTUAL ENRICHMENT ACTIVE!\")\n                    print(f\"   Window size: {window_size}\")\n                    print(f\"   Model: {self.contextual_enricher.llm_model}\")\n                    print(f\"   Batch size: {self.contextual_enricher.batch_size}\")\n                    print(f\"   Processing {len(all_chunks)} chunks...\")\n                    \n                    # Show before/after example\n                    if all_chunks:\n                        print(f\"   Example BEFORE: '{all_chunks[0]['text'][:100]}...'\")\n                    \n                    # This modifies the 'text' field in each chunk dictionary\n                    all_chunks = self.contextual_enricher.enrich_chunks(all_chunks, window_size=window_size)\n                    \n                    if all_chunks:\n                        print(f\"   Example AFTER: '{all_chunks[0]['text'][:100]}...'\")\n                    \n                    print(f\"✅ Enriched {len(all_chunks)} chunks with context for indexing.\")\n            else:\n                print(f\"⚠️  CONTEXTUAL ENRICHMENT SKIPPED:\")\n                if not hasattr(self, 'contextual_enricher'):\n                    print(f\"   Reason: No enricher object (config enabled={enricher_enabled})\")\n                elif not enricher_enabled:\n                    print(f\"   Reason: Disabled in config\")\n                print(f\"   Chunks will be indexed without contextual enrichment.\")\n\n            # Step 4: Create BM25 Index from enriched chunks (for consistency with vector index)\n            if hasattr(self, 'vector_indexer') and hasattr(self, 'embedding_generator'):\n                with timer(\"Vector Embedding & Indexing\"):\n                    table_name = self.config[\"storage\"].get(\"text_table_name\") or retriever_configs.get(\"dense\", {}).get(\"lancedb_table_name\", \"default_text_table\")\n                    print(f\"\\n--- Generating embeddings with {self.config.get('embedding_model_name')} ---\")\n                    \n                    embeddings = self.embedding_generator.generate(all_chunks)\n                    \n                    print(f\"\\n--- Indexing {len(embeddings)} vectors into LanceDB table: {table_name} ---\")\n                    self.vector_indexer.index(table_name, all_chunks, embeddings)\n                    print(\"✅ Vector embeddings indexed successfully\")\n\n                    # Create FTS index on the 'text' field after adding data\n                    print(f\"\\n--- Ensuring Full-Text Search (FTS) index on table '{table_name}' ---\")\n                    try:\n                        tbl = self.lancedb_manager.get_table(table_name)\n                        # LanceDB's default index name is \"text_idx\" while older\n                        # revisions of this pipeline used our own name \"fts_text\".\n                        # Guard against both so we don't attempt to create a     \n                        # duplicate index and trigger a LanceError.\n                        existing_indices = [idx.name for idx in tbl.list_indices()]\n                        if not any(name in existing_indices for name in (\"text_idx\", \"fts_text\")):\n                            # Use LanceDB default index naming (\"text_idx\")\n                            tbl.create_fts_index(\n                                \"text\",\n                                use_tantivy=False,\n                                replace=False,\n                            )\n                            print(\"✅ FTS index created successfully (using Lance native FTS).\")\n                        else:\n                            print(\"ℹ️  FTS index already exists – skipped creation.\")\n                    except Exception as e:\n                        print(f\"❌ Failed to create/verify FTS index: {e}\")\n\n                    # ---------------------------------------------------\n                    # Late-Chunk Embedding + Indexing (optional)\n                    # ---------------------------------------------------\n                    if self.latechunk_enabled:\n                        with timer(\"Late-Chunk Embedding & Indexing\"):\n                            lc_table_name = self.latechunk_cfg.get(\"lancedb_table_name\", f\"{table_name}_lc\")\n                            print(f\"\\n--- Generating late-chunk embeddings (table={lc_table_name}) ---\")\n\n                            total_lc_vecs = 0\n                            for doc_id, doc_chunks in doc_chunks_map.items():\n                                # Build full text and span list\n                                full_text_parts = []\n                                spans = []\n                                current_pos = 0\n                                for ch in doc_chunks:\n                                    ch_text = ch[\"text\"]\n                                    full_text_parts.append(ch_text)\n                                    start = current_pos\n                                    end = start + len(ch_text)\n                                    spans.append((start, end))\n                                    current_pos = end + 1  # +1 for newline to join later\n                                full_doc = \"\\n\".join(full_text_parts)\n\n                                try:\n                                    lc_vecs = self.latechunk_encoder.encode(full_doc, spans)\n                                except Exception as e:\n                                    print(f\"⚠️  LateChunk encode failed for {doc_id}: {e}\")\n                                    continue\n\n                                if len(doc_chunks) == 0 or len(lc_vecs) == 0:\n                                    # Nothing to index for this document\n                                    continue\n                                if len(lc_vecs) != len(doc_chunks):\n                                    print(f\"⚠️  Mismatch LC vecs ({len(lc_vecs)}) vs chunks ({len(doc_chunks)}) for {doc_id}. Skipping.\")\n                                    continue\n\n                                self.vector_indexer.index(lc_table_name, doc_chunks, lc_vecs)\n                                total_lc_vecs += len(lc_vecs)\n\n                            print(f\"✅ Late-chunk vectors indexed: {total_lc_vecs}\")\n                \n            # Step 6: Knowledge Graph Extraction (Optional)\n            if hasattr(self, 'graph_extractor'):\n                with timer(\"Knowledge Graph Extraction\"):\n                    graph_path = retriever_configs.get(\"graph\", {}).get(\"graph_path\", \"./index_store/graph/default_graph.gml\")\n                    print(f\"\\n--- Building and saving knowledge graph to: {graph_path} ---\")\n                    \n                    graph_data = self.graph_extractor.extract(all_chunks)\n                    G = nx.DiGraph()\n                    for entity in graph_data['entities']:\n                        G.add_node(entity['id'], type=entity.get('type', 'Unknown'), properties=entity.get('properties', {}))\n                    for rel in graph_data['relationships']:\n                        G.add_edge(rel['source'], rel['target'], label=rel['label'])\n                    \n                    os.makedirs(os.path.dirname(graph_path), exist_ok=True)\n                    nx.write_gml(G, graph_path)\n                    print(f\"✅ Knowledge graph saved successfully.\")\n                    \n        print(\"\\n--- ✅ Indexing Complete ---\")\n        self._print_final_statistics(len(file_paths), len(all_chunks))\n    \n    def _print_final_statistics(self, num_files: int, num_chunks: int):\n        \"\"\"Print final indexing statistics\"\"\"\n        print(f\"\\n📈 Final Statistics:\")\n        print(f\"  Files processed: {num_files}\")\n        print(f\"  Chunks generated: {num_chunks}\")\n        print(f\"  Average chunks per file: {num_chunks/num_files:.1f}\")\n        \n        # Component status\n        components = []\n        if hasattr(self, 'contextual_enricher'):\n            components.append(\"✅ Contextual Enrichment\")\n        if hasattr(self, 'vector_indexer'):\n            components.append(\"✅ Vector & FTS Index\")\n        if hasattr(self, 'graph_extractor'):\n            components.append(\"✅ Knowledge Graph\")\n            \n        print(f\"  Components: {', '.join(components)}\")\n        print(f\"  Batch sizes: Embeddings={self.embedding_batch_size}, Enrichment={self.enrichment_batch_size}\")\n"
  },
  {
    "path": "rag_system/pipelines/retrieval_pipeline.py",
    "content": "import pymupdf\nfrom typing import List, Dict, Any, Tuple, Optional\nfrom PIL import Image\nimport concurrent.futures\nimport time\nimport json\nimport lancedb\nimport logging\nimport math\nimport numpy as np\nfrom threading import Lock\n\nfrom rag_system.utils.ollama_client import OllamaClient\nfrom rag_system.retrieval.retrievers import MultiVectorRetriever, GraphRetriever\nfrom rag_system.indexing.multimodal import LocalVisionModel\nfrom rag_system.indexing.representations import select_embedder\nfrom rag_system.indexing.embedders import LanceDBManager\nfrom rag_system.rerankers.reranker import QwenReranker\nfrom rag_system.rerankers.sentence_pruner import SentencePruner\n# from rag_system.indexing.chunk_store import ChunkStore\n\nimport os\nfrom PIL import Image\n\n# ---------------------------------------------------------------------------\n# Thread-safety helpers\n# ---------------------------------------------------------------------------\n\n# 1. ColBERT (via `rerankers` lib) is not thread-safe.  We protect the actual\n#    `.rank()` call with `_rerank_lock`.\n_rerank_lock: Lock = Lock()\n\n# 2. Loading a large cross-encoder or ColBERT model can easily take >1 GB of\n#    RAM.  When multiple sub-queries are processed in parallel they may try to\n#    instantiate the reranker simultaneously, which results in PyTorch meta\n#    tensor errors.  We therefore guard the *initialisation* with its own\n#    lock so only one thread carries out the heavy `from_pretrained()` call.\n_ai_reranker_init_lock: Lock = Lock()\n\n# Lock to serialise first-time Provence model load\n_sentence_pruner_lock: Lock = Lock()\n\nclass RetrievalPipeline:\n    \"\"\"\n    Orchestrates the state-of-the-art multimodal RAG pipeline.\n    \"\"\"\n    def __init__(self, config: Dict[str, Any], ollama_client: OllamaClient, ollama_config: Dict[str, Any]):\n        self.config = config\n        self.ollama_config = ollama_config\n        self.ollama_client = ollama_client\n        \n        # Support both legacy \"retrievers\" key and newer \"retrieval\" key\n        self.retriever_configs = self.config.get(\"retrievers\") or self.config.get(\"retrieval\", {})\n        self.storage_config = self.config[\"storage\"]\n        \n        # Defer initialization to just-in-time methods\n        self.db_manager = None\n        self.text_embedder = None\n        self.dense_retriever = None\n        self.bm25_retriever = None\n        # Use a private attribute to avoid clashing with the public property\n        self._graph_retriever = None\n        self.reranker = None\n        self.ai_reranker = None\n\n    def _get_db_manager(self):\n        if self.db_manager is None:\n            # Accept either \"db_path\" (preferred) or legacy \"lancedb_uri\"\n            db_path = self.storage_config.get(\"db_path\") or self.storage_config.get(\"lancedb_uri\")\n            if not db_path:\n                raise ValueError(\"Storage config must contain 'db_path' or 'lancedb_uri'.\")\n            self.db_manager = LanceDBManager(db_path=db_path)\n        return self.db_manager\n\n    def _get_text_embedder(self):\n        if self.text_embedder is None:\n            from rag_system.indexing.representations import select_embedder\n            self.text_embedder = select_embedder(\n                self.config.get(\"embedding_model_name\", \"BAAI/bge-small-en-v1.5\"),\n                self.ollama_config.get(\"host\") if isinstance(self.ollama_config, dict) else None,\n            )\n        return self.text_embedder\n\n    def _get_dense_retriever(self):\n        \"\"\"Ensure a dense MultiVectorRetriever is always available unless explicitly disabled.\"\"\"\n        if self.dense_retriever is None:\n            # If the config explicitly sets dense.enabled to False, respect it\n            if self.retriever_configs.get(\"dense\", {}).get(\"enabled\", True) is False:\n                return None\n\n            try:\n                db_manager = self._get_db_manager()\n                text_embedder = self._get_text_embedder()\n                fusion_cfg = self.config.get(\"fusion\", {})\n                self.dense_retriever = MultiVectorRetriever(\n                    db_manager,\n                    text_embedder,\n                    vision_model=None,\n                    fusion_config=fusion_cfg,\n                )\n            except Exception as e:\n                print(f\"❌ Failed to initialise dense retriever: {e}\")\n                self.dense_retriever = None\n        return self.dense_retriever\n\n    def _get_bm25_retriever(self):\n        if self.bm25_retriever is None and self.retriever_configs.get(\"bm25\", {}).get(\"enabled\"):\n            try:\n                print(f\"🔧 Lazily initializing BM25 retriever...\")\n                self.bm25_retriever = BM25Retriever(\n                    index_path=self.storage_config[\"bm25_path\"],\n                    index_name=self.retriever_configs[\"bm25\"][\"index_name\"]\n                )\n                print(\"✅ BM25 retriever initialized successfully\")\n            except Exception as e:\n                print(f\"❌ Failed to initialize BM25 retriever on demand: {e}\")\n                # Keep it None so we don't try again\n        return self.bm25_retriever\n\n    def _get_graph_retriever(self):\n        if self._graph_retriever is None and self.retriever_configs.get(\"graph\", {}).get(\"enabled\"):\n            self._graph_retriever = GraphRetriever(graph_path=self.storage_config[\"graph_path\"])\n        return self._graph_retriever\n\n    def _get_reranker(self):\n        \"\"\"Initializes the reranker for hybrid search score fusion.\"\"\"\n        reranker_config = self.config.get(\"reranker\", {})\n        # This is for the LanceDB internal reranker, not the AI one.\n        if self.reranker is None and reranker_config.get(\"type\") == \"linear_combination\":\n            rerank_weight = reranker_config.get(\"weight\", 0.5) \n            self.reranker = lancedb.rerankers.LinearCombinationReranker(weight=rerank_weight)\n            print(f\"✅ Initialized LinearCombinationReranker with weight {rerank_weight}\")\n        return self.reranker\n\n    def _get_ai_reranker(self):\n        \"\"\"Initializes a dedicated AI-based reranker.\"\"\"\n        reranker_config = self.config.get(\"reranker\", {})\n        if self.ai_reranker is None and reranker_config.get(\"enabled\"):\n            # Serialise first-time initialisation so only one thread attempts\n            # to load the (very large) model.  Other threads will wait and use\n            # the instance once ready, preventing the meta-tensor crash.\n            with _ai_reranker_init_lock:\n                # Another thread may have completed init while we waited\n                if self.ai_reranker is None:\n                    try:\n                        model_name = reranker_config.get(\"model_name\")\n                        strategy = reranker_config.get(\"strategy\", \"qwen\")\n\n                        if strategy == \"rerankers-lib\":\n                            print(f\"🔧 Initialising Answer.AI ColBERT reranker ({model_name}) via rerankers lib…\")\n                            from rerankers import Reranker\n                            self.ai_reranker = Reranker(model_name, model_type=\"colbert\")\n                        else:\n                            print(f\"🔧 Lazily initializing Qwen reranker ({model_name})…\")\n                            self.ai_reranker = QwenReranker(model_name=model_name)\n\n                        print(\"✅ AI reranker initialized successfully.\")\n                    except Exception as e:\n                        # Leave as None so the pipeline can proceed without reranking\n                        print(f\"❌ Failed to initialize AI reranker: {e}\")\n        return self.ai_reranker\n\n    def _get_sentence_pruner(self):\n        if getattr(self, \"_sentence_pruner\", None) is None:\n            with _sentence_pruner_lock:\n                if getattr(self, \"_sentence_pruner\", None) is None:\n                    self._sentence_pruner = SentencePruner()\n        return self._sentence_pruner\n\n    def _get_surrounding_chunks_lancedb(self, chunk: Dict[str, Any], window_size: int) -> List[Dict[str, Any]]:\n        \"\"\"\n        Retrieves a window of chunks around a central chunk using LanceDB.\n        \"\"\"\n        db_manager = self._get_db_manager()\n        if not db_manager:\n            return [chunk]\n\n        # Extract identifiers needed for the query\n        document_id = chunk.get(\"document_id\")\n        chunk_index = chunk.get(\"chunk_index\")\n\n        # If essential identifiers are missing, return the chunk itself\n        if document_id is None or chunk_index is None or chunk_index == -1:\n            return [chunk]\n\n        table_name = self.config[\"storage\"][\"text_table_name\"]\n        try:\n            tbl = db_manager.get_table(table_name)\n        except Exception:\n            # If the table can't be opened, we can't get surrounding chunks\n            return [chunk]\n\n        # Define the window for the search\n        start_index = max(0, chunk_index - window_size)\n        end_index = chunk_index + window_size\n        \n        # Construct the SQL filter for an efficient metadata-based search\n        sql_filter = f\"document_id = '{document_id}' AND chunk_index >= {start_index} AND chunk_index <= {end_index}\"\n        \n        try:\n            # Execute a filter-only search, which is very fast on indexed metadata\n            results = tbl.search().where(sql_filter).to_list()\n            \n            # The results must be sorted by chunk_index to maintain logical order\n            results.sort(key=lambda c: c['chunk_index'])\n\n            # The 'metadata' field is a JSON string and needs to be parsed\n            for res in results:\n                if isinstance(res.get('metadata'), str):\n                    try:\n                        res['metadata'] = json.loads(res['metadata'])\n                    except json.JSONDecodeError:\n                        res['metadata'] = {} # Handle corrupted metadata gracefully\n            return results\n        except Exception:\n            # If the query fails for any reason, fall back to the single chunk\n            return [chunk]\n\n    def _synthesize_final_answer(self, query: str, facts: str, *, event_callback=None) -> str:\n        \"\"\"Uses a text LLM to synthesize a final answer from extracted facts.\"\"\"\n        prompt = f\"\"\"\nYou are an AI assistant specialised in answering questions from retrieved context.\n\nContext you receive\n• VERIFIED FACTS – text snippets retrieved from the user's documents. Some may be irrelevant noise.  \n• ORIGINAL QUESTION – the user's actual query.\n\nInstructions\n1. Evaluate each snippet for relevance to the ORIGINAL QUESTION; ignore those that do not help answer it.  \n2. Synthesise an answer **using only information from the relevant snippets**.  \n3. If snippets contradict one another, mention the contradiction explicitly.  \n4. If the snippets do not contain the needed information, reply exactly with:  \n   \"I could not find that information in the provided documents.\"  \n5. Provide a thorough, well-structured answer. Use paragraphs or bullet points where helpful, and include any relevant numbers/names exactly as they appear. There is **no strict sentence limit**, but aim for clarity over brevity.  \n6. Do **not** introduce external knowledge unless step 4 applies; in that case you may add a clearly-labelled \"General knowledge\" sentence after the required statement.\n\nOutput format\nAnswer:\n<your answer here>\n\n–––––  Retrieved Snippets  –––––\n{facts}\n––––––––––––––––––––––––––––––\n\nORIGINAL QUESTION: \"{query}\"\n\"\"\"\n        # Stream the answer token-by-token so the caller can forward them as SSE\n        answer_parts: list[str] = []\n        for tok in self.ollama_client.stream_completion(\n            model=self.ollama_config[\"generation_model\"],\n            prompt=prompt,\n        ):\n            answer_parts.append(tok)\n            if event_callback:\n                event_callback(\"token\", {\"text\": tok})\n\n        return \"\".join(answer_parts)\n\n    def run(self, query: str, table_name: str = None, window_size_override: Optional[int] = None, event_callback=None) -> Dict[str, Any]:\n        start_time = time.time()\n        retrieval_k = self.config.get(\"retrieval_k\", 10)\n\n        logger = logging.getLogger(__name__)\n        logger.debug(\"--- Running Hybrid Search for query '%s' (table=%s) ---\", query, table_name or self.storage_config.get(\"text_table_name\"))\n        \n        # If a custom table_name is provided, propagate it to storage config so helper methods use it\n        if table_name:\n            self.storage_config[\"text_table_name\"] = table_name\n\n        if event_callback:\n            event_callback(\"retrieval_started\", {})\n        # Unified retrieval using the refactored MultiVectorRetriever\n        dense_retriever = self._get_dense_retriever()\n        # Get the LanceDB reranker for initial score fusion\n        lancedb_reranker = self._get_reranker()\n        \n        retrieved_docs = []\n        if dense_retriever:\n            retrieved_docs = dense_retriever.retrieve(\n                text_query=query,\n                table_name=table_name or self.storage_config[\"text_table_name\"],\n                k=retrieval_k,\n                reranker=lancedb_reranker # Pass the reranker to enable hybrid search\n            )\n\n        # ---------------------------------------------------------------\n        # Late-Chunk retrieval (optional)\n        # ---------------------------------------------------------------\n        if self.retriever_configs.get(\"latechunk\", {}).get(\"enabled\"):\n            lc_table = self.retriever_configs[\"latechunk\"].get(\"lancedb_table_name\")\n            if lc_table:\n                try:\n                    lc_docs = dense_retriever.retrieve(\n                        text_query=query,\n                        table_name=lc_table,\n                        k=retrieval_k,\n                        reranker=lancedb_reranker,\n                    )\n                    retrieved_docs.extend(lc_docs)\n                except Exception as e:\n                    print(f\"⚠️  Late-chunk retrieval failed: {e}\")\n\n        if event_callback:\n            event_callback(\"retrieval_done\", {\"count\": len(retrieved_docs)})\n        \n        retrieval_time = time.time() - start_time\n        logger.debug(\"Retrieved %s chunks in %.2fs\", len(retrieved_docs), retrieval_time)\n\n        # -----------------------------------------------------------\n        #  LATE-CHUNK MERGING (merge ±1 sub-vector into central hit)\n        # -----------------------------------------------------------\n        if self.retriever_configs.get(\"latechunk\", {}).get(\"enabled\") and retrieved_docs:\n            merged_count = 0\n            for doc in retrieved_docs:\n                try:\n                    cid = doc.get(\"chunk_id\")\n                    meta = doc.get(\"metadata\", {})\n                    if meta.get(\"latechunk_merged\"):\n                        continue  # already processed\n                    doc_id = doc.get(\"document_id\")\n                    cidx = doc.get(\"chunk_index\")\n                    if doc_id is None or cidx is None or cidx == -1:\n                        continue\n                    # Fetch neighbouring late-chunks inside same document (±1)\n                    siblings = self._get_surrounding_chunks_lancedb(doc, window_size=1)\n                    # Keep only same document_id and ordered by chunk_index\n                    siblings = [s for s in siblings if s.get(\"document_id\") == doc_id]\n                    siblings.sort(key=lambda s: s.get(\"chunk_index\", 0))\n                    merged_text = \" \\n\".join(s.get(\"text\", \"\") for s in siblings)\n                    if merged_text:\n                        doc[\"text\"] = merged_text\n                        meta[\"latechunk_merged\"] = True\n                        merged_count += 1\n                except Exception as e:\n                    print(f\"⚠️  Late-chunk merge failed for chunk {doc.get('chunk_id')}: {e}\")\n            if merged_count:\n                print(f\"🪄 Late-chunk merging applied to {merged_count} retrieved chunks.\")\n\n        # --- AI Reranking Step ---\n        ai_reranker = self._get_ai_reranker()\n        if ai_reranker and retrieved_docs:\n            if event_callback:\n                event_callback(\"rerank_started\", {\"count\": len(retrieved_docs)})\n            print(f\"\\n--- Reranking top {len(retrieved_docs)} docs with AI model... ---\")\n            start_rerank_time = time.time()\n\n            rerank_cfg = self.config.get(\"reranker\", {})\n            top_k_cfg = rerank_cfg.get(\"top_k\")\n            top_percent = rerank_cfg.get(\"top_percent\")  # value in range 0–1\n\n            if top_percent is not None:\n                try:\n                    pct = float(top_percent)\n                    assert 0 < pct <= 1\n                    top_k = max(1, int(len(retrieved_docs) * pct))\n                except Exception:\n                    print(\"⚠️  Invalid top_percent value; falling back to top_k\")\n                    top_k = top_k_cfg or len(retrieved_docs)\n            else:\n                top_k = top_k_cfg or len(retrieved_docs)\n\n            strategy = self.config.get(\"reranker\", {}).get(\"strategy\", \"qwen\")\n\n            if strategy == \"rerankers-lib\":\n                texts = [d['text'] for d in retrieved_docs]\n                # ColBERT's Rust backend isn't Sync; serialise calls.\n                with _rerank_lock:\n                    ranked = ai_reranker.rank(query=query, docs=texts)\n                # ranked is RankedResults; convert to list of (score, idx)\n                try:\n                    pairs = [(r.score, r.document.doc_id) for r in ranked.results]\n                    if any(p[1] is None for p in pairs):\n                        pairs = [(r.score, i) for i, r in enumerate(ranked.results)]\n                except Exception:\n                    pairs = ranked\n                # Keep only top_k results if requested\n                if top_k is not None and len(pairs) > top_k:\n                    pairs = pairs[:top_k]\n                reranked_docs = [retrieved_docs[idx] | {\"rerank_score\": score} for score, idx in pairs]\n            else:\n                try:\n                    reranked_docs = ai_reranker.rerank(query, retrieved_docs, top_k=top_k)\n                except TypeError:\n                    texts = [d['text'] for d in retrieved_docs]\n                    pairs = ai_reranker.rank(query, texts, top_k=top_k)\n                    reranked_docs = [retrieved_docs[idx] | {\"rerank_score\": score} for score, idx in pairs]\n\n            rerank_time = time.time() - start_rerank_time\n            print(f\"✅ Reranking completed in {rerank_time:.2f}s. Refined to {len(reranked_docs)} docs.\")\n            if event_callback:\n                event_callback(\"rerank_done\", {\"count\": len(reranked_docs)})\n        else:\n            # If no AI reranker, proceed with the initially retrieved docs\n            reranked_docs = retrieved_docs\n\n        window_size = self.config.get(\"context_window_size\", 1)\n        if window_size_override is not None:\n            window_size = window_size_override\n        if window_size > 0 and reranked_docs:\n            if event_callback:\n                event_callback(\"context_expand_started\", {\"count\": len(reranked_docs)})\n            print(f\"\\n--- Expanding context for {len(reranked_docs)} top documents (window size: {window_size})... ---\")\n            expanded_chunks = {}\n            with concurrent.futures.ThreadPoolExecutor() as executor:\n                future_to_chunk = {executor.submit(self._get_surrounding_chunks_lancedb, chunk, window_size): chunk for chunk in reranked_docs}\n                for future in concurrent.futures.as_completed(future_to_chunk):\n                    try:\n                        seed_chunk = future_to_chunk[future]\n                        surrounding_chunks = future.result()\n                        for surrounding_chunk in surrounding_chunks:\n                            cid = surrounding_chunk['chunk_id']\n                            if cid not in expanded_chunks:\n                                # If this is the *central* chunk we already reranked, carry over its score\n                                if cid == seed_chunk.get('chunk_id') and 'rerank_score' in seed_chunk:\n                                    surrounding_chunk['rerank_score'] = seed_chunk['rerank_score']\n                                expanded_chunks[cid] = surrounding_chunk\n                    except Exception as e:\n                        print(f\"Error expanding context for a chunk: {e}\")\n\n            final_docs = list(expanded_chunks.values())\n            # Sort by reranker score if present, otherwise by raw score/distance\n            if any('rerank_score' in d for d in final_docs):\n                final_docs.sort(key=lambda c: c.get('rerank_score', -1), reverse=True)\n            elif any('_distance' in d for d in final_docs):\n                # For vector search smaller distance is better\n                final_docs.sort(key=lambda c: c.get('_distance', 1e9))\n            elif any('score' in d for d in final_docs):\n                final_docs.sort(key=lambda c: c.get('score', 0), reverse=True)\n            else:\n                # Fallback to document order\n                final_docs.sort(key=lambda c: (c.get('document_id', ''), c.get('chunk_index', 0)))\n\n            print(f\"Expanded to {len(final_docs)} unique chunks for synthesis.\")\n            if event_callback:\n                event_callback(\"context_expand_done\", {\"count\": len(final_docs)})\n        else:\n            final_docs = reranked_docs\n\n        # Optionally hide non-reranked chunks: if any chunk carries a\n        # `rerank_score`, we assume the caller wants to focus on those.\n        if any('rerank_score' in d for d in final_docs):\n            final_docs = [d for d in final_docs if 'rerank_score' in d]\n\n        # ------------------------------------------------------------------\n        # Sentence-level pruning (Provence)\n        # ------------------------------------------------------------------\n        prov_cfg = self.config.get(\"provence\", {})\n        if prov_cfg.get(\"enabled\"):\n            if event_callback:\n                event_callback(\"prune_started\", {\"count\": len(final_docs)})\n            thresh = float(prov_cfg.get(\"threshold\", 0.1))\n            print(f\"\\n--- Provence pruning enabled (threshold={thresh}) ---\")\n            pruner = self._get_sentence_pruner()\n            final_docs = pruner.prune_documents(query, final_docs, threshold=thresh)\n            # Remove any chunks that were fully pruned (empty text)\n            final_docs = [d for d in final_docs if d.get('text', '').strip()]\n            if event_callback:\n                event_callback(\"prune_done\", {\"count\": len(final_docs)})\n\n        print(\"\\n--- Final Documents for Synthesis ---\")\n        if not final_docs:\n            print(\"No documents to synthesize.\")\n        else:\n            for i, doc in enumerate(final_docs):\n                print(f\"  [{i+1}] Chunk ID: {doc.get('chunk_id')}\")\n                print(f\"      Score: {doc.get('score', 'N/A')}\")\n                if 'rerank_score' in doc:\n                    print(f\"      Rerank Score: {doc.get('rerank_score'):.4f}\")\n                print(f\"      Text: \\\"{doc.get('text', '').strip()}\\\"\")\n        print(\"------------------------------------\")\n\n        if not final_docs:\n            return {\"answer\": \"I could not find an answer in the documents.\", \"source_documents\": []}\n        \n        # --- Sanitize docs for JSON serialization (no NaN/Inf types) ---\n        def _clean_val(v):\n            if isinstance(v, float) and (math.isnan(v) or math.isinf(v)):\n                return None\n            if isinstance(v, (np.floating,)):\n                try:\n                    f = float(v)\n                    if math.isnan(f) or math.isinf(f):\n                        return None\n                    return f\n                except Exception:\n                    return None\n            return v\n\n        for doc in final_docs:\n            # Remove heavy or internal-only fields before serialising\n            doc.pop(\"vector\", None)\n            doc.pop(\"_distance\", None)\n            # Clean numeric fields\n            for key in ['score', '_distance', 'rerank_score']:\n                if key in doc:\n                    doc[key] = _clean_val(doc[key])\n\n        context = \"\\n\\n\".join([doc['text'] for doc in final_docs])\n\n        # 👀 DEBUG: Show the exact context passed to the LLM after pruning\n        print(\"\\n=== Context passed to LLM (post-pruning) ===\")\n        if len(context) > 2000:\n            print(context[:2000] + \"…\\n[truncated] (total {} chars)\".format(len(context)))\n        else:\n            print(context)\n        print(\"=== End of context ===\\n\")\n\n        final_answer = self._synthesize_final_answer(query, context, event_callback=event_callback)\n        \n        return {\"answer\": final_answer, \"source_documents\": final_docs}\n\n    # ------------------------------------------------------------------\n    # Public utility\n    # ------------------------------------------------------------------\n    def list_document_titles(self, max_items: int = 25) -> List[str]:\n        \"\"\"Return up to *max_items* distinct document titles (or IDs).\n\n        This is used only for prompt-routing, so we favour robustness over\n        perfect recall. If anything goes wrong we return an empty list so\n        the caller can degrade gracefully.\n        \"\"\"\n        try:\n            tbl_name = self.storage_config.get(\"text_table_name\")\n            if not tbl_name:\n                return []\n\n            tbl = self._get_db_manager().get_table(tbl_name)\n\n            field_name = \"document_title\" if \"document_title\" in tbl.schema.names else \"document_id\"\n\n            # Use a cheap SQL filter to grab distinct values; fall back to a\n            # simple scan if the driver lacks DISTINCT support.\n            try:\n                sql = f\"SELECT DISTINCT {field_name} FROM tbl LIMIT {max_items}\"\n                rows = tbl.search().where(\"true\").sql(sql).to_list()  # type: ignore\n                titles = [r[field_name] for r in rows if r.get(field_name)]\n            except Exception:\n                # Fallback: scan first N rows\n                rows = tbl.search().select(field_name).limit(max_items * 4).to_list()\n                seen = set()\n                titles = []\n                for r in rows:\n                    val = r.get(field_name)\n                    if val and val not in seen:\n                        titles.append(val)\n                        seen.add(val)\n                        if len(titles) >= max_items:\n                            break\n\n            # Ensure we don't exceed max_items\n            return titles[:max_items]\n        except Exception:\n            # Any issues (missing table, bad schema, etc.) –> just return []\n            return []\n\n    # -------------------- Public helper properties --------------------\n    @property\n    def retriever(self):\n        \"\"\"Lazily exposes the main (dense) retriever so external components\n        like the ReAct agent tools can call `.retrieve()` directly without\n        reaching into private helpers. If the retriever has not yet been\n        instantiated, it is created on first access via `_get_dense_retriever`.\"\"\"\n        return self._get_dense_retriever()\n\n    def update_embedding_model(self, model_name: str):\n        \"\"\"Switch embedding model at runtime and clear cached objects so they re-initialize.\"\"\"\n        if self.config.get(\"embedding_model_name\") == model_name:\n            return  # nothing to do\n        print(f\"🔧 RetrievalPipeline switching embedding model to '{model_name}' (was '{self.config.get('embedding_model_name')}')\")\n        self.config[\"embedding_model_name\"] = model_name\n        # Reset caches so new instances are built on demand\n        self.text_embedder = None\n        self.dense_retriever = None"
  },
  {
    "path": "rag_system/requirements.txt",
    "content": "colpali-engine\nPyMuPDF\nPillow\ntransformers==4.51.0\ntorch==2.4.1\ntorchvision==0.19.1\nlancedb\nrank_bm25\nfuzzywuzzy\npython-Levenshtein\ntorchaudio\ntransformers\nsentencepiece\naccelerate\ndocling\nocrmac\nibm-watsonx-ai>=1.3.39\n"
  },
  {
    "path": "rag_system/rerankers/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/rerankers/reranker.py",
    "content": "from transformers import AutoModelForSequenceClassification, AutoTokenizer\nimport torch\nfrom typing import List, Dict, Any\n\nclass QwenReranker:\n    \"\"\"\n    A reranker that uses a local Hugging Face transformer model.\n    \"\"\"\n    def __init__(self, model_name: str = \"BAAI/bge-reranker-base\"):\n        # Auto-select the best available device: CUDA > MPS > CPU\n        if torch.cuda.is_available():\n            self.device = \"cuda\"\n        elif getattr(torch.backends, \"mps\", None) and torch.backends.mps.is_available():\n            self.device = \"mps\"\n        else:\n            self.device = \"cpu\"\n        print(f\"Initializing BGE Reranker with model '{model_name}' on device '{self.device}'.\")\n        self.tokenizer = AutoTokenizer.from_pretrained(model_name)\n        self.model = AutoModelForSequenceClassification.from_pretrained(\n            model_name,\n            torch_dtype=torch.float16 if self.device != \"cpu\" else None,\n        ).to(self.device).eval()\n        \n        print(\"BGE Reranker loaded successfully.\")\n\n    def _format_instruction(self, query: str, doc: str):\n        instruction = 'Given a web search query, retrieve relevant passages that answer the query'\n        return f\"<Instruct>: {instruction}\\n<Query>: {query}\\n<Document>: {doc}\"\n\n    def rerank(self, query: str, documents: List[Dict[str, Any]], top_k: int = 5, *, early_exit: bool = True, margin: float = 0.4, min_scored: int = 8, batch_size: int = 8) -> List[Dict[str, Any]]:\n        \"\"\"\n        Reranks a list of documents based on their relevance to a query.\n\n        If *early_exit* is True the cross-encoder scores documents in mini-batches and\n        stops once the best-so-far score beats the worst-so-far by *margin* after at\n        least *min_scored* docs have been processed.  This accelerates \"easy\" queries\n        where strong positives dominate.\n        \"\"\"\n        if not documents:\n            return []\n\n        # Sort by the upstream (hybrid) score so that the strongest candidates are evaluated first.\n        docs_sorted = sorted(documents, key=lambda d: d.get('score', 0.0), reverse=True)\n\n        scored_pairs: List[tuple[float, Dict[str, Any]]] = []\n\n        with torch.no_grad():\n            for start in range(0, len(docs_sorted), batch_size):\n                batch_docs = docs_sorted[start : start + batch_size]\n                batch_pairs = [[query, d['text']] for d in batch_docs]\n\n                inputs = self.tokenizer(\n                    batch_pairs,\n                    padding=True,\n                    truncation=True,\n                    return_tensors=\"pt\",\n                    max_length=512,\n                ).to(self.device)\n\n                logits = self.model(**inputs).logits.view(-1)\n                batch_scores = logits.float().cpu().tolist()\n\n                scored_pairs.extend(zip(batch_scores, batch_docs))\n\n                # --- Early-exit check ---\n                if early_exit and len(scored_pairs) >= min_scored:\n                    # Current best and worst among *already* scored docs\n                    best_score = max(scored_pairs, key=lambda x: x[0])[0]\n                    worst_score = min(scored_pairs, key=lambda x: x[0])[0]\n                    if best_score - worst_score >= margin:\n                        break\n\n        # Sort final set and attach scores\n        sorted_by_score = sorted(scored_pairs, key=lambda x: x[0], reverse=True)\n        reranked_docs: List[Dict[str, Any]] = []\n        for score, doc in sorted_by_score[:top_k]:\n            doc_with_score = doc.copy()\n            doc_with_score['rerank_score'] = score\n            reranked_docs.append(doc_with_score)\n\n        return reranked_docs\n\nif __name__ == '__main__':\n    # This test requires an internet connection to download the models.\n    try:\n        reranker = QwenReranker(model_name=\"BAAI/bge-reranker-base\")\n        \n        query = \"What is the capital of France?\"\n        documents = [\n            {'text': \"Paris is the capital of France.\", 'metadata': {'doc_id': 'a'}},\n            {'text': \"The Eiffel Tower is in Paris.\", 'metadata': {'doc_id': 'b'}},\n            {'text': \"France is a country in Europe.\", 'metadata': {'doc_id': 'c'}},\n        ]\n        \n        reranked_documents = reranker.rerank(query, documents)\n        \n        print(\"\\n--- Verification ---\")\n        print(f\"Query: {query}\")\n        print(\"Reranked documents:\")\n        for doc in reranked_documents:\n            print(f\"  - Score: {doc['rerank_score']:.4f}, Text: {doc['text']}\")\n\n    except Exception as e:\n        print(f\"\\nAn error occurred during the QwenReranker test: {e}\")\n        print(\"Please ensure you have an internet connection for model downloads.\")\n"
  },
  {
    "path": "rag_system/rerankers/sentence_pruner.py",
    "content": "from __future__ import annotations\n\n\"\"\"Sentence-level context pruning using the Provence model (ICLR 2025).\n\nThis lightweight helper wraps the HuggingFace model hosted at\n`naver/provence-reranker-debertav3-v1` and exposes a thread-safe\n`prune_documents()` method that converts a list of RAG chunks into their\npruned variants.\n\nThe module fails gracefully – if the model weights cannot be downloaded\n(or the `transformers` / `nltk` deps are missing) we simply return the\noriginal documents unchanged so the upstream pipeline continues\nunaffected.\n\"\"\"\n\nfrom threading import Lock\nfrom typing import List, Dict, Any\n\n\nclass SentencePruner:\n    \"\"\"Lightweight singleton wrapper around the Provence model.\"\"\"\n\n    _model = None  # shared across all instances\n    _init_lock: Lock = Lock()\n\n    def __init__(self, model_name: str = \"naver/provence-reranker-debertav3-v1\") -> None:\n        self.model_name = model_name\n        self._ensure_model()\n\n    # ---------------------------------------------------------------------\n    # Internal helpers\n    # ---------------------------------------------------------------------\n    def _ensure_model(self) -> None:\n        \"\"\"Lazily download and load the Provence model exactly once.\"\"\"\n        if SentencePruner._model is not None:\n            return\n\n        with SentencePruner._init_lock:\n            if SentencePruner._model is not None:\n                return  # another thread beat us\n            try:\n                from transformers import AutoModel  # local import to keep base deps light\n\n                print(\"🔧 Loading Provence sentence-pruning model …\")\n                SentencePruner._model = AutoModel.from_pretrained(\n                    self.model_name,\n                    trust_remote_code=True,\n                )\n                print(\"✅ Provence model loaded successfully.\")\n            except Exception as e:\n                # Any failure leaves the singleton as None so callers can skip pruning.\n                print(f\"❌ Failed to load Provence model: {e}. Context pruning will be skipped.\")\n                SentencePruner._model = None\n\n    # ------------------------------------------------------------------\n    # Public API\n    # ------------------------------------------------------------------\n    def prune_documents(\n        self,\n        question: str,\n        docs: List[Dict[str, Any]],\n        *,\n        threshold: float = 0.1,\n    ) -> List[Dict[str, Any]]:\n        \"\"\"Return *docs* with their `text` field pruned sentence-wise.\n\n        If the model could not be initialised we simply echo the input.\n        \"\"\"\n        if SentencePruner._model is None:\n            return docs  # model unavailable – no-op\n\n        # Batch texts for efficiency when >1 doc\n        texts = [d.get(\"text\", \"\") for d in docs]\n\n        try:\n            if len(texts) == 1:\n                # returns dict\n                outputs = [SentencePruner._model.process(question, texts[0], threshold=threshold)]\n            else:\n                # Batch call expects list[list[str]] with same outer length as questions list (1)\n                batched_out = SentencePruner._model.process(question, [texts], threshold=threshold)\n                # HF returns List[Dict] per question\n                outputs = batched_out[0] if isinstance(batched_out, list) else batched_out\n                if isinstance(outputs, dict):\n                    outputs = [outputs]\n                if len(outputs) != len(texts):\n                    print(\"⚠️ Provence batch size mismatch; falling back to per-doc loop\")\n                    raise ValueError\n\n            pruned: List[Dict[str, Any]] = []\n            for doc, out in zip(docs, outputs):\n                raw = out.get(\"pruned_context\", doc.get(\"text\", \"\")) if isinstance(out, dict) else doc.get(\"text\", \"\")\n                new_text = raw if isinstance(raw, str) else \" \".join(raw)  # HF model may return a list of sentences\n                pruned.append({**doc, \"text\": new_text})\n        except Exception as e:\n            print(f\"⚠️ Provence batch pruning failed ({e}); falling back to individual calls\")\n            pruned = []\n            for doc in docs:\n                text = doc.get(\"text\", \"\")\n                if not text:\n                    pruned.append(doc)\n                    continue\n                try:\n                    res = SentencePruner._model.process(question, text, threshold=threshold)\n                    raw = res.get(\"pruned_context\", text) if isinstance(res, dict) else text\n                    new_text = raw if isinstance(raw, str) else \" \".join(raw)\n                    pruned.append({**doc, \"text\": new_text})\n                except Exception as err:\n                    print(f\"⚠️ Provence pruning failed for chunk {doc.get('chunk_id')}: {err}\")\n                    pruned.append(doc)\n\n        return pruned "
  },
  {
    "path": "rag_system/retrieval/__init__.py",
    "content": ""
  },
  {
    "path": "rag_system/retrieval/query_transformer.py",
    "content": "from typing import List, Any, Dict\nimport json\nfrom rag_system.utils.ollama_client import OllamaClient\n\nclass QueryDecomposer:\n    def __init__(self, llm_client: OllamaClient, llm_model: str):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n\n    def decompose(self, query: str, chat_history: List[Dict[str, Any]] | None = None) -> List[str]:\n        \"\"\"Decompose *query* into standalone sub-queries.\n\n        Parameters\n        ----------\n        query : str\n            The latest user message.\n        chat_history : list[dict] | None\n            Recent conversation turns (each item should contain at least the original\n            user query under the key ``\"query\"``). Only the **last 5** turns are\n            included to keep the prompt short.\n        \"\"\"\n\n        # ---- Limit history to last 5 user turns and extract the queries ----\n        history_snippets: List[str] = []\n        if chat_history:\n            # Keep only the last 5 turns\n            recent_turns = chat_history[-5:]\n            # Extract user queries (fallback: full dict as string if key missing)\n            for turn in recent_turns:\n                history_snippets.append(str(turn.get(\"query\", turn)))\n\n        # Serialize chat_history for the prompt (single string)\n        chat_history_text = \" | \".join(history_snippets)\n\n        # ---- Build the new SYSTEM prompt with added legacy examples ----\n        system_prompt = \"\"\"\nYou are an expert at query decomposition for a Retrieval-Augmented Generation (RAG) system.\n\nReturn one RFC-8259-compliant JSON object and nothing else.\nSchema:\n{\n“requires_decomposition”: <bool>,\n“reasoning”:              <string>,  // ≤ 50 words\n“resolved_query”:         <string>,  // query after context resolution\n“sub_queries”:            <string[]> // 1–10 standalone items\n}\n\nThink step-by-step internally, but reveal only the concise reasoning.\n\n⸻\n\nContext Resolution  (perform FIRST)\n\nYou will receive:\n\t•\tquery – the current user message\n\t•\tchat_history – the most recent user turns (may be empty)\n\nIf query contains pronouns, ellipsis, or shorthand that can be unambiguously linked to something in chat_history, rewrite it to a fully self-contained question and place the result in resolved_query.\nOtherwise, copy query into resolved_query unchanged.\n\n⸻\n\nWhen is decomposition REQUIRED?\n\t•\tMULTI-PART questions joined by “and”, “or”, “also”, list commas, etc.\n\t•\tCOMPARATIVE / SUPERLATIVE questions (two or more entities, e.g. “bigger, better, fastest”).\n\t•\tTEMPORAL / SEQUENTIAL questions (changes over time, event timelines).\n\t•\tENUMERATIONS (pros, cons, impacts).\n\t•\tENTITY-SET COMPARISONS (A, B, C revenue…).\n\nWhen is decomposition NOT REQUIRED?\n\t•\tA single, factual information need.\n\t•\tAmbiguous queries needing clarification rather than splitting.\n\n⸻\n\nOutput rules\n\t1.\tUse resolved_query—not the raw query—to decide on decomposition.\n\t2.\tIf requires_decomposition is false, sub_queries must contain exactly resolved_query.\n\t3.\tOtherwise, produce 2–10 self-contained questions; avoid pronouns and shared context.\n\n⸻\n\"\"\"\n\n        # ---- Append NEW examples provided by the user ----\n        new_examples = \"\"\"\n\nNormalise pronouns and references: turn “this paper” into the explicit title if it can be inferred, otherwise leave as-is.\nchat_history: “What is the email address of the computer vision consultants?”\nquery: “What is their revenue?”\n\n{\n  \"requires_decomposition\": false,\n  \"reasoning\": \"Pronoun resolved; single information need.\",\n  \"resolved_query\": \"What is the revenue of the computer vision consultants?\",\n  \"sub_queries\": [\n    \"What is the revenue of the computer vision consultants?\"\n  ]\n}\n\nContext resolution (single info need)\nchat_history: “What is the email address of the computer vision consultants?”\nquery: “What is the address?”\n\n{\n  \"requires_decomposition\": false,\n  \"reasoning\": \"Pronoun resolved; single information need.\",\n  \"resolved_query\": \"What is the physical address of the computer vision consultants?\",\n  \"sub_queries\": [\n    \"What is the physical address of the computer vision consultants?\"\n  ]\n}\n\nContext resolution (single info need)\nchat_history: “ComputeX has a revenue of 100M?”\nquery: “Who is the CEO?”\n\n{\n  \"requires_decomposition\": false,\n  \"reasoning\": \"entities normalization.\",\n  \"resolved_query\": \"who is the CEO of ComputeX\",\n  \"sub_queries\": [\n    \"who is the CEO of ComputeX\"\n  ]\n}\n\nNo unique antecedent → leave unresolved\nchat_history: “Tell me about the paper.”\nquery: “What is the address?”\n\n{\n  \"requires_decomposition\": false,\n  \"reasoning\": \"Ambiguous reference; cannot resolve safely.\",\n  \"resolved_query\": \"What is the address?\",\n  \"sub_queries\": [\"What is the address?\"]\n}\n\nTemporal + Comparative\nchat_history: \"\"\nquery: “How did Nvidia’s 2024 revenue compare with 2023?”\n\n{\n  \"requires_decomposition\": true,\n  \"reasoning\": \"Needs revenue for two separate years before comparison.\",\n  \"resolved_query\": \"How did Nvidia’s 2024 revenue compare with 2023?\",\n  \"sub_queries\": [\n    \"What was Nvidia’s revenue in 2024?\",\n    \"What was Nvidia’s revenue in 2023?\"\n  ]\n}\n\nEnumeration (pros / cons / cost)\nchat_history: \"\"\nquery: “List the pros, cons, and estimated implementation cost of adopting a vector database.”\n\n{\n  \"requires_decomposition\": true,\n  \"reasoning\": \"Three distinct information needs: pros, cons, cost.\",\n  \"resolved_query\": \"List the pros, cons, and estimated implementation cost of adopting a vector database.\",\n  \"sub_queries\": [\n    \"What are the pros of adopting a vector database?\",\n    \"What are the cons of adopting a vector database?\",\n    \"What is the estimated implementation cost of adopting a vector database?\"\n  ]\n}\n\nEntity-set comparison (multiple companies)\nchat_history: \"\"\nquery: “How did Nvidia, AMD, and Intel perform in Q2 2025 in terms of revenue?”\n\n{\n  \"requires_decomposition\": true,\n  \"reasoning\": \"Need revenue for each of three entities before comparison.\",\n  \"resolved_query\": \"How did Nvidia, AMD, and Intel perform in Q2 2025 in terms of revenue?\",\n  \"sub_queries\": [\n    \"What was Nvidia's revenue in Q2 2025?\",\n    \"What was AMD's revenue in Q2 2025?\",\n    \"What was Intel's revenue in Q2 2025?\"\n  ]\n}\n\nMulti-part question (limitations + mitigations)\nchat_history: \"\"\nquery: “What are the limitations of GPT-4o and what are the recommended mitigations?”\n\n{\n  \"requires_decomposition\": true,\n  \"reasoning\": \"Two distinct pieces of information: limitations and mitigations.\",\n  \"resolved_query\": \"What are the limitations of GPT-4o and what are the recommended mitigations?\",\n  \"sub_queries\": [\n    \"What are the known limitations of GPT-4o?\",\n    \"What are the recommended mitigations for the limitations of GPT-4o?\"\n  ]\n}\n\"\"\"\n\n        # ---- Append legacy examples that already existed in the old prompt ----\n        legacy_examples_header = \"\"\"\n⸻\n\nAdditional legacy examples\n\"\"\"\n\n        legacy_examples_body = \"\"\"\n**Example 1: Multi-Part Query**\nQuery: \"What were the main findings of the aiconfig report and how do they compare to the results from the RAG paper?\"\nJSON Output:\n{\n  \"reasoning\": \"The query asks for two distinct pieces of information: the findings from one report and a comparison to another. This requires two separate retrieval steps.\",\n  \"sub_queries\": [\n    \"What were the main findings of the aiconfig report?\",\n    \"How do the findings of the aiconfig report compare to the results from the RAG paper?\"\n  ]\n}\n\n**Example 2: Simple Query**\nQuery: \"Summarize the contributions of the DeepSeek-V3 paper.\"\nJSON Output:\n{\n  \"reasoning\": \"This is a direct request for a summary of a single document and does not contain multiple parts.\",\n  \"sub_queries\": [\n    \"Summarize the contributions of the DeepSeek-V3 paper.\"\n  ]\n}\n\n**Example 3: Comparative Query**\nQuery: \"Did Microsoft or Google make more money last year?\"\nJSON Output:\n{\n  \"reasoning\": \"This is a comparative query that requires fetching the profit for each company before a comparison can be made.\",\n  \"sub_queries\": [\n    \"How much profit did Microsoft make last year?\",\n    \"How much profit did Google make last year?\"\n  ]\n}\n\n**Example 4: Comparative Query with different phrasing**\nQuery: \"Who has more siblings, Jamie or Sansa?\"\nJSON Output:\n{\n  \"reasoning\": \"This comparative query needs the sibling count for both individuals to be answered.\",\n  \"sub_queries\": [\n    \"How many siblings does Jamie have?\",\n    \"How many siblings does Sansa have?\"\n  ]\n}\n\"\"\"\n\n        full_prompt = (\n            system_prompt\n            + new_examples\n            # + legacy_examples_header\n            # + legacy_examples_body\n            + \"\"\"\n\n⸻\n\nNow process\n\nInput payload:\n\n\"\"\" + json.dumps({\"query\": query, \"chat_history\": chat_history_text}, indent=2) + \"\"\"\n\"\"\"\n        )\n\n        # ---- Call the LLM ----\n        response = self.llm_client.generate_completion(self.llm_model, full_prompt, format=\"json\")\n\n        response_text = response.get('response', '{}')\n        try:\n            # Handle potential markdown code blocks in the response\n            if response_text.strip().startswith(\"```json\"):\n                response_text = response_text.strip()[7:-3].strip()\n\n            data = json.loads(response_text)\n\n            sub_queries = data.get('sub_queries') or [query]\n            reasoning = data.get('reasoning', 'No reasoning provided.')\n\n            print(f\"Query Decomposition Reasoning: {reasoning}\")\n\n            # Fallback: ensure at least the resolved_query if sub_queries empty\n            if not sub_queries:\n                sub_queries = [data.get('resolved_query', query)]\n\n            # Deduplicate while preserving order\n            sub_queries = list(dict.fromkeys(sub_queries))\n\n            # Enforce 10 sub-query limit per new requirements\n            return sub_queries[:10]\n        except json.JSONDecodeError:\n            print(f\"Failed to decode JSON from query decomposer: {response_text}\")\n            return [query]\n\nclass HyDEGenerator:\n    def __init__(self, llm_client: OllamaClient, llm_model: str):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n\n    def generate(self, query: str) -> str:\n        prompt = f\"Generate a short, hypothetical document that answers the following question. The document should be dense with keywords and concepts related to the query.\\n\\nQuery: {query}\\n\\nHypothetical Document:\"\n        response = self.llm_client.generate_completion(self.llm_model, prompt)\n        return response.get('response', '')\n\nclass GraphQueryTranslator:\n    def __init__(self, llm_client: OllamaClient, llm_model: str):\n        self.llm_client = llm_client\n        self.llm_model = llm_model\n\n    def _generate_translation_prompt(self, query: str) -> str:\n        return f\"\"\"\nYou are an expert query planner. Convert the user's question into a structured JSON query for a knowledge graph.\nThe JSON should contain a 'start_node' (the known entity in the query) and an 'edge_label' (the relationship being asked about).\nThe graph has nodes (entities) and directed edges (relationships). For example, (Tim Cook) -[IS_CEO_OF]-> (Apple).\nReturn ONLY the JSON object.\n\nUser Question: \"{query}\"\n\nJSON Output:\n\"\"\"\n\n    def translate(self, query: str) -> Dict[str, Any]:\n        prompt = self._generate_translation_prompt(query)\n        response = self.llm_client.generate_completion(self.llm_model, prompt, format=\"json\")\n        try:\n            return json.loads(response.get('response', '{}'))\n        except json.JSONDecodeError:\n            return {}"
  },
  {
    "path": "rag_system/retrieval/retrievers.py",
    "content": "import lancedb\nimport pickle\nimport json\nfrom typing import List, Dict, Any\nimport numpy as np\nimport networkx as nx\nimport os\nfrom PIL import Image\nfrom transformers import CLIPProcessor, CLIPModel\nimport torch\nimport logging\nimport pandas as pd\nimport math\nimport concurrent.futures\nfrom functools import lru_cache\n\nfrom rag_system.indexing.embedders import LanceDBManager\nfrom rag_system.indexing.representations import QwenEmbedder\nfrom rag_system.indexing.multimodal import LocalVisionModel\nfrom rag_system.utils.logging_utils import log_retrieval_results\n\n# BM25Retriever is no longer needed.\n# class BM25Retriever: ...\n\nfrom fuzzywuzzy import process\n\nclass GraphRetriever:\n    def __init__(self, graph_path: str):\n        self.graph = nx.read_gml(graph_path)\n\n    def retrieve(self, query: str, k: int = 5, score_cutoff: int = 80) -> List[Dict[str, Any]]:\n        print(f\"\\n--- Performing Graph Retrieval for query: '{query}' ---\")\n        \n        query_parts = query.split()\n        entities = []\n        for part in query_parts:\n            match = process.extractOne(part, self.graph.nodes(), score_cutoff=score_cutoff)\n            if match and isinstance(match[0], str):\n                entities.append(match[0])\n        \n        retrieved_docs = []\n        for entity in set(entities):\n            for neighbor in self.graph.neighbors(entity):\n                retrieved_docs.append({\n                    'chunk_id': f\"graph_{entity}_{neighbor}\",\n                    'text': f\"Entity: {entity}, Neighbor: {neighbor}\",\n                    'score': 1.0,\n                    'metadata': {'source': 'graph'}\n                })\n        \n        print(f\"Retrieved {len(retrieved_docs)} documents from the graph.\")\n        return retrieved_docs[:k]\n\n# region === MultiVectorRetriever ===\nclass MultiVectorRetriever:\n    \"\"\"\n    Performs hybrid (vector + FTS) or vector-only retrieval.\n    \"\"\"\n    def __init__(self, db_manager: LanceDBManager, text_embedder: QwenEmbedder, vision_model: LocalVisionModel = None, *, fusion_config: Dict[str, Any] | None = None):\n        self.db_manager = db_manager\n        self.text_embedder = text_embedder\n        self.vision_model = vision_model\n        self.fusion_config = fusion_config or {\"method\": \"linear\", \"bm25_weight\": 0.5, \"vec_weight\": 0.5}\n\n        # Lightweight in-memory LRU cache for single-query embeddings (256 entries)\n        @lru_cache(maxsize=256)\n        def _embed_single(q: str):\n            return self.text_embedder.create_embeddings([q])[0]\n\n        self._embed_single = _embed_single\n\n    def retrieve(self, text_query: str, table_name: str, k: int, reranker=None) -> List[Dict[str, Any]]:\n        \"\"\"\n        Performs a search on a single LanceDB table.\n        If a reranker is provided, it performs a hybrid search.\n        Otherwise, it performs a standard vector search.\n        \"\"\"\n        print(f\"\\n--- Performing Retrieval for query: '{text_query}' on table '{table_name}' ---\")\n        \n        try:\n            if table_name is None:\n                table_name = \"default_text_table\"\n            tbl = self.db_manager.get_table(table_name)\n            \n            # Create / fetch cached text embedding for the query\n            text_query_embedding = self._embed_single(text_query)\n            \n            logger = logging.getLogger(__name__)\n\n            # Always perform hybrid lexical + vector search\n            logger.debug(\n                \"Running hybrid search on table '%s' (k=%s, have_reranker=%s)\",\n                table_name,\n                k,\n                bool(reranker),\n            )\n\n            if reranker:\n                logger.debug(\"Hybrid + reranker path not yet implemented with manual fusion; proceeding without extra reranker.\")\n\n            # Manual two-leg hybrid: take half from each modality\n            fts_k = k // 2\n            vec_k = k - fts_k\n\n            # Run FTS and vector search in parallel to cut latency\n            def _run_fts():\n                # Very short queries often underperform → add fuzzy wildcard\n                fts_query = text_query\n                if len(text_query.split()) == 1:\n                    fts_query = f\"{text_query}* OR {text_query}~\"\n                return (\n                     tbl.search(query=fts_query, query_type=\"fts\")\n                        .limit(fts_k)\n                        .to_df()\n                 )\n\n            def _run_vec():\n                if vec_k == 0:\n                    return None\n                return (\n                    tbl.search(text_query_embedding)\n                       .limit(vec_k * 2)  # fetch extra to allow for dedup\n                       .to_df()\n                )\n\n            with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:\n                fts_future = executor.submit(_run_fts)\n                vec_future = executor.submit(_run_vec)\n                fts_df = fts_future.result()\n                vec_df = vec_future.result()\n\n            if vec_df is not None:\n                combined = pd.concat([fts_df, vec_df])\n            else:\n                combined = fts_df\n\n            # Remove duplicates preserving first occurrence, then trim to k\n            dedup_subset = [\"_rowid\"] if \"_rowid\" in combined.columns else ([\"chunk_id\"] if \"chunk_id\" in combined.columns else None)\n            if dedup_subset:\n                combined = combined.drop_duplicates(subset=dedup_subset, keep=\"first\")\n            combined = combined.head(k)\n\n            results_df = combined\n            logger.debug(\n                \"Hybrid (fts=%s, vec=%s) → %s unique chunks\",\n                len(fts_df),\n                0 if vec_df is None else len(vec_df),\n                len(results_df),\n            )\n            \n            retrieved_docs = []\n            for _, row in results_df.iterrows():\n                metadata = json.loads(row.get('metadata', '{}'))\n                # Add top-level fields back into metadata for consistency if they don't exist\n                metadata.setdefault('document_id', row.get('document_id'))\n                metadata.setdefault('chunk_index', row.get('chunk_index'))\n                \n                # Determine score (vector distance or FTS). Replace NaN with 0.0\n                raw_score = row.get('_distance') if '_distance' in row else row.get('score')\n                try:\n                    if raw_score is None or (isinstance(raw_score, float) and math.isnan(raw_score)):\n                        raw_score = 0.0\n                except Exception:\n                    raw_score = 0.0\n\n                combined_score = raw_score\n                # Optional linear-weight fusion if both FTS & vector scores exist\n                if '_distance' in row and 'score' in row:\n                    try:\n                        bm25 = row.get('score', 0.0)\n                        vec_sim = 1.0 / (1.0 + row.get('_distance', 1.0))  # convert distance to similarity\n                        w_bm25 = float(self.fusion_config.get('bm25_weight', 0.5))\n                        w_vec = float(self.fusion_config.get('vec_weight', 0.5))\n                        combined_score = w_bm25 * bm25 + w_vec * vec_sim\n                    except Exception:\n                        pass\n\n                retrieved_docs.append({\n                    'chunk_id': row.get('chunk_id'),\n                    'text': metadata.get('original_text', row.get('text')),\n                    'score': combined_score,\n                    'bm25': row.get('score'),\n                    '_distance': row.get('_distance'),\n                    'document_id': row.get('document_id'),\n                    'chunk_index': row.get('chunk_index'),\n                    'metadata': metadata\n                })\n\n            logger.debug(\"Hybrid search returned %s results\", len(retrieved_docs))\n            log_retrieval_results(retrieved_docs, k)\n            print(f\"Retrieved {len(retrieved_docs)} documents.\")\n            return retrieved_docs\n        \n        except Exception as e:\n            print(f\"Could not search table '{table_name}': {e}\")\n            return []\n# endregion\n\nif __name__ == '__main__':\n    print(\"retrievers.py updated for LanceDB FTS Hybrid Search.\")\n"
  },
  {
    "path": "rag_system/utils/batch_processor.py",
    "content": "import time\nimport logging\nfrom typing import List, Dict, Any, Callable, Optional, Iterator\nfrom contextlib import contextmanager\nimport gc\n\n# Set up logging\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n@contextmanager\ndef timer(operation_name: str):\n    \"\"\"Context manager to time operations\"\"\"\n    start = time.time()\n    try:\n        yield\n    finally:\n        duration = time.time() - start\n        logger.info(f\"{operation_name} completed in {duration:.2f}s\")\n\nclass ProgressTracker:\n    \"\"\"Tracks progress and performance metrics for batch operations\"\"\"\n    \n    def __init__(self, total_items: int, operation_name: str = \"Processing\"):\n        self.total_items = total_items\n        self.operation_name = operation_name\n        self.processed_items = 0\n        self.errors_encountered = 0\n        self.start_time = time.time()\n        self.last_report_time = time.time()\n        self.report_interval = 10  # Report every 10 seconds\n        \n    def update(self, items_processed: int, errors: int = 0):\n        \"\"\"Update progress with number of items processed\"\"\"\n        self.processed_items += items_processed\n        self.errors_encountered += errors\n        \n        current_time = time.time()\n        if current_time - self.last_report_time >= self.report_interval:\n            self._report_progress()\n            self.last_report_time = current_time\n            \n    def _report_progress(self):\n        \"\"\"Report current progress\"\"\"\n        elapsed = time.time() - self.start_time\n        if elapsed > 0:\n            rate = self.processed_items / elapsed\n            remaining = self.total_items - self.processed_items\n            eta = remaining / rate if rate > 0 else 0\n            \n            progress_pct = (self.processed_items / self.total_items) * 100\n            \n            logger.info(\n                f\"{self.operation_name}: {self.processed_items}/{self.total_items} \"\n                f\"({progress_pct:.1f}%) - {rate:.2f} items/sec - \"\n                f\"ETA: {eta/60:.1f}min - Errors: {self.errors_encountered}\"\n            )\n            \n    def finish(self):\n        \"\"\"Report final statistics\"\"\"\n        elapsed = time.time() - self.start_time\n        rate = self.processed_items / elapsed if elapsed > 0 else 0\n        \n        logger.info(\n            f\"{self.operation_name} completed: {self.processed_items}/{self.total_items} items \"\n            f\"in {elapsed:.2f}s ({rate:.2f} items/sec) - {self.errors_encountered} errors\"\n        )\n\nclass BatchProcessor:\n    \"\"\"Generic batch processor with progress tracking and error handling\"\"\"\n    \n    def __init__(self, batch_size: int = 50, enable_gc: bool = True):\n        self.batch_size = batch_size\n        self.enable_gc = enable_gc\n        \n    def process_in_batches(\n        self,\n        items: List[Any],\n        process_func: Callable,\n        operation_name: str = \"Processing\",\n        **kwargs\n    ) -> List[Any]:\n        \"\"\"\n        Process items in batches with progress tracking\n        \n        Args:\n            items: List of items to process\n            process_func: Function to process each batch\n            operation_name: Name for progress reporting\n            **kwargs: Additional arguments passed to process_func\n            \n        Returns:\n            List of results from all batches\n        \"\"\"\n        if not items:\n            logger.info(f\"{operation_name}: No items to process\")\n            return []\n            \n        tracker = ProgressTracker(len(items), operation_name)\n        results = []\n        \n        logger.info(f\"Starting {operation_name} for {len(items)} items in batches of {self.batch_size}\")\n        \n        with timer(f\"{operation_name} (total)\"):\n            for i in range(0, len(items), self.batch_size):\n                batch = items[i:i + self.batch_size]\n                batch_num = i // self.batch_size + 1\n                total_batches = (len(items) + self.batch_size - 1) // self.batch_size\n                \n                try:\n                    with timer(f\"Batch {batch_num}/{total_batches}\"):\n                        batch_results = process_func(batch, **kwargs)\n                        results.extend(batch_results)\n                        \n                    tracker.update(len(batch))\n                    \n                except Exception as e:\n                    logger.error(f\"Error in batch {batch_num}: {e}\")\n                    tracker.update(len(batch), errors=len(batch))\n                    # Continue processing other batches\n                    continue\n                \n                # Optional garbage collection to manage memory\n                if self.enable_gc and batch_num % 5 == 0:\n                    gc.collect()\n                    \n        tracker.finish()\n        return results\n        \n    def batch_iterator(self, items: List[Any]) -> Iterator[List[Any]]:\n        \"\"\"Generate batches as an iterator for memory-efficient processing\"\"\"\n        for i in range(0, len(items), self.batch_size):\n            yield items[i:i + self.batch_size]\n\nclass StreamingProcessor:\n    \"\"\"Process items one at a time with minimal memory usage\"\"\"\n    \n    def __init__(self, enable_gc_interval: int = 100):\n        self.enable_gc_interval = enable_gc_interval\n        \n    def process_streaming(\n        self,\n        items: List[Any],\n        process_func: Callable,\n        operation_name: str = \"Streaming Processing\",\n        **kwargs\n    ) -> List[Any]:\n        \"\"\"\n        Process items one at a time with minimal memory footprint\n        \n        Args:\n            items: List of items to process\n            process_func: Function to process each item\n            operation_name: Name for progress reporting\n            **kwargs: Additional arguments passed to process_func\n            \n        Returns:\n            List of results\n        \"\"\"\n        if not items:\n            logger.info(f\"{operation_name}: No items to process\")\n            return []\n            \n        tracker = ProgressTracker(len(items), operation_name)\n        results = []\n        \n        logger.info(f\"Starting {operation_name} for {len(items)} items (streaming)\")\n        \n        with timer(f\"{operation_name} (streaming)\"):\n            for i, item in enumerate(items):\n                try:\n                    result = process_func(item, **kwargs)\n                    results.append(result)\n                    tracker.update(1)\n                    \n                except Exception as e:\n                    logger.error(f\"Error processing item {i}: {e}\")\n                    tracker.update(1, errors=1)\n                    continue\n                    \n                # Periodic garbage collection\n                if self.enable_gc_interval and (i + 1) % self.enable_gc_interval == 0:\n                    gc.collect()\n                    \n        tracker.finish()\n        return results\n\n# Utility functions for common batch operations\ndef batch_chunks_by_document(chunks: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]:\n    \"\"\"Group chunks by document_id for document-level batch processing\"\"\"\n    document_batches = {}\n    for chunk in chunks:\n        doc_id = chunk.get('metadata', {}).get('document_id', 'unknown')\n        if doc_id not in document_batches:\n            document_batches[doc_id] = []\n        document_batches[doc_id].append(chunk)\n    return document_batches\n\ndef estimate_memory_usage(chunks: List[Dict[str, Any]]) -> float:\n    \"\"\"Estimate memory usage of chunks in MB\"\"\"\n    if not chunks:\n        return 0.0\n        \n    # Rough estimate: average text length * number of chunks * 2 (for overhead)\n    avg_text_length = sum(len(chunk.get('text', '')) for chunk in chunks[:min(10, len(chunks))]) / min(10, len(chunks))\n    estimated_bytes = avg_text_length * len(chunks) * 2\n    return estimated_bytes / (1024 * 1024)  # Convert to MB\n\nif __name__ == '__main__':\n    # Test the batch processor\n    def dummy_process_func(batch):\n        time.sleep(0.1)  # Simulate processing time\n        return [f\"processed_{item}\" for item in batch]\n    \n    test_items = list(range(100))\n    processor = BatchProcessor(batch_size=10)\n    results = processor.process_in_batches(\n        test_items, \n        dummy_process_func, \n        \"Test Processing\"\n    )\n    \n    print(f\"Processed {len(results)} items\") "
  },
  {
    "path": "rag_system/utils/logging_utils.py",
    "content": "import logging\nfrom typing import List, Dict\nfrom textwrap import shorten\n\nlogger = logging.getLogger(\"rag-system\")\n\n# Global log format – only set if user has not configured logging\nif not logger.handlers:\n    logging.basicConfig(\n        level=logging.INFO,\n        format=\"%(asctime)s | %(levelname)-7s | %(name)s | %(message)s\",\n    )\n\n\ndef log_query(query: str, sub_queries: List[str] | None = None) -> None:\n    \"\"\"Emit a nicely-formatted block describing the incoming query and any\n    decomposition.\"\"\"\n    border = \"=\" * 60\n    logger.info(\"\\n%s\\nUSER QUERY: %s\", border, query)\n    if sub_queries:\n        for i, q in enumerate(sub_queries, 1):\n            logger.info(\"  sub-%d → %s\", i, q)\n    logger.info(\"%s\", border)\n\n\ndef log_retrieval_results(results: List[Dict], k: int) -> None:\n    \"\"\"Show chunk_id, truncated text and score for the first *k* rows.\"\"\"\n    if not results:\n        logger.info(\"Retrieval returned 0 documents.\")\n        return\n    logger.info(\"Top %d results:\", min(k, len(results)))\n    header = f\"{'chunk_id':<14} {'score':<7} preview\"\n    logger.info(header)\n    logger.info(\"-\" * len(header))\n    for row in results[:k]:\n        preview = shorten(row.get(\"text\", \"\"), width=60, placeholder=\"…\")\n        logger.info(\"%s %-7.3f %s\", str(row.get(\"chunk_id\"))[:12], row.get(\"score\", 0.0), preview) "
  },
  {
    "path": "rag_system/utils/ollama_client.py",
    "content": "import requests\nimport json\nfrom typing import List, Dict, Any\nimport base64\nfrom io import BytesIO\nfrom PIL import Image\nimport httpx, asyncio\n\nclass OllamaClient:\n    \"\"\"\n    An enhanced client for Ollama that now handles image data for VLM models.\n    \"\"\"\n    def __init__(self, host: str = \"http://localhost:11434\"):\n        self.host = host\n        self.api_url = f\"{host}/api\"\n        # (Connection check remains the same)\n\n    def _image_to_base64(self, image: Image.Image) -> str:\n        \"\"\"Converts a Pillow Image to a base64 string.\"\"\"\n        buffered = BytesIO()\n        image.save(buffered, format=\"PNG\")\n        return base64.b64encode(buffered.getvalue()).decode('utf-8')\n\n    def generate_embedding(self, model: str, text: str) -> List[float]:\n        try:\n            response = requests.post(\n                f\"{self.api_url}/embeddings\",\n                json={\"model\": model, \"prompt\": text}\n            )\n            response.raise_for_status()\n            return response.json().get(\"embedding\", [])\n        except requests.exceptions.RequestException as e:\n            print(f\"Error generating embedding: {e}\")\n            return []\n\n    def generate_completion(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        format: str = \"\",\n        images: List[Image.Image] | None = None,\n        enable_thinking: bool | None = None,\n    ) -> Dict[str, Any]:\n        \"\"\"\n        Generates a completion, now with optional support for images.\n\n        Args:\n            model: The name of the generation model (e.g., 'llava', 'qwen-vl').\n            prompt: The text prompt for the model.\n            format: The format for the response, e.g., \"json\".\n            images: A list of Pillow Image objects to send to the VLM.\n            enable_thinking: Optional flag to disable chain-of-thought for Qwen models.\n        \"\"\"\n        try:\n            payload = {\n                \"model\": model,\n                \"prompt\": prompt,\n                \"stream\": False\n            }\n            if format:\n                payload[\"format\"] = format\n            \n            if images:\n                payload[\"images\"] = [self._image_to_base64(img) for img in images]\n\n            # Optional: disable thinking mode for Qwen3 / DeepSeek models\n            if enable_thinking is not None:\n                payload[\"chat_template_kwargs\"] = {\"enable_thinking\": enable_thinking}\n\n            response = requests.post(\n                f\"{self.api_url}/generate\",\n                json=payload\n            )\n            response.raise_for_status()\n            response_lines = response.text.strip().split('\\n')\n            final_response = json.loads(response_lines[-1])\n            return final_response\n\n        except requests.exceptions.RequestException as e:\n            print(f\"Error generating completion: {e}\")\n            return {}\n\n    # -------------------------------------------------------------\n    # Async variant – uses httpx so the caller can await multiple\n    # LLM calls concurrently (triage, verification, etc.).\n    # -------------------------------------------------------------\n    async def generate_completion_async(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        format: str = \"\",\n        images: List[Image.Image] | None = None,\n        enable_thinking: bool | None = None,\n        timeout: int = 60,\n    ) -> Dict[str, Any]:\n        \"\"\"Asynchronous version of generate_completion using httpx.\"\"\"\n\n        payload = {\"model\": model, \"prompt\": prompt, \"stream\": False}\n        if format:\n            payload[\"format\"] = format\n        if images:\n            payload[\"images\"] = [self._image_to_base64(img) for img in images]\n\n        if enable_thinking is not None:\n            payload[\"chat_template_kwargs\"] = {\"enable_thinking\": enable_thinking}\n\n        try:\n            async with httpx.AsyncClient(timeout=timeout) as client:\n                resp = await client.post(f\"{self.api_url}/generate\", json=payload)\n                resp.raise_for_status()\n                return json.loads(resp.text.strip().split(\"\\n\")[-1])\n        except (httpx.HTTPError, asyncio.CancelledError) as e:\n            print(f\"Async Ollama completion error: {e}\")\n            return {}\n\n    # -------------------------------------------------------------\n    # Streaming variant – yields token chunks in real time\n    # -------------------------------------------------------------\n    def stream_completion(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        images: List[Image.Image] | None = None,\n        enable_thinking: bool | None = None,\n    ):\n        \"\"\"Generator that yields partial *response* strings as they arrive.\n\n        Example:\n\n            for tok in client.stream_completion(\"qwen2\", \"Hello\"):\n                print(tok, end=\"\", flush=True)\n        \"\"\"\n        payload: Dict[str, Any] = {\"model\": model, \"prompt\": prompt, \"stream\": True}\n        if images:\n            payload[\"images\"] = [self._image_to_base64(img) for img in images]\n        if enable_thinking is not None:\n            payload[\"chat_template_kwargs\"] = {\"enable_thinking\": enable_thinking}\n\n        with requests.post(f\"{self.api_url}/generate\", json=payload, stream=True) as resp:\n            resp.raise_for_status()\n            for raw_line in resp.iter_lines():\n                if not raw_line:\n                    # Keep-alive newline\n                    continue\n                try:\n                    data = json.loads(raw_line.decode())\n                except json.JSONDecodeError:\n                    continue\n                # The Ollama streaming API sends objects like {\"response\":\"Hi\",\"done\":false}\n                chunk = data.get(\"response\", \"\")\n                if chunk:\n                    yield chunk\n                if data.get(\"done\"):\n                    break\n\nif __name__ == '__main__':\n    # This test now requires a VLM model like 'llava' or 'qwen-vl' to be pulled.\n    print(\"Ollama client updated for multimodal (VLM) support.\")\n    try:\n        client = OllamaClient()\n        # Create a dummy black image for testing\n        dummy_image = Image.new('RGB', (100, 100), 'black')\n        \n        # Test VLM completion\n        vlm_response = client.generate_completion(\n            model=\"llava\", # Make sure you have run 'ollama pull llava'\n            prompt=\"What color is this image?\",\n            images=[dummy_image]\n        )\n        \n        if vlm_response and 'response' in vlm_response:\n            print(\"\\n--- VLM Test Response ---\")\n            print(vlm_response['response'])\n        else:\n            print(\"\\nFailed to get VLM response. Is 'llava' model pulled and running?\")\n\n    except Exception as e:\n        print(f\"An error occurred: {e}\")"
  },
  {
    "path": "rag_system/utils/validate_model_config.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nModel Configuration Validation Script\n=====================================\n\nThis script validates the consolidated model configuration system to ensure:\n1. No configuration conflicts exist\n2. All model names are consistent across components\n3. Models are accessible and properly configured\n4. The configuration validation system works correctly\n\nRun this after making configuration changes to catch issues early.\n\"\"\"\n\nimport sys\nimport os\n# Add parent directories to path for imports\nsys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))\n\nfrom rag_system.main import (\n    PIPELINE_CONFIGS, \n    OLLAMA_CONFIG, \n    EXTERNAL_MODELS,\n    validate_model_config\n)\n\ndef print_header(title: str):\n    \"\"\"Print a formatted header.\"\"\"\n    print(f\"\\n{'='*60}\")\n    print(f\"🔍 {title}\")\n    print(f\"{'='*60}\")\n\ndef print_section(title: str):\n    \"\"\"Print a formatted section header.\"\"\" \n    print(f\"\\n{'─'*40}\")\n    print(f\"📋 {title}\")\n    print(f\"{'─'*40}\")\n\ndef validate_configuration_consistency():\n    \"\"\"Validate that all configurations are consistent.\"\"\"\n    print_header(\"CONFIGURATION CONSISTENCY VALIDATION\")\n    \n    errors = []\n    \n    # 1. Check embedding model consistency\n    print_section(\"Embedding Model Consistency\")\n    default_embedding = PIPELINE_CONFIGS[\"default\"][\"embedding_model_name\"]\n    external_embedding = EXTERNAL_MODELS[\"embedding_model\"]\n    fast_embedding = PIPELINE_CONFIGS[\"fast\"][\"embedding_model_name\"]\n    \n    print(f\"Default Config: {default_embedding}\")\n    print(f\"External Models: {external_embedding}\")  \n    print(f\"Fast Config: {fast_embedding}\")\n    \n    if default_embedding != external_embedding:\n        errors.append(f\"❌ Embedding model mismatch: default={default_embedding}, external={external_embedding}\")\n    elif default_embedding != fast_embedding:\n        errors.append(f\"❌ Embedding model mismatch: default={default_embedding}, fast={fast_embedding}\")\n    else:\n        print(\"✅ Embedding models are consistent\")\n    \n    # 2. Check reranker model consistency\n    print_section(\"Reranker Model Consistency\")\n    default_reranker = PIPELINE_CONFIGS[\"default\"][\"reranker\"][\"model_name\"]\n    external_reranker = EXTERNAL_MODELS[\"reranker_model\"]\n    \n    print(f\"Default Config: {default_reranker}\")\n    print(f\"External Models: {external_reranker}\")\n    \n    if default_reranker != external_reranker:\n        errors.append(f\"❌ Reranker model mismatch: default={default_reranker}, external={external_reranker}\")\n    else:\n        print(\"✅ Reranker models are consistent\")\n    \n    # 3. Check vision model consistency\n    print_section(\"Vision Model Consistency\")\n    default_vision = PIPELINE_CONFIGS[\"default\"][\"vision_model_name\"]\n    external_vision = EXTERNAL_MODELS[\"vision_model\"]\n    \n    print(f\"Default Config: {default_vision}\")\n    print(f\"External Models: {external_vision}\")\n    \n    if default_vision != external_vision:\n        errors.append(f\"❌ Vision model mismatch: default={default_vision}, external={external_vision}\")\n    else:\n        print(\"✅ Vision models are consistent\")\n    \n    return errors\n\ndef print_model_usage_map():\n    \"\"\"Print a comprehensive map of which models are used where.\"\"\"\n    print_header(\"MODEL USAGE MAP\")\n    \n    print_section(\"🤖 Ollama Models (Local Inference)\")\n    for model_type, model_name in OLLAMA_CONFIG.items():\n        if model_type != \"host\":\n            print(f\"  {model_type.replace('_', ' ').title()}: {model_name}\")\n    \n    print_section(\"🔗 External Models (HuggingFace/Direct)\")\n    for model_type, model_name in EXTERNAL_MODELS.items():\n        print(f\"  {model_type.replace('_', ' ').title()}: {model_name}\")\n    \n    print_section(\"📍 Model Usage by Component\")\n    usage_map = {\n        \"🔤 Text Embedding\": {\n            \"Model\": EXTERNAL_MODELS[\"embedding_model\"],\n            \"Used In\": [\"Retrieval Pipeline\", \"Semantic Cache\", \"Dense Retrieval\", \"Late Chunking\"],\n            \"Component\": \"QwenEmbedder (representations.py)\"\n        },\n        \"🧠 Text Generation\": {\n            \"Model\": OLLAMA_CONFIG[\"generation_model\"],\n            \"Used In\": [\"Agent Loop\", \"Answer Synthesis\", \"Query Decomposition\", \"Verification\"],\n            \"Component\": \"OllamaClient\"\n        },\n        \"🚀 Enrichment/Routing\": {\n            \"Model\": OLLAMA_CONFIG[\"enrichment_model\"],\n            \"Used In\": [\"Query Routing\", \"Document Overview Analysis\"],\n            \"Component\": \"Agent Loop (_route_via_overviews)\"\n        },\n        \"🔀 Reranking\": {\n            \"Model\": EXTERNAL_MODELS[\"reranker_model\"],\n            \"Used In\": [\"Hybrid Search\", \"Document Reranking\", \"AI Reranker\"],\n            \"Component\": \"ColBERT (rerankers-lib) or QwenReranker\"\n        },\n        \"👁️ Vision\": {\n            \"Model\": EXTERNAL_MODELS[\"vision_model\"],\n            \"Used In\": [\"Multimodal Processing\", \"Image Embeddings\"],\n            \"Component\": \"Vision Pipeline (when enabled)\"\n        }\n    }\n    \n    for model_name, details in usage_map.items():\n        print(f\"\\n{model_name}\")\n        print(f\"  Model: {details['Model']}\")\n        print(f\"  Component: {details['Component']}\")\n        print(f\"  Used In: {', '.join(details['Used In'])}\")\n\ndef test_validation_function():\n    \"\"\"Test the built-in validation function.\"\"\"\n    print_header(\"VALIDATION FUNCTION TEST\")\n    \n    try:\n        result = validate_model_config()\n        if result:\n            print(\"✅ validate_model_config() passed successfully!\")\n        else:\n            print(\"❌ validate_model_config() returned False\")\n    except Exception as e:\n        print(f\"❌ validate_model_config() failed with error: {e}\")\n        return False\n    \n    return True\n\ndef check_pipeline_configurations():\n    \"\"\"Check all pipeline configurations for completeness.\"\"\"\n    print_header(\"PIPELINE CONFIGURATION COMPLETENESS\")\n    \n    required_keys = {\n        \"default\": [\"storage\", \"retrieval\", \"embedding_model_name\", \"reranker\"],\n        \"fast\": [\"storage\", \"retrieval\", \"embedding_model_name\"]\n    }\n    \n    errors = []\n    \n    for config_name, required in required_keys.items():\n        print_section(f\"{config_name.title()} Configuration\")\n        config = PIPELINE_CONFIGS.get(config_name, {})\n        \n        for key in required:\n            if key in config:\n                print(f\"  ✅ {key}: {type(config[key]).__name__}\")\n            else:\n                error_msg = f\"❌ Missing required key '{key}' in {config_name} config\"\n                errors.append(error_msg)  \n                print(f\"  {error_msg}\")\n    \n    return errors\n\ndef main():\n    \"\"\"Run all validation checks.\"\"\"\n    print(\"🚀 Starting Model Configuration Validation\")\n    print(f\"Python Path: {sys.path[0]}\")\n    \n    all_errors = []\n    \n    # Run all validation checks\n    all_errors.extend(validate_configuration_consistency())\n    all_errors.extend(check_pipeline_configurations())\n    \n    # Print model usage map\n    print_model_usage_map()\n    \n    # Test validation function\n    validation_passed = test_validation_function()\n    \n    # Final summary\n    print_header(\"VALIDATION SUMMARY\")\n    \n    if all_errors:\n        print(\"❌ VALIDATION FAILED - Issues Found:\")\n        for error in all_errors:\n            print(f\"  {error}\")\n        return 1\n    elif not validation_passed:\n        print(\"❌ VALIDATION FAILED - validate_model_config() function failed\")\n        return 1\n    else:\n        print(\"✅ ALL VALIDATIONS PASSED!\")\n        print(\"\\n🎉 Your model configuration is consistent and properly structured!\")\n        print(\"\\n📋 Summary:\")\n        print(f\"   • Embedding Model: {EXTERNAL_MODELS['embedding_model']}\")\n        print(f\"   • Generation Model: {OLLAMA_CONFIG['generation_model']}\")\n        print(f\"   • Enrichment Model: {OLLAMA_CONFIG['enrichment_model']}\")\n        print(f\"   • Reranker Model: {EXTERNAL_MODELS['reranker_model']}\")\n        print(f\"   • Vision Model: {EXTERNAL_MODELS['vision_model']}\")\n        return 0\n\nif __name__ == \"__main__\":\n    sys.exit(main()) "
  },
  {
    "path": "rag_system/utils/watsonx_client.py",
    "content": "import json\nfrom typing import List, Dict, Any, Optional\nimport base64\nfrom io import BytesIO\nfrom PIL import Image\n\n\nclass WatsonXClient:\n    \"\"\"\n    A client for IBM Watson X AI that provides similar interface to OllamaClient\n    for seamless integration with the RAG system.\n    \"\"\"\n    def __init__(\n        self,\n        api_key: str,\n        project_id: str,\n        url: str = \"https://us-south.ml.cloud.ibm.com\",\n    ):\n        \"\"\"\n        Initialize the Watson X client.\n        \n        Args:\n            api_key: IBM Cloud API key for authentication\n            project_id: Watson X project ID\n            url: Watson X service URL (default: us-south region)\n        \"\"\"\n        self.api_key = api_key\n        self.project_id = project_id\n        self.url = url\n        \n        try:\n            from ibm_watsonx_ai import APIClient\n            from ibm_watsonx_ai import Credentials\n            from ibm_watsonx_ai.foundation_models import ModelInference\n            from ibm_watsonx_ai.foundation_models.schema import TextGenParameters\n        except ImportError:\n            raise ImportError(\n                \"ibm-watsonx-ai package is required. \"\n                \"Install it with: pip install ibm-watsonx-ai\"\n            )\n        \n        self._APIClient = APIClient\n        self._Credentials = Credentials\n        self._ModelInference = ModelInference\n        self._TextGenParameters = TextGenParameters\n        \n        self.credentials = self._Credentials(\n            api_key=self.api_key,\n            url=self.url\n        )\n        \n        self.client = self._APIClient(self.credentials)\n        self.client.set.default_project(self.project_id)\n\n    def _image_to_base64(self, image: Image.Image) -> str:\n        \"\"\"Converts a Pillow Image to a base64 string.\"\"\"\n        buffered = BytesIO()\n        image.save(buffered, format=\"PNG\")\n        return base64.b64encode(buffered.getvalue()).decode('utf-8')\n\n    def generate_embedding(self, model: str, text: str) -> List[float]:\n        \"\"\"\n        Generate embeddings using Watson X embedding models.\n        Note: This requires using Watson X embedding models through the embeddings API.\n        \"\"\"\n        try:\n            from ibm_watsonx_ai.foundation_models import Embeddings\n            \n            embedding_model = Embeddings(\n                model_id=model,\n                credentials=self.credentials,\n                project_id=self.project_id\n            )\n            \n            result = embedding_model.embed_query(text)\n            return result if isinstance(result, list) else []\n            \n        except Exception as e:\n            print(f\"Error generating embedding: {e}\")\n            return []\n\n    def generate_completion(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        format: str = \"\",\n        images: Optional[List[Image.Image]] = None,\n        enable_thinking: Optional[bool] = None,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"\n        Generates a completion using Watson X foundation models.\n        \n        Args:\n            model: The name/ID of the Watson X model (e.g., 'ibm/granite-13b-chat-v2')\n            prompt: The text prompt for the model\n            format: The format for the response (e.g., \"json\")\n            images: List of Pillow Image objects (for multimodal models)\n            enable_thinking: Optional flag (not used in Watson X, kept for compatibility)\n            **kwargs: Additional parameters for text generation\n        \n        Returns:\n            Dictionary with response in Ollama-compatible format\n        \"\"\"\n        try:\n            gen_params = {}\n            \n            if kwargs.get('max_tokens'):\n                gen_params['max_new_tokens'] = kwargs['max_tokens']\n            if kwargs.get('temperature'):\n                gen_params['temperature'] = kwargs['temperature']\n            if kwargs.get('top_p'):\n                gen_params['top_p'] = kwargs['top_p']\n            if kwargs.get('top_k'):\n                gen_params['top_k'] = kwargs['top_k']\n            \n            parameters = self._TextGenParameters(**gen_params) if gen_params else None\n            \n            model_inference = self._ModelInference(\n                model_id=model,\n                credentials=self.credentials,\n                project_id=self.project_id,\n                params=parameters\n            )\n            \n            if images:\n                print(\"Warning: Image support in Watson X may vary by model\")\n                result = model_inference.generate(prompt=prompt)\n            else:\n                result = model_inference.generate(prompt=prompt)\n            \n            generated_text = \"\"\n            if isinstance(result, dict):\n                generated_text = result.get('results', [{}])[0].get('generated_text', '')\n            else:\n                generated_text = str(result)\n            \n            return {\n                'response': generated_text,\n                'model': model,\n                'done': True\n            }\n            \n        except Exception as e:\n            print(f\"Error generating completion: {e}\")\n            return {'response': '', 'error': str(e)}\n\n    async def generate_completion_async(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        format: str = \"\",\n        images: Optional[List[Image.Image]] = None,\n        enable_thinking: Optional[bool] = None,\n        timeout: int = 60,\n        **kwargs\n    ) -> Dict[str, Any]:\n        \"\"\"\n        Asynchronous version of generate_completion.\n        \n        Note: IBM Watson X SDK may not have native async support,\n        so this is a wrapper around the sync version.\n        \"\"\"\n        import asyncio\n        \n        loop = asyncio.get_event_loop()\n        return await loop.run_in_executor(\n            None,\n            lambda: self.generate_completion(\n                model, prompt, format=format, images=images,\n                enable_thinking=enable_thinking, **kwargs\n            )\n        )\n\n    def stream_completion(\n        self,\n        model: str,\n        prompt: str,\n        *,\n        images: Optional[List[Image.Image]] = None,\n        enable_thinking: Optional[bool] = None,\n        **kwargs\n    ):\n        \"\"\"\n        Generator that yields partial response strings as they arrive.\n        \n        Note: Watson X streaming support depends on the SDK version and model.\n        \"\"\"\n        try:\n            gen_params = {}\n            if kwargs.get('max_tokens'):\n                gen_params['max_new_tokens'] = kwargs['max_tokens']\n            if kwargs.get('temperature'):\n                gen_params['temperature'] = kwargs['temperature']\n                \n            parameters = self._TextGenParameters(**gen_params) if gen_params else None\n            \n            model_inference = self._ModelInference(\n                model_id=model,\n                credentials=self.credentials,\n                project_id=self.project_id,\n                params=parameters\n            )\n            \n            try:\n                for chunk in model_inference.generate_text_stream(prompt=prompt):\n                    if chunk:\n                        yield chunk\n            except AttributeError:\n                result = model_inference.generate(prompt=prompt)\n                generated_text = \"\"\n                if isinstance(result, dict):\n                    generated_text = result.get('results', [{}])[0].get('generated_text', '')\n                else:\n                    generated_text = str(result)\n                yield generated_text\n                \n        except Exception as e:\n            print(f\"Error in stream_completion: {e}\")\n            yield \"\"\n\n\nif __name__ == '__main__':\n    print(\"Watson X Client for IBM watsonx.ai integration\")\n    print(\"This client provides Ollama-compatible interface for Watson X granite models\")\n    print(\"\\nTo use this client, you need:\")\n    print(\"1. IBM Cloud API key\")\n    print(\"2. Watson X project ID\")\n    print(\"3. ibm-watsonx-ai package installed\")\n    print(\"\\nExample usage:\")\n    print(\"\"\"\n    from rag_system.utils.watsonx_client import WatsonXClient\n    \n    client = WatsonXClient(\n        api_key=\"your-api-key\",\n        project_id=\"your-project-id\"\n    )\n    \n    response = client.generate_completion(\n        model=\"ibm/granite-13b-chat-v2\",\n        prompt=\"What is AI?\"\n    )\n    print(response['response'])\n    \"\"\")\n"
  },
  {
    "path": "requirements-docker.txt",
    "content": "requests\npython-dotenv\nPyPDF2\ncolpali-engine\nPyMuPDF\nPillow\ntransformers==4.51.0\ntorch==2.4.1\ntorchvision==0.19.1\nlancedb\nrank_bm25\nfuzzywuzzy\npython-Levenshtein\ntorchaudio\nsentencepiece\naccelerate\ndocling\ncachetools\nnumpy\nnetworkx\nmatplotlib\npsutil\nhttpx\nscikit-learn\npandas\nsentence_transformers\nrerankers\nnltk\n# Standard library modules (no need to install)\n# asyncio, logging, json, os, sys, typing, threading, itertools, math, re\n# ocrmac - removed for Docker compatibility (macOS-specific) \n"
  },
  {
    "path": "requirements.txt",
    "content": "requests\npython-dotenv\nPyPDF2\ncolpali-engine\nrequests\npython-dotenv\nPyPDF2\ncolpali-engine\nPyMuPDF\nPillow\ntransformers==4.51.0\ntorch==2.4.1\ntorchvision==0.19.1\nlancedb\nrank_bm25\nfuzzywuzzy\npython-Levenshtein\ntorchaudio\nsentencepiece\naccelerate\ndocling\ncachetools\nnumpy\nnetworkx\nmatplotlib\npsutil\nhttpx\nscikit-learn\npandas\nsentence_transformers\nrerankers\nnltk\n"
  },
  {
    "path": "run_system.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nRAG System Unified Launcher\n===========================\n\nA comprehensive launcher that starts all RAG system components:\n- Ollama server\n- RAG API server (port 8001)\n- Backend server (port 8000)  \n- Frontend server (port 3000)\n\nFeatures:\n- Single command startup\n- Real-time log aggregation\n- Process health monitoring\n- Graceful shutdown\n- Production-ready deployment support\n\nUsage:\n    python run_system.py [--mode dev|prod] [--logs-only] [--no-frontend]\n\"\"\"\n\nimport subprocess\nimport threading\nimport time\nimport signal\nimport sys\nimport os\nimport argparse\nimport json\nimport requests\nfrom pathlib import Path\nfrom datetime import datetime\nfrom typing import Dict, List, Optional, TextIO\nimport logging\nfrom dataclasses import dataclass\nimport psutil\n\n@dataclass\nclass ServiceConfig:\n    name: str\n    command: List[str]\n    port: int\n    cwd: Optional[str] = None\n    env: Optional[Dict[str, str]] = None\n    health_check_path: str = \"/health\"\n    startup_delay: int = 2\n    required: bool = True\n\nclass ColoredFormatter(logging.Formatter):\n    \"\"\"Custom formatter with colors for different log levels and services.\"\"\"\n    \n    COLORS = {\n        'DEBUG': '\\033[36m',     # Cyan\n        'INFO': '\\033[32m',      # Green\n        'WARNING': '\\033[33m',   # Yellow\n        'ERROR': '\\033[31m',     # Red\n        'CRITICAL': '\\033[35m',  # Magenta\n    }\n    \n    SERVICE_COLORS = {\n        'ollama': '\\033[94m',     # Blue\n        'rag-api': '\\033[95m',    # Magenta\n        'backend': '\\033[96m',    # Cyan\n        'frontend': '\\033[93m',   # Yellow\n        'system': '\\033[92m',     # Green\n    }\n    \n    RESET = '\\033[0m'\n    \n    def format(self, record):\n        # Add service-specific coloring\n        service_name = getattr(record, 'service', 'system')\n        service_color = self.SERVICE_COLORS.get(service_name, self.COLORS.get(record.levelname, ''))\n        \n        # Format timestamp\n        timestamp = datetime.fromtimestamp(record.created).strftime('%H:%M:%S')\n        \n        # Create colored log line\n        colored_service = f\"{service_color}[{service_name.upper()}]{self.RESET}\"\n        colored_level = f\"{self.COLORS.get(record.levelname, '')}{record.levelname}{self.RESET}\"\n        \n        return f\"{timestamp} {colored_service} {colored_level}: {record.getMessage()}\"\n\nclass ServiceManager:\n    \"\"\"Manages multiple system services with logging and health monitoring.\"\"\"\n    \n    def __init__(self, mode: str = \"dev\", logs_dir: str = \"logs\"):\n        self.mode = mode\n        self.logs_dir = Path(logs_dir)\n        self.logs_dir.mkdir(exist_ok=True)\n        \n        self.processes: Dict[str, subprocess.Popen] = {}\n        self.log_threads: Dict[str, threading.Thread] = {}\n        self.running = False\n        \n        # Setup logging\n        self.setup_logging()\n        \n        # Service configurations\n        self.services = self._get_service_configs()\n        \n        # Register signal handlers for graceful shutdown\n        signal.signal(signal.SIGINT, self._signal_handler)\n        signal.signal(signal.SIGTERM, self._signal_handler)\n    \n    def setup_logging(self):\n        \"\"\"Setup centralized logging with colors.\"\"\"\n        # Create main logger\n        self.logger = logging.getLogger('system')\n        self.logger.setLevel(logging.INFO)\n        \n        # Console handler with colors\n        console_handler = logging.StreamHandler(sys.stdout)\n        console_handler.setFormatter(ColoredFormatter())\n        self.logger.addHandler(console_handler)\n        \n        # File handler for system logs\n        file_handler = logging.FileHandler(self.logs_dir / 'system.log')\n        file_handler.setFormatter(logging.Formatter(\n            '%(asctime)s [%(levelname)s] %(message)s'\n        ))\n        self.logger.addHandler(file_handler)\n    \n    def _get_service_configs(self) -> Dict[str, ServiceConfig]:\n        \"\"\"Define service configurations based on mode.\"\"\"\n        base_configs = {\n            'ollama': ServiceConfig(\n                name='ollama',\n                command=['ollama', 'serve'],\n                port=11434,\n                startup_delay=5,\n                required=True\n            ),\n            'rag-api': ServiceConfig(\n                name='rag-api',\n                command=[sys.executable, '-m', 'rag_system.api_server'],\n                port=8001,\n                startup_delay=3,\n                required=True\n            ),\n            'backend': ServiceConfig(\n                name='backend',\n                command=[sys.executable, 'backend/server.py'],\n                port=8000,\n                startup_delay=2,\n                required=True\n            ),\n            'frontend': ServiceConfig(\n                name='frontend',\n                command=['npm', 'run', 'dev' if self.mode == 'dev' else 'start'],\n                port=3000,\n                startup_delay=5,\n                required=False  # Optional in case Node.js not available\n            )\n        }\n        \n        # Production mode adjustments\n        if self.mode == 'prod':\n            # Use production build for frontend\n            base_configs['frontend'].command = ['npm', 'run', 'start']\n            # Add production environment variables\n            base_configs['rag-api'].env = {'NODE_ENV': 'production'}\n            base_configs['backend'].env = {'NODE_ENV': 'production'}\n        \n        return base_configs\n    \n    def _signal_handler(self, signum, frame):\n        \"\"\"Handle shutdown signals gracefully.\"\"\"\n        self.logger.info(f\"Received signal {signum}, shutting down...\")\n        self.shutdown()\n        sys.exit(0)\n    \n    def is_port_in_use(self, port: int) -> bool:\n        \"\"\"Check if a port is already in use.\"\"\"\n        try:\n            for conn in psutil.net_connections():\n                if conn.laddr.port == port and conn.status == 'LISTEN':\n                    return True\n            return False\n        except (psutil.AccessDenied, AttributeError):\n            # Fallback method\n            import socket\n            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:\n                return s.connect_ex(('localhost', port)) == 0\n    \n    def check_prerequisites(self) -> bool:\n        \"\"\"Check if all required tools are available.\"\"\"\n        self.logger.info(\"🔍 Checking prerequisites...\")\n        \n        missing_tools = []\n        \n        # Check Ollama\n        if not self._command_exists('ollama'):\n            missing_tools.append('ollama (https://ollama.ai)')\n        \n        # Check Python\n        if not self._command_exists('python') and not self._command_exists('python3'):\n            missing_tools.append('python')\n        \n        # Check Node.js (optional)\n        if not self._command_exists('npm'):\n            self.logger.warning(\"⚠️  npm not found - frontend will be disabled\")\n            self.services['frontend'].required = False\n        \n        if missing_tools:\n            self.logger.error(f\"❌ Missing required tools: {', '.join(missing_tools)}\")\n            return False\n        \n        self.logger.info(\"✅ All prerequisites satisfied\")\n        return True\n    \n    def _command_exists(self, command: str) -> bool:\n        \"\"\"Check if a command exists in PATH.\"\"\"\n        try:\n            subprocess.run([command, '--version'], \n                         capture_output=True, check=True, timeout=5)\n            return True\n        except (subprocess.CalledProcessError, subprocess.TimeoutExpired, FileNotFoundError):\n            return False\n    \n    def ensure_models(self):\n        \"\"\"Ensure required Ollama models are available.\"\"\"\n        self.logger.info(\"📥 Checking required models...\")\n        \n        required_models = ['qwen3:8b', 'qwen3:0.6b']\n        \n        try:\n            # Get list of installed models\n            result = subprocess.run(['ollama', 'list'], \n                                  capture_output=True, text=True, timeout=10)\n            installed_models = result.stdout\n            \n            for model in required_models:\n                if model not in installed_models:\n                    self.logger.info(f\"📥 Pulling {model}...\")\n                    subprocess.run(['ollama', 'pull', model], \n                                 check=True, timeout=300)  # 5 min timeout\n                    self.logger.info(f\"✅ {model} ready\")\n                else:\n                    self.logger.info(f\"✅ {model} already available\")\n                    \n        except subprocess.TimeoutExpired:\n            self.logger.warning(\"⚠️  Model check timed out - continuing anyway\")\n        except subprocess.CalledProcessError as e:\n            self.logger.warning(f\"⚠️  Could not check/pull models: {e}\")\n    \n    def start_service(self, service_name: str, config: ServiceConfig) -> bool:\n        \"\"\"Start a single service.\"\"\"\n        if service_name in self.processes:\n            self.logger.warning(f\"⚠️  {service_name} already running\")\n            return True\n        \n        # Check if port is in use\n        if self.is_port_in_use(config.port):\n            self.logger.warning(f\"⚠️  Port {config.port} already in use, skipping {service_name}\")\n            return not config.required\n        \n        self.logger.info(f\"🔄 Starting {service_name} on port {config.port}...\")\n        \n        try:\n            # Setup environment\n            env = os.environ.copy()\n            if config.env:\n                env.update(config.env)\n            \n            # Start process\n            process = subprocess.Popen(\n                config.command,\n                cwd=config.cwd,\n                env=env,\n                stdout=subprocess.PIPE,\n                stderr=subprocess.STDOUT,\n                text=True,\n                bufsize=1,\n                universal_newlines=True\n            )\n            \n            self.processes[service_name] = process\n            \n            # Start log monitoring thread\n            log_thread = threading.Thread(\n                target=self._monitor_service_logs,\n                args=(service_name, process),\n                daemon=True\n            )\n            log_thread.start()\n            self.log_threads[service_name] = log_thread\n            \n            # Wait for startup\n            time.sleep(config.startup_delay)\n            \n            # Check if process is still running\n            if process.poll() is None:\n                self.logger.info(f\"✅ {service_name} started successfully (PID: {process.pid})\")\n                return True\n            else:\n                self.logger.error(f\"❌ {service_name} failed to start\")\n                return False\n                \n        except Exception as e:\n            self.logger.error(f\"❌ Failed to start {service_name}: {e}\")\n            return False\n    \n    def _monitor_service_logs(self, service_name: str, process: subprocess.Popen):\n        \"\"\"Monitor service logs and forward to main logger.\"\"\"\n        service_logger = logging.getLogger(service_name)\n        service_logger.setLevel(logging.INFO)\n        \n        # Add file handler for this service\n        file_handler = logging.FileHandler(self.logs_dir / f'{service_name}.log')\n        file_handler.setFormatter(logging.Formatter('%(asctime)s %(message)s'))\n        service_logger.addHandler(file_handler)\n        \n        try:\n            for line in iter(process.stdout.readline, ''):\n                if line.strip():\n                    # Create log record with service context\n                    record = logging.LogRecord(\n                        name=service_name,\n                        level=logging.INFO,\n                        pathname='',\n                        lineno=0,\n                        msg=line.strip(),\n                        args=(),\n                        exc_info=None\n                    )\n                    record.service = service_name\n                    \n                    # Log to both service file and main console\n                    service_logger.handle(record)\n                    self.logger.handle(record)\n                    \n        except Exception as e:\n            self.logger.error(f\"Error monitoring {service_name} logs: {e}\")\n    \n    def health_check(self, service_name: str, config: ServiceConfig) -> bool:\n        \"\"\"Perform health check on a service.\"\"\"\n        try:\n            url = f\"http://localhost:{config.port}{config.health_check_path}\"\n            response = requests.get(url, timeout=5)\n            return response.status_code == 200\n        except:\n            return False\n    \n    def start_all(self, skip_frontend: bool = False) -> bool:\n        \"\"\"Start all services in order.\"\"\"\n        self.logger.info(\"🚀 Starting RAG System Components...\")\n        \n        if not self.check_prerequisites():\n            return False\n        \n        self.running = True\n        failed_services = []\n        \n        # Start services in dependency order\n        service_order = ['ollama', 'rag-api', 'backend']\n        if not skip_frontend and 'frontend' in self.services:\n            service_order.append('frontend')\n        \n        for service_name in service_order:\n            if service_name not in self.services:\n                continue\n                \n            config = self.services[service_name]\n            \n            # Special handling for Ollama\n            if service_name == 'ollama':\n                if not self._start_ollama():\n                    if config.required:\n                        failed_services.append(service_name)\n                        continue\n                    else:\n                        self.logger.warning(f\"⚠️  Skipping optional service: {service_name}\")\n                        continue\n            else:\n                if not self.start_service(service_name, config):\n                    if config.required:\n                        failed_services.append(service_name)\n                    else:\n                        self.logger.warning(f\"⚠️  Skipping optional service: {service_name}\")\n        \n        if failed_services:\n            self.logger.error(f\"❌ Failed to start required services: {', '.join(failed_services)}\")\n            return False\n        \n        # Print status summary\n        self._print_status_summary()\n        return True\n    \n    def _start_ollama(self) -> bool:\n        \"\"\"Special handling for Ollama startup.\"\"\"\n        # Check if Ollama is already running\n        if self.is_port_in_use(11434):\n            self.logger.info(\"✅ Ollama already running\")\n            self.ensure_models()\n            return True\n        \n        # Start Ollama\n        if self.start_service('ollama', self.services['ollama']):\n            self.ensure_models()\n            return True\n        \n        return False\n    \n    def _print_status_summary(self):\n        \"\"\"Print system status summary.\"\"\"\n        self.logger.info(\"\")\n        self.logger.info(\"🎉 RAG System Started!\")\n        self.logger.info(\"📊 Services Status:\")\n        \n        for service_name, config in self.services.items():\n            if service_name in self.processes or self.is_port_in_use(config.port):\n                status = \"✅ Running\"\n                url = f\"http://localhost:{config.port}\"\n                self.logger.info(f\"   • {service_name.capitalize():<10}: {status:<10} {url}\")\n            else:\n                self.logger.info(f\"   • {service_name.capitalize():<10}: ❌ Stopped\")\n        \n        self.logger.info(\"\")\n        self.logger.info(\"🌐 Access your RAG system at: http://localhost:3000\")\n        self.logger.info(\"\")\n        self.logger.info(\"📋 Useful commands:\")\n        self.logger.info(\"   • Stop system:  Ctrl+C\")\n        self.logger.info(\"   • Check logs:   tail -f logs/*.log\")\n        self.logger.info(\"   • Health check: python run_system.py --health\")\n    \n    def shutdown(self):\n        \"\"\"Gracefully shutdown all services.\"\"\"\n        if not self.running:\n            return\n        \n        self.logger.info(\"🛑 Shutting down RAG system...\")\n        self.running = False\n        \n        # Stop services in reverse order\n        for service_name in reversed(list(self.processes.keys())):\n            self._stop_service(service_name)\n        \n        self.logger.info(\"✅ All services stopped\")\n    \n    def _stop_service(self, service_name: str):\n        \"\"\"Stop a single service.\"\"\"\n        if service_name not in self.processes:\n            return\n        \n        process = self.processes[service_name]\n        self.logger.info(f\"🔄 Stopping {service_name}...\")\n        \n        try:\n            # Try graceful shutdown first\n            process.terminate()\n            \n            # Wait up to 10 seconds for graceful shutdown\n            try:\n                process.wait(timeout=10)\n            except subprocess.TimeoutExpired:\n                # Force kill if graceful shutdown fails\n                process.kill()\n                process.wait()\n            \n            self.logger.info(f\"✅ {service_name} stopped\")\n            \n        except Exception as e:\n            self.logger.error(f\"❌ Error stopping {service_name}: {e}\")\n        finally:\n            del self.processes[service_name]\n    \n    def monitor(self):\n        \"\"\"Monitor running services and restart if needed.\"\"\"\n        self.logger.info(\"👁️  Monitoring services... (Press Ctrl+C to stop)\")\n        \n        try:\n            while self.running:\n                time.sleep(30)  # Check every 30 seconds\n                \n                for service_name, process in list(self.processes.items()):\n                    if process.poll() is not None:\n                        self.logger.warning(f\"⚠️  {service_name} has stopped unexpectedly\")\n                        \n                        # Restart the service\n                        config = self.services[service_name]\n                        if config.required:\n                            self.logger.info(f\"🔄 Restarting {service_name}...\")\n                            del self.processes[service_name]\n                            self.start_service(service_name, config)\n                        \n        except KeyboardInterrupt:\n            self.logger.info(\"Monitoring stopped by user\")\n\ndef main():\n    \"\"\"Main entry point.\"\"\"\n    parser = argparse.ArgumentParser(description='RAG System Unified Launcher')\n    parser.add_argument('--mode', choices=['dev', 'prod'], default='dev',\n                       help='Run mode (default: dev)')\n    parser.add_argument('--logs-only', action='store_true',\n                       help='Only show aggregated logs from running services')\n    parser.add_argument('--no-frontend', action='store_true',\n                       help='Skip frontend startup')\n    parser.add_argument('--health', action='store_true',\n                       help='Check health of running services')\n    parser.add_argument('--stop', action='store_true',\n                       help='Stop all running services')\n    \n    args = parser.parse_args()\n    \n    # Create service manager\n    manager = ServiceManager(mode=args.mode)\n    \n    try:\n        if args.health:\n            # Health check mode\n            manager._print_status_summary()\n            return\n        \n        if args.stop:\n            # Stop mode - kill any running processes\n            manager.logger.info(\"🛑 Stopping all RAG system processes...\")\n            # Implementation for stopping would go here\n            return\n        \n        if args.logs_only:\n            # Logs only mode - just tail existing logs\n            manager.logger.info(\"📋 Showing aggregated logs... (Press Ctrl+C to stop)\")\n            manager.monitor()\n            return\n        \n        # Normal startup mode\n        if manager.start_all(skip_frontend=args.no_frontend):\n            manager.monitor()\n        else:\n            manager.logger.error(\"❌ System startup failed\")\n            sys.exit(1)\n            \n    except KeyboardInterrupt:\n        manager.logger.info(\"Received interrupt signal\")\n    finally:\n        manager.shutdown()\n\nif __name__ == \"__main__\":\n    main() "
  },
  {
    "path": "setup_rag_system.sh",
    "content": "#!/bin/bash\n# setup_rag_system.sh - Complete RAG System Setup Script\n# This script handles Docker installation, system setup, and initial configuration\n\nset -e\n\n# Colors for output\nRED='\\033[0;31m'\nGREEN='\\033[0;32m'\nYELLOW='\\033[1;33m'\nBLUE='\\033[0;34m'\nNC='\\033[0m' # No Color\n\n# Logging function\nlog() {\n    echo -e \"${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] $1${NC}\"\n}\n\nwarn() {\n    echo -e \"${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] WARNING: $1${NC}\"\n}\n\nerror() {\n    echo -e \"${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $1${NC}\"\n}\n\ninfo() {\n    echo -e \"${BLUE}[$(date +'%Y-%m-%d %H:%M:%S')] INFO: $1${NC}\"\n}\n\n# Check if running as root\nif [[ $EUID -eq 0 ]]; then\n    error \"This script should not be run as root (except for package installation steps)\"\n    exit 1\nfi\n\necho \"================================================================\"\necho \"🚀 RAG System Complete Setup Script\"\necho \"================================================================\"\necho \"\"\n\n# Step 1: System Requirements Check\nlog \"Step 1: Checking system requirements...\"\n\n# Check OS\nif [[ \"$OSTYPE\" == \"darwin\"* ]]; then\n    OS=\"macos\"\n    info \"Detected macOS\"\nelif [[ -f /etc/os-release ]]; then\n    . /etc/os-release\n    OS=$ID\n    info \"Detected Linux: $OS\"\nelse\n    error \"Unsupported operating system\"\n    exit 1\nfi\n\n# Check available memory\nMEMORY_GB=$(free -g 2>/dev/null | grep '^Mem:' | awk '{print $2}' || sysctl -n hw.memsize 2>/dev/null | awk '{print int($1/1024/1024/1024)}' || echo \"unknown\")\nif [[ \"$MEMORY_GB\" != \"unknown\" && \"$MEMORY_GB\" -lt 8 ]]; then\n    warn \"System has ${MEMORY_GB}GB RAM. Recommended: 16GB+ for optimal performance\"\nelse\n    info \"Memory check passed: ${MEMORY_GB}GB RAM\"\nfi\n\n# Check available disk space\nDISK_GB=$(df -BG . | tail -1 | awk '{print $4}' | sed 's/G//' || echo \"unknown\")\nif [[ \"$DISK_GB\" != \"unknown\" && \"$DISK_GB\" -lt 50 ]]; then\n    warn \"Available disk space: ${DISK_GB}GB. Recommended: 50GB+ free space\"\nelse\n    info \"Disk space check passed: ${DISK_GB}GB available\"\nfi\n\n# Step 2: Install Dependencies\nlog \"Step 2: Installing system dependencies...\"\n\n# Install Git if not present\nif ! command -v git &> /dev/null; then\n    info \"Installing Git...\"\n    case $OS in\n        \"macos\")\n            if command -v brew &> /dev/null; then\n                brew install git\n            else\n                error \"Git not found. Please install Git first or install Homebrew\"\n                exit 1\n            fi\n            ;;\n        \"ubuntu\"|\"debian\")\n            sudo apt-get update\n            sudo apt-get install -y git\n            ;;\n        \"centos\"|\"rhel\"|\"fedora\")\n            if command -v dnf &> /dev/null; then\n                sudo dnf install -y git\n            else\n                sudo yum install -y git\n            fi\n            ;;\n    esac\nelse\n    info \"Git is already installed: $(git --version)\"\nfi\n\n# Install curl if not present\nif ! command -v curl &> /dev/null; then\n    info \"Installing curl...\"\n    case $OS in\n        \"macos\")\n            # curl is usually pre-installed on macOS\n            ;;\n        \"ubuntu\"|\"debian\")\n            sudo apt-get install -y curl\n            ;;\n        \"centos\"|\"rhel\"|\"fedora\")\n            if command -v dnf &> /dev/null; then\n                sudo dnf install -y curl\n            else\n                sudo yum install -y curl\n            fi\n            ;;\n    esac\nelse\n    info \"curl is already installed\"\nfi\n\n# Step 3: Install Docker\nlog \"Step 3: Installing Docker...\"\n\nif command -v docker &> /dev/null; then\n    info \"Docker is already installed: $(docker --version)\"\nelse\n    info \"Docker not found. Installing Docker...\"\n    \n    case $OS in\n        \"macos\")\n            # Check if Homebrew is installed\n            if ! command -v brew &> /dev/null; then\n                info \"Installing Homebrew...\"\n                /bin/bash -c \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\"\n            fi\n            \n            # Install Docker Desktop\n            info \"Installing Docker Desktop...\"\n            brew install --cask docker\n            \n            warn \"Docker Desktop installed. Please:\"\n            warn \"1. Start Docker Desktop from Applications\"\n            warn \"2. Wait for Docker to start completely\"\n            warn \"3. Run this script again\"\n            exit 0\n            ;;\n            \n        \"ubuntu\"|\"debian\")\n            # Update package index\n            sudo apt-get update\n            \n            # Install dependencies\n            sudo apt-get install -y \\\n                ca-certificates \\\n                curl \\\n                gnupg \\\n                lsb-release\n            \n            # Add Docker's official GPG key\n            sudo mkdir -p /etc/apt/keyrings\n            curl -fsSL https://download.docker.com/linux/$OS/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg\n            \n            # Set up repository\n            echo \\\n              \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/$OS \\\n              $(lsb_release -cs) stable\" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null\n            \n            # Install Docker Engine\n            sudo apt-get update\n            sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin\n            \n            # Add user to docker group\n            sudo usermod -aG docker $USER\n            \n            # Start Docker service\n            sudo systemctl enable docker\n            sudo systemctl start docker\n            \n            info \"Docker installed successfully!\"\n            warn \"Please log out and log back in for group changes to take effect, then run this script again\"\n            warn \"Or run: newgrp docker && $0\"\n            exit 0\n            ;;\n            \n        \"centos\"|\"rhel\"|\"fedora\")\n            # Install required packages\n            if command -v dnf &> /dev/null; then\n                sudo dnf install -y yum-utils\n                sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo\n                sudo dnf install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin\n            else\n                sudo yum install -y yum-utils\n                sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo\n                sudo yum install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin\n            fi\n            \n            # Add user to docker group\n            sudo usermod -aG docker $USER\n            \n            # Start Docker service\n            sudo systemctl enable docker\n            sudo systemctl start docker\n            \n            info \"Docker installed successfully!\"\n            warn \"Please log out and log back in for group changes to take effect, then run this script again\"\n            exit 0\n            ;;\n    esac\nfi\n\n# Verify Docker is working\nif ! docker --version &> /dev/null; then\n    error \"Docker is not working properly. Please check Docker installation\"\n    exit 1\nfi\n\nif ! docker compose version &> /dev/null; then\n    error \"Docker Compose is not working properly. Please check Docker Compose installation\"\n    exit 1\nfi\n\ninfo \"Docker verification passed: $(docker --version)\"\ninfo \"Docker Compose verification passed: $(docker compose version)\"\n\n# Test Docker daemon\nif ! docker ps &> /dev/null; then\n    error \"Cannot connect to Docker daemon. Please ensure Docker is running\"\n    exit 1\nfi\n\n# Step 4: Setup RAG System\nlog \"Step 4: Setting up RAG System...\"\n\n# Create project directory structure\ninfo \"Creating directory structure...\"\nmkdir -p {lancedb,shared_uploads,logs,ollama_data}\nmkdir -p index_store/{overviews,bm25,graph}\nmkdir -p backups\n\n# Set proper permissions\nchmod 755 {lancedb,shared_uploads,logs,ollama_data}\nchmod 755 index_store/{overviews,bm25,graph}\nchmod 755 backups\n\n# Create environment file\nif [[ ! -f \".env\" ]]; then\n    info \"Creating environment configuration...\"\n    cat > .env << 'EOF'\n# System Configuration\nNODE_ENV=production\nLOG_LEVEL=info\nDEBUG=false\n\n# Service URLs\nFRONTEND_URL=http://localhost:3000\nBACKEND_URL=http://localhost:8000\nRAG_API_URL=http://localhost:8001\nOLLAMA_URL=http://localhost:11434\n\n# Database Configuration\nDATABASE_PATH=./backend/chat_data.db\nLANCEDB_PATH=./lancedb\nUPLOADS_PATH=./shared_uploads\nINDEX_STORE_PATH=./index_store\n\n# Model Configuration\nDEFAULT_EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2\n# Default model names - updated to current versions\nDEFAULT_GENERATION_MODEL=qwen3:8b\nDEFAULT_RERANKER_MODEL=answerdotai/answerai-colbert-small-v1\nDEFAULT_ENRICHMENT_MODEL=qwen3:0.6b\n\n# Performance Configuration\nMAX_CONCURRENT_REQUESTS=5\nREQUEST_TIMEOUT=300\nEMBEDDING_BATCH_SIZE=32\nMAX_CONTEXT_LENGTH=4096\n\n# Security Configuration\nCORS_ORIGINS=http://localhost:3000\nAPI_KEY_REQUIRED=false\nRATE_LIMIT_REQUESTS=100\nRATE_LIMIT_WINDOW=60\n\n# Storage Configuration\nMAX_FILE_SIZE=50MB\nMAX_UPLOAD_FILES=10\nCLEANUP_INTERVAL=3600\nBACKUP_RETENTION_DAYS=30\nEOF\n    info \"Environment file created: .env\"\nelse\n    info \"Environment file already exists: .env\"\nfi\n\n# Step 5: Build and Start Services\nlog \"Step 5: Building and starting services...\"\n\ninfo \"Building Docker containers (this may take 10-15 minutes)...\"\ndocker compose build --no-cache\n\ninfo \"Starting services...\"\ndocker compose up -d\n\n# Wait for services to start\ninfo \"Waiting for services to initialize...\"\nsleep 30\n\n# Check service status\ninfo \"Checking service status...\"\ndocker compose ps\n\n# Step 6: Install AI Models\nlog \"Step 6: Installing AI models...\"\n\n# Wait for Ollama to be ready\ninfo \"Waiting for Ollama to be ready...\"\nmax_attempts=30\nattempt=0\nwhile ! docker compose exec ollama ollama list &> /dev/null; do\n    if [ $attempt -ge $max_attempts ]; then\n        error \"Ollama failed to start after $max_attempts attempts\"\n        exit 1\n    fi\n    info \"Waiting for Ollama... (attempt $((attempt+1))/$max_attempts)\"\n    sleep 10\n    ((attempt++))\ndone\n\n# Download Ollama models\ninfo \"Downloading required Ollama models...\"\ndocker compose exec ollama ollama pull qwen3:8b\ndocker compose exec ollama ollama pull qwen3:0.6b\n\ninfo \"Verifying model installation...\"\ndocker compose exec ollama ollama list\n\n# Step 7: System Verification\nlog \"Step 7: Verifying system installation...\"\n\n# Check service health\ninfo \"Checking service health...\"\nservices=(\"frontend:3000\" \"backend:8000\" \"rag-api:8001\" \"ollama:11434\")\nfor service in \"${services[@]}\"; do\n    name=\"${service%:*}\"\n    port=\"${service#*:}\"\n    \n    if curl -s -f \"http://localhost:$port\" &> /dev/null || curl -s -f \"http://localhost:$port/health\" &> /dev/null || curl -s -f \"http://localhost:$port/api/tags\" &> /dev/null || curl -s -f \"http://localhost:$port/models\" &> /dev/null; then\n        info \"✅ $name service is healthy\"\n    else\n        warn \"⚠️ $name service may not be ready yet\"\n    fi\ndone\n\n# Step 8: Create Helper Scripts\nlog \"Step 8: Creating helper scripts...\"\n\n# Create start script\ncat > start_rag_system.sh << 'EOF'\n#!/bin/bash\n# Start RAG System\necho \"Starting RAG System...\"\ndocker compose up -d\necho \"RAG System started. Access at: http://localhost:3000\"\nEOF\nchmod +x start_rag_system.sh\n\n# Create stop script\ncat > stop_rag_system.sh << 'EOF'\n#!/bin/bash\n# Stop RAG System\necho \"Stopping RAG System...\"\ndocker compose down\necho \"RAG System stopped.\"\nEOF\nchmod +x stop_rag_system.sh\n\n# Create status script\ncat > status_rag_system.sh << 'EOF'\n#!/bin/bash\n# Check RAG System Status\necho \"=== RAG System Status ===\"\ndocker compose ps\necho \"\"\necho \"=== Service Health ===\"\ncurl -s -f http://localhost:3000 && echo \"✅ Frontend: OK\" || echo \"❌ Frontend: FAIL\"\ncurl -s -f http://localhost:8000/health && echo \"✅ Backend: OK\" || echo \"❌ Backend: FAIL\"\ncurl -s -f http://localhost:8001/models && echo \"✅ RAG API: OK\" || echo \"❌ RAG API: FAIL\"\ncurl -s -f http://localhost:11434/api/tags && echo \"✅ Ollama: OK\" || echo \"❌ Ollama: FAIL\"\nEOF\nchmod +x status_rag_system.sh\n\n# Create backup script\ncat > backup_rag_system.sh << 'EOF'\n#!/bin/bash\n# Backup RAG System Data\nBACKUP_DIR=\"./backups/$(date +%Y%m%d_%H%M%S)\"\nmkdir -p \"$BACKUP_DIR\"\n\necho \"Creating backup in $BACKUP_DIR...\"\n\n# Stop services\ndocker compose down\n\n# Backup data\ncp -r ./backend/chat_data.db \"$BACKUP_DIR/\" 2>/dev/null || true\ncp -r ./lancedb \"$BACKUP_DIR/\" 2>/dev/null || true\ncp -r ./shared_uploads \"$BACKUP_DIR/\" 2>/dev/null || true\ncp -r ./index_store \"$BACKUP_DIR/\" 2>/dev/null || true\n\n# Backup configuration\ncp .env \"$BACKUP_DIR/\"\ncp docker-compose.yml \"$BACKUP_DIR/\"\n\n# Restart services\ndocker compose up -d\n\necho \"Backup completed: $BACKUP_DIR\"\nEOF\nchmod +x backup_rag_system.sh\n\n# Create update script\ncat > update_rag_system.sh << 'EOF'\n#!/bin/bash\n# Update RAG System\necho \"Updating RAG System...\"\n\n# Backup first\n./backup_rag_system.sh\n\n# Pull latest changes\ngit pull origin main\n\n# Rebuild containers\ndocker compose build --no-cache\n\n# Restart services\ndocker compose up -d\n\necho \"Update completed!\"\nEOF\nchmod +x update_rag_system.sh\n\ninfo \"Helper scripts created:\"\ninfo \"  - start_rag_system.sh: Start the system\"\ninfo \"  - stop_rag_system.sh: Stop the system\"\ninfo \"  - status_rag_system.sh: Check system status\"\ninfo \"  - backup_rag_system.sh: Backup system data\"\ninfo \"  - update_rag_system.sh: Update the system\"\n\n# Step 9: Final Setup\nlog \"Step 9: Final setup and verification...\"\n\n# Create initial database if it doesn't exist\nif [[ ! -f \"./backend/chat_data.db\" ]]; then\n    info \"Creating initial database...\"\n    docker compose exec backend python -c \"\nimport sqlite3\nconn = sqlite3.connect('/app/backend/chat_data.db')\nconn.execute('CREATE TABLE IF NOT EXISTS sessions (id TEXT PRIMARY KEY, title TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)')\nconn.execute('CREATE TABLE IF NOT EXISTS messages (id INTEGER PRIMARY KEY, session_id TEXT, content TEXT, role TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)')\nconn.execute('CREATE TABLE IF NOT EXISTS indexes (id TEXT PRIMARY KEY, name TEXT, metadata TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)')\nconn.execute('CREATE TABLE IF NOT EXISTS session_indexes (session_id TEXT, index_id TEXT, PRIMARY KEY (session_id, index_id))')\nconn.commit()\nconn.close()\nprint('Database initialized')\n\" 2>/dev/null || warn \"Database initialization may have failed\"\nfi\n\n# Final health check\ninfo \"Performing final health check...\"\nsleep 10\n./status_rag_system.sh\n\necho \"\"\necho \"================================================================\"\necho \"🎉 RAG System Setup Complete!\"\necho \"================================================================\"\necho \"\"\necho \"✅ System Status:\"\necho \"   - Frontend: http://localhost:3000\"\necho \"   - Backend API: http://localhost:8000\"\necho \"   - RAG API: http://localhost:8001\"\necho \"   - Ollama: http://localhost:11434\"\necho \"\"\necho \"📚 Documentation:\"\necho \"   - System Overview: Documentation/system_overview.md\"\necho \"   - Deployment Guide: Documentation/deployment_guide.md\"\necho \"   - Docker Usage: Documentation/docker_usage.md\"\necho \"   - Installation Guide: Documentation/installation_guide.md\"\necho \"\"\necho \"🔧 Helper Scripts:\"\necho \"   - Start system: ./start_rag_system.sh\"\necho \"   - Stop system: ./stop_rag_system.sh\"\necho \"   - Check status: ./status_rag_system.sh\"\necho \"   - Backup data: ./backup_rag_system.sh\"\necho \"   - Update system: ./update_rag_system.sh\"\necho \"\"\necho \"🚀 Next Steps:\"\necho \"   1. Open http://localhost:3000 in your browser\"\necho \"   2. Create a new chat session\"\necho \"   3. Upload some PDF documents\"\necho \"   4. Start asking questions about your documents!\"\necho \"\"\necho \"📋 System Information:\"\necho \"   - OS: $OS\"\necho \"   - Memory: ${MEMORY_GB}GB\"\necho \"   - Disk Space: ${DISK_GB}GB available\"\necho \"   - Docker: $(docker --version)\"\necho \"   - Docker Compose: $(docker compose version)\"\necho \"\"\necho \"For support and troubleshooting, check the documentation in the\"\necho \"Documentation/ folder or run ./status_rag_system.sh to check system health.\"\necho \"\" "
  },
  {
    "path": "simple_create_index.sh",
    "content": "#!/bin/bash\n\n# Simple Index Creation Script for LocalGPT RAG System\n# Usage: ./simple_create_index.sh \"Index Name\" \"path/to/document.pdf\" [additional_files...]\n\nset -e  # Exit on any error\n\n# Colors for output\nRED='\\033[0;31m'\nGREEN='\\033[0;32m'\nYELLOW='\\033[1;33m'\nBLUE='\\033[0;34m'\nNC='\\033[0m' # No Color\n\n# Function to print colored output\nprint_status() {\n    echo -e \"${BLUE}[INFO]${NC} $1\"\n}\n\nprint_success() {\n    echo -e \"${GREEN}[SUCCESS]${NC} $1\"\n}\n\nprint_warning() {\n    echo -e \"${YELLOW}[WARNING]${NC} $1\"\n}\n\nprint_error() {\n    echo -e \"${RED}[ERROR]${NC} $1\"\n}\n\n# Function to check if a command exists\ncommand_exists() {\n    command -v \"$1\" >/dev/null 2>&1\n}\n\n# Function to check prerequisites\ncheck_prerequisites() {\n    print_status \"Checking prerequisites...\"\n    \n    # Check Python\n    if ! command_exists python3; then\n        print_error \"Python 3 is required but not installed.\"\n        exit 1\n    fi\n    \n    # Check if we're in the right directory\n    if [ ! -f \"run_system.py\" ] || [ ! -d \"rag_system\" ]; then\n        print_error \"This script must be run from the LocalGPT project root directory.\"\n        exit 1\n    fi\n    \n    # Check if Ollama is running\n    if ! curl -s http://localhost:11434/api/tags >/dev/null 2>&1; then\n        print_error \"Ollama is not running. Please start Ollama first:\"\n        echo \"  ollama serve\"\n        exit 1\n    fi\n    \n    print_success \"Prerequisites check passed\"\n}\n\n# Function to validate documents\nvalidate_documents() {\n    local documents=(\"$@\")\n    local valid_docs=()\n    \n    print_status \"Validating documents...\"\n    \n    for doc in \"${documents[@]}\"; do\n        if [ -f \"$doc\" ]; then\n            # Check file extension\n            case \"${doc##*.}\" in\n                pdf|txt|docx|md|html|htm)\n                    valid_docs+=(\"$doc\")\n                    print_status \"✓ Valid document: $doc\"\n                    ;;\n                *)\n                    print_warning \"Unsupported file type: $doc (skipping)\"\n                    ;;\n            esac\n        else\n            print_warning \"File not found: $doc (skipping)\"\n        fi\n    done\n    \n    if [ ${#valid_docs[@]} -eq 0 ]; then\n        print_error \"No valid documents found.\"\n        exit 1\n    fi\n    \n    echo \"${valid_docs[@]}\"\n}\n\n# Function to create index using Python\ncreate_index() {\n    local index_name=\"$1\"\n    shift\n    local documents=(\"$@\")\n    \n    print_status \"Creating index: $index_name\"\n    print_status \"Documents: ${documents[*]}\"\n    \n    # Create a temporary Python script to create the index\n    cat > /tmp/create_index_temp.py << EOF\n#!/usr/bin/env python3\nimport sys\nimport os\nimport json\nsys.path.insert(0, os.getcwd())\n\nfrom rag_system.main import PIPELINE_CONFIGS\nfrom rag_system.pipelines.indexing_pipeline import IndexingPipeline\nfrom rag_system.utils.ollama_client import OllamaClient\nfrom backend.database import ChatDatabase\nimport uuid\n\ndef create_index_simple():\n    try:\n        # Initialize database\n        db = ChatDatabase()\n        \n        # Create index record\n        index_id = db.create_index(\n            name=\"$index_name\",\n            description=\"Created with simple_create_index.sh\",\n            metadata={\n                \"chunk_size\": 512,\n                \"chunk_overlap\": 64,\n                \"enable_enrich\": True,\n                \"enable_latechunk\": True,\n                \"retrieval_mode\": \"hybrid\",\n                \"created_by\": \"simple_create_index.sh\"\n            }\n        )\n        \n        # Add documents to index\n        documents = [${documents[@]/#/\\\"} ${documents[@]/%/\\\"}]\n        for doc_path in documents:\n            if doc_path.strip():  # Skip empty strings\n                filename = os.path.basename(doc_path.strip())\n                db.add_document_to_index(index_id, filename, os.path.abspath(doc_path.strip()))\n        \n        # Initialize pipeline\n        config = PIPELINE_CONFIGS.get(\"default\", {})\n        ollama_client = OllamaClient()\n        ollama_config = {\n            \"generation_model\": \"qwen3:0.6b\",\n            \"embedding_model\": \"qwen3:0.6b\"\n        }\n        \n        pipeline = IndexingPipeline(config, ollama_client, ollama_config)\n        \n        # Process documents\n        valid_docs = [doc.strip() for doc in documents if doc.strip() and os.path.exists(doc.strip())]\n        if valid_docs:\n            pipeline.process_documents(valid_docs)\n        \n        print(f\"✅ Index '{index_name}' created successfully!\")\n        print(f\"Index ID: {index_id}\")\n        print(f\"Processed {len(valid_docs)} documents\")\n        \n        return index_id\n        \n    except Exception as e:\n        print(f\"❌ Error creating index: {e}\")\n        import traceback\n        traceback.print_exc()\n        return None\n\nif __name__ == \"__main__\":\n    create_index_simple()\nEOF\n\n    # Run the Python script\n    python3 /tmp/create_index_temp.py\n    \n    # Clean up\n    rm -f /tmp/create_index_temp.py\n}\n\n# Function to show usage\nshow_usage() {\n    echo \"Usage: $0 \\\"Index Name\\\" \\\"path/to/document.pdf\\\" [additional_files...]\"\n    echo \"\"\n    echo \"Examples:\"\n    echo \"  $0 \\\"My Documents\\\" \\\"document.pdf\\\"\"\n    echo \"  $0 \\\"Research Papers\\\" \\\"paper1.pdf\\\" \\\"paper2.pdf\\\" \\\"notes.txt\\\"\"\n    echo \"  $0 \\\"Invoice Collection\\\" ./invoices/*.pdf\"\n    echo \"\"\n    echo \"Supported file types: PDF, TXT, DOCX, MD, HTML\"\n}\n\n# Main script\nmain() {\n    # Check arguments\n    if [ $# -lt 2 ]; then\n        print_error \"Insufficient arguments provided.\"\n        show_usage\n        exit 1\n    fi\n    \n    local index_name=\"$1\"\n    shift\n    local documents=(\"$@\")\n    \n    # Check prerequisites\n    check_prerequisites\n    \n    # Validate documents\n    local valid_documents\n    valid_documents=($(validate_documents \"${documents[@]}\"))\n    \n    if [ ${#valid_documents[@]} -eq 0 ]; then\n        print_error \"No valid documents to process.\"\n        exit 1\n    fi\n    \n    # Create the index\n    print_status \"Starting index creation process...\"\n    create_index \"$index_name\" \"${valid_documents[@]}\"\n    \n    print_success \"Index creation completed!\"\n    print_status \"You can now use the index in the LocalGPT interface.\"\n}\n\n# Run main function with all arguments\nmain \"$@\"  "
  },
  {
    "path": "src/app/globals.css",
    "content": "@import \"tailwindcss\";\n@import \"tw-animate-css\";\n\n@custom-variant dark (&:is(.dark *));\n\n@theme inline {\n  --color-background: var(--background);\n  --color-foreground: var(--foreground);\n  --font-sans: var(--font-geist-sans);\n  --font-mono: var(--font-geist-mono);\n  --color-sidebar-ring: var(--sidebar-ring);\n  --color-sidebar-border: var(--sidebar-border);\n  --color-sidebar-accent-foreground: var(--sidebar-accent-foreground);\n  --color-sidebar-accent: var(--sidebar-accent);\n  --color-sidebar-primary-foreground: var(--sidebar-primary-foreground);\n  --color-sidebar-primary: var(--sidebar-primary);\n  --color-sidebar-foreground: var(--sidebar-foreground);\n  --color-sidebar: var(--sidebar);\n  --color-chart-5: var(--chart-5);\n  --color-chart-4: var(--chart-4);\n  --color-chart-3: var(--chart-3);\n  --color-chart-2: var(--chart-2);\n  --color-chart-1: var(--chart-1);\n  --color-ring: var(--ring);\n  --color-input: var(--input);\n  --color-border: var(--border);\n  --color-destructive: var(--destructive);\n  --color-accent-foreground: var(--accent-foreground);\n  --color-accent: var(--accent);\n  --color-muted-foreground: var(--muted-foreground);\n  --color-muted: var(--muted);\n  --color-secondary-foreground: var(--secondary-foreground);\n  --color-secondary: var(--secondary);\n  --color-primary-foreground: var(--primary-foreground);\n  --color-primary: var(--primary);\n  --color-popover-foreground: var(--popover-foreground);\n  --color-popover: var(--popover);\n  --color-card-foreground: var(--card-foreground);\n  --color-card: var(--card);\n  --radius-sm: calc(var(--radius) - 4px);\n  --radius-md: calc(var(--radius) - 2px);\n  --radius-lg: var(--radius);\n  --radius-xl: calc(var(--radius) + 4px);\n}\n\n:root {\n  --radius: 0.625rem;\n  --background: oklch(1 0 0);\n  --foreground: oklch(0.145 0 0);\n  --card: oklch(1 0 0);\n  --card-foreground: oklch(0.145 0 0);\n  --popover: oklch(1 0 0);\n  --popover-foreground: oklch(0.145 0 0);\n  --primary: oklch(0.205 0 0);\n  --primary-foreground: oklch(0.985 0 0);\n  --secondary: oklch(0.97 0 0);\n  --secondary-foreground: oklch(0.205 0 0);\n  --muted: oklch(0.97 0 0);\n  --muted-foreground: oklch(0.556 0 0);\n  --accent: oklch(0.97 0 0);\n  --accent-foreground: oklch(0.205 0 0);\n  --destructive: oklch(0.577 0.245 27.325);\n  --border: oklch(0.922 0 0);\n  --input: oklch(0.922 0 0);\n  --ring: oklch(0.708 0 0);\n  --chart-1: oklch(0.646 0.222 41.116);\n  --chart-2: oklch(0.6 0.118 184.704);\n  --chart-3: oklch(0.398 0.07 227.392);\n  --chart-4: oklch(0.828 0.189 84.429);\n  --chart-5: oklch(0.769 0.188 70.08);\n  --sidebar: oklch(0.985 0 0);\n  --sidebar-foreground: oklch(0.145 0 0);\n  --sidebar-primary: oklch(0.205 0 0);\n  --sidebar-primary-foreground: oklch(0.985 0 0);\n  --sidebar-accent: oklch(0.97 0 0);\n  --sidebar-accent-foreground: oklch(0.205 0 0);\n  --sidebar-border: oklch(0.922 0 0);\n  --sidebar-ring: oklch(0.708 0 0);\n}\n\n.dark {\n  --background: oklch(0.145 0 0);\n  --foreground: oklch(0.985 0 0);\n  --card: oklch(0.205 0 0);\n  --card-foreground: oklch(0.985 0 0);\n  --popover: oklch(0.205 0 0);\n  --popover-foreground: oklch(0.985 0 0);\n  --primary: oklch(0.922 0 0);\n  --primary-foreground: oklch(0.205 0 0);\n  --secondary: oklch(0.269 0 0);\n  --secondary-foreground: oklch(0.985 0 0);\n  --muted: oklch(0.269 0 0);\n  --muted-foreground: oklch(0.708 0 0);\n  --accent: oklch(0.269 0 0);\n  --accent-foreground: oklch(0.985 0 0);\n  --destructive: oklch(0.704 0.191 22.216);\n  --border: oklch(1 0 0 / 10%);\n  --input: oklch(1 0 0 / 15%);\n  --ring: oklch(0.556 0 0);\n  --chart-1: oklch(0.488 0.243 264.376);\n  --chart-2: oklch(0.696 0.17 162.48);\n  --chart-3: oklch(0.769 0.188 70.08);\n  --chart-4: oklch(0.627 0.265 303.9);\n  --chart-5: oklch(0.645 0.246 16.439);\n  --sidebar: oklch(0.205 0 0);\n  --sidebar-foreground: oklch(0.985 0 0);\n  --sidebar-primary: oklch(0.488 0.243 264.376);\n  --sidebar-primary-foreground: oklch(0.985 0 0);\n  --sidebar-accent: oklch(0.269 0 0);\n  --sidebar-accent-foreground: oklch(0.985 0 0);\n  --sidebar-border: oklch(1 0 0 / 10%);\n  --sidebar-ring: oklch(0.556 0 0);\n}\n\n@layer base {\n  * {\n    @apply border-border outline-ring/50;\n  }\n  html {\n    @apply bg-black overflow-x-hidden overflow-y-hidden;\n    font-size: 17px;\n  }\n  body {\n    @apply bg-black text-white overflow-x-hidden;\n  }\n}\n\n/* Style for <think> tokens */\n.thinking-block summary::-webkit-details-marker {\n  display: none;\n}\n.thinking-block summary::after {\n  content: \"▸\";\n  display: inline-block;\n  margin-left: 4px;\n  transform-origin: center;\n  transition: transform 0.15s ease-out;\n}\n.thinking-block[open] summary::after {\n  transform: rotate(90deg);\n}\n.thinking-block summary {\n  outline: none;\n}\n\n.thinking-block div {\n  color: #9ca3af;\n  font-style: italic;\n}\n"
  },
  {
    "path": "src/app/layout.tsx",
    "content": "import type { Metadata } from \"next\";\nimport { Geist, Geist_Mono } from \"next/font/google\";\nimport \"./globals.css\";\n\nconst geistSans = Geist({\n  variable: \"--font-geist-sans\",\n  subsets: [\"latin\"],\n});\n\nconst geistMono = Geist_Mono({\n  variable: \"--font-geist-mono\",\n  subsets: [\"latin\"],\n});\n\nexport const metadata: Metadata = {\n  title: \"Create Next App\",\n  description: \"Generated by create next app\",\n};\n\nexport default function RootLayout({\n  children,\n}: Readonly<{\n  children: React.ReactNode;\n}>) {\n  return (\n    <html lang=\"en\" className=\"bg-black\">\n      <body\n        className={`${geistSans.variable} ${geistMono.variable} antialiased h-screen overflow-hidden flex flex-col`}\n      >\n        {children}\n      </body>\n    </html>\n  );\n}\n"
  },
  {
    "path": "src/app/page.tsx",
    "content": "import { Demo } from \"@/components/demo\";\n\nexport default function Home() {\n  return (\n    <main className=\"flex flex-col flex-1 min-h-0\">\n      <Demo />\n    </main>\n  );\n}\n"
  },
  {
    "path": "src/components/IndexForm.tsx",
    "content": "\"use client\";\nimport { useState } from 'react';\nimport { GlassInput } from '@/components/ui/GlassInput';\nimport { GlassToggle } from '@/components/ui/GlassToggle';\nimport { AccordionGroup } from '@/components/ui/AccordionGroup';\nimport { ModelSelect } from '@/components/ModelSelect';\nimport { chatAPI, ChatSession } from '@/lib/api';\nimport { InfoTooltip } from '@/components/ui/InfoTooltip';\n\ninterface Props {\n  onClose: () => void;\n  onIndexed?: (session: ChatSession) => void;\n}\n\nexport function IndexForm({ onClose, onIndexed }: Props) {\n  const [files, setFiles] = useState<FileList | null>(null);\n  const [indexName, setIndexName] = useState('');\n  const [chunkSize, setChunkSize] = useState(512);\n  const [chunkOverlap, setChunkOverlap] = useState(64);\n  const [windowSize, setWindowSize] = useState(5);\n  const [enableEnrich, setEnableEnrich] = useState(true);\n  const [retrievalMode, setRetrievalMode] = useState<'hybrid' | 'vector' | 'fts'>('hybrid');\n  const [embeddingModel, setEmbeddingModel] = useState<string>();\n  const DEFAULT_LLM = 'qwen3:0.6b';\n  const [enrichModel, setEnrichModel] = useState<string>(DEFAULT_LLM);\n  const [overviewModel, setOverviewModel] = useState<string>(DEFAULT_LLM);\n  const [batchSizeEmbed, setBatchSizeEmbed] = useState(64);\n  const [batchSizeEnrich, setBatchSizeEnrich] = useState(64);\n  const [loading, setLoading] = useState(false);\n  const [enableLateChunk, setEnableLateChunk] = useState(false);\n  const [enableDoclingChunk, setEnableDoclingChunk] = useState(true);\n\n  const handleSubmit = async () => {\n    if (!files) return;\n    setLoading(true);\n    try {\n      // 1. create index record\n      const { index_id } = await chatAPI.createIndex(indexName);\n\n      // 2. upload files to index\n      await chatAPI.uploadFilesToIndex(index_id, Array.from(files));\n\n      // 3. build index (run pipeline) with ALL OPTIONS\n      await chatAPI.buildIndex(index_id, { \n        latechunk: enableLateChunk, \n        doclingChunk: enableDoclingChunk,\n        chunkSize: chunkSize,\n        chunkOverlap: chunkOverlap,\n        retrievalMode: retrievalMode==='fts' ? 'bm25' : retrievalMode,\n        windowSize: windowSize,\n        enableEnrich: enableEnrich,\n        embeddingModel: embeddingModel,\n        enrichModel: enrichModel,\n        overviewModel: overviewModel,\n        batchSizeEmbed: batchSizeEmbed,\n        batchSizeEnrich: batchSizeEnrich\n      });\n\n      // 4. create chat session and link index\n      const session = await chatAPI.createSession(indexName);\n      await chatAPI.linkIndexToSession(session.id, index_id);\n\n      // 5. callback\n      if (onIndexed) onIndexed(session);\n    } catch (e) {\n      console.error('Indexing failed', e);\n      setLoading(false);\n      alert('Indexing failed. See console for details.');\n    }\n  };\n\n  return (\n    <div className=\"relative bg-white/5 backdrop-blur rounded-xl p-6 w-[640px] text-white space-y-6\">\n      {/* Loading overlay */}\n      {loading && (\n        <div className=\"absolute inset-0 bg-black/60 backdrop-blur-sm flex flex-col items-center justify-center rounded-xl z-20\">\n          <div className=\"w-10 h-10 border-4 border-white/30 border-t-transparent rounded-full animate-spin\"></div>\n          <p className=\"mt-4 text-sm text-gray-200\">Indexing… this may take a moment</p>\n        </div>\n      )}\n\n      <h2 className=\"text-lg font-semibold\">Create new index</h2>\n\n      {/* Index name */}\n      <div>\n        <label className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Index name</label>\n        <GlassInput placeholder=\"My project docs\" value={indexName} onChange={(e)=>setIndexName(e.target.value)} />\n      </div>\n\n      {/* Upload & defaults */}\n      <div className=\"space-y-4\">\n        <div>\n          <label className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">PDF files</label>\n          <label\n            htmlFor=\"file-upload\"\n            className=\"flex flex-col items-center justify-center w-full h-32 border border-dashed border-white/20 rounded cursor-pointer hover:border-white/40 transition\"\n            onDragOver={(e)=>e.preventDefault()}\n            onDrop={(e)=>{e.preventDefault(); if(e.dataTransfer.files) setFiles(e.dataTransfer.files)}}\n          >\n            <svg width=\"32\" height=\"32\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" strokeWidth=\"1.5\" strokeLinecap=\"round\" strokeLinejoin=\"round\" className=\"mb-2 text-white/80\"><path d=\"M4 16v2a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2v-2\"/><polyline points=\"7 10 12 5 17 10\"/><line x1=\"12\" y1=\"5\" x2=\"12\" y2=\"16\"/></svg>\n            <span className=\"text-xs text-gray-400\">Drag & Drop documents here or click to browse</span>\n            <input id=\"file-upload\" type=\"file\" accept=\"application/pdf,.docx,.doc,.html,.htm,.md,.txt\" multiple className=\"hidden\" onChange={(e)=>setFiles(e.target.files)} />\n          </label>\n          {files && <p className=\"mt-1 text-xs text-green-400\">{files.length} file(s) selected</p>}\n        </div>\n\n        {/* Retrieval mode & Late-chunk toggle */}\n        <div>\n          <label className=\"flex items-center gap-1 text-xs uppercase tracking-wide text-gray-300 mb-1\">Retrieval mode <InfoTooltip text=\"Choose how chunks are found. Hybrid combines full-text search with vectors; FTS uses textual matching only; Vector relies purely on dense similarity.\" /></label>\n          <div className=\"flex gap-3\">\n            {(['hybrid','vector','fts'] as const).map((m)=>(\n              <button key={m} onClick={()=>setRetrievalMode(m)} className={`px-3 py-1 rounded text-xs font-sans ${retrievalMode===m?'bg-white/20':'bg-white/10 hover:bg-white/20'}`}>{m==='fts' ? 'FTS' : m}</button>\n            ))}\n          </div>\n          <div className=\"grid grid-cols-2 gap-4 mt-3\">\n            <div className=\"flex items-center gap-2\">\n              <span className=\"text-xs text-gray-400\">Late-chunk vectors <InfoTooltip text=\"Split chunks into sub-vectors to improve recall, then merge them back after retrieval.\" size={12} /></span>\n              <GlassToggle checked={enableLateChunk} onChange={setEnableLateChunk} />\n            </div>\n            <div className=\"flex items-center gap-2\">\n              <span className=\"text-xs text-gray-400\">High-recall chunking <InfoTooltip text=\"Advanced sentence-level packing with Docling features for maximum recall. Both modes use token-based sizing.\" size={12} /></span>\n              <GlassToggle checked={enableDoclingChunk} onChange={setEnableDoclingChunk} />\n            </div>\n          </div>\n          <div className=\"grid grid-cols-2 gap-4 mt-4\">\n            <div>\n              <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Chunk size <InfoTooltip text=\"Maximum token length for each chunk. Both legacy and high-recall modes now use token-based sizing.\" size={12} /></label>\n              <GlassInput type=\"number\" value={chunkSize} onChange={(e) => setChunkSize(parseInt(e.target.value))} />\n            </div>\n            <div>\n              <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Chunk overlap <InfoTooltip text=\"Tokens reused between adjacent chunks to preserve context.\" size={12} /></label>\n              <GlassInput\n                type=\"number\"\n                value={chunkOverlap}\n                onChange={(e) => setChunkOverlap(parseInt(e.target.value))}\n              />\n            </div>\n          </div>\n\n          {/* Embedding & Overview models */}\n          <div className=\"grid grid-cols-2 gap-4 mt-4\">\n            <div>\n              <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Embedding model <InfoTooltip text=\"Model used to generate dense vectors stored in the index.\" size={12} /></label>\n              <ModelSelect \n                value={embeddingModel} \n                onChange={setEmbeddingModel}\n                type=\"embedding\"\n                placeholder=\"Select embedding model\"\n              />\n            </div>\n            <div>\n              <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Overview LLM <InfoTooltip text=\"LLM that writes the short overview paragraph per document.\" size={12} /></label>\n              <ModelSelect \n                value={overviewModel}\n                onChange={setOverviewModel}\n                type=\"generation\"\n                placeholder=\"Select overview LLM\"\n              />\n            </div>\n          </div>\n        </div>\n\n        {/* Contextual retrieval section */}\n        <AccordionGroup title={<><span>Contextual Retrieval</span> <InfoTooltip text=\"Adds neighbour chunks into each original chunk then enriches with LLM – improves semantic continuity but increases indexing latency.\" /></>}>\n          <div className=\"flex items-center gap-3\">\n            <span className=\"text-xs text-gray-400\">Enable</span>\n            <GlassToggle checked={enableEnrich} onChange={setEnableEnrich} />\n          </div>\n          <div className=\"grid grid-cols-2 gap-4 mt-3\">\n            <div>\n              <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Context window <InfoTooltip text=\"Number of neighbour chunks included when enriching context.\" size={12} /></label>\n              <GlassInput type=\"number\" value={windowSize} onChange={(e)=>setWindowSize(parseInt(e.target.value))} />\n            </div>\n            <div>\n              <label className=\"block text-xs mb-1 text-gray-400\">Retrieval LLM</label>\n              <ModelSelect \n                value={enrichModel}\n                onChange={setEnrichModel}\n                type=\"generation\"\n                placeholder=\"Select retrieval LLM\"\n              />\n            </div>\n          </div>\n        </AccordionGroup>\n      </div>\n\n      {/* Advanced */}\n      <AccordionGroup title={<><span>Batch Size</span> <InfoTooltip text=\"Control the number of chunks processed per batch. Larger values speed up indexing but require more memory.\" /></>}>\n        <div className=\"grid grid-cols-2 gap-4\">\n          <div>\n            <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Embedding batch size <InfoTooltip text=\"Chunks processed per batch when producing embeddings.\" size={12} /></label>\n            <GlassInput\n              type=\"number\"\n              value={batchSizeEmbed}\n              onChange={(e) => setBatchSizeEmbed(parseInt(e.target.value))}\n            />\n          </div>\n          <div>\n            <label className=\"flex items-center gap-1 text-xs mb-1 text-gray-400\">Context retrieval batch size <InfoTooltip text=\"Chunks sent per request during contextual enrichment.\" size={12} /></label>\n            <GlassInput\n              type=\"number\"\n              value={batchSizeEnrich}\n              onChange={(e) => setBatchSizeEnrich(parseInt(e.target.value))}\n            />\n          </div>\n        </div>\n      </AccordionGroup>\n\n      <div className=\"flex justify-end gap-3 pt-4 border-t border-white/10\">\n        <button onClick={onClose} className=\"px-4 py-2 bg-gray-700 rounded hover:bg-gray-600 text-sm\">\n          Cancel\n        </button>\n        <button\n          disabled={loading || !files || !indexName.trim()}\n          onClick={handleSubmit}\n          className=\"px-4 py-2 bg-green-600 rounded disabled:opacity-40 text-sm\"\n        >\n          {loading ? 'Indexing…' : 'Start indexing'}\n        </button>\n      </div>\n    </div>\n  );\n}                        "
  },
  {
    "path": "src/components/IndexPicker.tsx",
    "content": "import { useEffect, useState } from 'react';\nimport { chatAPI } from '@/lib/api';\n\ninterface Props {\n  onSelect: (indexId: string) => void;\n  onClose: () => void;\n}\n\nexport default function IndexPicker({ onSelect, onClose }: Props) {\n  const [indexes, setIndexes] = useState<any[]>([]);\n  const [loading, setLoading] = useState(true);\n  const [error, setError] = useState<string | null>(null);\n  const [search, setSearch] = useState('');\n\n  const [menuOpenId, setMenuOpenId] = useState<string | null>(null);\n\n  useEffect(() => {\n    (async () => {\n      try {\n        const data = await chatAPI.listIndexes();\n        setIndexes(data.indexes);\n      } catch (e: any) {\n        setError(e.message || 'Failed to load indexes');\n      } finally {\n        setLoading(false);\n      }\n    })();\n  }, []);\n\n  const filtered = indexes.filter(i => i.name.toLowerCase().includes(search.toLowerCase()));\n\n  async function handleDelete(idxId: string, name: string) {\n    if (!confirm(`Delete index \"${name}\"? This cannot be undone.`)) return;\n    try {\n      await chatAPI.deleteIndex(idxId);\n      setIndexes(prev => prev.filter(i => i.id!==idxId));\n      setMenuOpenId(null);\n    } catch (e:any){\n      alert(e.message || 'Failed to delete index');\n    }\n  }\n\n  useEffect(() => {\n    function handleOutside(e: MouseEvent) {\n      if ((e.target as Element).closest('.index-row-menu') === null) {\n        setMenuOpenId(null);\n      }\n    }\n    if (menuOpenId) {\n      document.addEventListener('click', handleOutside);\n    }\n    return () => document.removeEventListener('click', handleOutside);\n  }, [menuOpenId]);\n\n  return (\n    <div className=\"fixed inset-0 bg-black/60 backdrop-blur-sm flex items-center justify-center z-50 p-4\">\n      <div className=\"bg-white/5 backdrop-blur rounded-xl w-full max-w-xl max-h-full overflow-y-auto p-6 text-white space-y-6\">\n        <h2 className=\"text-lg font-semibold\">Select an index</h2>\n        <input value={search} onChange={e=>setSearch(e.target.value)} placeholder=\"Search…\" className=\"w-full px-3 py-2 rounded bg-black/30 border border-white/20 focus:outline-none\" />\n        {loading && <p className=\"text-sm text-gray-300\">Loading…</p>}\n        {error && <p className=\"text-sm text-red-400\">{error}</p>}\n        {!loading && !error && (\n          <ul className=\"space-y-2\">\n            {filtered.map(idx => (\n              <li key={idx.id}>\n                <div className=\"relative group\">\n                  <button onClick={()=>onSelect(idx.id)} className=\"w-full px-4 py-3 bg-white/10 hover:bg-white/20 rounded transition flex justify-between items-center pr-10\">\n                    <span className=\"font-medium truncate max-w-[60%]\">{idx.name}</span>\n                    <span className=\"text-xs text-gray-400\">{idx.documents?.length || 0} files</span>\n                  </button>\n\n                  <button onClick={(e)=>{e.stopPropagation(); setMenuOpenId(menuOpenId===idx.id?null:idx.id);}} title=\"More actions\" className=\"absolute right-4 top-1/2 -translate-y-1/2 opacity-0 group-hover:opacity-100 text-gray-400 hover:text-white transition text-lg leading-none font-bold\">\n                    …\n                  </button>\n\n                  {menuOpenId===idx.id && (\n                    <div className=\"index-row-menu absolute right-0 top-full mt-1 bg-black/80 backdrop-blur border border-white/10 rounded shadow-lg py-1 w-32 text-sm z-50\">\n                      <button onClick={()=>{onSelect(idx.id); setMenuOpenId(null);}} className=\"block w-full text-left px-4 py-2 hover:bg-white/10\">Open</button>\n                      <button onClick={()=>handleDelete(idx.id, idx.name)} className=\"block w-full text-left px-4 py-2 hover:bg-white/10 text-red-400 hover:text-red-500\">Delete</button>\n                    </div>\n                  )}\n                </div>\n              </li>\n            ))}\n            {filtered.length===0 && <p className=\"text-sm text-gray-400\">No indexes found.</p>}\n          </ul>\n        )}\n        <div className=\"pt-4 border-t border-white/10 flex justify-end\">\n          <button onClick={onClose} className=\"px-4 py-2 bg-gray-700 rounded hover:bg-gray-600 text-sm\">Close</button>\n        </div>\n      </div>\n    </div>\n  );\n} "
  },
  {
    "path": "src/components/IndexWizard.tsx",
    "content": "\"use client\";\nimport { useState } from 'react';\nimport { ModelSelect } from '@/components/ModelSelect';\n\ninterface Props {\n  onClose: () => void;\n}\n\nexport function IndexWizard({ onClose }: Props) {\n  const [files, setFiles] = useState<FileList | null>(null);\n  const [chunkSize, setChunkSize] = useState(512);\n  const [chunkOverlap, setChunkOverlap] = useState(64);\n  const [embeddingModel, setEmbeddingModel] = useState<string>();\n  // TODO: more params\n\n  const handleFile = (e: React.ChangeEvent<HTMLInputElement>) => {\n    setFiles(e.target.files);\n  };\n\n  return (\n    <div className=\"fixed inset-0 bg-black/60 backdrop-blur flex items-center justify-center z-50\">\n      <div className=\"bg-gray-900 w-[600px] max-h-[90vh] overflow-auto rounded-xl p-6 text-white space-y-6\">\n        <h2 className=\"text-lg font-semibold\">Create new index</h2>\n\n        <div className=\"space-y-4\">\n          <div>\n            <label className=\"block text-sm mb-1\">Document files</label>\n            <input type=\"file\" accept=\"application/pdf,.docx,.doc,.html,.htm,.md,.txt\" multiple onChange={handleFile} className=\"text-sm\" />\n          </div>\n\n          <div className=\"grid grid-cols-2 gap-4\">\n            <div>\n              <label className=\"block text-sm mb-1\">Chunk size</label>\n              <input\n                type=\"number\"\n                value={chunkSize}\n                onChange={(e) => setChunkSize(parseInt(e.target.value))}\n                className=\"w-full bg-gray-800 rounded px-2 py-1\"\n              />\n            </div>\n            <div>\n              <label className=\"block text-sm mb-1\">Chunk overlap</label>\n              <input\n                type=\"number\"\n                value={chunkOverlap}\n                onChange={(e) => setChunkOverlap(parseInt(e.target.value))}\n                className=\"w-full bg-gray-800 rounded px-2 py-1\"\n              />\n            </div>\n          </div>\n\n          <div>\n            <label className=\"block text-sm mb-1\">Embedding model</label>\n            <ModelSelect type=\"embedding\" value={embeddingModel} onChange={setEmbeddingModel} />\n          </div>\n        </div>\n\n        <div className=\"flex justify-end gap-3 pt-4 border-t border-white/10\">\n          <button onClick={onClose} className=\"px-4 py-2 bg-gray-700 rounded hover:bg-gray-600 text-sm\">\n            Cancel\n          </button>\n          <button\n            disabled={!files || !embeddingModel}\n            className=\"px-4 py-2 bg-green-600 rounded disabled:opacity-40 text-sm\"\n          >\n            Start indexing\n          </button>\n        </div>\n      </div>\n    </div>\n  );\n}    "
  },
  {
    "path": "src/components/LandingMenu.tsx",
    "content": "\"use client\";\n\nimport React from 'react';\n\ninterface Props {\n  onSelect: (mode: 'INDEX' | 'CHAT_EXISTING' | 'QUICK_CHAT') => void;\n}\n\nexport function LandingMenu({ onSelect }: Props) {\n  const Tile = ({ label, mode, icon }: { label: string; mode: Props[\"onSelect\"] extends (m: infer U)=>void ? U: never; icon: React.ReactNode;}) => (\n    <button\n      onClick={() => onSelect(mode)}\n      className=\"w-56 h-44 rounded-xl bg-white/5 backdrop-blur border border-white/10 hover:border-white/30 text-white flex flex-col items-center justify-center gap-2 transition\"\n    >\n      {icon}\n      <span className=\"text-sm font-medium\">{label}</span>\n    </button>\n  );\n\n  const FileIcon = (\n    <svg width=\"32\" height=\"32\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" strokeWidth=\"1.5\" strokeLinecap=\"round\" strokeLinejoin=\"round\">\n      <path d=\"M14 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V8z\" />\n      <polyline points=\"14 2 14 8 20 8\" />\n    </svg>\n  );\n\n  const DbIcon = (\n    <svg width=\"32\" height=\"32\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" strokeWidth=\"1.5\" strokeLinecap=\"round\" strokeLinejoin=\"round\">\n      <ellipse cx=\"12\" cy=\"5\" rx=\"9\" ry=\"3\" />\n      <path d=\"M3 5v6c0 1.7 4 3 9 3s9-1.3 9-3V5\" />\n      <path d=\"M3 11v6c0 1.7 4 3 9 3s9-1.3 9-3v-6\" />\n    </svg>\n  );\n\n  const ChatIcon = (\n    <svg width=\"32\" height=\"32\" viewBox=\"0 0 24 24\" fill=\"none\" stroke=\"currentColor\" strokeWidth=\"1.5\" strokeLinecap=\"round\" strokeLinejoin=\"round\">\n      <path d=\"M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z\" />\n    </svg>\n  );\n\n  return (\n    <div className=\"flex gap-8\">\n      <Tile label=\"Create new index\" mode={\"INDEX\"} icon={FileIcon} />\n      <Tile label=\"Chat with index\" mode={\"CHAT_EXISTING\"} icon={DbIcon} />\n      <Tile label=\"LLM Chat\" mode={\"QUICK_CHAT\"} icon={ChatIcon} />\n    </div>\n  );\n} "
  },
  {
    "path": "src/components/Markdown.tsx",
    "content": "// eslint-disable-next-line @typescript-eslint/ban-ts-comment\n// @ts-nocheck\n'use client'\n\nimport dynamic from 'next/dynamic'\nimport React, { useMemo } from 'react'\nimport remarkGfm from 'remark-gfm'\n\n// Dynamically import react-markdown to avoid SSR issues\nconst ReactMarkdown: any = dynamic(() => import('react-markdown') as any, { ssr: false })\n\ninterface MarkdownProps {\n  text: string\n  className?: string\n}\n\nexport default function Markdown({ text, className = '' }: MarkdownProps) {\n  const plugins = useMemo(() => [remarkGfm], [])\n  return (\n    <div className={`prose prose-invert max-w-none ${className}`}>\n      {/* @ts-ignore – react-markdown type doesn't recognise remarkPlugins array */}\n    <ReactMarkdown\n        remarkPlugins={plugins}\n        components={{\n          a: ({ node, ...props }) => (\n            <a {...props} target=\"_blank\" rel=\"noopener noreferrer\" />\n          ),\n        }}\n    >\n      {text}\n    </ReactMarkdown>\n    </div>\n  )\n} "
  },
  {
    "path": "src/components/ModelSelect.tsx",
    "content": "import { useEffect, useState } from 'react';\nimport { chatAPI, ModelsResponse } from '@/lib/api';\n\ninterface Props {\n  value: string | undefined;\n  onChange: (v: string) => void;\n  type: 'generation' | 'embedding';\n  className?: string;\n  placeholder?: string;\n}\n\nexport function ModelSelect({ value, onChange, type, className, placeholder }: Props) {\n  const [models, setModels] = useState<string[]>([]);\n  const [loading, setLoading] = useState(true);\n  const [error, setError] = useState<string | null>(null);\n\n  useEffect(() => {\n    let mounted = true;\n    chatAPI\n      .getModels()\n      .then((res: ModelsResponse) => {\n        if (!mounted) return;\n        const list = type === 'generation' ? res.generation_models : res.embedding_models;\n        setModels(list);\n        // Auto-select default qwen3:0.6b if available and not chosen yet\n        if(!value && list.includes('qwen3:0.6b')){\n          onChange('qwen3:0.6b');\n        }\n        setLoading(false);\n      })\n      .catch((e) => {\n        if (!mounted) return;\n        setError(String(e));\n        setLoading(false);\n      });\n    return () => {\n      mounted = false;\n    };\n  }, [type]);\n\n  if (loading) {\n    return (\n      <select className={className} disabled>\n        <option>Loading…</option>\n      </select>\n    );\n  }\n  if (error || models.length === 0) {\n    return (\n      <select className={className} disabled>\n        <option>No models</option>\n      </select>\n    );\n  }\n\n  return (\n    <select\n      className={`w-full px-3 py-2 bg-gray-700 border border-gray-600 rounded-lg text-white text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent ${className || ''}`}\n      value={value || ''}\n      onChange={(e) => onChange(e.target.value)}\n    >\n      <option value=\"\" disabled>\n        {placeholder || `Select ${type === 'generation' ? 'LLM' : 'embed model'}`}\n      </option>\n      {models.map((m) => (\n        <option key={m} value={m}>\n          {m}\n        </option>\n      ))}\n    </select>\n  );\n} "
  },
  {
    "path": "src/components/SessionIndexInfo.tsx",
    "content": "import { useEffect, useState } from 'react';\nimport { chatAPI, ChatSession } from '@/lib/api';\n\ninterface Props {\n  sessionId: string;\n  onClose: () => void;\n}\n\nexport default function SessionIndexInfo({ sessionId, onClose }: Props) {\n  const [files, setFiles] = useState<string[]>([]);\n  const [indexMeta, setIndexMeta] = useState<any | null>(null);\n  const [session, setSession] = useState<ChatSession | null>(null);\n  const [loading, setLoading] = useState(true);\n  const [error, setError] = useState<string | null>(null);\n\n  useEffect(() => {\n    (async () => {\n      try {\n        const data = await chatAPI.getSessionIndexes(sessionId);\n        const first = data.indexes[0];\n        if(first){\n          setSession(first.session??{...first, title:first.name, model_used:first.model_used||''});\n          setFiles(first.documents?.map((d:any)=>d.filename) || []);\n          setIndexMeta(first.metadata || {});\n        } else {\n          setError('No indexes linked to this chat');\n        }\n      } catch (e:any){ setError(e.message||'Failed to load'); }\n      finally{ setLoading(false);}\n    })();\n  }, [sessionId]);\n\n  const hasMetadata = indexMeta && Object.keys(indexMeta).length > 0;\n  const isInferredMetadata = indexMeta?.metadata_source === 'lancedb_inspection';\n  const indexStatus = indexMeta?.status;\n\n  const getStatusMessage = () => {\n    if (!hasMetadata) {\n      return {\n        type: 'warning',\n        title: '⚠️ No Configuration Data',\n        message: 'This index was created before metadata tracking was implemented. Configuration details are not available.'\n      };\n    }\n    \n    if (indexStatus === 'incomplete') {\n      return {\n        type: 'error',\n        title: '❌ Index Incomplete',\n        message: indexMeta.issue || 'The index appears to be incomplete or was never properly built.'\n      };\n    }\n    \n    if (indexStatus === 'empty') {\n      return {\n        type: 'error',\n        title: '❌ Index Empty',\n        message: 'The vector table exists but contains no data. The index may need to be rebuilt.'\n      };\n    }\n    \n    if (indexStatus === 'legacy') {\n      return {\n        type: 'warning',\n        title: '⚠️ Legacy Index',\n        message: indexMeta.issue || 'This index was created before metadata tracking was implemented. Configuration details are not available.'\n      };\n    }\n    \n    if (isInferredMetadata) {\n      return {\n        type: 'info',\n        title: '🔍 Metadata Inferred',\n        message: 'This metadata was inferred from the vector database structure. Some configuration details may be incomplete.'\n      };\n    }\n    \n    if (indexStatus === 'functional') {\n      // Check if we have complete configuration metadata\n      const hasCompleteConfig = indexMeta.chunk_size && \n                               indexMeta.chunk_overlap !== undefined &&\n                               indexMeta.retrieval_mode &&\n                               indexMeta.embedding_model;\n      \n      // Only show limited message if we truly have limited data\n      if (indexMeta.inspection_limitation && !hasCompleteConfig) {\n        return {\n          type: 'info',\n          title: '🔍 Limited Configuration Data',\n          message: 'This index is functional but detailed configuration inspection requires direct RAG system access. Basic information is shown below.'\n        };\n      }\n      \n      // Don't show any status message for functional indexes with complete metadata\n      return null;\n    }\n    \n    return null;\n  };\n\n  const statusMessage = getStatusMessage();\n\n  return (\n    <div className=\"fixed inset-0 flex items-center justify-center bg-black/60 backdrop-blur-sm z-50 p-4\">\n      <div className=\"relative bg-white/5 backdrop-blur rounded-xl p-8 w-full max-w-2xl text-white space-y-6 overflow-y-auto max-h-full\">\n        <h2 className=\"text-lg font-semibold\">Index details</h2>\n\n        {loading && <p className=\"text-sm text-gray-300\">Loading…</p>}\n        {error && <p className=\"text-sm text-red-400\">{error}</p>}\n\n        {(!loading && !error) && (\n          <>\n            <div>\n              <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Name</span>\n              <p className=\"text-sm\">{session?.title}</p>\n            </div>\n\n            {statusMessage && (\n              <div className={`rounded-lg p-4 ${\n                statusMessage.type === 'error' ? 'bg-red-900/20 border border-red-600/30' :\n                statusMessage.type === 'warning' ? 'bg-yellow-900/20 border border-yellow-600/30' :\n                'bg-blue-900/20 border border-blue-600/30'\n              }`}>\n                <p className={`text-sm font-medium mb-1 ${\n                  statusMessage.type === 'error' ? 'text-red-200' :\n                  statusMessage.type === 'warning' ? 'text-yellow-200' :\n                  'text-blue-200'\n                }`}>\n                  {statusMessage.title}\n                </p>\n                <p className={`text-sm ${\n                  statusMessage.type === 'error' ? 'text-red-300' :\n                  statusMessage.type === 'warning' ? 'text-yellow-300' :\n                  'text-blue-300'\n                }`}>\n                  {statusMessage.message}\n                </p>\n              </div>\n            )}\n\n            {hasMetadata && (indexStatus === 'functional' || indexStatus === 'created' || !indexStatus) && (\n              <>\n                {/* Basic Information */}\n                <div className=\"grid grid-cols-2 gap-4\">\n                  {(indexMeta.embedding_model || indexMeta.embedding_model_inferred) && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Embedding model</span>\n                      <p className=\"text-sm break-words\">\n                        {indexMeta.embedding_model || indexMeta.embedding_model_inferred}\n                        {indexMeta.embedding_model_inferred && <span className=\"text-gray-400\"> (inferred)</span>}\n                      </p>\n                    </div>\n                  )}\n                  {(indexMeta.retrieval_mode || indexMeta.retrieval_mode_inferred) && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Retrieval mode</span>\n                      <p className=\"text-sm capitalize\">\n                        {indexMeta.retrieval_mode || indexMeta.retrieval_mode_inferred}\n                        {indexMeta.retrieval_mode_inferred && <span className=\"text-gray-400\"> (inferred)</span>}\n                      </p>\n                    </div>\n                  )}\n                  {indexMeta.vector_dimensions && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Vector dimensions</span>\n                      <p className=\"text-sm\">{indexMeta.vector_dimensions}</p>\n                    </div>\n                  )}\n                  {indexMeta.total_chunks && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Total chunks</span>\n                      <p className=\"text-sm\">{indexMeta.total_chunks.toLocaleString()}</p>\n                    </div>\n                  )}\n                </div>\n\n                {/* Chunk Configuration */}\n                <div className=\"grid grid-cols-2 gap-4\">\n                  {(typeof indexMeta.chunk_size==='number' || indexMeta.chunk_size_inferred) && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Chunk size</span>\n                      <p className=\"text-sm\">\n                        {typeof indexMeta.chunk_size==='number' ? `${indexMeta.chunk_size} tokens` : indexMeta.chunk_size_inferred}\n                        {indexMeta.chunk_size_inferred && <span className=\"text-gray-400\"> (estimated)</span>}\n                      </p>\n                    </div>\n                  )}\n                  {typeof indexMeta.chunk_overlap==='number' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Chunk overlap</span>\n                      <p className=\"text-sm\">{indexMeta.chunk_overlap} tokens</p>\n                    </div>\n                  )}\n                </div>\n\n                {/* Context and Features */}\n                <div className=\"grid grid-cols-2 gap-4\">\n                  {typeof indexMeta.window_size==='number' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Context window</span>\n                      <p className=\"text-sm\">{indexMeta.window_size}</p>\n                    </div>\n                  )}\n                  {typeof indexMeta.enable_enrich==='boolean' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Contextual enrichment</span>\n                      <p className=\"text-sm\">{indexMeta.enable_enrich ? '✅ Enabled' : '❌ Disabled'}</p>\n                    </div>\n                  )}\n                  {indexMeta.has_contextual_enrichment && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Contextual enrichment</span>\n                      <p className=\"text-sm\">🔍 Detected</p>\n                    </div>\n                  )}\n                </div>\n\n                {/* Advanced features */}\n                <div className=\"grid grid-cols-2 gap-4\">\n                  {typeof indexMeta.latechunk==='boolean' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Late-chunk vectors</span>\n                      <p className=\"text-sm\">{indexMeta.latechunk ? '✅ Enabled' : '❌ Disabled'}</p>\n                    </div>\n                  )}\n                  {typeof indexMeta.docling_chunk==='boolean' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">High-recall chunking</span>\n                      <p className=\"text-sm\">{indexMeta.docling_chunk ? '✅ Enabled' : '❌ Disabled'}</p>\n                    </div>\n                  )}\n                  {indexMeta.has_fts_index && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Full-text search</span>\n                      <p className=\"text-sm\">🔍 Available</p>\n                    </div>\n                  )}\n                  {indexMeta.has_document_structure && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Document structure</span>\n                      <p className=\"text-sm\">🔍 Organized</p>\n                    </div>\n                  )}\n                </div>\n\n                {/* LLM Models section */}\n                {(indexMeta.enrich_model || indexMeta.overview_model) && (\n                  <>\n                    <div className=\"border-t border-white/10 pt-4\">\n                      <h3 className=\"text-sm font-medium text-gray-300 mb-3\">LLM Models</h3>\n                      <div className=\"grid grid-cols-2 gap-4\">\n                        {indexMeta.enrich_model && (\n                          <div>\n                            <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Enrichment LLM</span>\n                            <p className=\"text-sm break-words\">{indexMeta.enrich_model}</p>\n                          </div>\n                        )}\n                        {indexMeta.overview_model && (\n                          <div>\n                            <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Overview LLM</span>\n                            <p className=\"text-sm break-words\">{indexMeta.overview_model}</p>\n                          </div>\n                        )}\n                      </div>\n                    </div>\n                  </>\n                )}\n\n                {/* Batch sizes section */}\n                {(typeof indexMeta.batch_size_embed==='number' || typeof indexMeta.batch_size_enrich==='number') && (\n                  <>\n                    <div className=\"border-t border-white/10 pt-4\">\n                      <h3 className=\"text-sm font-medium text-gray-300 mb-3\">Batch Configuration</h3>\n                      <div className=\"grid grid-cols-2 gap-4\">\n                        {typeof indexMeta.batch_size_embed==='number' && (\n                          <div>\n                            <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Embedding batch size</span>\n                            <p className=\"text-sm\">{indexMeta.batch_size_embed}</p>\n                          </div>\n                        )}\n                        {typeof indexMeta.batch_size_enrich==='number' && (\n                          <div>\n                            <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Enrichment batch size</span>\n                            <p className=\"text-sm\">{indexMeta.batch_size_enrich}</p>\n                          </div>\n                        )}\n                      </div>\n                    </div>\n                  </>\n                )}\n\n                {/* Metadata info */}\n                {isInferredMetadata && indexMeta.metadata_inferred_at && (\n                  <div className=\"border-t border-white/10 pt-4\">\n                    <h3 className=\"text-sm font-medium text-gray-300 mb-3\">Metadata Information</h3>\n                    <div className=\"text-xs text-gray-400 space-y-1\">\n                      <p>Inferred at: {new Date(indexMeta.metadata_inferred_at).toLocaleString()}</p>\n                      <p>Source: LanceDB table inspection</p>\n                      {indexMeta.sample_chunk_length && (\n                        <p>Sample chunk length: {indexMeta.sample_chunk_length} characters</p>\n                      )}\n                    </div>\n                  </div>\n                )}\n              </>\n            )}\n\n            {/* Legacy index information */}\n            {hasMetadata && indexStatus === 'legacy' && (\n              <>\n                <div className=\"grid grid-cols-2 gap-4\">\n                  {typeof indexMeta.documents_count === 'number' && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Documents</span>\n                      <p className=\"text-sm\">{indexMeta.documents_count}</p>\n                    </div>\n                  )}\n                  {indexMeta.created_at && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Created</span>\n                      <p className=\"text-sm\">{new Date(indexMeta.created_at).toLocaleDateString()}</p>\n                    </div>\n                  )}\n                  {indexMeta.vector_table_name && (\n                    <div>\n                      <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Vector table</span>\n                      <p className=\"text-sm text-gray-400 text-xs break-all\">{indexMeta.vector_table_name}</p>\n                    </div>\n                  )}\n                </div>\n                \n                {indexMeta.note && (\n                  <div className=\"border-t border-white/10 pt-4\">\n                    <h3 className=\"text-sm font-medium text-gray-300 mb-3\">Technical Note</h3>\n                    <p className=\"text-xs text-gray-400\">{indexMeta.note}</p>\n                  </div>\n                )}\n              </>\n            )}\n\n            {/* Debug info for incomplete indexes */}\n            {indexStatus === 'incomplete' && indexMeta.available_tables && (\n              <div className=\"border-t border-white/10 pt-4\">\n                <h3 className=\"text-sm font-medium text-gray-300 mb-3\">Debug Information</h3>\n                <div className=\"text-xs text-gray-400 space-y-1\">\n                  <p>Expected table: {indexMeta.vector_table_expected}</p>\n                  <p>Available tables: {indexMeta.available_tables.join(', ') || 'None'}</p>\n                </div>\n              </div>\n            )}\n\n            <div className=\"border-t border-white/10 pt-4\">\n              <span className=\"block text-xs uppercase tracking-wide text-gray-300 mb-1\">Files ({files.length})</span>\n              <ul className=\"list-disc list-inside space-y-1 text-sm max-h-32 overflow-y-auto\">\n                {files.map((f) => (\n                  <li key={f}>{f}</li>\n                ))}\n              </ul>\n            </div>\n          </>\n        )}\n\n        <div className=\"flex justify-end pt-4 border-t border-white/10\">\n          <button onClick={onClose} className=\"px-4 py-2 bg-gray-700 rounded hover:bg-gray-600 text-sm\">Close</button>\n        </div>\n      </div>\n    </div>\n  );\n} "
  },
  {
    "path": "src/components/demo.tsx",
    "content": "\"use client\";\n\nimport { useState, useEffect } from \"react\"\nimport { LocalGPTChat } from \"@/components/ui/localgpt-chat\"\nimport { SessionSidebar } from \"@/components/ui/session-sidebar\"\nimport { SessionChat } from '@/components/ui/session-chat'\nimport { chatAPI, ChatSession } from \"@/lib/api\"\nimport { LandingMenu } from \"@/components/LandingMenu\";\nimport { IndexForm } from \"@/components/IndexForm\";\nimport SessionIndexInfo from \"@/components/SessionIndexInfo\";\nimport IndexPicker from \"@/components/IndexPicker\";\nimport { QuickChat } from '@/components/ui/quick-chat'\n\nexport function Demo() {\n    const [currentSessionId, setCurrentSessionId] = useState<string | undefined>()\n    const [currentSession, setCurrentSession] = useState<ChatSession | null>(null)\n    const [showConversation, setShowConversation] = useState(false)\n    const [backendStatus, setBackendStatus] = useState<'checking' | 'connected' | 'error'>('checking')\n    const [sidebarRef, setSidebarRef] = useState<{ refreshSessions: () => Promise<void> } | null>(null)\n    const [homeMode, setHomeMode] = useState<'HOME' | 'INDEX' | 'CHAT_EXISTING' | 'QUICK_CHAT'>('HOME')\n    const [showIndexInfo, setShowIndexInfo] = useState(false)\n    const [showIndexPicker, setShowIndexPicker] = useState(false)\n    const [sidebarOpen, setSidebarOpen] = useState(true)\n\n    console.log('Demo component rendering...')\n\n    useEffect(() => {\n        console.log('Demo component mounted')\n        checkBackendHealth()\n    }, [])\n\n    const checkBackendHealth = async () => {\n        try {\n            const health = await chatAPI.checkHealth()\n            setBackendStatus('connected')\n            console.log('Backend connected:', health)\n        } catch (error) {\n            console.error('Backend health check failed:', error)\n            setBackendStatus('error')\n        }\n    }\n\n    const handleSessionSelect = (sessionId: string) => {\n        setCurrentSessionId(sessionId)\n        setShowConversation(true)\n        setHomeMode('CHAT_EXISTING') // Ensure we're in the right mode to show SessionChat\n    }\n\n    const handleNewSession = () => {\n        // Reset state and return to landing page so user can choose chat type\n        setCurrentSessionId(undefined)\n        setCurrentSession(null)\n        setShowConversation(false)  // Hide conversation view & sidebar\n        setHomeMode('HOME')         // Show landing selector (Create index / Chat with index / LLM Chat)\n    }\n\n    const handleSessionChange = async (session: ChatSession) => {\n        setCurrentSession(session)\n\n        // Update the current session ID if it changed (e.g., brand-new session)\n        if (session.id !== currentSessionId) {\n            setCurrentSessionId(session.id)\n        }\n\n        // Always refresh the sidebar so that updated titles / message counts are displayed\n            if (sidebarRef) {\n                await sidebarRef.refreshSessions()\n        }\n    }\n\n    const handleSessionDelete = (deletedSessionId: string) => {\n        if (currentSessionId === deletedSessionId) {\n            // Stay in conversation mode but show empty state\n            setCurrentSessionId(undefined)\n            setCurrentSession(null)\n        }\n    }\n\n    const handleStartConversation = () => {\n        if (backendStatus === 'connected') {\n            // Just show empty state, don't create session yet\n            handleNewSession()\n        } else {\n            setShowConversation(true)\n        }\n    }\n\n    return (\n        <div className=\"flex h-full w-full flex-col bg-black\">\n            {/* Top App Bar */}\n            <header className=\"h-12 relative flex items-center justify-center border-b border-gray-800 flex-shrink-0\">\n                <button onClick={()=>setSidebarOpen(o=>!o)} className=\"absolute left-4 p-1 rounded hover:bg-gray-800 text-gray-200 focus:outline-none\" title=\"Toggle sidebar\">\n                    {sidebarOpen ? <span className=\"text-xl leading-none\">◀</span> : <span className=\"text-xl leading-none\">▶</span>}\n                </button>\n                {homeMode !== 'HOME' && (\n                    <h1 className=\"text-lg font-semibold text-white\">localGPT</h1>\n                )}\n            </header>\n            {/* Main content row */}\n            <div className=\"flex flex-1 flex-row min-h-0\">\n                {/* Session Sidebar */}\n                {sidebarOpen && showConversation && (homeMode === 'CHAT_EXISTING' || homeMode === 'QUICK_CHAT') && (\n                    <SessionSidebar\n                        currentSessionId={currentSessionId}\n                        onSessionSelect={handleSessionSelect}\n                        onNewSession={handleNewSession}\n                        onSessionDelete={handleSessionDelete}\n                        onSessionCreated={setSidebarRef}\n                    />\n                )}\n                \n                <main className=\"flex flex-1 flex-col transition-all duration-200 bg-black min-h-0 overflow-hidden\">\n                    {homeMode === 'HOME' ? (\n                        <div className=\"flex items-center justify-center h-full\">\n                            <div className=\"space-y-8\">\n                                <div className=\"text-center space-y-2\">\n                                    <h1 className=\"text-4xl font-bold text-white\">LocalGPT</h1>\n                                    <p className=\"text-lg text-gray-400\">What can I help you find today?</p>\n                                </div>\n\n                                <LandingMenu onSelect={(m)=>{\n                                    if(m==='CHAT_EXISTING'){ setShowIndexPicker(true); return; }\n                                    if(m==='QUICK_CHAT'){\n                                        setHomeMode('QUICK_CHAT');\n                                        setShowConversation(true);\n                                        return;\n                                    }\n                                    setHomeMode('INDEX');\n                                }} />\n                                <div className=\"flex flex-col items-center gap-3 mt-12\">\n                                    <div className=\"flex items-center gap-2 text-sm\">\n                                        {backendStatus === 'checking' && (\n                                            <div className=\"flex items-center gap-2 text-gray-400\">\n                                                <div className=\"w-2 h-2 bg-yellow-500 rounded-full animate-pulse\"></div>\n                                                Connecting to backend...\n                                            </div>\n                                        )}\n                                        {backendStatus === 'connected' && (\n                                            <div className=\"flex items-center gap-2 text-green-400\">\n                                                <div className=\"w-2 h-2 bg-green-500 rounded-full\"></div>\n                                                Backend connected • Session-based chat ready\n                                            </div>\n                                        )}\n                                        {backendStatus === 'error' && (\n                                            <div className=\"flex items-center gap-2 text-red-400\">\n                                                <div className=\"w-2 h-2 bg-red-500 rounded-full\"></div>\n                                                Backend offline • Start backend server to enable chat\n                                            </div>\n                                        )}\n                                    </div>\n                                </div>\n                            </div>\n                        </div>\n                    ) : homeMode==='CHAT_EXISTING' ? (\n                        <SessionChat\n                            sessionId={currentSessionId}\n                            onSessionChange={handleSessionChange}\n                            className=\"flex-1\"\n                        />\n                    ) : homeMode==='QUICK_CHAT' ? (\n                        <QuickChat sessionId={currentSessionId} onSessionChange={handleSessionChange} className=\"flex-1\" />\n                    ) : null}\n                </main>\n\n                {homeMode==='INDEX' && (\n                  <div className=\"fixed inset-0 flex items-center justify-center bg-black/50 backdrop-blur-sm z-50\">\n                    <IndexForm onClose={()=>setHomeMode('HOME')} onIndexed={(s)=>{setHomeMode('CHAT_EXISTING'); handleSessionSelect(s.id);}} />\n                  </div>\n                )}\n\n                {showIndexInfo && currentSessionId && (\n                  <SessionIndexInfo sessionId={currentSessionId} onClose={()=>setShowIndexInfo(false)} />\n                )}\n\n                {showIndexPicker && (\n                  <IndexPicker onClose={()=>setShowIndexPicker(false)} onSelect={async (idxId)=>{\n                    // create session and link index then open chat\n                    const session = await chatAPI.createSession()\n                    await chatAPI.linkIndexToSession(session.id, idxId)\n                    setShowIndexPicker(false)\n                    setHomeMode('CHAT_EXISTING')\n                    handleSessionSelect(session.id)\n                  }} />\n                )}\n            </div>\n        </div>\n    );\n} "
  },
  {
    "path": "src/components/ui/AccordionGroup.tsx",
    "content": "\"use client\";\nimport React from 'react';\n\ninterface Props {\n  title: React.ReactNode;\n  children: React.ReactNode;\n  defaultOpen?: boolean;\n}\n\nexport function AccordionGroup({ title, children, defaultOpen }: Props) {\n  return (\n    <details open={defaultOpen} className=\"border-t border-white/10 py-4 group\">\n      <summary className=\"cursor-pointer select-none list-none text-xs uppercase tracking-wide text-gray-400 mb-3 flex items-center gap-2\">\n        {title}\n        <svg\n          className=\"w-3 h-3 text-gray-400 ml-auto transition-transform group-open:rotate-90\"\n          viewBox=\"0 0 20 20\"\n          fill=\"none\"\n          stroke=\"currentColor\"\n          strokeWidth=\"2\"\n        >\n          <path d=\"M6 6l6 4-6 4V6z\" />\n        </svg>\n      </summary>\n      <div className=\"space-y-4 pl-1\">{children}</div>\n    </details>\n  );\n} "
  },
  {
    "path": "src/components/ui/GlassInput.tsx",
    "content": "\"use client\";\nimport React, { InputHTMLAttributes } from 'react';\n\nexport function GlassInput(props: InputHTMLAttributes<HTMLInputElement>) {\n  return (\n    <input\n      {...props}\n      className={`w-full rounded bg-white/5 hover:bg-white/10 focus:bg-white/10 px-2 py-1 text-sm font-sans text-white outline-none focus:ring-2 focus:ring-white/20 transition ${props.className || ''}`}\n    />\n  );\n} "
  },
  {
    "path": "src/components/ui/GlassSelect.tsx",
    "content": "\"use client\";\nimport React, { SelectHTMLAttributes } from 'react';\n\nexport function GlassSelect(props: SelectHTMLAttributes<HTMLSelectElement>) {\n  return (\n    <select\n      {...props}\n      className={`w-full rounded bg-white/5 hover:bg-white/10 focus:bg-white/10 px-2 py-1 text-sm font-sans text-white outline-none focus:ring-2 focus:ring-white/20 transition ${props.className || ''}`}\n    >\n      {props.children}\n    </select>\n  );\n} "
  },
  {
    "path": "src/components/ui/GlassToggle.tsx",
    "content": "\"use client\";\nimport React from 'react';\n\ninterface Props {\n  checked: boolean;\n  onChange: (v: boolean) => void;\n}\n\nexport function GlassToggle({ checked, onChange }: Props) {\n  return (\n    <button\n      onClick={() => onChange(!checked)}\n      className={`w-10 h-5 rounded-full transition relative ${checked ? 'bg-green-500/70' : 'bg-white/20'} font-sans`}\n    >\n      <span\n        className={`absolute top-0.5 left-0.5 w-4 h-4 rounded-full bg-white transition-transform ${checked ? 'translate-x-5' : ''}`}\n      />\n    </button>\n  );\n} "
  },
  {
    "path": "src/components/ui/InfoTooltip.tsx",
    "content": "import { useState } from \"react\";\nimport { Info } from \"lucide-react\";\n\ninterface Props {\n  text: string;\n  className?: string;\n  size?: number;\n}\n\n// A lightweight hover / focus tooltip used next to form labels.\n// It shows a small Info icon; on hover (or focus) a dark glassy popover appears.\nexport function InfoTooltip({ text, className = \"\", size = 14 }: Props) {\n  const [open, setOpen] = useState(false);\n  return (\n    <span\n      className={`relative inline-block align-middle ${className}`}\n      onMouseEnter={() => setOpen(true)}\n      onMouseLeave={() => setOpen(false)}\n      onFocus={() => setOpen(true)}\n      onBlur={() => setOpen(false)}\n      tabIndex={0}\n    >\n      <Info size={size} className=\"text-gray-400 hover:text-white cursor-pointer\" />\n      {open && (\n        <div className=\"absolute left-1/2 -translate-x-1/2 top-full mt-2 w-56 bg-black/80 backdrop-blur-sm text-gray-200 text-xs px-3 py-2 rounded shadow-lg z-50 normal-case whitespace-normal break-words\">\n          {text}\n        </div>\n      )}\n    </span>\n  );\n} "
  },
  {
    "path": "src/components/ui/avatar.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport * as AvatarPrimitive from \"@radix-ui/react-avatar\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Avatar({\n  className,\n  ...props\n}: React.ComponentProps<typeof AvatarPrimitive.Root>) {\n  return (\n    <AvatarPrimitive.Root\n      data-slot=\"avatar\"\n      className={cn(\n        \"relative flex size-8 shrink-0 overflow-hidden rounded-full\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nfunction AvatarImage({\n  className,\n  ...props\n}: React.ComponentProps<typeof AvatarPrimitive.Image>) {\n  return (\n    <AvatarPrimitive.Image\n      data-slot=\"avatar-image\"\n      className={cn(\"aspect-square size-full\", className)}\n      {...props}\n    />\n  )\n}\n\nfunction AvatarFallback({\n  className,\n  ...props\n}: React.ComponentProps<typeof AvatarPrimitive.Fallback>) {\n  return (\n    <AvatarPrimitive.Fallback\n      data-slot=\"avatar-fallback\"\n      className={cn(\n        \"bg-muted flex size-full items-center justify-center rounded-full text-black\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nexport { Avatar, AvatarImage, AvatarFallback }\n"
  },
  {
    "path": "src/components/ui/badge.tsx",
    "content": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class-variance-authority\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst badgeVariants = cva(\n  \"inline-flex items-center justify-center rounded-md border px-2 py-0.5 text-xs font-medium w-fit whitespace-nowrap shrink-0 [&>svg]:size-3 gap-1 [&>svg]:pointer-events-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive transition-[color,box-shadow] overflow-hidden\",\n  {\n    variants: {\n      variant: {\n        default:\n          \"border-transparent bg-primary text-primary-foreground [a&]:hover:bg-primary/90\",\n        secondary:\n          \"border-transparent bg-secondary text-secondary-foreground [a&]:hover:bg-secondary/90\",\n        destructive:\n          \"border-transparent bg-destructive text-white [a&]:hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:focus-visible:ring-destructive/40 dark:bg-destructive/60\",\n        outline:\n          \"text-foreground [a&]:hover:bg-accent [a&]:hover:text-accent-foreground\",\n      },\n    },\n    defaultVariants: {\n      variant: \"default\",\n    },\n  }\n)\n\nfunction Badge({\n  className,\n  variant,\n  asChild = false,\n  ...props\n}: React.ComponentProps<\"span\"> &\n  VariantProps<typeof badgeVariants> & { asChild?: boolean }) {\n  const Comp = asChild ? Slot : \"span\"\n\n  return (\n    <Comp\n      data-slot=\"badge\"\n      className={cn(badgeVariants({ variant }), className)}\n      {...props}\n    />\n  )\n}\n\nexport { Badge, badgeVariants }\n"
  },
  {
    "path": "src/components/ui/button.tsx",
    "content": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class-variance-authority\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst buttonVariants = cva(\n  \"inline-flex items-center justify-center gap-2 whitespace-nowrap rounded-md text-sm font-medium transition-all disabled:pointer-events-none disabled:opacity-50 [&_svg]:pointer-events-none [&_svg:not([class*='size-'])]:size-4 shrink-0 [&_svg]:shrink-0 outline-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive\",\n  {\n    variants: {\n      variant: {\n        default:\n          \"bg-primary text-primary-foreground shadow-xs hover:bg-primary/90\",\n        destructive:\n          \"bg-destructive text-white shadow-xs hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:focus-visible:ring-destructive/40 dark:bg-destructive/60\",\n        outline:\n          \"border bg-background shadow-xs hover:bg-accent hover:text-accent-foreground dark:bg-input/30 dark:border-input dark:hover:bg-input/50\",\n        secondary:\n          \"bg-secondary text-secondary-foreground shadow-xs hover:bg-secondary/80\",\n        ghost:\n          \"hover:bg-accent hover:text-accent-foreground dark:hover:bg-accent/50\",\n        link: \"text-primary underline-offset-4 hover:underline\",\n      },\n      size: {\n        default: \"h-9 px-4 py-2 has-[>svg]:px-3\",\n        sm: \"h-8 rounded-md gap-1.5 px-3 has-[>svg]:px-2.5\",\n        lg: \"h-10 rounded-md px-6 has-[>svg]:px-4\",\n        icon: \"size-9\",\n      },\n    },\n    defaultVariants: {\n      variant: \"default\",\n      size: \"default\",\n    },\n  }\n)\n\nfunction Button({\n  className,\n  variant,\n  size,\n  asChild = false,\n  ...props\n}: React.ComponentProps<\"button\"> &\n  VariantProps<typeof buttonVariants> & {\n    asChild?: boolean\n  }) {\n  const Comp = asChild ? Slot : \"button\"\n\n  return (\n    <Comp\n      data-slot=\"button\"\n      className={cn(buttonVariants({ variant, size, className }))}\n      {...props}\n    />\n  )\n}\n\nexport { Button, buttonVariants }\n"
  },
  {
    "path": "src/components/ui/chat-bubble-demo.tsx",
    "content": "\"use client\"\n\nimport {\n  ChatBubble,\n  ChatBubbleAvatar,\n  ChatBubbleMessage\n} from \"@/components/ui/chat-bubble\"\nimport { Copy, RefreshCcw } from \"lucide-react\"\n\nconst messages = [\n  {\n    id: 1,\n    message: \"Help me with my essay.\",\n    sender: \"user\",\n  },\n  {\n    id: 2,\n    message: \"I can help you with that. What do you need help with?\",\n    sender: \"bot\",\n  },\n]\n\nconst actionIcons = [\n  { icon: Copy, type: \"Copy\" },\n  { icon: RefreshCcw, type: \"Regenerate\" },\n]\n\nexport function ChatBubbleVariants() {\n  return (\n    <div className=\"max-w-md space-y-4 p-4\">\n      <ChatBubble variant=\"sent\">\n        <ChatBubbleAvatar fallback=\"US\" src=\"https://images.unsplash.com/photo-1534528741775-53994a69daeb?w=64&h=64&q=80&crop=faces&fit=crop\" />\n        <ChatBubbleMessage variant=\"sent\">\n          I have a question about the library.\n        </ChatBubbleMessage>\n      </ChatBubble>\n\n      <ChatBubble variant=\"received\">\n        <ChatBubbleAvatar fallback=\"AI\" src=\"https://images.unsplash.com/photo-1677442136019-21780ecad995?w=64&h=64&q=80&crop=faces&fit=crop\"  />\n        <ChatBubbleMessage>\n          Sure, I&apos;d be happy to help!\n        </ChatBubbleMessage>\n      </ChatBubble>\n    </div>\n  )\n}\n\nexport function ChatBubbleAiLayout() {\n  return (\n    <div className=\"max-w-md divide-y\">\n      {messages.map((message, index) => {\n        const variant = message.sender === \"user\" ? \"sent\" : \"received\"\n        return (\n          <div key={message.id} className=\"py-6 first:pt-0 last:pb-0\">\n            <div className=\"flex gap-3\">\n              <ChatBubbleAvatar \n                src={variant === \"sent\" \n                  ? \"https://images.unsplash.com/photo-1534528741775-53994a69daeb?w=64&h=64&q=80&crop=faces&fit=crop\"\n                  : \"https://images.unsplash.com/photo-1677442136019-21780ecad995?w=64&h=64&q=80&crop=faces&fit=crop\"\n                }\n                fallback={variant === \"sent\" ? \"US\" : \"L\"} \n              />\n              <div className=\"flex-1\">\n                {message.message}\n                {message.sender === \"bot\" && (\n                  <div className=\"flex gap-2 mt-2\">\n                    {actionIcons.map(({ icon: Icon, type }) => (\n                      <button\n                        key={type}\n                        onClick={() => console.log(`Action ${type} clicked for message ${index}`)}\n                        className=\"p-1 hover:bg-muted rounded-md transition-colors\"\n                      >\n                        <Icon className=\"size-3\" />\n                      </button>\n                    ))}\n                  </div>\n                )}\n              </div>\n            </div>\n          </div>\n        )\n      })}\n    </div>\n  )\n}\n\nexport function ChatBubbleStates() {\n  return (\n    <div className=\"max-w-md space-y-4 p-4\">\n      <ChatBubble variant=\"received\">\n        <ChatBubbleAvatar fallback=\"L\" />\n        <ChatBubbleMessage isLoading />\n      </ChatBubble>\n\n      <ChatBubble variant=\"received\">\n        <ChatBubbleAvatar fallback=\"L\" />\n        <ChatBubbleMessage className=\"bg-destructive/10 text-destructive\">\n          Error processing request\n        </ChatBubbleMessage>\n      </ChatBubble>\n    </div>\n  )\n} "
  },
  {
    "path": "src/components/ui/chat-bubble.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport { cn } from \"@/lib/utils\"\nimport { Avatar, AvatarFallback, AvatarImage } from \"@/components/ui/avatar\"\nimport { Button } from \"@/components/ui/button\"\nimport { MessageLoading } from \"@/components/ui/message-loading\";\n\ninterface ChatBubbleProps {\n  variant?: \"sent\" | \"received\"\n  layout?: \"default\" | \"ai\"\n  className?: string\n  children: React.ReactNode\n}\n\nexport function ChatBubble({\n  variant = \"received\",\n  layout = \"default\", // eslint-disable-line @typescript-eslint/no-unused-vars\n  className,\n  children,\n}: ChatBubbleProps) {\n  return (\n    <div\n      className={cn(\n        \"flex items-start gap-2 mb-4\",\n        variant === \"sent\" && \"flex-row-reverse\",\n        className,\n      )}\n    >\n      {children}\n    </div>\n  )\n}\n\ninterface ChatBubbleMessageProps {\n  variant?: \"sent\" | \"received\"\n  isLoading?: boolean\n  className?: string\n  children?: React.ReactNode\n}\n\nexport function ChatBubbleMessage({\n  variant = \"received\",\n  isLoading,\n  className,\n  children,\n}: ChatBubbleMessageProps) {\n  return (\n    <div\n      className={cn(\n        \"rounded-lg p-3\",\n        variant === \"sent\" ? \"bg-primary text-primary-foreground\" : \"bg-muted\",\n        className\n      )}\n    >\n      {isLoading ? (\n        <div className=\"flex items-center space-x-2\">\n          <MessageLoading />\n        </div>\n      ) : (\n        children\n      )}\n    </div>\n  )\n}\n\ninterface ChatBubbleAvatarProps {\n  src?: string\n  fallback?: string\n  className?: string\n}\n\nexport function ChatBubbleAvatar({\n  src,\n  fallback = \"AI\",\n  className,\n}: ChatBubbleAvatarProps) {\n  return (\n    <Avatar className={cn(\"h-8 w-8\", className)}>\n      {src && <AvatarImage src={src} />}\n      <AvatarFallback>{fallback}</AvatarFallback>\n    </Avatar>\n  )\n}\n\ninterface ChatBubbleActionProps {\n  icon?: React.ReactNode\n  onClick?: () => void\n  className?: string\n}\n\nexport function ChatBubbleAction({\n  icon,\n  onClick,\n  className,\n}: ChatBubbleActionProps) {\n  return (\n    <Button\n      variant=\"ghost\"\n      size=\"icon\"\n      className={cn(\"h-6 w-6\", className)}\n      onClick={onClick}\n    >\n      {icon}\n    </Button>\n  )\n}\n\nexport function ChatBubbleActionWrapper({\n  className,\n  children,\n}: {\n  className?: string\n  children: React.ReactNode\n}) {\n  return (\n    <div className={cn(\"flex items-center gap-1 mt-2\", className)}>\n      {children}\n    </div>\n  )\n} "
  },
  {
    "path": "src/components/ui/chat-input.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport { useState, useRef } from \"react\"\nimport { ArrowUp, Settings as SettingsIcon, Plus, X, FileText } from \"lucide-react\"\nimport { Button } from \"@/components/ui/button\"\nimport { AttachedFile } from \"@/lib/types\"\n\ninterface ChatInputProps {\n  onSendMessage: (message: string, attachedFiles?: AttachedFile[]) => Promise<void>\n  disabled?: boolean\n  placeholder?: string\n  className?: string\n  onOpenSettings?: () => void\n  onAddIndex?: () => void\n  leftExtras?: React.ReactNode\n}\n\nexport function ChatInput({ \n  onSendMessage, \n  disabled = false,\n  placeholder = \"Message localGPT...\",\n  className = \"\",\n  onOpenSettings,\n  onAddIndex,\n  leftExtras\n}: ChatInputProps) {\n  const [message, setMessage] = useState(\"\")\n  const [attachedFiles, setAttachedFiles] = useState<AttachedFile[]>([])\n  const [isLoading, setIsLoading] = useState(false)\n  const textareaRef = useRef<HTMLTextAreaElement>(null)\n  const fileInputRef = useRef<HTMLInputElement>(null)\n\n  const handleSubmit = async (e: React.FormEvent) => {\n    e.preventDefault()\n    if ((!message.trim() && attachedFiles.length === 0) || disabled || isLoading) return\n\n    const messageToSend = message.trim()\n    const filesToSend = [...attachedFiles]\n    setMessage(\"\")\n    setAttachedFiles([])\n    setIsLoading(true)\n\n    try {\n      await onSendMessage(messageToSend, filesToSend)\n    } catch (error) {\n      console.error(\"Failed to send message:\", error)\n      // Restore message and files on error\n      setMessage(messageToSend)\n      setAttachedFiles(filesToSend)\n    } finally {\n      setIsLoading(false)\n    }\n  }\n\n  const handleKeyDown = (e: React.KeyboardEvent) => {\n    if (e.key === 'Enter' && !e.shiftKey) {\n      e.preventDefault()\n      handleSubmit(e as unknown as React.FormEvent)\n    }\n  }\n\n  const handleInput = (e: React.ChangeEvent<HTMLTextAreaElement>) => {\n    setMessage(e.target.value)\n    \n    // Auto-resize textarea\n    const textarea = textareaRef.current\n    if (textarea) {\n      textarea.style.height = 'auto'\n      textarea.style.height = Math.min(textarea.scrollHeight, 120) + 'px'\n    }\n  }\n\n  const handleFileAttach = () => {\n    fileInputRef.current?.click()\n  }\n\n  const handleFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {\n    const files = e.target.files\n    if (!files) return\n\n    const newFiles: AttachedFile[] = []\n    for (let i = 0; i < files.length; i++) {\n      const file = files[i]\n      console.log('🔧 Frontend: File selected:', {\n        name: file.name,\n        size: file.size,\n        type: file.type,\n        lastModified: file.lastModified\n      });\n      \n      if (file.type === 'application/pdf' || \n          file.type === 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' ||\n          file.type === 'application/msword' ||\n          file.type === 'text/html' ||\n          file.type === 'text/markdown' ||\n          file.type === 'text/plain' ||\n          file.name.toLowerCase().endsWith('.pdf') ||\n          file.name.toLowerCase().endsWith('.docx') ||\n          file.name.toLowerCase().endsWith('.doc') ||\n          file.name.toLowerCase().endsWith('.html') ||\n          file.name.toLowerCase().endsWith('.htm') ||\n          file.name.toLowerCase().endsWith('.md') ||\n          file.name.toLowerCase().endsWith('.txt')) {\n        newFiles.push({\n          id: crypto.randomUUID(),\n          name: file.name,\n          size: file.size,\n          type: file.type,\n          file: file,\n        })\n      } else {\n        console.log('🔧 Frontend: File rejected - unsupported format:', file.type);\n      }\n    }\n\n    setAttachedFiles(prev => [...prev, ...newFiles])\n    \n    // Reset the input\n    if (fileInputRef.current) {\n      fileInputRef.current.value = ''\n    }\n  }\n\n  const removeFile = (fileId: string) => {\n    setAttachedFiles(prev => prev.filter(f => f.id !== fileId))\n  }\n\n  const formatFileSize = (bytes: number) => {\n    if (bytes === 0) return '0 Bytes'\n    const k = 1024\n    const sizes = ['Bytes', 'KB', 'MB', 'GB']\n    const i = Math.floor(Math.log(bytes) / Math.log(k))\n    return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i]\n  }\n\n  return (\n    <div className={`border-t border-white/10 bg-black/60 backdrop-blur-sm p-4 ${className}`}>\n      <form onSubmit={handleSubmit} className=\"max-w-4xl mx-auto\">\n        {/* Attached Files Display */}\n        {attachedFiles.length > 0 && (\n          <div className=\"mb-3 space-y-2\">\n            <div className=\"text-sm text-gray-400 font-medium\">Attached Files:</div>\n            <div className=\"space-y-2\">\n              {attachedFiles.map((file) => (\n                <div key={file.id} className=\"flex items-center gap-3 bg-gray-800 rounded-lg p-3\">\n                  <FileText className=\"w-5 h-5 text-red-400\" />\n                  <div className=\"flex-1 min-w-0\">\n                    <div className=\"text-sm text-white truncate\">{file.name}</div>\n                    <div className=\"text-xs text-gray-400\">{formatFileSize(file.size)}</div>\n                  </div>\n                  <button\n                    type=\"button\"\n                    onClick={() => removeFile(file.id)}\n                    className=\"p-1 hover:bg-gray-700 rounded transition-colors\"\n                  >\n                    <X className=\"w-4 h-4 text-gray-400 hover:text-white\" />\n                  </button>\n                </div>\n              ))}\n            </div>\n          </div>\n        )}\n\n        <div className=\"bg-white/5 backdrop-blur border border-white/10 rounded-2xl px-5 pt-4 pb-3 space-y-2\">\n          {/* Hidden file input (kept for future use) */}\n          <input ref={fileInputRef} type=\"file\" accept=\".pdf,.docx,.doc,.html,.htm,.md,.txt\" multiple onChange={handleFileChange} className=\"hidden\" />\n\n          {/* Textarea */}\n          <textarea\n            ref={textareaRef}\n            value={message}\n            onChange={handleInput}\n            onKeyDown={handleKeyDown}\n            placeholder={attachedFiles.length > 0 ? \"Ask questions about your attached files...\" : placeholder}\n            disabled={disabled || isLoading}\n            rows={1}\n            className=\"w-full bg-transparent border-none text-white placeholder-gray-400 resize-none overflow-y-hidden focus:outline-none focus:ring-0 disabled:opacity-50 disabled:cursor-not-allowed text-base\"\n            style={{ maxHeight: '120px', minHeight: '44px' }}\n          />\n\n          {/* Action row */}\n          <div className=\"mt-1 flex items-center justify-between\">\n            <div className=\"flex items-center gap-4\">\n              <button\n                type=\"button\"\n                onClick={()=>onOpenSettings && onOpenSettings()}\n                disabled={disabled || isLoading}\n                className=\"flex items-center gap-1 p-2 text-gray-400 hover:text-white hover:bg-gray-800 rounded-full transition-colors disabled:opacity-50 disabled:cursor-not-allowed\"\n                title=\"Chat settings\"\n              >\n                <SettingsIcon className=\"w-5 h-5\" />\n                <span className=\"text-xs hidden sm:inline\">Settings</span>\n              </button>\n              {leftExtras}\n            </div>\n            <Button\n              type=\"submit\"\n              size=\"sm\"\n              disabled={(!message.trim() && attachedFiles.length === 0) || disabled || isLoading}\n              className=\"w-8 h-8 p-0 rounded-full bg-white hover:bg-gray-100 text-black disabled:bg-gray-600 disabled:text-gray-400\"\n            >\n              {isLoading ? (\n                <div className=\"w-4 h-4 border-2 border-gray-400 border-t-transparent rounded-full animate-spin\" />\n              ) : (\n                <ArrowUp className=\"w-4 h-4\" />\n              )}\n            </Button>\n          </div>\n        </div>\n      </form>\n    </div>\n  )\n}    "
  },
  {
    "path": "src/components/ui/chat-settings-modal.tsx",
    "content": "\"use client\";\n\nimport { GlassToggle } from '@/components/ui/GlassToggle';\nimport { InfoTooltip } from '@/components/ui/InfoTooltip';\n\nexport interface ToggleOption {\n  type: 'toggle';\n  label: string;\n  checked: boolean;\n  setter: (v: boolean) => void;\n}\n\nexport interface SliderOption {\n  type: 'slider';\n  label: string;\n  value: number;\n  setter: (v: number) => void;\n  min: number;\n  max: number;\n  step?: number;\n  unit?: string;\n}\n\nexport interface DropdownOption {\n  type: 'dropdown';\n  label: string;\n  value: string;\n  setter: (v: string) => void;\n  options: { value: string; label: string }[];\n}\n\nexport type SettingOption = ToggleOption | SliderOption | DropdownOption;\n\ninterface Props {\n  options: SettingOption[];\n  onClose: () => void;\n}\n\nconst optionHelp: Record<string,string> = {\n  'Query decomposition':'Breaks a complex question into sub-queries to improve recall (adds latency).',\n  'Compose sub-answers':'Merges answers from decomposed sub-queries into a single response.',\n  'Pruning':'Removes sentences deemed irrelevant by a lightweight model before synthesis.',\n  'RAG (no-triage)':'Force retrieval on every query; disables index-selection triage.',\n  'Verify answer':'Runs an extra LLM pass to self-critique the draft answer.',\n  'Streaming':'Send tokens to the UI as they are generated.',\n  'AI reranker':'Re-orders retrieved chunks with a cross-encoder (higher quality, more latency).',\n  'Expand context window':'Adds neighbour chunks around each top chunk to provide more context.',\n  'Context window size':'How many neighbour chunks to include on each side.',\n  'Retrieval chunks':'Number of chunks fetched before reranking.',\n  'LLM':'Select which model generates the final answer.',\n  'Search type':'Choose retrieval strategy (Hybrid recommended).',\n  'Reranker top chunks':'Limit how many chunks are re-ranked to speed up processing.'\n};\n\nexport function ChatSettingsModal({ options, onClose }: Props) {\n  const renderOption = (opt: SettingOption) => {\n    switch (opt.type) {\n      case 'toggle':\n        return (\n          <div key={opt.label} className=\"flex items-center justify-between\">\n            <span className=\"text-sm text-gray-300 flex items-center gap-1 whitespace-nowrap\">\n              {displayName(opt.label)}\n              {optionHelp[displayName(opt.label)] && <InfoTooltip text={optionHelp[displayName(opt.label)]} size={12} />}\n            </span>\n            <GlassToggle checked={opt.checked} onChange={opt.setter} />\n          </div>\n        );\n      \n      case 'slider':\n        return (\n          <div key={opt.label} className=\"space-y-2\">\n            <div className=\"flex items-center justify-between\">\n              <span className=\"text-sm text-gray-300 flex items-center gap-1\">{displayName(opt.label)}{optionHelp[displayName(opt.label)] && <InfoTooltip text={optionHelp[displayName(opt.label)]} size={12} />}</span>\n              <span className=\"text-sm text-gray-400\">\n                {opt.value}{opt.unit || ''}\n              </span>\n            </div>\n            <input\n              type=\"range\"\n              min={opt.min}\n              max={opt.max}\n              step={opt.step || 1}\n              value={opt.value}\n              onChange={(e) => opt.setter(Number(e.target.value))}\n              className=\"w-full h-2 bg-gray-700 rounded-lg appearance-none cursor-pointer slider\"\n              style={{\n                background: `linear-gradient(to right, #3b82f6 0%, #3b82f6 ${((opt.value - opt.min) / (opt.max - opt.min)) * 100}%, #374151 ${((opt.value - opt.min) / (opt.max - opt.min)) * 100}%, #374151 100%)`\n              }}\n            />\n            <div className=\"flex justify-between text-xs text-gray-500\">\n              <span>{opt.min}{opt.unit || ''}</span>\n              <span>{opt.max}{opt.unit || ''}</span>\n            </div>\n          </div>\n        );\n      \n      case 'dropdown':\n        return (\n          <div key={opt.label} className=\"space-y-2\">\n            <span className=\"text-sm text-gray-300 flex items-center gap-1\">{displayName(opt.label)}{optionHelp[displayName(opt.label)] && <InfoTooltip text={optionHelp[displayName(opt.label)]} size={12} />}</span>\n            <select\n              value={opt.value}\n              onChange={(e) => opt.setter(e.target.value)}\n              className=\"w-full px-3 py-2 bg-gray-700 border border-gray-600 rounded-lg text-white text-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent\"\n            >\n              {opt.options.map((option) => (\n                <option key={option.value} value={option.value}>\n                  {option.label}\n                </option>\n              ))}\n            </select>\n          </div>\n        );\n      \n      default:\n        return null;\n    }\n  };\n\n  const gridToggleLabels: string[] = [\n    'Query decomposition',\n    'Compose sub-answers',\n    'Prune irrelevant sentences',\n    'Always search documents', // will be displayed as RAG (no-triage)\n    'Verify answer',\n    'Stream phases',\n  ];\n\n  const retrievalGridLabels = ['LLM model','Search type'];\n\n  const displayName = (label: string) => {\n    if (label === 'Always search documents') return 'RAG (no-triage)';\n    if (label === 'LLM model') return 'LLM';\n    if (label === 'Prune irrelevant sentences') return 'Pruning';\n    if (label === 'Stream phases') return 'Streaming';\n    return label;\n  };\n\n  const renderOptionOrdered = (label: string) => {\n    const opt = options.find(o => o.label === label);\n    if (!opt) return null;\n    // Clone option with display label override\n    const clone = { ...opt, label: displayName(label) } as SettingOption;\n    return renderOption(clone);\n  };\n\n  return (\n    <div className=\"fixed inset-0 bg-black/60 backdrop-blur-sm flex items-center justify-center z-50 p-4\">\n      <div className=\"bg-white/5 backdrop-blur rounded-xl w-full max-w-xl max-h-full overflow-y-auto p-6 text-white space-y-6\">\n        <h2 className=\"text-lg font-semibold mb-6\">Chat Settings</h2>\n\n        <div className=\"space-y-6\">\n          {/* High-level Settings */}\n          <div>\n            <h3 className=\"text-md font-medium text-gray-200 mb-4 flex items-center gap-1\">General Settings <InfoTooltip text=\"High-level toggles that affect how the assistant thinks and whether it always performs RAG.\" /></h3>\n            {/* Two-column grid for key toggles */}\n            <div className=\"grid grid-cols-2 gap-4 mb-4\">\n              {gridToggleLabels.map(renderOptionOrdered)}\n            </div>\n            {/* No additional general options after grid */}\n          </div>\n\n          {/* Retrieval Settings */}\n          <div>\n            <h3 className=\"text-md font-medium text-gray-200 mb-4 flex items-center gap-1\">Retrieval Settings <InfoTooltip text=\"Configure which LLM answers and how the system searches your indexes.\" /></h3>\n            {/* LLM + Search type grid */}\n            {(() => {\n              const arr: SettingOption[] = retrievalGridLabels\n                .map(lbl => {\n                  const opt = options.find(o=>o.label===lbl);\n                  return opt ? ({...opt, label: displayName(lbl) } as SettingOption) : undefined;\n                })\n                .filter((o): o is SettingOption => !!o);\n              return <div className=\"grid grid-cols-2 gap-4 mb-4\">{arr.map(renderOption)}</div>;\n            })()}\n            {/* Sliders */}\n            <div className=\"space-y-4\">\n              {options.filter(opt => ['Retrieval chunks'].includes(opt.label)).map(renderOption)}\n            </div>\n          </div>\n\n          {/* Reranking Settings */}\n          <div>\n            <h3 className=\"text-md font-medium text-gray-200 mb-4 flex items-center gap-1\">Reranking & Context <InfoTooltip text=\"Controls post-retrieval reordering, context window expansion and pruning (may add latency).\" /></h3>\n            <div className=\"space-y-4\">\n              {options.filter(opt => \n                ['AI reranker', 'Reranker top chunks', 'Expand context window', 'Context window size'].includes(opt.label)\n              ).map(renderOption)}\n            </div>\n          </div>\n        </div>\n\n        <div className=\"flex justify-end pt-6 border-t border-white/10 mt-6\">\n          <button\n            onClick={onClose}\n            className=\"px-4 py-2 bg-gray-700 rounded hover:bg-gray-600 text-sm\"\n          >\n            Close\n          </button>\n        </div>\n      </div>\n    </div>\n  );\n} "
  },
  {
    "path": "src/components/ui/conversation-page.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport { useRef, useEffect, useState } from \"react\"\nimport {\n  ChatBubbleAvatar,\n} from \"@/components/ui/chat-bubble\"\nimport { Copy, RefreshCcw, ThumbsUp, ThumbsDown, Volume2, MoreHorizontal, ChevronDown, Loader2, CheckCircle, XOctagon } from \"lucide-react\"\nimport { ScrollArea } from \"@/components/ui/scroll-area\"\nimport { ChatMessage } from \"@/lib/api\"\nimport { cn } from \"@/lib/utils\"\nimport Markdown from \"@/components/Markdown\"\nimport { normalizeWhitespace } from \"@/utils/textNormalization\"\n\ninterface ConversationPageProps {\n  messages: ChatMessage[]\n  isLoading?: boolean\n  className?: string\n  onAction?: (action: string, messageId: string, messageContent: string) => void\n}\n\nconst actionIcons = [\n  { icon: Copy, type: \"Copy\", action: \"copy\" },\n  { icon: ThumbsUp, type: \"Like\", action: \"like\" },\n  { icon: ThumbsDown, type: \"Dislike\", action: \"dislike\" },\n  { icon: Volume2, type: \"Speak\", action: \"speak\" },\n  { icon: RefreshCcw, type: \"Regenerate\", action: \"regenerate\" },\n  { icon: MoreHorizontal, type: \"More\", action: \"more\" },\n]\n\n// Citation block toggle component\nfunction Citation({doc, idx}: {doc:any, idx:number}){\n  const [open,setOpen]=React.useState(false);\n  const preview = (doc.text||'').replace(/\\s+/g,' ').trim().slice(0,160) + ((doc.text||'').length>160?'…':'');\n  return (\n    <div onClick={()=>setOpen(!open)} className=\"text-xs text-gray-300 bg-gray-900/60 rounded p-2 cursor-pointer hover:bg-gray-800 transition\">\n      <span className=\"font-semibold mr-1\">[{idx+1}]</span>{open?doc.text:preview}\n    </div>\n  );\n}\n\n// NEW: Expandable list of citations per assistant message\nfunction CitationsBlock({docs}:{docs:any[]}){\n  const scored = docs.filter(d => d.rerank_score || d.score || d._distance)\n  scored.sort((a, b) => (b.rerank_score ?? b.score ?? 1/b._distance) - (a.rerank_score ?? a.score ?? 1/a._distance))\n  const [expanded, setExpanded] = useState(false);\n\n  if (scored.length === 0) return null;\n\n  const visibleDocs = expanded ? scored : scored.slice(0, 5);\n\n  return (\n    <div className=\"mt-2 text-xs text-gray-400\">\n      <p className=\"font-semibold mb-1\">Sources:</p>\n      <div className=\"grid grid-cols-1 gap-2\">\n        {visibleDocs.map((doc, i) => <Citation key={doc.chunk_id || i} doc={doc} idx={i} />)}\n      </div>\n      {scored.length > 5 && (\n        <button \n          onClick={() => setExpanded(!expanded)} \n          className=\"text-blue-400 hover:text-blue-300 mt-2 text-xs\"\n        >\n          {expanded ? 'Show less' : `Show ${scored.length-5} more`}\n        </button>\n      )}\n    </div>\n  );\n}\n\nfunction StepIcon({ status }: { status: 'pending' | 'active' | 'done' | 'error' }) {\n  switch (status) {\n    case 'pending':\n      return <MoreHorizontal className=\"w-4 h-4 text-neutral-600\" />\n    case 'active':\n      return <Loader2 className=\"w-4 h-4 text-blue-400 animate-spin\" />\n    case 'done':\n      return <CheckCircle className=\"w-4 h-4 text-green-400\" />\n    case 'error':\n      return <XOctagon className=\"w-4 h-4 text-red-400\" />\n    default:\n      return null\n  }\n}\n\nconst statusBorder: Record<string, string> = {\n  pending: 'border-neutral-800',\n  active: 'border-blue-400 animate-pulse',\n  done: 'border-green-400',\n  error: 'border-red-400'\n}\n\n// Component to handle <think> tokens and render them in a collapsible block\nfunction ThinkingText({ text }: { text: string }) {\n  const regex = /<think>([\\s\\S]*?)<\\/think>/g;\n  const thinkSegments: string[] = [];\n  const visibleText = text.replace(regex, (_, p1) => {\n    thinkSegments.push(p1.trim());\n    return \"\"; // remove thinking content from main text\n  });\n\n  return (\n    <>\n      {thinkSegments.length > 0 && (\n        <details className=\"thinking-block inline-block align-baseline mr-2\" open={false}>\n          <summary className=\"cursor-pointer text-xs text-gray-400 uppercase select-none\">Thinking</summary>\n          <div className=\"mt-1 space-y-1 text-xs text-gray-400 italic\">\n            {thinkSegments.map((seg, idx) => (\n              <div key={idx}>{seg}</div>\n            ))}\n          </div>\n        </details>\n      )}\n      {visibleText.trim() && (\n        <Markdown text={normalizeWhitespace(visibleText)} className=\"whitespace-pre-wrap\" />\n      )}\n    </>\n  );\n}\n\nfunction StructuredMessageBlock({ content }: { content: Array<Record<string, any>> | { steps: any[] } }) {\n  const steps: any[] = Array.isArray(content) ? content : (content as any).steps;\n  // Determine if sub-query answers are present\n  const hasSubAnswers = steps.some((s: any) => s.key === 'answer' && Array.isArray(s.details) && s.details.length > 0);\n  // Compute the last index that has started (status !== 'pending') so we only\n  // render steps that are in progress or completed. This avoids showing the\n  // whole plan upfront and reveals each stage sequentially.\n  const lastRevealedIdx = (() => {\n    for (let i = steps.length - 1; i >= 0; i--) {\n      if (steps[i].status && steps[i].status !== 'pending') {\n        return i;\n      }\n    }\n    return -1; // nothing started yet\n  })();\n\n  const visibleSteps = lastRevealedIdx >= 0 ? steps.slice(0, lastRevealedIdx + 1) : [];\n\n  return (\n    <div className=\"flex flex-col\">\n      {visibleSteps.map((step: any, index: number) => {\n        if (step.key && step.label) {\n          const borderCls = statusBorder[step.status] || statusBorder['pending']\n          const statusClass = `timeline-card card my-1 py-2 pl-3 pr-2 bg-[#0d0d0d] rounded border-l-2 ${borderCls}`\n          \n          return (\n            <div key={step.key} className={statusClass}>\n              <div className=\"flex items-center gap-2 mb-1\">\n                <StepIcon status={step.status} />\n                <span className=\"text-sm font-medium text-neutral-100\">{step.label}</span>\n              </div>\n              {/* Details for each step */}\n              {step.key === 'final' && step.details && typeof step.details === 'object' && !Array.isArray(step.details) ? (\n                <div className=\"space-y-3\">\n                  <div className=\"whitespace-pre-wrap text-gray-100\">\n                    <ThinkingText text={normalizeWhitespace(step.details.answer)} />\n                  </div>\n                  {!hasSubAnswers && step.details.source_documents && step.details.source_documents.length > 0 && (\n                    <CitationsBlock docs={step.details.source_documents} />\n                  )}\n                </div>\n              ) : step.key === 'final' && step.details && typeof step.details === 'string' ? (\n                <div className=\"whitespace-pre-wrap text-gray-100\">\n                  <ThinkingText text={normalizeWhitespace(step.details)} />\n                </div>\n              ) : Array.isArray(step.details) ? (\n                step.key === 'decompose' && step.details.every((d: any)=> typeof d === 'string') ? (\n                  // Render list of sub-query strings\n                  <ul className=\"list-disc list-inside space-y-1 text-neutral-200\">\n                    {step.details.map((q: string, idx:number)=>(\n                      <li key={idx}>{q}</li>\n                    ))}\n                  </ul>\n                ) : (\n                  // Handle array of sub-answers\n                  <div className=\"space-y-2\">\n                    {step.details.map((detail: any, idx: number) => (\n                      <div key={idx} className=\"border-l-2 border-blue-400 pl-2\">\n                        <div className=\"font-semibold\">{detail.question}</div>\n                        <div><ThinkingText text={normalizeWhitespace(detail.answer)} /></div>\n                        {detail.source_documents && detail.source_documents.length > 0 && (\n                          <CitationsBlock docs={detail.source_documents} />\n                        )}\n                      </div>\n                    ))}\n                  </div>\n                )\n              ) : (\n                // Handle string details\n                <ThinkingText text={normalizeWhitespace(step.details as string)} />\n              )}\n            </div>\n          );\n        }\n        return null;\n      })}\n    </div>\n  );\n}\n\nexport function ConversationPage({ \n  messages, \n  isLoading = false,\n  className = \"\",\n  onAction\n}: ConversationPageProps) {\n  const scrollAreaRef = useRef<HTMLDivElement>(null)\n  const messagesEndRef = useRef<HTMLDivElement>(null)\n  const [showScrollButton, setShowScrollButton] = useState(false)\n  const [isUserNearBottom,setIsUserNearBottom]=useState(true)\n\n  // Track if user is near bottom so we don't interrupt manual scrolling\n  useEffect(() => {\n    if(isUserNearBottom){\n    scrollToBottom()\n    }\n  }, [messages, isLoading])\n\n  // Monitor scroll position to show/hide scroll button\n  useEffect(() => {\n    const scrollContainer = scrollAreaRef.current?.querySelector('[data-radix-scroll-area-viewport]')\n    if (!scrollContainer) return\n\n    const handleScroll = () => {\n      const { scrollTop, scrollHeight, clientHeight } = scrollContainer\n      const isNearBottom = scrollHeight - scrollTop - clientHeight < 100\n      setShowScrollButton(!isNearBottom)\n      setIsUserNearBottom(isNearBottom)\n    }\n\n    scrollContainer.addEventListener('scroll', handleScroll)\n    handleScroll() // Check initial state\n\n    return () => scrollContainer.removeEventListener('scroll', handleScroll)\n  }, [])\n\n  const scrollToBottom = () => {\n    // Try multiple methods to ensure scrolling works\n    if (messagesEndRef.current) {\n      messagesEndRef.current.scrollIntoView({ behavior: 'smooth' })\n    }\n    \n    // Fallback: scroll the container directly\n    setTimeout(() => {\n      if (scrollAreaRef.current) {\n        const scrollContainer = scrollAreaRef.current.querySelector('[data-radix-scroll-area-viewport]') || scrollAreaRef.current\n        if (scrollContainer) {\n          scrollContainer.scrollTop = scrollContainer.scrollHeight\n        }\n      }\n    }, 100)\n  }\n\n  const handleAction = (action: string, messageId: string, messageContent: string) => {\n    if (onAction) {\n      // For structured messages, we'll just join the text parts for copy/paste\n      let contentToPass: string;\n      if (typeof messageContent === 'string') {\n        contentToPass = messageContent;\n      } else if (Array.isArray(messageContent)) {\n        contentToPass = (messageContent as any[]).map((s: any) => s.text || s.answer || '').join('\\n');\n      } else if (messageContent && typeof messageContent === 'object' && Array.isArray((messageContent as any).steps)) {\n        // For {steps: Step[]} structure\n        contentToPass = (messageContent as any).steps.map((s: any) => s.label + (s.details ? (typeof s.details === 'string' ? (': ' + s.details) : '') : '')).join('\\n');\n      } else {\n        contentToPass = '';\n      }\n      onAction(action, messageId, contentToPass)\n      return\n    }\n    \n    console.log(`Action ${action} clicked for message ${messageId}`)\n    // Handle different actions here\n    switch (action) {\n      case 'copy':\n        navigator.clipboard.writeText(messageContent)\n        break\n      case 'regenerate':\n        // Regenerate AI response\n        break\n      case 'like':\n        // Add like reaction\n        break\n      case 'dislike':\n        // Add dislike reaction\n        break\n      case 'speak':\n        // Text to speech\n        break\n      case 'more':\n        // Show more options\n        break\n    }\n  }\n\n  return (\n    <div className={`flex flex-col h-full bg-black relative overflow-hidden ${className}`}>\n      <ScrollArea ref={scrollAreaRef} className=\"flex-1 h-full px-4 pt-4 pb-6 min-h-0\">\n        <div className=\"max-w-4xl mx-auto space-y-6\">\n          {messages.map((message) => {\n            const isUser = message.sender === \"user\"\n            \n            return (\n              <div key={message.id} className=\"w-full group\">\n                <div className={`flex gap-3 ${isUser ? 'justify-end' : 'justify-start'}`}>\n                  {!isUser && (\n                    <ChatBubbleAvatar \n                      fallback=\"AI\" \n                      className=\"mt-1 flex-shrink-0 text-black\"\n                    />\n                  )}\n                  \n                  <div className={`flex flex-col space-y-2 ${isUser ? 'items-end' : 'items-start'} max-w-full md:max-w-3xl`}>\n                    <div\n                      className={`rounded-2xl px-5 py-4 ${\n                        isUser \n                          ? \"bg-white text-black\" \n                          : \"bg-gray-800 text-gray-100\"\n                      }`}\n                    >\n                      {message.isLoading ? (\n                        <div className=\"flex items-center space-x-2\">\n                          <div className=\"flex space-x-1\">\n                            <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\"></div>\n                            <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\" style={{animationDelay: '0.1s'}}></div>\n                            <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\" style={{animationDelay: '0.2s'}}></div>\n                          </div>\n                        </div>\n                      ) : (\n                        <div className=\"whitespace-pre-wrap text-base leading-relaxed\">\n                          {typeof message.content === 'string' \n                              ? <ThinkingText text={normalizeWhitespace(message.content)} />\n                              : <StructuredMessageBlock content={message.content} />\n                          }\n                        </div>\n                      )}\n                    </div>\n                    \n                    {!isUser && !message.isLoading && (\n                      <div className=\"flex items-center gap-1 opacity-0 group-hover:opacity-100 transition-opacity duration-200\">\n                        {actionIcons.map(({ icon: Icon, type, action }) => (\n                          <button\n                            key={action}\n                            onClick={() => {\n                              const content = typeof message.content === 'string' ? message.content : (message.content as any[]).map(s => s.text || s.answer).join('\\\\n');\n                              handleAction(action, message.id, content)\n                            }}\n                            className=\"p-1.5 hover:bg-gray-700 rounded-md transition-colors text-gray-400 hover:text-gray-200\"\n                            title={type}\n                          >\n                            <Icon className=\"w-3.5 h-3.5\" />\n                          </button>\n                        ))}\n                      </div>\n                    )}\n\n                    {/* Global citations only for plain-string messages */}\n                    {(!isUser &&\n                      !message.isLoading &&\n                      typeof message.content === 'string' &&\n                      Array.isArray((message as any).metadata?.source_documents) &&\n                      (message as any).metadata.source_documents.length > 0) && (\n                        <CitationsBlock docs={(message as any).metadata.source_documents} />\n                    )}\n                  </div>\n\n                  {isUser && (\n                    <ChatBubbleAvatar \n                      className=\"mt-1 flex-shrink-0 text-black\"\n                      src=\"https://i.pravatar.cc/40?u=user\"\n                      fallback=\"User\"\n                    />\n                  )}\n                </div>\n              </div>\n            )\n          })}\n          \n          {/* Loading indicator for new message */}\n          {isLoading && (\n            <div className=\"w-full group\">\n              <div className=\"flex gap-3 justify-start\">\n                <ChatBubbleAvatar fallback=\"AI\" className=\"mt-1 flex-shrink-0 text-black\" />\n                <div className=\"flex flex-col space-y-2 items-start max-w-[80%]\">\n                  <div className=\"rounded-2xl px-4 py-3 bg-gray-800 text-gray-100\">\n                    <div className=\"flex items-center space-x-2\">\n                      <div className=\"flex space-x-1\">\n                        <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\"></div>\n                        <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\" style={{animationDelay: '0.1s'}}></div>\n                        <div className=\"w-2 h-2 bg-gray-400 rounded-full animate-bounce\" style={{animationDelay: '0.2s'}}></div>\n                      </div>\n                    </div>\n                  </div>\n                </div>\n              </div>\n            </div>\n                      )}\n          \n          {/* Invisible element to scroll to */}\n          <div ref={messagesEndRef} />\n        </div>\n      </ScrollArea>\n      \n      {/* Scroll to bottom button - only show when not at bottom */}\n      {showScrollButton && (\n        <div className=\"absolute bottom-20 left-1/2 transform -translate-x-1/2 z-10\">\n          <button\n            onClick={scrollToBottom}\n            className=\"p-2 bg-gray-800 border border-gray-700 rounded-full hover:bg-gray-700 transition-all duration-200 shadow-lg group animate-in fade-in slide-in-from-bottom-2\"\n            title=\"Scroll to bottom\"\n          >\n            <ChevronDown className=\"w-4 h-4 text-gray-400 group-hover:text-gray-200 transition-colors\" />\n          </button>\n        </div>\n      )}\n    </div>\n  )\n}  "
  },
  {
    "path": "src/components/ui/dropdown-menu.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport * as DropdownMenuPrimitive from \"@radix-ui/react-dropdown-menu\"\nimport { CheckIcon, ChevronRightIcon, CircleIcon } from \"lucide-react\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction DropdownMenu({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Root>) {\n  return <DropdownMenuPrimitive.Root data-slot=\"dropdown-menu\" {...props} />\n}\n\nfunction DropdownMenuPortal({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Portal>) {\n  return (\n    <DropdownMenuPrimitive.Portal data-slot=\"dropdown-menu-portal\" {...props} />\n  )\n}\n\nfunction DropdownMenuTrigger({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Trigger>) {\n  return (\n    <DropdownMenuPrimitive.Trigger\n      data-slot=\"dropdown-menu-trigger\"\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuContent({\n  className,\n  sideOffset = 4,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Content>) {\n  return (\n    <DropdownMenuPrimitive.Portal>\n      <DropdownMenuPrimitive.Content\n        data-slot=\"dropdown-menu-content\"\n        sideOffset={sideOffset}\n        className={cn(\n          \"bg-popover text-popover-foreground data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 data-[state=closed]:zoom-out-95 data-[state=open]:zoom-in-95 data-[side=bottom]:slide-in-from-top-2 data-[side=left]:slide-in-from-right-2 data-[side=right]:slide-in-from-left-2 data-[side=top]:slide-in-from-bottom-2 z-50 max-h-(--radix-dropdown-menu-content-available-height) min-w-[8rem] origin-(--radix-dropdown-menu-content-transform-origin) overflow-x-hidden overflow-y-auto rounded-md border p-1 shadow-md\",\n          className\n        )}\n        {...props}\n      />\n    </DropdownMenuPrimitive.Portal>\n  )\n}\n\nfunction DropdownMenuGroup({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Group>) {\n  return (\n    <DropdownMenuPrimitive.Group data-slot=\"dropdown-menu-group\" {...props} />\n  )\n}\n\nfunction DropdownMenuItem({\n  className,\n  inset,\n  variant = \"default\",\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Item> & {\n  inset?: boolean\n  variant?: \"default\" | \"destructive\"\n}) {\n  return (\n    <DropdownMenuPrimitive.Item\n      data-slot=\"dropdown-menu-item\"\n      data-inset={inset}\n      data-variant={variant}\n      className={cn(\n        \"focus:bg-accent focus:text-accent-foreground data-[variant=destructive]:text-destructive data-[variant=destructive]:focus:bg-destructive/10 dark:data-[variant=destructive]:focus:bg-destructive/20 data-[variant=destructive]:focus:text-destructive data-[variant=destructive]:*:[svg]:!text-destructive [&_svg:not([class*='text-'])]:text-muted-foreground relative flex cursor-default items-center gap-2 rounded-sm px-2 py-1.5 text-sm outline-hidden select-none data-[disabled]:pointer-events-none data-[disabled]:opacity-50 data-[inset]:pl-8 [&_svg]:pointer-events-none [&_svg]:shrink-0 [&_svg:not([class*='size-'])]:size-4\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuCheckboxItem({\n  className,\n  children,\n  checked,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.CheckboxItem>) {\n  return (\n    <DropdownMenuPrimitive.CheckboxItem\n      data-slot=\"dropdown-menu-checkbox-item\"\n      className={cn(\n        \"focus:bg-accent focus:text-accent-foreground relative flex cursor-default items-center gap-2 rounded-sm py-1.5 pr-2 pl-8 text-sm outline-hidden select-none data-[disabled]:pointer-events-none data-[disabled]:opacity-50 [&_svg]:pointer-events-none [&_svg]:shrink-0 [&_svg:not([class*='size-'])]:size-4\",\n        className\n      )}\n      checked={checked}\n      {...props}\n    >\n      <span className=\"pointer-events-none absolute left-2 flex size-3.5 items-center justify-center\">\n        <DropdownMenuPrimitive.ItemIndicator>\n          <CheckIcon className=\"size-4\" />\n        </DropdownMenuPrimitive.ItemIndicator>\n      </span>\n      {children}\n    </DropdownMenuPrimitive.CheckboxItem>\n  )\n}\n\nfunction DropdownMenuRadioGroup({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.RadioGroup>) {\n  return (\n    <DropdownMenuPrimitive.RadioGroup\n      data-slot=\"dropdown-menu-radio-group\"\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuRadioItem({\n  className,\n  children,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.RadioItem>) {\n  return (\n    <DropdownMenuPrimitive.RadioItem\n      data-slot=\"dropdown-menu-radio-item\"\n      className={cn(\n        \"focus:bg-accent focus:text-accent-foreground relative flex cursor-default items-center gap-2 rounded-sm py-1.5 pr-2 pl-8 text-sm outline-hidden select-none data-[disabled]:pointer-events-none data-[disabled]:opacity-50 [&_svg]:pointer-events-none [&_svg]:shrink-0 [&_svg:not([class*='size-'])]:size-4\",\n        className\n      )}\n      {...props}\n    >\n      <span className=\"pointer-events-none absolute left-2 flex size-3.5 items-center justify-center\">\n        <DropdownMenuPrimitive.ItemIndicator>\n          <CircleIcon className=\"size-2 fill-current\" />\n        </DropdownMenuPrimitive.ItemIndicator>\n      </span>\n      {children}\n    </DropdownMenuPrimitive.RadioItem>\n  )\n}\n\nfunction DropdownMenuLabel({\n  className,\n  inset,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Label> & {\n  inset?: boolean\n}) {\n  return (\n    <DropdownMenuPrimitive.Label\n      data-slot=\"dropdown-menu-label\"\n      data-inset={inset}\n      className={cn(\n        \"px-2 py-1.5 text-sm font-medium data-[inset]:pl-8\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuSeparator({\n  className,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Separator>) {\n  return (\n    <DropdownMenuPrimitive.Separator\n      data-slot=\"dropdown-menu-separator\"\n      className={cn(\"bg-border -mx-1 my-1 h-px\", className)}\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuShortcut({\n  className,\n  ...props\n}: React.ComponentProps<\"span\">) {\n  return (\n    <span\n      data-slot=\"dropdown-menu-shortcut\"\n      className={cn(\n        \"text-muted-foreground ml-auto text-xs tracking-widest\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nfunction DropdownMenuSub({\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.Sub>) {\n  return <DropdownMenuPrimitive.Sub data-slot=\"dropdown-menu-sub\" {...props} />\n}\n\nfunction DropdownMenuSubTrigger({\n  className,\n  inset,\n  children,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.SubTrigger> & {\n  inset?: boolean\n}) {\n  return (\n    <DropdownMenuPrimitive.SubTrigger\n      data-slot=\"dropdown-menu-sub-trigger\"\n      data-inset={inset}\n      className={cn(\n        \"focus:bg-accent focus:text-accent-foreground data-[state=open]:bg-accent data-[state=open]:text-accent-foreground flex cursor-default items-center rounded-sm px-2 py-1.5 text-sm outline-hidden select-none data-[inset]:pl-8\",\n        className\n      )}\n      {...props}\n    >\n      {children}\n      <ChevronRightIcon className=\"ml-auto size-4\" />\n    </DropdownMenuPrimitive.SubTrigger>\n  )\n}\n\nfunction DropdownMenuSubContent({\n  className,\n  ...props\n}: React.ComponentProps<typeof DropdownMenuPrimitive.SubContent>) {\n  return (\n    <DropdownMenuPrimitive.SubContent\n      data-slot=\"dropdown-menu-sub-content\"\n      className={cn(\n        \"bg-popover text-popover-foreground data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 data-[state=closed]:zoom-out-95 data-[state=open]:zoom-in-95 data-[side=bottom]:slide-in-from-top-2 data-[side=left]:slide-in-from-right-2 data-[side=right]:slide-in-from-left-2 data-[side=top]:slide-in-from-bottom-2 z-50 min-w-[8rem] origin-(--radix-dropdown-menu-content-transform-origin) overflow-hidden rounded-md border p-1 shadow-lg\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nexport {\n  DropdownMenu,\n  DropdownMenuPortal,\n  DropdownMenuTrigger,\n  DropdownMenuContent,\n  DropdownMenuGroup,\n  DropdownMenuLabel,\n  DropdownMenuItem,\n  DropdownMenuCheckboxItem,\n  DropdownMenuRadioGroup,\n  DropdownMenuRadioItem,\n  DropdownMenuSeparator,\n  DropdownMenuShortcut,\n  DropdownMenuSub,\n  DropdownMenuSubTrigger,\n  DropdownMenuSubContent,\n}\n"
  },
  {
    "path": "src/components/ui/empty-chat-state.tsx",
    "content": "\"use client\";\n\nimport { useEffect, useRef, useCallback } from \"react\";\nimport { useState } from \"react\";\nimport { Textarea } from \"@/components/ui/textarea\";\nimport { cn } from \"@/lib/utils\";\nimport {\n    ArrowUpIcon,\n    Paperclip,\n    PlusIcon,\n    X,\n    FileText,\n} from \"lucide-react\";\nimport { AttachedFile } from \"@/lib/types\";\n\ninterface UseAutoResizeTextareaProps {\n    minHeight: number;\n    maxHeight?: number;\n}\n\nfunction useAutoResizeTextarea({\n    minHeight,\n    maxHeight,\n}: UseAutoResizeTextareaProps) {\n    const textareaRef = useRef<HTMLTextAreaElement>(null);\n\n    const adjustHeight = useCallback(\n        (reset?: boolean) => {\n            const textarea = textareaRef.current;\n            if (!textarea) return;\n\n            if (reset) {\n                textarea.style.height = `${minHeight}px`;\n                return;\n            }\n\n            // Temporarily shrink to get the right scrollHeight\n            textarea.style.height = `${minHeight}px`;\n\n            // Calculate new height\n            const newHeight = Math.max(\n                minHeight,\n                Math.min(\n                    textarea.scrollHeight,\n                    maxHeight ?? Number.POSITIVE_INFINITY\n                )\n            );\n\n            textarea.style.height = `${newHeight}px`;\n        },\n        [minHeight, maxHeight]\n    );\n\n    useEffect(() => {\n        // Set initial height\n        const textarea = textareaRef.current;\n        if (textarea) {\n            textarea.style.height = `${minHeight}px`;\n        }\n    }, [minHeight]);\n\n    // Adjust height on window resize\n    useEffect(() => {\n        const handleResize = () => adjustHeight();\n        window.addEventListener(\"resize\", handleResize);\n        return () => window.removeEventListener(\"resize\", handleResize);\n    }, [adjustHeight]);\n\n    return { textareaRef, adjustHeight };\n}\n\ninterface EmptyChatStateProps {\n    onSendMessage: (message: string, attachedFiles?: AttachedFile[]) => void;\n    disabled?: boolean;\n    placeholder?: string;\n}\n\nexport function EmptyChatState({ \n    onSendMessage, \n    disabled = false, \n    placeholder = \"Ask localgpt a question...\" \n}: EmptyChatStateProps) {\n    const [value, setValue] = useState(\"\");\n    const [attachedFiles, setAttachedFiles] = useState<AttachedFile[]>([]);\n    const fileInputRef = useRef<HTMLInputElement>(null);\n    const { textareaRef, adjustHeight } = useAutoResizeTextarea({\n        minHeight: 60,\n        maxHeight: 200,\n    });\n\n    const handleSend = () => {\n        if ((value.trim() || attachedFiles.length > 0) && !disabled) {\n            onSendMessage(value.trim(), attachedFiles);\n            setValue(\"\");\n            setAttachedFiles([]);\n            adjustHeight(true);\n        }\n    };\n\n    const handleKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => {\n        if (e.key === \"Enter\" && !e.shiftKey) {\n            e.preventDefault();\n            handleSend();\n        }\n    };\n\n    const handleFileAttach = () => {\n        fileInputRef.current?.click();\n    };\n\n    const handleFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {\n        const files = e.target.files;\n        if (!files) return;\n\n        const newFiles: AttachedFile[] = [];\n        for (let i = 0; i < files.length; i++) {\n            const file = files[i];\n            if (file.type === 'application/pdf' || \n                file.type === 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' ||\n                file.type === 'application/msword' ||\n                file.type === 'text/html' ||\n                file.type === 'text/markdown' ||\n                file.type === 'text/plain' ||\n                file.name.toLowerCase().endsWith('.pdf') ||\n                file.name.toLowerCase().endsWith('.docx') ||\n                file.name.toLowerCase().endsWith('.doc') ||\n                file.name.toLowerCase().endsWith('.html') ||\n                file.name.toLowerCase().endsWith('.htm') ||\n                file.name.toLowerCase().endsWith('.md') ||\n                file.name.toLowerCase().endsWith('.txt')) {\n                newFiles.push({\n                    id: crypto.randomUUID(),\n                    name: file.name,\n                    size: file.size,\n                    type: file.type,\n                    file: file,\n                });\n            }\n        }\n\n        setAttachedFiles(prev => [...prev, ...newFiles]);\n        \n        // Reset the input\n        if (fileInputRef.current) {\n            fileInputRef.current.value = '';\n        }\n\n        // --- NEW: Immediately trigger upload when files are selected ---\n        if (newFiles.length > 0) {\n            onSendMessage(\"\", newFiles);\n            // Clear the local attachment state as the parent now handles it\n            setAttachedFiles([]); \n        }\n    };\n\n    const removeFile = (fileId: string) => {\n        setAttachedFiles(prev => prev.filter(f => f.id !== fileId));\n    };\n\n    const formatFileSize = (bytes: number) => {\n        if (bytes === 0) return '0 Bytes';\n        const k = 1024;\n        const sizes = ['Bytes', 'KB', 'MB', 'GB'];\n        const i = Math.floor(Math.log(bytes) / Math.log(k));\n        return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];\n    };\n\n    return (\n        <div className=\"flex flex-col items-center justify-center h-full w-full max-w-4xl mx-auto p-4 space-y-8\">\n            <h1 className=\"text-4xl font-bold text-white\">\n                What can I help you find?\n            </h1>\n\n            <div className=\"w-full\">\n                {/* Attached Files Display */}\n                {attachedFiles.length > 0 && (\n                    <div className=\"mb-4 space-y-2\">\n                        <div className=\"text-sm text-gray-400 font-medium\">Attached Files:</div>\n                        <div className=\"space-y-2\">\n                            {attachedFiles.map((file) => (\n                                <div key={file.id} className=\"flex items-center gap-3 bg-gray-800 rounded-lg p-3\">\n                                    <FileText className=\"w-5 h-5 text-red-400\" />\n                                    <div className=\"flex-1 min-w-0\">\n                                        <div className=\"text-sm text-white truncate\">{file.name}</div>\n                                        <div className=\"text-xs text-gray-400\">{formatFileSize(file.size)}</div>\n                                    </div>\n                                    {/* The remove button is commented out as the parent will manage the state now */}\n                                    {/* <button\n                                        onClick={() => removeFile(file.id)}\n                                        className=\"p-1 hover:bg-gray-700 rounded transition-colors\"\n                                    >\n                                        <X className=\"w-4 h-4 text-gray-400 hover:text-white\" />\n                                    </button> */}\n                                </div>\n                            ))}\n                        </div>\n                    </div>\n                )}\n\n                <div className=\"relative bg-neutral-900 rounded-xl border border-neutral-800\">\n                    <div className=\"overflow-y-auto\">\n                        <Textarea\n                            ref={textareaRef}\n                            value={value}\n                            onChange={(e) => {\n                                setValue(e.target.value);\n                                adjustHeight();\n                            }}\n                            onKeyDown={handleKeyDown}\n                            placeholder={attachedFiles.length > 0 ? \"Ask questions about your attached files...\" : placeholder}\n                            disabled={disabled}\n                            className={cn(\n                                \"w-full px-4 py-3\",\n                                \"resize-none\",\n                                \"bg-transparent\",\n                                \"border-none\",\n                                \"text-white text-sm\",\n                                \"focus:outline-none\",\n                                \"focus-visible:ring-0 focus-visible:ring-offset-0\",\n                                \"placeholder:text-neutral-500 placeholder:text-sm\",\n                                \"min-h-[60px]\",\n                                disabled && \"opacity-50 cursor-not-allowed\"\n                            )}\n                            style={{\n                                overflow: \"hidden\",\n                            }}\n                        />\n                    </div>\n\n                    {/* Hidden file input */}\n                    <input\n                        ref={fileInputRef}\n                        type=\"file\"\n                        accept=\".pdf,.docx,.doc,.html,.htm,.md,.txt\"\n                        multiple\n                        onChange={handleFileChange}\n                        className=\"hidden\"\n                    />\n\n                    <div className=\"flex items-center justify-between p-3\">\n                        <div className=\"flex items-center gap-2\">\n                            <button\n                                type=\"button\"\n                                onClick={handleFileAttach}\n                                disabled={disabled}\n                                className=\"group p-2 hover:bg-neutral-800 rounded-lg transition-colors flex items-center gap-1 disabled:opacity-50 disabled:cursor-not-allowed\"\n                                title=\"Attach PDF files\"\n                            >\n                                <Paperclip className=\"w-4 h-4 text-white\" />\n                                <span className=\"text-xs text-zinc-400 hidden group-hover:inline transition-opacity\">\n                                    Attach PDF\n                                </span>\n                            </button>\n                        </div>\n                        <div className=\"flex items-center gap-2\">\n                            <button\n                                type=\"button\"\n                                disabled={disabled}\n                                className=\"px-2 py-1 rounded-lg text-sm text-zinc-400 transition-colors border border-dashed border-zinc-700 hover:border-zinc-600 hover:bg-zinc-800 flex items-center justify-between gap-1 disabled:opacity-50 disabled:cursor-not-allowed\"\n                            >\n                                <PlusIcon className=\"w-4 h-4\" />\n                                Project\n                            </button>\n                            <button\n                                type=\"button\"\n                                onClick={handleSend}\n                                disabled={disabled || (!value.trim() && attachedFiles.length === 0)}\n                                className={cn(\n                                    \"px-1.5 py-1.5 rounded-lg text-sm transition-colors border border-zinc-700 hover:border-zinc-600 hover:bg-zinc-800 flex items-center justify-between gap-1\",\n                                    (value.trim() || attachedFiles.length > 0) && !disabled\n                                        ? \"bg-white text-black hover:bg-gray-200\"\n                                        : \"text-zinc-400\",\n                                    \"disabled:opacity-50 disabled:cursor-not-allowed\"\n                                )}\n                            >\n                                <ArrowUpIcon\n                                    className={cn(\n                                        \"w-4 h-4\",\n                                        (value.trim() || attachedFiles.length > 0) && !disabled\n                                            ? \"text-black\"\n                                            : \"text-zinc-400\"\n                                    )}\n                                />\n                                <span className=\"sr-only\">Send</span>\n                            </button>\n                        </div>\n                    </div>\n                </div>\n            </div>\n        </div>\n    );\n}    "
  },
  {
    "path": "src/components/ui/localgpt-chat.tsx",
    "content": "\"use client\";\n\nimport { useEffect, useRef, useCallback } from \"react\";\nimport { useState } from \"react\";\nimport { Textarea } from \"@/components/ui/textarea\";\nimport { cn } from \"@/lib/utils\";\nimport {\n    ArrowUpIcon,\n    Paperclip,\n    PlusIcon,\n} from \"lucide-react\";\n\ninterface UseAutoResizeTextareaProps {\n    minHeight: number;\n    maxHeight?: number;\n}\n\nfunction useAutoResizeTextarea({\n    minHeight,\n    maxHeight,\n}: UseAutoResizeTextareaProps) {\n    const textareaRef = useRef<HTMLTextAreaElement>(null);\n\n    const adjustHeight = useCallback(\n        (reset?: boolean) => {\n            const textarea = textareaRef.current;\n            if (!textarea) return;\n\n            if (reset) {\n                textarea.style.height = `${minHeight}px`;\n                return;\n            }\n\n            // Temporarily shrink to get the right scrollHeight\n            textarea.style.height = `${minHeight}px`;\n\n            // Calculate new height\n            const newHeight = Math.max(\n                minHeight,\n                Math.min(\n                    textarea.scrollHeight,\n                    maxHeight ?? Number.POSITIVE_INFINITY\n                )\n            );\n\n            textarea.style.height = `${newHeight}px`;\n        },\n        [minHeight, maxHeight]\n    );\n\n    useEffect(() => {\n        // Set initial height\n        const textarea = textareaRef.current;\n        if (textarea) {\n            textarea.style.height = `${minHeight}px`;\n        }\n    }, [minHeight]);\n\n    // Adjust height on window resize\n    useEffect(() => {\n        const handleResize = () => adjustHeight();\n        window.addEventListener(\"resize\", handleResize);\n        return () => window.removeEventListener(\"resize\", handleResize);\n    }, [adjustHeight]);\n\n    return { textareaRef, adjustHeight };\n}\n\nexport function LocalGPTChat() {\n    const [value, setValue] = useState(\"\");\n    const { textareaRef, adjustHeight } = useAutoResizeTextarea({\n        minHeight: 60,\n        maxHeight: 200,\n    });\n\n    const handleKeyDown = (e: React.KeyboardEvent<HTMLTextAreaElement>) => {\n        if (e.key === \"Enter\" && !e.shiftKey) {\n            e.preventDefault();\n            if (value.trim()) {\n                setValue(\"\");\n                adjustHeight(true);\n            }\n        }\n    };\n\n    return (\n        <div className=\"flex flex-col items-center w-full max-w-4xl mx-auto p-4 space-y-8\">\n            <h1 className=\"text-4xl font-bold text-white\">\n                What can I help you find?\n            </h1>\n\n            <div className=\"w-full\">\n                <div className=\"relative bg-neutral-900 rounded-xl border border-neutral-800\">\n                    <div className=\"overflow-y-auto\">\n                        <Textarea\n                            ref={textareaRef}\n                            value={value}\n                            onChange={(e) => {\n                                setValue(e.target.value);\n                                adjustHeight();\n                            }}\n                            onKeyDown={handleKeyDown}\n                            placeholder=\"Ask localgpt a question...\"\n                            className={cn(\n                                \"w-full px-4 py-3\",\n                                \"resize-none\",\n                                \"bg-transparent\",\n                                \"border-none\",\n                                \"text-white text-sm\",\n                                \"focus:outline-none\",\n                                \"focus-visible:ring-0 focus-visible:ring-offset-0\",\n                                \"placeholder:text-neutral-500 placeholder:text-sm\",\n                                \"min-h-[60px]\"\n                            )}\n                            style={{\n                                overflow: \"hidden\",\n                            }}\n                        />\n                    </div>\n\n                    <div className=\"flex items-center justify-between p-3\">\n                        <div className=\"flex items-center gap-2\">\n                            <button\n                                type=\"button\"\n                                className=\"group p-2 hover:bg-neutral-800 rounded-lg transition-colors flex items-center gap-1\"\n                            >\n                                <Paperclip className=\"w-4 h-4 text-white\" />\n                                <span className=\"text-xs text-zinc-400 hidden group-hover:inline transition-opacity\">\n                                    Attach\n                                </span>\n                            </button>\n                        </div>\n                        <div className=\"flex items-center gap-2\">\n                            <button\n                                type=\"button\"\n                                className=\"px-2 py-1 rounded-lg text-sm text-zinc-400 transition-colors border border-dashed border-zinc-700 hover:border-zinc-600 hover:bg-zinc-800 flex items-center justify-between gap-1\"\n                            >\n                                <PlusIcon className=\"w-4 h-4\" />\n                                Project\n                            </button>\n                            <button\n                                type=\"button\"\n                                className={cn(\n                                    \"px-1.5 py-1.5 rounded-lg text-sm transition-colors border border-zinc-700 hover:border-zinc-600 hover:bg-zinc-800 flex items-center justify-between gap-1\",\n                                    value.trim()\n                                        ? \"bg-white text-black\"\n                                        : \"text-zinc-400\"\n                                )}\n                            >\n                                <ArrowUpIcon\n                                    className={cn(\n                                        \"w-4 h-4\",\n                                        value.trim()\n                                            ? \"text-black\"\n                                            : \"text-zinc-400\"\n                                    )}\n                                />\n                                <span className=\"sr-only\">Send</span>\n                            </button>\n                        </div>\n                    </div>\n                </div>\n\n\n            </div>\n        </div>\n    );\n}\n\n "
  },
  {
    "path": "src/components/ui/message-loading.tsx",
    "content": "\"use client\"\n\nfunction MessageLoading() {\n  return (\n    <svg\n      width=\"24\"\n      height=\"24\"\n      viewBox=\"0 0 24 24\"\n      xmlns=\"http://www.w3.org/2000/svg\"\n      className=\"text-foreground\"\n    >\n      <circle cx=\"4\" cy=\"12\" r=\"2\" fill=\"currentColor\">\n        <animate\n          id=\"spinner_qFRN\"\n          begin=\"0;spinner_OcgL.end+0.25s\"\n          attributeName=\"cy\"\n          calcMode=\"spline\"\n          dur=\"0.6s\"\n          values=\"12;6;12\"\n          keySplines=\".33,.66,.66,1;.33,0,.66,.33\"\n        />\n      </circle>\n      <circle cx=\"12\" cy=\"12\" r=\"2\" fill=\"currentColor\">\n        <animate\n          begin=\"spinner_qFRN.begin+0.1s\"\n          attributeName=\"cy\"\n          calcMode=\"spline\"\n          dur=\"0.6s\"\n          values=\"12;6;12\"\n          keySplines=\".33,.66,.66,1;.33,0,.66,.33\"\n        />\n      </circle>\n      <circle cx=\"20\" cy=\"12\" r=\"2\" fill=\"currentColor\">\n        <animate\n          id=\"spinner_OcgL\"\n          begin=\"spinner_qFRN.begin+0.2s\"\n          attributeName=\"cy\"\n          calcMode=\"spline\"\n          dur=\"0.6s\"\n          values=\"12;6;12\"\n          keySplines=\".33,.66,.66,1;.33,0,.66,.33\"\n        />\n      </circle>\n    </svg>\n  );\n}\n\nexport { MessageLoading }; "
  },
  {
    "path": "src/components/ui/quick-chat.tsx",
    "content": "\"use client\";\n\nimport React, { useState, useEffect } from 'react';\nimport { ChatInput } from '@/components/ui/chat-input';\nimport { chatAPI, ChatMessage } from '@/lib/api';\nimport { ConversationPage } from '@/components/ui/conversation-page';\nimport { ChatSettingsModal } from '@/components/ui/chat-settings-modal';\n\ninterface QuickChatProps {\n  sessionId?: string;\n  onSessionChange?: (s: any) => void;\n  className?: string;\n}\n\nexport function QuickChat({ sessionId: externalSessionId, onSessionChange, className=\"\" }: QuickChatProps) {\n  const [messages, setMessages] = useState<ChatMessage[]>([]);\n  const [isLoading, setIsLoading] = useState(false);\n  const [sessionId, setSessionId] = useState<string | undefined>(externalSessionId);\n  const [generationModels, setGenerationModels] = useState<string[]>([]);\n  const [selectedModel, setSelectedModel] = useState<string>('');\n  const [showSettings, setShowSettings] = useState(false);\n  const api = chatAPI;\n\n  // 🔄 Sync prop -> state: when sidebar selects a different session, update local session and reset chat window\n  useEffect(() => {\n    if (externalSessionId && externalSessionId !== sessionId) {\n      setSessionId(externalSessionId);\n      // Fetch existing messages for the selected session\n      (async () => {\n        try {\n          const data = await api.getSession(externalSessionId);\n          // Convert DB messages to ChatMessage format expected by UI helper\n          const msgs: ChatMessage[] = data.messages.map((m: any) => api.convertDbMessage(m));\n          setMessages(msgs);\n        } catch (err) {\n          console.error('Failed to load messages for session', err);\n          setMessages([]);\n        }\n      })();\n    }\n    // eslint-disable-next-line react-hooks/exhaustive-deps\n  }, [externalSessionId]);\n\n  // Fetch available models\n  useEffect(()=>{\n    (async()=>{\n      try{\n        const resp = await api.getModels();\n        setGenerationModels(resp.generation_models||[]);\n        if(resp.generation_models && resp.generation_models.length>0){\n          const def = resp.generation_models.find((m:string)=>m==='qwen3:8b');\n          setSelectedModel(def || resp.generation_models[0]);\n        }\n      }catch(e){console.warn('Failed to load models',e);}\n    })();\n  },[api]);\n\n  const sendMessage = async (content: string, _files?: any) => {\n    if (!content.trim()) return;\n\n    const userMsg: ChatMessage = {\n      id: crypto.randomUUID(),\n      content,\n      sender: 'user',\n      timestamp: new Date().toISOString(),\n    };\n    setMessages((prev) => [...prev, userMsg]);\n\n    setIsLoading(true);\n\n    // Ensure we have a backend session to preserve history on the agent side\n    let activeSessionId = sessionId;\n    if (!activeSessionId) {\n      try {\n        const newSess = await api.createSession('Quick Chat');\n        activeSessionId = newSess.id;\n        setSessionId(activeSessionId);\n        if(onSessionChange){ \n          onSessionChange(newSess); \n        }\n      } catch (err) {\n        console.error('Failed to create quick-chat session', err);\n      }\n    }\n\n    try {\n      const history = api.messagesToHistory(messages);\n      const resp = await api.sendMessage({ message: content, conversation_history: history, model: selectedModel });\n\n    const assistantMsg: ChatMessage = {\n      id: crypto.randomUUID(),\n        content: resp.response,\n      sender: 'assistant',\n      timestamp: new Date().toISOString(),\n    };\n    setMessages((prev) => [...prev, assistantMsg]);\n    } catch (err) {\n      console.error('Quick chat failed', err);\n    } finally {\n      setIsLoading(false);\n    }\n\n    // if session existed externally and callback provided, still sync id\n    if(onSessionChange && activeSessionId && activeSessionId!==externalSessionId){\n      // no additional action; already sent on creation\n    }\n  };\n\n  const showEmptyState = messages.length === 0 && !isLoading\n\n  return (\n    <div className={`flex flex-col h-full ${className}`}>\n      {showEmptyState ? (\n        <div className=\"flex-1 flex flex-col items-center justify-center gap-6\">\n          <div className=\"text-center text-2xl font-semibold text-gray-300 select-none\">What can I help you find today?</div>\n          <div className=\"w-full max-w-2xl px-4\">\n            <ChatInput onSendMessage={sendMessage} disabled={isLoading} placeholder=\"Ask anything…\" onOpenSettings={()=>setShowSettings(true)} />\n          </div>\n        </div>\n      ) : (\n        <>\n          <ConversationPage messages={messages} isLoading={isLoading} className=\"flex-1 overflow-y-auto\" />\n          <div className=\"flex-shrink-0\">\n            <ChatInput onSendMessage={sendMessage} disabled={isLoading} placeholder=\"Ask anything…\" onOpenSettings={()=>setShowSettings(true)} />\n          </div>\n        </>\n      )}\n      {showSettings && (\n        <ChatSettingsModal\n          onClose={()=>setShowSettings(false)}\n          options={[\n            { type:'dropdown', label:'LLM model', value:selectedModel, setter:setSelectedModel, options:generationModels.map(m=>({value:m,label:m})) }\n          ]}\n        />\n      )}\n    </div>\n  );\n} "
  },
  {
    "path": "src/components/ui/scroll-area.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport * as ScrollAreaPrimitive from \"@radix-ui/react-scroll-area\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction ScrollArea({\n  className,\n  children,\n  ...props\n}: React.ComponentProps<typeof ScrollAreaPrimitive.Root>) {\n  return (\n    <ScrollAreaPrimitive.Root\n      data-slot=\"scroll-area\"\n      className={cn(\"relative h-full w-full\", className)}\n      {...props}\n    >\n      <ScrollAreaPrimitive.Viewport\n        data-slot=\"scroll-area-viewport\"\n        className=\"focus-visible:ring-ring/50 size-full rounded-[inherit] transition-[color,box-shadow] outline-none focus-visible:ring-[3px] focus-visible:outline-1\"\n      >\n        {children}\n      </ScrollAreaPrimitive.Viewport>\n      <ScrollBar />\n      <ScrollAreaPrimitive.Corner />\n    </ScrollAreaPrimitive.Root>\n  )\n}\n\nfunction ScrollBar({\n  className,\n  orientation = \"vertical\",\n  ...props\n}: React.ComponentProps<typeof ScrollAreaPrimitive.ScrollAreaScrollbar>) {\n  return (\n    <ScrollAreaPrimitive.ScrollAreaScrollbar\n      data-slot=\"scroll-area-scrollbar\"\n      orientation={orientation}\n      className={cn(\n        \"flex touch-none p-px transition-colors select-none\",\n        orientation === \"vertical\" &&\n          \"h-full w-2.5 border-l border-l-transparent\",\n        orientation === \"horizontal\" &&\n          \"h-2.5 flex-col border-t border-t-transparent\",\n        className\n      )}\n      {...props}\n    >\n      <ScrollAreaPrimitive.ScrollAreaThumb\n        data-slot=\"scroll-area-thumb\"\n        className=\"bg-border relative flex-1 rounded-full\"\n      />\n    </ScrollAreaPrimitive.ScrollAreaScrollbar>\n  )\n}\n\nexport { ScrollArea, ScrollBar }\n"
  },
  {
    "path": "src/components/ui/separator.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport * as SeparatorPrimitive from \"@radix-ui/react-separator\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Separator({\n  className,\n  orientation = \"horizontal\",\n  decorative = true,\n  ...props\n}: React.ComponentProps<typeof SeparatorPrimitive.Root>) {\n  return (\n    <SeparatorPrimitive.Root\n      data-slot=\"separator\"\n      decorative={decorative}\n      orientation={orientation}\n      className={cn(\n        \"bg-border shrink-0 data-[orientation=horizontal]:h-px data-[orientation=horizontal]:w-full data-[orientation=vertical]:h-full data-[orientation=vertical]:w-px\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nexport { Separator }\n"
  },
  {
    "path": "src/components/ui/session-chat.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport { ConversationPage } from \"./conversation-page\"\nimport { ChatInput } from \"./chat-input\"\nimport { EmptyChatState } from \"./empty-chat-state\"\nimport { ChatMessage, ChatSession, chatAPI, generateUUID } from \"@/lib/api\"\nimport { AttachedFile } from \"@/lib/types\"\nimport { useEffect, useState, forwardRef, useImperativeHandle, useCallback } from \"react\"\nimport { normalizeStreamingToken } from \"@/utils/textNormalization\"\nimport { Button } from \"./button\"\nimport type { Step } from '@/lib/api'\nimport { ChatSettingsModal } from '@/components/ui/chat-settings-modal'\nimport { IndexForm } from '@/components/IndexForm'\nimport SessionIndexInfo from '@/components/SessionIndexInfo'\nimport { Database } from 'lucide-react'\n\ninterface SessionChatProps {\n  sessionId?: string\n  onSessionChange?: (session: ChatSession) => void\n  onNewMessage?: (message: ChatMessage) => void\n  className?: string\n}\n\n// Export sendMessage function for parent components\nexport interface SessionChatRef {\n  sendMessage: (content: string, attachedFiles?: AttachedFile[]) => Promise<void>\n  currentSession: ChatSession | null\n}\n\n// Helper to shorten long titles\nconst truncate = (str: string, n: number = 18) => str.length > n ? str.slice(0, n) + '…' : str;\n\nexport const SessionChat = forwardRef<SessionChatRef, SessionChatProps>(({ \n  sessionId,\n  onSessionChange,\n  onNewMessage,\n  className = \"\"\n}, ref) => {\n  const [messages, setMessages] = useState<ChatMessage[]>([])\n  const [isLoading, setIsLoading] = useState(false)\n  const [currentSession, setCurrentSession] = useState<ChatSession | null>(null)\n  const [error, setError] = useState<string | null>(null)\n  const [uploadedFiles, setUploadedFiles] = useState<{filename: string, stored_path: string}[]>([])\n  const [isIndexed, setIsIndexed] = useState(false)\n  const [composeSubAnswers, setComposeSubAnswers] = useState<boolean>(true)\n  const [enableDecompose, setEnableDecompose] = useState<boolean>(true)\n  const [enableAiRerank, setEnableAiRerank] = useState<boolean>(true)\n  const [enableContextExpand, setEnableContextExpand] = useState<boolean>(true)\n  const [enableStream, setEnableStream] = useState<boolean>(true)\n  const [enableVerify, setEnableVerify] = useState<boolean>(true)\n  // Force RAG toggle\n  const [forceDocs, setForceDocs] = useState<boolean>(false)\n  // Provence pruning toggle\n  const [provencePrune, setProvencePrune] = useState<boolean>(false)\n  \n  // ✨ NEW RETRIEVAL PARAMETERS\n  const [retrievalK, setRetrievalK] = useState<number>(20)\n  const [contextWindowSize, setContextWindowSize] = useState<number>(1)\n  const [rerankerTopK, setRerankerTopK] = useState<number>(10)\n  const [searchType, setSearchType] = useState<string>('hybrid')\n  const [generationModels,setGenerationModels]=useState<string[]>([])\n  const [selectedModel,setSelectedModel]=useState<string>('qwen3:8b')\n  const [currentIndexId, setCurrentIndexId] = useState<string | null>(null)\n  const [currentIndexName, setCurrentIndexName] = useState<string | null>(null)\n  const [showSettings, setShowSettings] = useState(false)\n  const [showIndexForm, setShowIndexForm] = useState(false)\n  const [showIndexInfo, setShowIndexInfo] = useState(false)\n  \n  const apiService = chatAPI\n\n  // Define loadSession with useCallback before useEffect\n  const loadSession = useCallback(async (id: string) => {\n    try {\n      setError(null)\n      const { session, messages: sessionMessages } = await apiService.getSession(id)\n      \n      const convertedMessages = sessionMessages.map((msg: unknown) => apiService.convertDbMessage(msg as Record<string, unknown>))\n      setMessages(convertedMessages)\n      setCurrentSession(session)\n      \n      if (onSessionChange) {\n        onSessionChange(session)\n      }\n\n      // Fetch linked indexes to know table name for streaming\n      try {\n        const idxResp = await apiService.getSessionIndexes(id)\n        if (idxResp.indexes && idxResp.indexes.length > 0) {\n          const lastIdxObj = idxResp.indexes[idxResp.indexes.length - 1] as any\n          const idxId = (lastIdxObj.index_id ?? lastIdxObj.id) as string\n          setCurrentIndexId(idxId ?? null)\n          setCurrentIndexName(lastIdxObj.name ?? lastIdxObj.title ?? idxId.slice(0,8))\n        }\n      } catch {}\n    } catch (error) {\n      console.error('Failed to load session:', error)\n      setError('Failed to load session')\n    }\n  }, [apiService, onSessionChange])\n\n  // Load session when sessionId changes\n  useEffect(() => {\n    if (sessionId) {\n      // Only load session if we don't already have the current session\n      // This prevents overriding messages when a new session is created\n      if (!currentSession || currentSession.id !== sessionId) {\n        loadSession(sessionId)\n      }\n    } else {\n      // Clear messages if no session\n      setMessages([])\n      setCurrentSession(null)\n    }\n  }, [sessionId, currentSession, loadSession]) // Added missing dependencies\n\n  // Fetch available models on mount\n  useEffect(()=>{\n    (async()=>{\n      try{\n        const resp=await apiService.getModels();\n        setGenerationModels(resp.generation_models||[])\n        if(resp.generation_models&&resp.generation_models.length>0){\n          const def = resp.generation_models.find((m:string)=>m==='qwen3:8b');\n          setSelectedModel(def || resp.generation_models[0])\n        }\n      }catch(e){console.warn('Failed to load models',e)}\n    })()\n  },[apiService])\n\n  const sendMessage = async (content: string, attachedFiles?: AttachedFile[]) => {\n    // --- Guard Clauses ---\n    // If files are being indexed, do nothing.\n    if (uploadedFiles.length > 0 && !isIndexed) {\n      console.warn(\"sendMessage called while waiting for indexing. Action blocked.\");\n      return;\n    }\n    // If no content and no files, do nothing.\n    if (!content.trim() && (!attachedFiles || attachedFiles.length === 0)) return;\n\n    try {\n      setError(null)\n      \n      let activeSessionId = sessionId\n      if (!activeSessionId) {\n        try {\n          const newSession = await apiService.createSession()\n          activeSessionId = newSession.id\n          setCurrentSession(newSession)\n          if (onSessionChange) {\n            onSessionChange(newSession)\n          }\n        } catch (error) {\n          console.error('Failed to create session:', error)\n          setError('Failed to create session')\n          return\n        }\n      }\n\n      // --- Action Router: Decide if this is an upload or a chat message ---\n      \n      // A) UPLOAD ACTION: If files are attached, this action's priority is to upload. Ignore any text content.\n      if (attachedFiles && attachedFiles.length > 0) {\n        setIsLoading(true)\n        try {\n          const files = attachedFiles.map(af => af.file)\n          const uploadResult = await apiService.uploadFiles(activeSessionId, files)\n          console.log('✅ Files uploaded successfully:', uploadResult)\n          \n          setUploadedFiles(uploadResult.uploaded_files)\n          setIsIndexed(false)\n\n          const uploadMessage = apiService.createMessage(\n            `📎 Uploaded ${uploadResult.uploaded_files.length} file(s): ${uploadResult.uploaded_files.map(f => f.filename).join(', ')}. Please click 'Index Documents' to chat with them.`,\n            'assistant'\n          )\n          setMessages(prev => [...prev, uploadMessage])\n        } catch (error) {\n          console.error('❌ Failed to upload files:', error)\n          const errorMessage = apiService.createMessage('❌ Failed to upload files. Please try again.', 'assistant')\n          setMessages(prev => [...prev, errorMessage])\n        } finally {\n          setIsLoading(false)\n        }\n        return; // End the function here.\n      }\n\n      // B) CHAT ACTION: If no files, it's a standard chat message.\n      if (!content.trim()) return;\n\n      const userMessage = apiService.createMessage(content, 'user')\n      setMessages(prev => [...prev, userMessage])\n      if (onNewMessage) onNewMessage(userMessage)\n\n      setIsLoading(true)\n\n      // Ensure we know the index id for table_name; fetch if missing\n      let idxId = currentIndexId;\n      if (!idxId) {\n        try {\n          const idxResp = await apiService.getSessionIndexes(activeSessionId as string);\n          if (idxResp.indexes && idxResp.indexes.length > 0) {\n            const lastIdxObj = idxResp.indexes[idxResp.indexes.length - 1] as any;\n            idxId = (lastIdxObj.index_id ?? lastIdxObj.id) as string;\n            setCurrentIndexId(idxId ?? null);\n            setCurrentIndexName(lastIdxObj.name ?? lastIdxObj.title ?? idxId.slice(0,8));\n          }\n        } catch {}\n      }\n\n      if (enableStream) {\n        // Stepwise progress structure\n        const steps: Step[] = [\n          { key: 'analyze', label: 'Analyzing user question', status: 'pending' as const, details: '' },\n          { key: 'decompose', label: 'Generating sub-queries', status: 'pending' as const, details: '' },\n          { key: 'retrieval', label: 'Retrieving context', status: 'pending' as const, details: '' },\n          { key: 'rerank', label: 'Reranking results', status: 'pending' as const, details: '' },\n          { key: 'expand', label: 'Expanding context window', status: 'pending' as const, details: '' },\n          { key: 'answer', label: 'Answering sub-queries', status: 'pending' as const, details: [] },\n          { key: 'synthesize', label: 'Putting everything together', status: 'pending' as const, details: '' },\n          { key: 'final', label: 'Final answer', status: 'pending' as const, details: '' },\n        ];\n        const placeholder: ChatMessage = {\n          id: generateUUID(),\n          content: { steps },\n          sender: 'assistant',\n          timestamp: new Date().toISOString(),\n          isLoading: false,\n          metadata: { message_type: 'in_progress' }\n        }\n        setMessages(prev => {\n          const withoutLoaders = prev.filter(m => m.metadata?.message_type !== 'in_progress' && !m.isLoading)\n          return [...withoutLoaders, placeholder]\n        })\n        // keep global isLoading true so input disabled until completion\n\n        await apiService.streamSessionMessage(\n          {\n            query: content,\n            session_id: activeSessionId,\n            table_name: idxId ? `text_pages_${idxId}` : undefined,\n            composeSubAnswers,\n            decompose: enableDecompose,\n            aiRerank: enableAiRerank,\n            contextExpand: enableContextExpand,\n            verify: enableVerify,\n            model: selectedModel,\n            // ✨ NEW RETRIEVAL PARAMETERS\n            retrievalK,\n            contextWindowSize,\n            rerankerTopK,\n            searchType,\n            forceRag: forceDocs,\n            provencePrune,\n          },\n          (evt) => {\n            console.log('STREAM EVENT:', evt.type, evt.data); // Debug log for SSE events\n            setMessages(prev => prev.map(m => {\n              if (m.id !== placeholder.id) return m;\n              const steps = [...(m.content as any).steps];\n              if (evt.type === 'analyze') {\n                steps[0].status = 'active';\n                steps[0].details = 'Analyzing your question...';\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'decomposition') {\n                steps[0].status = 'done';\n                steps[1].status = 'active';\n                steps[1].details = (evt.data.sub_queries || []);\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'retrieval_started') {\n                steps[1].status = 'done';\n                steps[2].status = 'active';\n                steps[2].details = 'Retrieving relevant documents...';\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'retrieval_done') {\n                const ridx = steps.findIndex(s => s.key === 'retrieval');\n                if (ridx !== -1) {\n                  steps[ridx].status = 'done';\n                  steps[ridx].details = 'Retrieval complete.';\n                }\n                const rrxIdx = steps.findIndex(s => s.key === 'rerank');\n                if (rrxIdx !== -1) {\n                  steps[rrxIdx].status = 'active';\n                  steps[rrxIdx].details = 'Reranking results...';\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'rerank_started') {\n                const rrxIdx = steps.findIndex(s => s.key === 'rerank');\n                if (rrxIdx !== -1) {\n                  steps[rrxIdx].status = 'active';\n                  steps[rrxIdx].details = 'Reranking results...';\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'rerank_done') {\n                const rrxIdx = steps.findIndex(s => s.key === 'rerank');\n                if (rrxIdx !== -1) {\n                  steps[rrxIdx].status = 'done';\n                  steps[rrxIdx].details = 'Reranking complete.';\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'context_expand_started') {\n                const eidx = steps.findIndex(s => s.key === 'expand');\n                if (eidx !== -1) {\n                  steps[eidx].status = 'active';\n                  steps[eidx].details = 'Expanding context window...';\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'context_expand_done') {\n                const eidx = steps.findIndex(s => s.key === 'expand');\n                if (eidx !== -1) {\n                  steps[eidx].status = 'done';\n                  steps[eidx].details = 'Context expansion complete.';\n                }\n                // Activate answering sub-queries stage to show spinner while we wait\n                const ansIdx = steps.findIndex(s => s.key === 'answer');\n                if (ansIdx !== -1 && steps[ansIdx].status === 'pending') {\n                  steps[ansIdx].status = 'active';\n                  steps[ansIdx].details = 'Answering sub-queries...';\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'sub_query_result') {\n                steps[5].status = 'active';\n                const existing = Array.isArray(steps[5].details) ? steps[5].details : [];\n                if (!existing.some((d: any) => d.question === evt.data.query)) {\n                  steps[5].details = [...existing, {\n                    question: evt.data.query,\n                    answer: evt.data.answer,\n                    source_documents: evt.data.source_documents || []\n                  }];\n                } else {\n                  steps[5].details = existing; // no change if duplicate\n                }\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'final_answer' || evt.type === 'single_query_result') {\n                steps[5].status = 'done';\n                steps[6].status = 'active';\n                steps[6].details = 'Synthesizing final answer...';\n                if (isLoading) setIsLoading(false);\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'token') {\n                // Determine final step index dynamically (7 for RAG, 0 for direct)\n                const finalIdx = steps.findIndex(s => s.key === 'final' || s.key === 'direct');\n                if (finalIdx === -1) return m;\n                if (steps[finalIdx].key !== 'direct') {\n                  steps[6].status = 'done';\n                  steps[7].status = 'active';\n                } else {\n                  steps[0].status = 'active';\n                }\n                let current = '' as string;\n                const detHolder = steps[finalIdx].details;\n                if (detHolder && typeof detHolder === 'object' && !Array.isArray(detHolder)) {\n                  current = (detHolder as any).answer || '';\n                } else if (typeof detHolder === 'string') {\n                  current = detHolder;\n                }\n                const tok: string = (evt.data.text || '') as string;\n                if (!tok.trim()) {\n                  return m; // skip empty/whitespace-only chunks\n                }\n                let updated = current.endsWith(tok) ? current : current + tok;\n                updated = normalizeStreamingToken('', updated);\n                if (steps[finalIdx].key === 'direct') {\n                  steps[0].details = updated;\n                } else {\n                  steps[7].details = { answer: updated, source_documents: [] };\n                }\n                steps[finalIdx].details = updated;\n                // Mark \"Putting everything together\" step as done once tokens start\n                const synthIdx = steps.findIndex(s => s.key === 'synthesize');\n                if (synthIdx !== -1 && steps[synthIdx].status !== 'done') {\n                  steps[synthIdx].status = 'done';\n                }\n                if (isLoading) setIsLoading(false);\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'sub_query_token') {\n                const idx = evt.data.index as number;\n                const tok: string = evt.data.text || '';\n                if (!tok.trim()) return m;\n                steps[5].status = 'active';\n                let detailsArr: any[] = Array.isArray(steps[5].details) ? steps[5].details as any[] : [];\n                while (detailsArr.length <= idx) {\n                  detailsArr.push({ question: evt.data.question || `Sub-query ${idx+1}`, answer: '' });\n                }\n                const curAns: string = detailsArr[idx].answer || '';\n                if (!curAns.endsWith(tok)) {\n                  let updatedAnswer = curAns + tok;\n                  updatedAnswer = normalizeStreamingToken('', updatedAnswer);\n                  detailsArr[idx].answer = updatedAnswer;\n                }\n                steps[5].details = detailsArr;\n                if (isLoading) setIsLoading(false);\n                return { ...m, content: { steps } };\n              }\n              if (evt.type === 'complete') {\n                const finalIdx = steps.findIndex(s => s.key === 'final' || s.key === 'direct');\n                if (finalIdx === -1) return m;\n                steps[finalIdx].status = 'done';\n\n                if (steps[finalIdx].key === 'direct') {\n                  // Direct answer: details is plain string\n                  steps[finalIdx].details = evt.data.answer;\n                } else {\n                  steps[finalIdx].details = {\n                    answer: evt.data.answer,\n                    source_documents: evt.data.source_documents || []\n                  };\n                }\n\n                setIsLoading(false);\n                // Make sure any lingering steps are marked done\n                steps.forEach(s => {\n                  if (s.status !== 'done') s.status = 'done';\n                });\n                \n                // 🔄 REFRESH SESSION: After completion, refresh session data to get updated title\n                if (activeSessionId) {\n                  // Always refresh session data so updated title & message count are reflected in the UI\n                  setTimeout(async () => {\n                    try {\n                      const { session } = await apiService.getSession(activeSessionId as string);\n                      setCurrentSession(session);\n                      if (onSessionChange) {\n                        onSessionChange(session);\n                      }\n                    } catch (error) {\n                      console.error('Failed to refresh session after completion:', error);\n                    }\n                  }, 100); // Small delay to ensure backend has processed the title update\n                }\n                \n                return { ...m, content: { steps }, metadata: { message_type: 'complete' } };\n              }\n              if (evt.type === 'direct_answer') {\n                const stepsDir: Step[] = [\n                  { key: 'direct', label: 'Answering directly', status: 'active' as const, details: '' }\n                ];\n                return { ...m, content: { steps: stepsDir } };\n              }\n              return m;\n            }));\n          }\n        )\n      } else {\n        const response = await apiService.sendSessionMessage(activeSessionId, content, { \n          composeSubAnswers, \n          decompose: enableDecompose, \n          aiRerank: enableAiRerank, \n          contextExpand: enableContextExpand, \n          verify: enableVerify,\n          model: selectedModel,\n          // ✨ NEW RETRIEVAL PARAMETERS\n          retrievalK,\n          contextWindowSize,\n          rerankerTopK,\n          searchType,\n          forceRag: forceDocs,\n          provencePrune,\n        })\n      \n      const aiMessage: ChatMessage = {\n        id: response.ai_message_id || generateUUID(),\n        content: response.response,\n        sender: 'assistant',\n        timestamp: new Date().toISOString(),\n          metadata: { \n            message_type: 'sub_answer',\n            source_documents: (response as any).source_documents || [] \n          }\n      }\n      setMessages(prev => [...prev, aiMessage])\n      \n        if ((response as any).session) {\n          const sess = (response as any).session as ChatSession\n          setCurrentSession(sess)\n          if (onSessionChange) onSessionChange(sess)\n        }\n        if (onNewMessage) onNewMessage(aiMessage)\n      }\n\n    } catch (error) {\n      console.error('Failed to send message:', error)\n      setError('Failed to send message')\n    } finally {\n      setIsLoading(false)\n    }\n  }\n\n  const handleIndexDocuments = async () => {\n    if (!currentSession) return;\n\n    setIsLoading(true);\n    setError(null);\n    try {\n      const result = await apiService.indexDocuments(currentSession.id);\n      console.log('✅ Indexing complete:', result);\n\n      const indexMessage = apiService.createMessage(\n        `✅ ${result.message}`,\n        'assistant'\n      );\n      setMessages(prev => [...prev, indexMessage]);\n      setIsIndexed(true);\n      setUploadedFiles([]); // Clear uploaded files after indexing\n\n    } catch (error) {\n      console.error('❌ Failed to index documents:', error);\n      const errorMessage = apiService.createMessage(\n        '❌ Failed to index documents. Please try again.',\n        'assistant'\n      );\n      setMessages(prev => [...prev, errorMessage]);\n    } finally {\n      setIsLoading(false);\n    }\n  }\n\n  // Expose functions to parent component\n  useImperativeHandle(ref, () => ({\n    sendMessage,\n    currentSession\n  }))\n\n  const handleAction = async (action: string, messageId: string, messageContent: string | Record<string, any>[] | { steps: Step[] }) => {\n    console.log(`Action ${action} on message ${messageId}`)\n    \n    switch (action) {\n      case 'copy':\n        await navigator.clipboard.writeText(typeof messageContent === 'string' ? messageContent : JSON.stringify(messageContent, null, 2))\n        break\n      case 'regenerate':\n        // Find the user message before this AI message and resend it\n        const messageIndex = messages.findIndex(m => m.id === messageId)\n        if (messageIndex > 0 && messages[messageIndex].sender === 'assistant') {\n          const userMessage = messages[messageIndex - 1]\n          if (userMessage.sender === 'user') {\n            // Remove the AI message and resend the user message\n            setMessages(prev => prev.filter(m => m.id !== messageId))\n            await sendMessage(userMessage.content as string)\n          }\n        }\n        break\n      default:\n        // Handle other actions\n        break\n    }\n  }\n\n  const showEmptyState = (!sessionId || messages.length === 0) && !isLoading\n\n  return (\n    <div className={`flex flex-col h-full ${className}`}>\n      {error && (\n        <div className=\"bg-red-900 text-red-200 px-4 py-2 text-sm flex-shrink-0\">\n          {error}\n        </div>\n      )}\n      \n      {showEmptyState ? (\n        <div className=\"flex-1 flex flex-col items-center justify-center gap-6 min-h-0\">\n          <div className=\"text-center text-2xl font-semibold text-gray-300 select-none\">What can I help you find today?</div>\n          <div className=\"w-full max-w-2xl px-4\">\n            <ChatInput\n              onSendMessage={sendMessage}\n              disabled={isLoading}\n              placeholder=\"Ask anything\"\n              onOpenSettings={()=>setShowSettings(true)}\n              onAddIndex={()=>setShowIndexForm(true)}\n              leftExtras={currentIndexId && currentIndexName ? (\n                <button\n                  type=\"button\"\n                  onClick={()=>setShowIndexInfo(true)}\n                  title=\"View index info\"\n                  className=\"flex items-center gap-1 p-2 text-gray-400 hover:text-white hover:bg-gray-800 rounded-full transition-colors\"\n                >\n                  <Database className=\"w-5 h-5\" />\n                  <span className=\"text-xs hidden sm:inline\">{truncate(currentIndexName,12)}</span>\n                </button>\n              ) : undefined}\n            />\n          </div>\n        </div>\n      ) : (\n        <>\n          <ConversationPage \n            messages={messages}\n            isLoading={isLoading}\n            onAction={handleAction}\n            className=\"flex-1 overflow-y-auto\"\n          />\n\n          {/* Bottom input when chat active */}\n          <div className=\"flex-shrink-0\">\n            {uploadedFiles.length > 0 && !isIndexed && (\n              <div className=\"p-2 text-center bg-yellow-100 dark:bg-yellow-900 border-t border-b border-gray-200 dark:border-gray-700\">\n                <Button onClick={handleIndexDocuments} disabled={isLoading}>\n                  {isLoading ? 'Indexing...' : 'Index Documents to Enable Chat'}\n                </Button>\n              </div>\n            )}\n            <ChatInput\n              onSendMessage={sendMessage}\n              disabled={isLoading || (uploadedFiles.length > 0 && !isIndexed)}\n              placeholder=\"Message localGPT...\"\n              onOpenSettings={()=>setShowSettings(true)}\n              onAddIndex={()=>setShowIndexForm(true)}\n              leftExtras={currentIndexId && currentIndexName ? (\n                <button\n                  type=\"button\"\n                  onClick={()=>setShowIndexInfo(true)}\n                  title=\"View index info\"\n                  className=\"flex items-center gap-1 p-2 text-gray-400 hover:text-white hover:bg-gray-800 rounded-full transition-colors\"\n                >\n                  <Database className=\"w-5 h-5\" />\n                  <span className=\"text-xs hidden sm:inline\">{truncate(currentIndexName,12)}</span>\n                </button>\n              ) : undefined}\n            />\n          </div>\n        </>\n      )}\n\n      {showSettings && (\n        <ChatSettingsModal\n          onClose={()=>setShowSettings(false)}\n          options={[\n            // General Settings\n            {type: 'toggle', label:'Query decomposition', checked: enableDecompose, setter: setEnableDecompose},\n            {type: 'toggle', label:'Compose sub-answers', checked: composeSubAnswers, setter: setComposeSubAnswers},\n            {type: 'toggle', label:'Verify answer', checked: enableVerify, setter: setEnableVerify},\n            {type: 'toggle', label:'Stream phases', checked: enableStream, setter: setEnableStream},\n            \n            // Retrieval Settings\n            {type: 'dropdown', label:'LLM model', value: selectedModel, setter: setSelectedModel, options: generationModels.map(m=>({value:m,label:m}))},\n            {type: 'dropdown', label:'Search type', value: searchType, setter: setSearchType, options: [\n              {value: 'hybrid', label: 'Hybrid (Vector + FTS)'},\n              {value: 'vector_only', label: 'Vector Only'},\n              {value: 'bm25_only', label: 'FTS Only'}\n            ]},\n            {type: 'slider', label:'Retrieval chunks', value: retrievalK, setter: setRetrievalK, min: 5, max: 50, unit: ' chunks'},\n            \n            // Reranking & Context\n            {type: 'toggle', label:'AI reranker', checked: enableAiRerank, setter: setEnableAiRerank},\n            {type: 'slider', label:'Reranker top chunks', value: rerankerTopK, setter: setRerankerTopK, min: 3, max: 20, unit: ' chunks'},\n            {type: 'toggle', label:'Expand context window', checked: enableContextExpand, setter: setEnableContextExpand},\n            {type: 'slider', label:'Context window size', value: contextWindowSize, setter: setContextWindowSize, min: 0, max: 5, unit: ' chunks'},\n            {type: 'toggle', label:'Prune irrelevant sentences', checked: provencePrune, setter: setProvencePrune},\n            {type: 'toggle', label:'Always search documents', checked: forceDocs, setter: setForceDocs},\n          ]}\n        />\n      )}\n\n      {showIndexForm && (\n        <IndexForm\n          onClose={()=>setShowIndexForm(false)}\n          onIndexed={(s)=>{\n            setShowIndexForm(false);\n            setCurrentSession(s);\n            if(onSessionChange) onSessionChange(s);\n          }}\n        />\n      )}\n\n      {/* Index info modal */}\n      {showIndexInfo && currentSession && (\n        <SessionIndexInfo sessionId={currentSession.id} onClose={()=>setShowIndexInfo(false)} />\n      )}\n    </div>\n  )\n})\n\nSessionChat.displayName = \"SessionChat\"  "
  },
  {
    "path": "src/components/ui/session-sidebar.tsx",
    "content": "\"use client\"\n\nimport * as React from \"react\"\nimport { useState, useEffect } from \"react\"\nimport { Plus, MessageSquare, MoreVertical } from \"lucide-react\"\nimport { Button } from \"@/components/ui/button\"\nimport { ScrollArea } from \"@/components/ui/scroll-area\"\nimport { ChatSession, chatAPI } from \"@/lib/api\"\n\ninterface SessionSidebarRef {\n  refreshSessions: () => Promise<void>\n}\n\ninterface SessionSidebarProps {\n  currentSessionId?: string\n  onSessionSelect: (sessionId: string) => void\n  onNewSession: () => void\n  onSessionDelete?: (sessionId: string) => void\n  onSessionCreated?: (ref: SessionSidebarRef) => void\n  className?: string\n}\n\nexport function SessionSidebar({\n  currentSessionId,\n  onSessionSelect,\n  onNewSession,\n  onSessionDelete,\n  onSessionCreated,\n  className = \"\"\n}: SessionSidebarProps) {\n  const [sessions, setSessions] = useState<ChatSession[]>([])\n  const [isLoading, setIsLoading] = useState(true)\n  const [error, setError] = useState<string | null>(null)\n  const [menuOpenId, setMenuOpenId] = useState<string | null>(null)\n\n  // Load sessions on mount\n  useEffect(() => {\n    loadSessions()\n  }, [])\n\n  const loadSessions = React.useCallback(async () => {\n    try {\n      setError(null)\n      const response = await chatAPI.getSessions()\n      setSessions(response.sessions)\n    } catch (error) {\n      console.error('Failed to load sessions:', error)\n      setError('Failed to load sessions')\n    } finally {\n      setIsLoading(false)\n    }\n  }, [])\n\n  const handleNewSession = () => {\n    // Don't create session immediately - just trigger empty state\n    onNewSession()\n  }\n\n  // Refresh sessions when a new session is created\n  const refreshSessions = React.useCallback(async () => {\n    await loadSessions()\n  }, [loadSessions])\n\n  // Expose refresh function to parent\n  React.useEffect(() => {\n    if (onSessionCreated) {\n      onSessionCreated({ refreshSessions })\n    }\n  }, [onSessionCreated, refreshSessions])\n\n  const handleDeleteSession = async (sessionId: string, event: React.MouseEvent) => {\n    event.stopPropagation() // Prevent session selection when clicking delete\n    \n    if (!confirm('Are you sure you want to delete this conversation? This action cannot be undone.')) {\n      return\n    }\n\n    try {\n      await chatAPI.deleteSession(sessionId)\n      setSessions(prev => prev.filter(s => s.id !== sessionId))\n      \n      // If the deleted session was currently selected, notify parent\n      if (currentSessionId === sessionId && onSessionDelete) {\n        onSessionDelete(sessionId)\n      }\n    } catch (error) {\n      console.error('Failed to delete session:', error)\n      setError('Failed to delete session')\n    }\n  }\n\n  const handleRenameSession = async (sessionId: string, event: React.MouseEvent) => {\n    event.stopPropagation();\n    const current = sessions.find(s => s.id === sessionId);\n    const newTitle = prompt('Enter new title', current?.title || '');\n    if (!newTitle || newTitle.trim() === '' || newTitle === current?.title) {\n      return;\n    }\n    try {\n      const result = await chatAPI.renameSession(sessionId, newTitle.trim());\n      // Update local state with new session data\n      setSessions(prev => prev.map(s => s.id === sessionId ? result.session : s));\n      // If this is the currently open session, notify parent to refresh\n      if (currentSessionId === sessionId && onSessionSelect) {\n        onSessionSelect(sessionId);\n      }\n      setMenuOpenId(null);\n    } catch (error) {\n      console.error('Failed to rename session:', error);\n      setError('Failed to rename session');\n    }\n  }\n\n  const formatDate = (dateString: string) => {\n    const date = new Date(dateString)\n    const now = new Date()\n    const diffInHours = (now.getTime() - date.getTime()) / (1000 * 60 * 60)\n    \n    if (diffInHours < 24) {\n      return date.toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })\n    } else if (diffInHours < 24 * 7) {\n      return date.toLocaleDateString([], { weekday: 'short' })\n    } else {\n      return date.toLocaleDateString([], { month: 'short', day: 'numeric' })\n    }\n  }\n\n  const truncateTitle = (title: string, maxLength: number = 25) => {\n    return title.length > maxLength ? title.substring(0, maxLength) + '...' : title\n  }\n\n  return (\n    <div className={`w-64 h-full min-h-0 bg-black border-r border-gray-800 flex flex-col ${className}`}>\n      {/* Header */}\n      <div className=\"p-4 border-b border-gray-800\">\n        <div className=\"flex items-center justify-between mb-3\">\n          <h2 className=\"text-lg font-semibold text-white\">Chats</h2>\n          <Button\n            onClick={handleNewSession}\n            size=\"sm\"\n            className=\"h-8 w-8 p-0 bg-gray-700 hover:bg-gray-600 text-white\"\n            title=\"New Chat\"\n          >\n            <Plus className=\"h-4 w-4\" />\n          </Button>\n        </div>\n      </div>\n\n      {/* Sessions List */}\n      <ScrollArea className=\"flex-1 min-h-0 overflow-y-auto\">\n        <div className=\"p-2\">\n          {error && (\n            <div className=\"mb-4 p-3 bg-red-900 text-red-200 text-sm rounded-lg\">\n              {error}\n              <Button\n                onClick={loadSessions}\n                size=\"sm\"\n                className=\"ml-2 h-6 px-2 text-xs bg-red-800 hover:bg-red-700\"\n              >\n                Retry\n              </Button>\n            </div>\n          )}\n\n          {isLoading ? (\n            <div className=\"space-y-2\">\n              {[...Array(5)].map((_, i) => (\n                <div key={i} className=\"h-12 bg-gray-900 rounded-lg animate-pulse\" />\n              ))}\n            </div>\n          ) : sessions.length === 0 ? (\n            <div className=\"text-center py-8 text-gray-400\">\n              <MessageSquare className=\"w-8 h-8 mx-auto mb-2 opacity-50\" />\n              <p className=\"text-sm\">No conversations yet</p>\n              <p className=\"text-xs mt-1\">Start a new chat to begin</p>\n            </div>\n          ) : (\n            <div className=\"space-y-px\">\n              {sessions.map((session) => (\n                <div\n                  key={session.id}\n                  className={`relative group pl-1 rounded transition-colors ${\n                    currentSessionId === session.id\n                      ? 'bg-gray-700/60 text-white border-l-2 border-white'\n                      : 'hover:bg-gray-800 text-gray-300'\n                  }`}\n                >\n                  <button\n                    onClick={() => onSessionSelect(session.id)}\n                    className=\"w-full pl-3 pr-8 py-2 text-left text-sm\"\n                  >\n                    <p className=\"truncate\">\n                      {truncateTitle(session.title)}\n                    </p>\n                  </button>\n                  \n                  {/* Overflow menu */}\n                  <div className=\"absolute right-2 top-2 index-row-menu\">\n                    <button onClick={(e)=>{e.stopPropagation(); setMenuOpenId(menuOpenId===session.id?null:session.id);}} className=\"p-1 text-gray-400 hover:text-white opacity-0 group-hover:opacity-100 transition\">\n                      <MoreVertical className=\"w-4 h-4\" />\n                    </button>\n                    {menuOpenId===session.id && (\n                      <div className=\"absolute right-0 top-full mt-1 bg-black/90 backdrop-blur border border-white/10 rounded shadow-lg py-1 w-32 text-sm z-50\">\n                        <button onClick={(e)=>{e.stopPropagation(); onSessionSelect(session.id); setMenuOpenId(null);}} className=\"block w-full text-left px-4 py-2 hover:bg-white/10\">Open</button>\n                        <button onClick={(e)=>handleRenameSession(session.id,e)} className=\"block w-full text-left px-4 py-2 hover:bg-white/10\">Rename</button>\n                        <button onClick={(e)=>handleDeleteSession(session.id,e)} className=\"block w-full text-left px-4 py-2 hover:bg-white/10 text-red-400 hover:text-red-500\">Delete</button>\n                      </div>\n                    )}\n                  </div>\n                </div>\n              ))}\n            </div>\n          )}\n        </div>\n      </ScrollArea>\n\n      {/* Footer with stats */}\n      {sessions.length > 0 && (\n        <div className=\"p-4 border-t border-gray-800 text-xs text-gray-400 bg-black\">\n          <div className=\"flex justify-between\">\n            <span>{sessions.length} conversations</span>\n            <span>\n              {sessions.reduce((sum, s) => sum + s.message_count, 0)} messages\n            </span>\n          </div>\n        </div>\n      )}\n    </div>\n  )\n} "
  },
  {
    "path": "src/components/ui/sidebar.tsx",
    "content": "\"use client\";\n\nimport { cn } from \"@/lib/utils\";\nimport { ScrollArea } from \"@/components/ui/scroll-area\";\nimport { motion } from \"framer-motion\";\nimport {\n  ChevronsUpDown,\n  LogOut,\n  MessagesSquare,\n  Plus,\n  Settings,\n  UserCircle,\n} from \"lucide-react\";\nimport { Avatar, AvatarFallback } from \"@/components/ui/avatar\"\nimport { useState } from \"react\";\nimport { Button } from \"@/components/ui/button\";\nimport {\n  DropdownMenu,\n  DropdownMenuContent,\n  DropdownMenuItem,\n  DropdownMenuSeparator,\n  DropdownMenuTrigger,\n} from \"@/components/ui/dropdown-menu\";\nimport { Separator } from \"@/components/ui/separator\";\n\nconst sidebarVariants = {\n  open: {\n    width: \"15rem\",\n  },\n  closed: {\n    width: \"3.05rem\",\n  },\n};\n\nconst contentVariants = {\n  open: { display: \"block\", opacity: 1 },\n  closed: { display: \"block\", opacity: 1 },\n};\n\nconst variants = {\n  open: {\n    x: 0,\n    opacity: 1,\n    transition: {\n      x: { stiffness: 1000, velocity: -100 },\n    },\n  },\n  closed: {\n    x: -20,\n    opacity: 0,\n    transition: {\n      x: { stiffness: 100 },\n    },\n  },\n};\n\nconst transitionProps = {\n  type: \"tween\",\n  ease: \"easeOut\",\n  duration: 0.2,\n  staggerChildren: 0.1,\n};\n\nconst staggerVariants = {\n  open: {\n    transition: { staggerChildren: 0.03, delayChildren: 0.02 },\n  },\n};\n\n// Mock chat sessions data\nconst chatSessions = [\n  { id: 1, title: \"React Component Help\", lastMessage: \"How to create a sidebar?\", timestamp: \"2 min ago\", isActive: true },\n  { id: 2, title: \"TypeScript Questions\", lastMessage: \"Interface vs Type\", timestamp: \"1 hour ago\", isActive: false },\n  { id: 3, title: \"Next.js Setup\", lastMessage: \"Setting up shadcn/ui\", timestamp: \"3 hours ago\", isActive: false },\n  { id: 4, title: \"Tailwind CSS\", lastMessage: \"Dark mode implementation\", timestamp: \"1 day ago\", isActive: false },\n  { id: 5, title: \"Database Design\", lastMessage: \"Schema optimization\", timestamp: \"2 days ago\", isActive: false },\n];\n\nexport function SessionNavBar() {\n  const [isCollapsed, setIsCollapsed] = useState(true);\n  \n  return (\n    <motion.div\n      className={cn(\n        \"sidebar fixed left-0 z-40 h-full shrink-0 border-r border-neutral-800\",\n      )}\n      initial={isCollapsed ? \"closed\" : \"open\"}\n      animate={isCollapsed ? \"closed\" : \"open\"}\n      variants={sidebarVariants}\n      transition={transitionProps}\n      onMouseEnter={() => setIsCollapsed(false)}\n      onMouseLeave={() => setIsCollapsed(true)}\n    >\n      <motion.div\n        className={`relative z-40 flex text-muted-foreground h-full shrink-0 flex-col bg-black transition-all`}\n        variants={contentVariants}\n      >\n        <motion.ul variants={staggerVariants} className=\"flex h-full flex-col\">\n          <div className=\"flex grow flex-col items-center\">\n            {/* Header */}\n            <div className=\"flex h-[54px] w-full shrink-0 border-b border-neutral-800 p-2\">\n              <div className=\"mt-[1.5px] flex w-full\">\n                <DropdownMenu modal={false}>\n                  <DropdownMenuTrigger className=\"w-full\" asChild>\n                    <Button\n                      variant=\"ghost\"\n                      size=\"sm\"\n                      className=\"flex w-fit items-center gap-2 px-2 text-white hover:bg-neutral-800\" \n                    >\n                      <Avatar className='rounded size-4'>\n                        <AvatarFallback className=\"bg-blue-600 text-white\">L</AvatarFallback>\n                      </Avatar>\n                      <motion.li\n                        variants={variants}\n                        className=\"flex w-fit items-center gap-2\"\n                      >\n                        {!isCollapsed && (\n                          <>\n                            <p className=\"text-sm font-medium text-white\">\n                              localGPT\n                            </p>\n                            <ChevronsUpDown className=\"h-4 w-4 text-neutral-400\" />\n                          </>\n                        )}\n                      </motion.li>\n                    </Button>\n                  </DropdownMenuTrigger>\n                  <DropdownMenuContent align=\"start\" className=\"bg-neutral-900 border-neutral-800\">\n                    <DropdownMenuItem className=\"flex items-center gap-2 text-white hover:bg-neutral-800\">\n                      <Settings className=\"h-4 w-4\" /> Preferences\n                    </DropdownMenuItem>\n                    <DropdownMenuItem className=\"flex items-center gap-2 text-white hover:bg-neutral-800\">\n                      <Plus className=\"h-4 w-4\" /> New Chat\n                    </DropdownMenuItem>\n                  </DropdownMenuContent>\n                </DropdownMenu>\n              </div>\n            </div>\n\n            {/* Chat Sessions */}\n            <div className=\"flex h-full w-full flex-col\">\n              <div className=\"flex grow flex-col gap-4\">\n                <ScrollArea className=\"h-16 grow p-2\">\n                  <div className={cn(\"flex w-full flex-col gap-1\")}>\n                    {/* New Chat Button */}\n                    <Button\n                      variant=\"ghost\"\n                      className=\"flex h-8 w-full flex-row items-center justify-start rounded-md px-2 py-1.5 text-white hover:bg-neutral-800 mb-2\"\n                    >\n                      <Plus className=\"h-4 w-4\" />\n                      <motion.span variants={variants} className=\"ml-2\">\n                        {!isCollapsed && (\n                          <p className=\"text-sm font-medium\">New Chat</p>\n                        )}\n                      </motion.span>\n                    </Button>\n                    \n                    <Separator className=\"w-full bg-neutral-800\" />\n                    \n                    {/* Chat Sessions List */}\n                    {chatSessions.map((session) => (\n                      <div\n                        key={session.id}\n                        className={cn(\n                          \"flex h-auto w-full flex-col rounded-md px-2 py-2 transition hover:bg-neutral-800 cursor-pointer\",\n                          session.isActive && \"bg-neutral-800\"\n                        )}\n                      >\n                        <div className=\"flex items-center gap-2\">\n                          <MessagesSquare className=\"h-4 w-4 text-neutral-400 shrink-0\" />\n                          <motion.div variants={variants} className=\"flex-1 min-w-0\">\n                            {!isCollapsed && (\n                              <div className=\"flex flex-col gap-1\">\n                                <p className=\"text-sm font-medium text-white truncate\">\n                                  {session.title}\n                                </p>\n                                <p className=\"text-xs text-neutral-400 truncate\">\n                                  {session.lastMessage}\n                                </p>\n                                <p className=\"text-xs text-neutral-500\">\n                                  {session.timestamp}\n                                </p>\n                              </div>\n                            )}\n                          </motion.div>\n                        </div>\n                      </div>\n                    ))}\n                  </div>\n                </ScrollArea>\n              </div>\n              \n              {/* Footer */}\n              <div className=\"flex flex-col p-2 border-t border-neutral-800\">\n                <Button\n                  variant=\"ghost\"\n                  className=\"mt-auto flex h-8 w-full flex-row items-center rounded-md px-2 py-1.5 text-white hover:bg-neutral-800\"\n                >\n                  <Settings className=\"h-4 w-4 shrink-0\" />\n                  <motion.span variants={variants}>\n                    {!isCollapsed && (\n                      <p className=\"ml-2 text-sm font-medium\">Settings</p>\n                    )}\n                  </motion.span>\n                </Button>\n                \n                <DropdownMenu modal={false}>\n                  <DropdownMenuTrigger className=\"w-full\">\n                    <div className=\"flex h-8 w-full flex-row items-center gap-2 rounded-md px-2 py-1.5 transition hover:bg-neutral-800\">\n                      <Avatar className=\"size-4\">\n                        <AvatarFallback className=\"bg-blue-600 text-white text-xs\">\n                          U\n                        </AvatarFallback>\n                      </Avatar>\n                      <motion.div\n                        variants={variants}\n                        className=\"flex w-full items-center gap-2\"\n                      >\n                        {!isCollapsed && (\n                          <>\n                            <p className=\"text-sm font-medium text-white\">User</p>\n                            <ChevronsUpDown className=\"ml-auto h-4 w-4 text-neutral-400\" />\n                          </>\n                        )}\n                      </motion.div>\n                    </div>\n                  </DropdownMenuTrigger>\n                  <DropdownMenuContent sideOffset={5} className=\"bg-neutral-900 border-neutral-800\">\n                    <div className=\"flex flex-row items-center gap-2 p-2\">\n                      <Avatar className=\"size-6\">\n                        <AvatarFallback className=\"bg-blue-600 text-white\">\n                          U\n                        </AvatarFallback>\n                      </Avatar>\n                      <div className=\"flex flex-col text-left\">\n                        <span className=\"text-sm font-medium text-white\">\n                          User\n                        </span>\n                        <span className=\"line-clamp-1 text-xs text-neutral-400\">\n                          user@example.com\n                        </span>\n                      </div>\n                    </div>\n                    <DropdownMenuSeparator className=\"bg-neutral-800\" />\n                    <DropdownMenuItem className=\"flex items-center gap-2 text-white hover:bg-neutral-800\">\n                      <UserCircle className=\"h-4 w-4\" /> Profile\n                    </DropdownMenuItem>\n                    <DropdownMenuItem className=\"flex items-center gap-2 text-white hover:bg-neutral-800\">\n                      <LogOut className=\"h-4 w-4\" /> Sign out\n                    </DropdownMenuItem>\n                  </DropdownMenuContent>\n                </DropdownMenu>\n              </div>\n            </div>\n          </div>\n        </motion.ul>\n      </motion.div>\n    </motion.div>\n  );\n} "
  },
  {
    "path": "src/components/ui/skeleton.tsx",
    "content": "import { cn } from \"@/lib/utils\"\n\nfunction Skeleton({ className, ...props }: React.ComponentProps<\"div\">) {\n  return (\n    <div\n      data-slot=\"skeleton\"\n      className={cn(\"bg-accent animate-pulse rounded-md\", className)}\n      {...props}\n    />\n  )\n}\n\nexport { Skeleton }\n"
  },
  {
    "path": "src/components/ui/textarea.tsx",
    "content": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nfunction Textarea({ className, ...props }: React.ComponentProps<\"textarea\">) {\n  return (\n    <textarea\n      data-slot=\"textarea\"\n      className={cn(\n        \"border-input placeholder:text-muted-foreground focus-visible:border-ring focus-visible:ring-ring/50 aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive dark:bg-input/30 flex field-sizing-content min-h-16 w-full rounded-md border bg-transparent px-3 py-2 text-base shadow-xs transition-[color,box-shadow] outline-none focus-visible:ring-[3px] disabled:cursor-not-allowed disabled:opacity-50 md:text-sm\",\n        className\n      )}\n      {...props}\n    />\n  )\n}\n\nexport { Textarea }\n"
  },
  {
    "path": "src/lib/api.ts",
    "content": "const API_BASE_URL = 'http://localhost:8000';\n\n// 🆕 Simple UUID generator for client-side message IDs\nexport const generateUUID = () => {\n  if (typeof window !== 'undefined' && window.crypto && window.crypto.randomUUID) {\n    return window.crypto.randomUUID();\n  }\n  // Fallback for older browsers or non-secure contexts\n  return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, (c) => {\n    const r = (Math.random() * 16) | 0;\n    const v = c === 'x' ? r : (r & 0x3) | 0x8;\n    return v.toString(16);\n  });\n};\n\nexport interface Step {\n  key: string;\n  label: string;\n  status: 'pending' | 'active' | 'done';\n  details: any;\n}\n\nexport interface ChatMessage {\n  id: string;\n  content: string | Array<Record<string, any>> | { steps: Step[] };\n  sender: 'user' | 'assistant';\n  timestamp: string;\n  isLoading?: boolean;\n  metadata?: Record<string, unknown>;\n}\n\nexport interface ChatSession {\n  id: string;\n  title: string;\n  created_at: string;\n  updated_at: string;\n  model_used: string;\n  message_count: number;\n}\n\nexport interface ChatRequest {\n  message: string;\n  model?: string;\n  conversation_history?: Array<{\n    role: 'user' | 'assistant';\n    content: string;\n  }>;\n}\n\nexport interface ChatResponse {\n  response: string;\n  model: string;\n  message_count: number;\n}\n\nexport interface HealthResponse {\n  status: string;\n  ollama_running: boolean;\n  available_models: string[];\n  database_stats?: {\n    total_sessions: number;\n    total_messages: number;\n    most_used_model: string | null;\n  };\n}\n\nexport interface ModelsResponse {\n  generation_models: string[];\n  embedding_models: string[];\n}\n\nexport interface SessionResponse {\n  sessions: ChatSession[];\n  total: number;\n}\n\nexport interface SessionChatResponse {\n  response: string;\n  session: ChatSession;\n  user_message_id: string;\n  ai_message_id: string;\n}\n\nclass ChatAPI {\n  async checkHealth(): Promise<HealthResponse> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/health`);\n      if (!response.ok) {\n        throw new Error(`Health check failed: ${response.status}`);\n      }\n      return await response.json();\n    } catch (error) {\n      console.error('Health check failed:', error);\n      throw error;\n    }\n  }\n\n  async sendMessage(request: ChatRequest): Promise<ChatResponse> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/chat`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({\n          message: request.message,\n          model: request.model || 'llama3.2:latest',\n          conversation_history: request.conversation_history || [],\n        }),\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Chat API error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Chat API failed:', error);\n      throw error;\n    }\n  }\n\n  // Convert ChatMessage array to conversation history format\n  messagesToHistory(messages: ChatMessage[]): Array<{ role: 'user' | 'assistant'; content: string }> {\n    return messages\n      .filter(msg => typeof msg.content === 'string' && msg.content.trim())\n      .map(msg => ({\n        role: msg.sender,\n        content: msg.content as string,\n      }));\n  }\n\n  // Session Management\n  async getSessions(): Promise<SessionResponse> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions`);\n      if (!response.ok) {\n        throw new Error(`Failed to get sessions: ${response.status}`);\n      }\n      return await response.json();\n    } catch (error) {\n      console.error('Get sessions failed:', error);\n      throw error;\n    }\n  }\n\n  async createSession(title: string = 'New Chat', model: string = 'llama3.2:latest'): Promise<ChatSession> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({ title, model }),\n      });\n\n      if (!response.ok) {\n        throw new Error(`Failed to create session: ${response.status}`);\n      }\n\n      const data = await response.json();\n      return data.session;\n    } catch (error) {\n      console.error('Create session failed:', error);\n      throw error;\n    }\n  }\n\n  async getSession(sessionId: string): Promise<{ session: ChatSession; messages: ChatMessage[] }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}`);\n      if (!response.ok) {\n        throw new Error(`Failed to get session: ${response.status}`);\n      }\n      return await response.json();\n    } catch (error) {\n      console.error('Get session failed:', error);\n      throw error;\n    }\n  }\n\n  async sendSessionMessage(\n    sessionId: string,\n    message: string,\n    opts: { \n      model?: string; \n      composeSubAnswers?: boolean; \n      decompose?: boolean; \n      aiRerank?: boolean; \n      contextExpand?: boolean; \n      verify?: boolean;\n      // ✨ NEW RETRIEVAL PARAMETERS\n      retrievalK?: number;\n      contextWindowSize?: number;\n      rerankerTopK?: number;\n      searchType?: string;\n      denseWeight?: number;\n      forceRag?: boolean;\n      provencePrune?: boolean;\n    } = {}\n  ): Promise<SessionChatResponse & { source_documents: any[] }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}/messages`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({\n          message,\n          ...(opts.model && { model: opts.model }),\n          ...(typeof opts.composeSubAnswers === 'boolean' && { compose_sub_answers: opts.composeSubAnswers }),\n          ...(typeof opts.decompose === 'boolean' && { query_decompose: opts.decompose }),\n          ...(typeof opts.aiRerank === 'boolean' && { ai_rerank: opts.aiRerank }),\n          ...(typeof opts.contextExpand === 'boolean' && { context_expand: opts.contextExpand }),\n          ...(typeof opts.verify === 'boolean' && { verify: opts.verify }),\n          // ✨ ADD NEW RETRIEVAL PARAMETERS\n          ...(typeof opts.retrievalK === 'number' && { retrieval_k: opts.retrievalK }),\n          ...(typeof opts.contextWindowSize === 'number' && { context_window_size: opts.contextWindowSize }),\n          ...(typeof opts.rerankerTopK === 'number' && { reranker_top_k: opts.rerankerTopK }),\n          ...(typeof opts.searchType === 'string' && { search_type: opts.searchType }),\n          ...(typeof opts.denseWeight === 'number' && { dense_weight: opts.denseWeight }),\n          ...(typeof opts.forceRag === 'boolean' && { force_rag: opts.forceRag }),\n          ...(typeof opts.provencePrune === 'boolean' && { provence_prune: opts.provencePrune }),\n        }),\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Session chat error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Session chat failed:', error);\n      throw error;\n    }\n  }\n\n  async deleteSession(sessionId: string): Promise<{ message: string; deleted_session_id: string }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}`, {\n        method: 'DELETE',\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Delete session error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Delete session failed:', error);\n      throw error;\n    }\n  }\n\n  async renameSession(sessionId: string, newTitle: string): Promise<{ message: string; session: ChatSession }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}/rename`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({ title: newTitle }),\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Rename session error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Rename session failed:', error);\n      throw error;\n    }\n  }\n\n  async cleanupEmptySessions(): Promise<{ message: string; cleanup_count: number }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/cleanup`);\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Cleanup sessions error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Cleanup sessions failed:', error);\n      throw error;\n    }\n  }\n\n  async uploadFiles(sessionId: string, files: File[]): Promise<{ \n    message: string; \n    uploaded_files: {filename: string, stored_path: string}[]; \n  }> {\n    try {\n      const formData = new FormData();\n      files.forEach((file) => {\n        formData.append('files', file, file.name);\n      });\n\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}/upload`, {\n        method: 'POST',\n        body: formData,\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Upload failed' }));\n        throw new Error(`Upload error: ${errorData.error || response.statusText}`);\n      }\n      return await response.json();\n    } catch (error) {\n      console.error('File upload failed:', error);\n      throw error;\n    }\n  }\n\n  async indexDocuments(sessionId: string): Promise<{ message: string }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}/index`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Indexing failed' }));\n        throw new Error(`Indexing error: ${errorData.error || response.statusText}`);\n      }\n      return await response.json();\n    } catch (error) {\n      console.error('Indexing failed:', error);\n      throw error;\n    }\n  }\n\n  // Legacy upload function - can be removed if no longer needed\n  async uploadPDFs(sessionId: string, files: File[]): Promise<{ \n    message: string; \n    uploaded_files: any[]; \n    processing_results: any[];\n    session_documents: any[];\n    total_session_documents: number;\n  }> {\n    try {\n      // Test if files have content and show size info\n      let totalSize = 0;\n      for (const file of files) {\n        if (file.size === 0) {\n          throw new Error(`File ${file.name} is empty (0 bytes)`);\n        }\n        totalSize += file.size;\n        const sizeMB = (file.size / (1024 * 1024)).toFixed(2);\n        console.log(`📄 File ${file.name}: ${sizeMB}MB (${file.size} bytes), type: ${file.type}`);\n      }\n      \n      const totalSizeMB = (totalSize / (1024 * 1024)).toFixed(2);\n      console.log(`📄 Total upload size: ${totalSizeMB}MB`);\n      \n      if (totalSize > 50 * 1024 * 1024) { // 50MB limit\n        throw new Error(`Total file size ${totalSizeMB}MB exceeds 50MB limit`);\n      }\n      \n      const formData = new FormData();\n      \n      // Use a generic field name 'file' that the backend expects\n      let i = 0;\n      for (const file of files) {\n        formData.append(`file_${i}`, file, file.name);\n        i++;\n      }\n      \n      const response = await fetch(`${API_BASE_URL}/sessions/${sessionId}/upload`, {\n        method: 'POST',\n        body: formData,\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Upload error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('PDF upload failed:', error);\n      throw error;\n    }\n  }\n\n  // Convert database message format to ChatMessage format\n  convertDbMessage(dbMessage: Record<string, unknown>): ChatMessage {\n    return {\n      id: dbMessage.id as string,\n      content: dbMessage.content as string,\n      sender: dbMessage.sender as 'user' | 'assistant',\n      timestamp: dbMessage.timestamp as string,\n      metadata: dbMessage.metadata as Record<string, unknown> | undefined,\n    };\n  }\n\n  // Create a new ChatMessage with UUID (for loading states)\n  createMessage(\n    content: string, \n    sender: 'user' | 'assistant', \n    isLoading = false\n  ): ChatMessage {\n    return {\n      id: generateUUID(),\n      content,\n      sender,\n      timestamp: new Date().toISOString(),\n      isLoading,\n    };\n  }\n\n  // ---------------- Models ----------------\n  async getModels(): Promise<ModelsResponse> {\n    const resp = await fetch(`${API_BASE_URL}/models`);\n    if (!resp.ok) {\n      throw new Error(`Failed to fetch models list: ${resp.status}`);\n    }\n    return resp.json();\n  }\n\n  async getSessionDocuments(sessionId: string): Promise<{ files: string[]; file_count: number; session: ChatSession }> {\n    const resp = await fetch(`${API_BASE_URL}/sessions/${sessionId}/documents`);\n    if (!resp.ok) {\n      throw new Error(`Failed to fetch session documents: ${resp.status}`);\n    }\n    return resp.json();\n  }\n\n  // ---------- Index endpoints ----------\n\n  async createIndex(name: string, description?: string, metadata: Record<string, unknown> = {}): Promise<{ index_id: string }> {\n    const resp = await fetch(`${API_BASE_URL}/indexes`, {\n      method: 'POST',\n      headers: { 'Content-Type': 'application/json' },\n      body: JSON.stringify({ name, description, metadata }),\n    });\n    if (!resp.ok) {\n      const err = await resp.json().catch(() => ({}));\n      throw new Error(`Create index error: ${err.error || resp.statusText}`);\n    }\n    return resp.json();\n  }\n\n  async uploadFilesToIndex(indexId: string, files: File[]): Promise<{ message: string; uploaded_files: any[] }> {\n    const fd = new FormData();\n    files.forEach((f) => fd.append('files', f, f.name));\n    const resp = await fetch(`${API_BASE_URL}/indexes/${indexId}/upload`, { method: 'POST', body: fd });\n    if (!resp.ok) {\n      const err = await resp.json().catch(() => ({}));\n      throw new Error(`Upload to index error: ${err.error || resp.statusText}`);\n    }\n    return resp.json();\n  }\n\n  async buildIndex(indexId: string, opts: { \n    latechunk?: boolean; \n    doclingChunk?: boolean;\n    chunkSize?: number;\n    chunkOverlap?: number;\n    retrievalMode?: string;\n    windowSize?: number;\n    enableEnrich?: boolean;\n    embeddingModel?: string;\n    enrichModel?: string;\n    overviewModel?: string;\n    batchSizeEmbed?: number;\n    batchSizeEnrich?: number;\n  } = {}): Promise<{ message: string }> {\n    try {\n      const response = await fetch(`${API_BASE_URL}/indexes/${indexId}/build`, {\n        method: 'POST',\n        headers: {\n          'Content-Type': 'application/json',\n        },\n        body: JSON.stringify({ \n          latechunk: opts.latechunk ?? false,\n          doclingChunk: opts.doclingChunk ?? false,\n          chunkSize: opts.chunkSize ?? 512,\n          chunkOverlap: opts.chunkOverlap ?? 64,\n          retrievalMode: opts.retrievalMode ?? 'hybrid',\n          windowSize: opts.windowSize ?? 2,\n          enableEnrich: opts.enableEnrich ?? true,\n          embeddingModel: opts.embeddingModel,\n          enrichModel: opts.enrichModel,\n          overviewModel: opts.overviewModel,\n          batchSizeEmbed: opts.batchSizeEmbed ?? 50,\n          batchSizeEnrich: opts.batchSizeEnrich ?? 25,\n        }),\n      });\n\n      if (!response.ok) {\n        const errorData = await response.json().catch(() => ({ error: 'Unknown error' }));\n        throw new Error(`Build index error: ${errorData.error || response.statusText}`);\n      }\n\n      return await response.json();\n    } catch (error) {\n      console.error('Build index failed:', error);\n      throw error;\n    }\n  }\n\n  async linkIndexToSession(sessionId: string, indexId: string): Promise<{ message: string }> {\n    const resp = await fetch(`${API_BASE_URL}/sessions/${sessionId}/indexes/${indexId}`, { method: 'POST' });\n    if (!resp.ok) {\n      const err = await resp.json().catch(() => ({}));\n      throw new Error(`Link index error: ${err.error || resp.statusText}`);\n    }\n    return resp.json();\n  }\n\n  async listIndexes(): Promise<{ indexes: any[]; total: number }> {\n    const resp = await fetch(`${API_BASE_URL}/indexes`);\n    if (!resp.ok) {\n      throw new Error(`Failed to list indexes: ${resp.status}`);\n    }\n    return resp.json();\n  }\n\n  async getSessionIndexes(sessionId: string): Promise<{ indexes: any[]; total: number }> {\n    const resp = await fetch(`${API_BASE_URL}/sessions/${sessionId}/indexes`);\n    if (!resp.ok) throw new Error(`Failed to get session indexes: ${resp.status}`);\n    return resp.json();\n  }\n\n  async deleteIndex(indexId: string): Promise<{ message: string }> {\n    const resp = await fetch(`${API_BASE_URL}/indexes/${indexId}`, {\n      method: 'DELETE',\n    });\n    if (!resp.ok) {\n      const data = await resp.json().catch(() => ({ error: 'Unknown error'}));\n      throw new Error(data.error || `Failed to delete index: ${resp.status}`);\n    }\n    return resp.json();\n  }\n\n  // -------------------- Streaming (SSE-over-fetch) --------------------\n  async streamSessionMessage(\n    params: {\n      query: string;\n      model?: string;\n      session_id?: string;\n      table_name?: string;\n      composeSubAnswers?: boolean;\n      decompose?: boolean;\n      aiRerank?: boolean;\n      contextExpand?: boolean;\n      verify?: boolean;\n      // ✨ NEW RETRIEVAL PARAMETERS\n      retrievalK?: number;\n      contextWindowSize?: number;\n      rerankerTopK?: number;\n      searchType?: string;\n      denseWeight?: number;\n      forceRag?: boolean;\n      provencePrune?: boolean;\n    },\n    onEvent: (event: { type: string; data: any }) => void,\n  ): Promise<void> {\n    const { query, model, session_id, table_name, composeSubAnswers, decompose, aiRerank, contextExpand, verify, retrievalK, contextWindowSize, rerankerTopK, searchType, denseWeight, forceRag, provencePrune } = params;\n\n    const payload: Record<string, unknown> = { query };\n    if (model) payload.model = model;\n    if (session_id) payload.session_id = session_id;\n    if (table_name) payload.table_name = table_name;\n    if (typeof composeSubAnswers === 'boolean') payload.compose_sub_answers = composeSubAnswers;\n    if (typeof decompose === 'boolean') payload.query_decompose = decompose;\n    if (typeof aiRerank === 'boolean') payload.ai_rerank = aiRerank;\n    if (typeof contextExpand === 'boolean') payload.context_expand = contextExpand;\n    if (typeof verify === 'boolean') payload.verify = verify;\n    // ✨ ADD NEW RETRIEVAL PARAMETERS TO PAYLOAD\n    if (typeof retrievalK === 'number') payload.retrieval_k = retrievalK;\n    if (typeof contextWindowSize === 'number') payload.context_window_size = contextWindowSize;\n    if (typeof rerankerTopK === 'number') payload.reranker_top_k = rerankerTopK;\n    if (typeof searchType === 'string') payload.search_type = searchType;\n    if (typeof denseWeight === 'number') payload.dense_weight = denseWeight;\n    if (typeof forceRag === 'boolean') payload.force_rag = forceRag;\n    if (typeof provencePrune === 'boolean') payload.provence_prune = provencePrune;\n\n    const resp = await fetch('http://localhost:8001/chat/stream', {\n      method: 'POST',\n      headers: { 'Content-Type': 'application/json' },\n      body: JSON.stringify(payload),\n    });\n\n    if (!resp.ok || !resp.body) {\n      throw new Error(`Stream request failed: ${resp.status}`);\n    }\n\n    const reader = resp.body.getReader();\n    const decoder = new TextDecoder();\n    let buffer = '';\n\n    let streamClosed = false;\n    while (!streamClosed) {\n      const { value, done } = await reader.read();\n      if (done) break;\n      buffer += decoder.decode(value, { stream: true });\n\n      const parts = buffer.split('\\n\\n');\n      buffer = parts.pop() || '';\n\n      for (const part of parts) {\n        const line = part.trim();\n        if (!line.startsWith('data:')) continue;\n        const jsonStr = line.replace(/^data:\\s*/, '');\n        try {\n          const evt = JSON.parse(jsonStr);\n          onEvent(evt);\n          if (evt.type === 'complete') {\n            // Gracefully close the stream so the caller unblocks\n            try { await reader.cancel(); } catch {}\n            streamClosed = true;\n            break;\n          }\n        } catch {\n          /* noop */\n        }\n      }\n    }\n  }\n}\n\nexport const chatAPI = new ChatAPI(); "
  },
  {
    "path": "src/lib/types.ts",
    "content": "export interface AttachedFile {\n  id: string;\n  name: string;\n  size: number;\n  type: string;\n  file: File;\n} "
  },
  {
    "path": "src/lib/utils.ts",
    "content": "import { clsx, type ClassValue } from \"clsx\"\nimport { twMerge } from \"tailwind-merge\"\n\nexport function cn(...inputs: ClassValue[]) {\n  return twMerge(clsx(inputs))\n}\n"
  },
  {
    "path": "src/test-upload.html",
    "content": "<!DOCTYPE html>\n<html>\n<head>\n    <title>Test PDF Upload</title>\n</head>\n<body>\n    <h1>Test PDF Upload</h1>\n    <form id=\"uploadForm\">\n        <input type=\"file\" id=\"fileInput\" accept=\".pdf,.docx,.doc,.html,.htm,.md,.txt\" />\n        <button type=\"submit\">Upload PDF</button>\n    </form>\n    \n    <div id=\"result\"></div>\n    \n    <script>\n        document.getElementById('uploadForm').addEventListener('submit', async (e) => {\n            e.preventDefault();\n            \n            const fileInput = document.getElementById('fileInput');\n            const file = fileInput.files[0];\n            \n            if (!file) {\n                alert('Please select a file');\n                return;\n            }\n            \n            console.log('Selected file:', {\n                name: file.name,\n                size: file.size,\n                type: file.type,\n                lastModified: file.lastModified\n            });\n            \n            const formData = new FormData();\n            formData.append('file_0', file);\n            \n            try {\n                const response = await fetch('http://localhost:8000/sessions/4b545007-f13f-4bc8-be69-3f0633645e52/upload', {\n                    method: 'POST',\n                    body: formData\n                });\n                \n                const result = await response.json();\n                document.getElementById('result').innerHTML = '<pre>' + JSON.stringify(result, null, 2) + '</pre>';\n                console.log('Upload result:', result);\n                \n            } catch (error) {\n                console.error('Upload failed:', error);\n                document.getElementById('result').innerHTML = 'Upload failed: ' + error.message;\n            }\n        });\n    </script>\n</body>\n</html>    "
  },
  {
    "path": "src/utils/textNormalization.ts",
    "content": "/**\n * Comprehensive text normalization utility for cleaning up excessive whitespace\n * in streaming markdown responses to prevent large visual gaps in the UI.\n */\n\nexport function normalizeWhitespace(text: string): string {\n  if (!text || typeof text !== 'string') {\n    return '';\n  }\n\n  text = text.replace(/\\n{3,}/g, '\\n\\n');\n  \n  text = text.replace(/[ \\t]+$/gm, '');\n  \n  text = text.replace(/[ \\t]{3,}/g, ' ');\n  \n  text = text.replace(/[ \\t]*\\n[ \\t]*\\n[ \\t]*\\n/g, '\\n\\n');\n  \n  text = text.replace(/[ \\t]+\\n/g, '\\n');\n  \n  text = text.trim();\n  \n  return text;\n}\n\n/**\n * Specialized normalization for streaming tokens to prevent accumulation\n * of excessive whitespace during real-time text generation.\n */\nexport function normalizeStreamingToken(currentText: string, newToken: string): string {\n  if (!newToken || typeof newToken !== 'string') {\n    return currentText;\n  }\n\n  let combined = currentText + newToken;\n  \n  combined = normalizeWhitespace(combined);\n  \n  return combined;\n}\n\n/**\n * Check if text contains excessive whitespace that needs normalization\n */\nexport function hasExcessiveWhitespace(text: string): boolean {\n  if (!text || typeof text !== 'string') {\n    return false;\n  }\n  \n  if (/\\n{3,}/.test(text)) {\n    return true;\n  }\n  \n  if (/[ \\t]{3,}/.test(text)) {\n    return true;\n  }\n  \n  if (/[ \\t]*\\n[ \\t]*\\n[ \\t]*\\n/.test(text)) {\n    return true;\n  }\n  \n  return false;\n}\n"
  },
  {
    "path": "start-docker.sh",
    "content": "#!/bin/bash\n\n# LocalGPT Docker Startup Script\n# This script provides easy options for running LocalGPT in Docker\n\nset -e\n\necho \"🐳 LocalGPT Docker Deployment\"\necho \"============================\"\n\n# Function to check if local Ollama is running\ncheck_local_ollama() {\n    if curl -s http://localhost:11434/api/tags >/dev/null 2>&1; then\n        echo \"✅ Local Ollama detected on port 11434\"\n        return 0\n    else\n        echo \"❌ No local Ollama detected on port 11434\"\n        return 1\n    fi\n}\n\n# Function to start with local Ollama\nstart_with_local_ollama() {\n    echo \"🚀 Starting LocalGPT containers (using local Ollama)...\"\n    echo \"📝 Note: Make sure your local Ollama is running on port 11434\"\n    \n    # Use the docker.env file for configuration\n    docker compose --env-file docker.env up --build -d\n    \n    echo \"\"\n    echo \"🎉 LocalGPT is starting up!\"\n    echo \"📱 Frontend: http://localhost:3000\"\n    echo \"🔧 Backend API: http://localhost:8000\"\n    echo \"🧠 RAG API: http://localhost:8001\"\n    echo \"🤖 Ollama: http://localhost:11434 (local)\"\n    echo \"\"\n    echo \"📊 Check container status: docker compose ps\"\n    echo \"📝 View logs: docker compose logs -f\"\n    echo \"🛑 Stop services: docker compose down\"\n}\n\n# Function to start with containerized Ollama\nstart_with_container_ollama() {\n    echo \"🚀 Starting LocalGPT containers (including Ollama container)...\"\n    \n    # Set environment variable for containerized Ollama\n    export OLLAMA_HOST=http://ollama:11434\n    \n    # Start all services including Ollama\n    docker compose --profile with-ollama up --build -d\n    \n    echo \"\"\n    echo \"🎉 LocalGPT is starting up!\"\n    echo \"📱 Frontend: http://localhost:3000\"\n    echo \"🔧 Backend API: http://localhost:8000\"\n    echo \"🧠 RAG API: http://localhost:8001\"\n    echo \"🤖 Ollama: http://localhost:11434 (containerized)\"\n    echo \"\"\n    echo \"⏳ Note: First startup may take longer as Ollama container initializes\"\n    echo \"📊 Check container status: docker compose --profile with-ollama ps\"\n    echo \"📝 View logs: docker compose --profile with-ollama logs -f\"\n    echo \"🛑 Stop services: docker compose --profile with-ollama down\"\n}\n\n# Function to show usage\nshow_usage() {\n    echo \"Usage: $0 [option]\"\n    echo \"\"\n    echo \"Options:\"\n    echo \"  local     - Use local Ollama instance (default)\"\n    echo \"  container - Use containerized Ollama\"\n    echo \"  stop      - Stop all containers\"\n    echo \"  logs      - Show container logs\"\n    echo \"  status    - Show container status\"\n    echo \"  help      - Show this help message\"\n    echo \"\"\n    echo \"Examples:\"\n    echo \"  $0 local      # Use local Ollama (recommended)\"\n    echo \"  $0 container  # Use containerized Ollama\"\n    echo \"  $0 stop       # Stop all services\"\n}\n\n# Function to stop containers\nstop_containers() {\n    echo \"🛑 Stopping LocalGPT containers...\"\n    docker compose down\n    docker compose --profile with-ollama down 2>/dev/null || true\n    echo \"✅ All containers stopped\"\n}\n\n# Function to show logs\nshow_logs() {\n    echo \"📝 Showing container logs (Ctrl+C to exit)...\"\n    if docker compose ps | grep -q \"rag-ollama\"; then\n        docker compose --profile with-ollama logs -f\n    else\n        docker compose logs -f\n    fi\n}\n\n# Function to show status\nshow_status() {\n    echo \"📊 Container Status:\"\n    docker compose ps\n    echo \"\"\n    echo \"🐳 All Docker containers:\"\n    docker ps | grep -E \"(rag-|CONTAINER)\" || echo \"No LocalGPT containers running\"\n}\n\n# Main script logic\ncase \"${1:-local}\" in\n    \"local\")\n        if check_local_ollama; then\n            start_with_local_ollama\n        else\n            echo \"\"\n            echo \"⚠️  No local Ollama detected. Options:\"\n            echo \"1. Start local Ollama: 'ollama serve'\"\n            echo \"2. Use containerized Ollama: '$0 container'\"\n            echo \"\"\n            read -p \"Start with containerized Ollama instead? (y/N): \" -n 1 -r\n            echo\n            if [[ $REPLY =~ ^[Yy]$ ]]; then\n                start_with_container_ollama\n            else\n                echo \"❌ Cancelled. Please start local Ollama or use '$0 container'\"\n                exit 1\n            fi\n        fi\n        ;;\n    \"container\")\n        start_with_container_ollama\n        ;;\n    \"stop\")\n        stop_containers\n        ;;\n    \"logs\")\n        show_logs\n        ;;\n    \"status\")\n        show_status\n        ;;\n    \"help\"|\"-h\"|\"--help\")\n        show_usage\n        ;;\n    *)\n        echo \"❌ Unknown option: $1\"\n        echo \"\"\n        show_usage\n        exit 1\n        ;;\nesac "
  },
  {
    "path": "system_health_check.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nSystem Health Check for RAG System\nQuick validation of configurations, models, and data access.\n\"\"\"\n\nimport sys\nimport traceback\nfrom pathlib import Path\n\ndef print_status(message, success=None):\n    \"\"\"Print status with emoji\"\"\"\n    if success is True:\n        print(f\"✅ {message}\")\n    elif success is False:\n        print(f\"❌ {message}\")\n    else:\n        print(f\"🔍 {message}\")\n\ndef check_imports():\n    \"\"\"Test basic imports\"\"\"\n    print_status(\"Testing basic imports...\")\n    try:\n        from rag_system.main import get_agent, EXTERNAL_MODELS, OLLAMA_CONFIG, PIPELINE_CONFIGS\n        print_status(\"Basic imports successful\", True)\n        return True\n    except Exception as e:\n        print_status(f\"Import failed: {e}\", False)\n        return False\n\ndef check_configurations():\n    \"\"\"Validate configurations\"\"\"\n    print_status(\"Checking configurations...\")\n    try:\n        from rag_system.main import EXTERNAL_MODELS, OLLAMA_CONFIG, PIPELINE_CONFIGS\n        \n        print(f\"📊 External Models: {EXTERNAL_MODELS}\")\n        print(f\"📊 Ollama Config: {OLLAMA_CONFIG}\")\n        print(f\"📊 Pipeline Configs: {PIPELINE_CONFIGS}\")\n        \n        # Check for common model dimension issues\n        embedding_model = EXTERNAL_MODELS.get(\"embedding_model\", \"Unknown\")\n        if \"bge-small\" in embedding_model:\n            print_status(f\"Embedding model: {embedding_model} (384 dims)\", True)\n        elif \"Qwen3-Embedding\" in embedding_model:\n            print_status(f\"Embedding model: {embedding_model} (1024 dims) - Check data compatibility!\", None)\n        else:\n            print_status(f\"Embedding model: {embedding_model} - Verify dimensions!\", None)\n            \n        print_status(\"Configuration check completed\", True)\n        return True\n    except Exception as e:\n        print_status(f\"Configuration check failed: {e}\", False)\n        return False\n\ndef check_agent_initialization():\n    \"\"\"Test agent initialization\"\"\"\n    print_status(\"Testing agent initialization...\")\n    try:\n        from rag_system.main import get_agent\n        agent = get_agent('default')\n        print_status(\"Agent initialization successful\", True)\n        return agent\n    except Exception as e:\n        print_status(f\"Agent initialization failed: {e}\", False)\n        traceback.print_exc()\n        return None\n\ndef check_embedding_model(agent):\n    \"\"\"Test embedding model\"\"\"\n    print_status(\"Testing embedding model...\")\n    try:\n        embedder = agent.retrieval_pipeline._get_text_embedder()\n        test_emb = embedder.create_embeddings(['test'])\n        \n        model_name = getattr(embedder.model, 'name_or_path', 'Unknown')\n        dimensions = test_emb.shape[1]\n        \n        print_status(f\"Embedding model: {model_name}\", True)\n        print_status(f\"Vector dimension: {dimensions}\", True)\n        \n        # Warn about dimension compatibility\n        if dimensions == 384:\n            print_status(\"Using 384-dim embeddings (bge-small compatible)\", True)\n        elif dimensions == 1024:\n            print_status(\"Using 1024-dim embeddings (Qwen3 compatible) - Ensure data compatibility!\", None)\n        \n        return True\n    except Exception as e:\n        print_status(f\"Embedding model test failed: {e}\", False)\n        return False\n\ndef check_database_access():\n    \"\"\"Test database access\"\"\"\n    print_status(\"Testing database access...\")\n    try:\n        import lancedb\n        db = lancedb.connect('./lancedb')\n        tables = db.table_names()\n        \n        print_status(f\"LanceDB connected - {len(tables)} tables available\", True)\n        if tables:\n            print(\"📋 Available tables:\")\n            for table in tables[:5]:  # Show first 5 tables\n                print(f\"   - {table}\")\n            if len(tables) > 5:\n                print(f\"   ... and {len(tables) - 5} more\")\n        else:\n            print_status(\"No tables found - may need to index documents first\", None)\n            \n        return True\n    except Exception as e:\n        print_status(f\"Database access failed: {e}\", False)\n        return False\n\ndef check_sample_query(agent):\n    \"\"\"Test a sample query if tables exist\"\"\"\n    print_status(\"Testing sample query...\")\n    try:\n        import lancedb\n        db = lancedb.connect('./lancedb')\n        tables = db.table_names()\n        \n        if not tables:\n            print_status(\"No tables available for query test\", None)\n            return True\n            \n        # Use first available table\n        table_name = tables[0]\n        print_status(f\"Testing query on table: {table_name}\")\n        \n        result = agent.run('what is this document about?', table_name=table_name)\n        \n        if result and 'answer' in result:\n            print_status(\"Sample query successful\", True)\n            print(f\"📝 Answer preview: {result['answer'][:100]}...\")\n            print(f\"📊 Found {len(result.get('source_documents', []))} source documents\")\n        else:\n            print_status(\"Query returned empty result\", None)\n            \n        return True\n    except Exception as e:\n        print_status(f\"Sample query failed: {e}\", False)\n        return False\n\ndef main():\n    \"\"\"Run complete system health check\"\"\"\n    print(\"🏥 RAG System Health Check\")\n    print(\"=\" * 50)\n    \n    checks_passed = 0\n    total_checks = 6\n    \n    # Basic checks\n    if check_imports():\n        checks_passed += 1\n    \n    if check_configurations():\n        checks_passed += 1\n    \n    if check_database_access():\n        checks_passed += 1\n    \n    # Agent-dependent checks\n    agent = check_agent_initialization()\n    if agent:\n        checks_passed += 1\n        \n        if check_embedding_model(agent):\n            checks_passed += 1\n            \n        if check_sample_query(agent):\n            checks_passed += 1\n    \n    # Summary\n    print(\"\\n\" + \"=\" * 50)\n    print(f\"🏥 Health Check Complete: {checks_passed}/{total_checks} checks passed\")\n    \n    if checks_passed == total_checks:\n        print_status(\"System is healthy! 🎉\", True)\n        return 0\n    elif checks_passed >= total_checks - 1:\n        print_status(\"System mostly healthy with minor issues\", None)\n        return 0\n    else:\n        print_status(\"System has significant issues that need attention\", False)\n        return 1\n\nif __name__ == \"__main__\":\n    sys.exit(main()) "
  },
  {
    "path": "tailwind.config.js",
    "content": "/** @type {import('tailwindcss').Config} */\nmodule.exports = {\n  content: [\n    './src/**/*.{js,ts,jsx,tsx}',\n    './src/components/**/*.{js,ts,jsx,tsx}',\n  ],\n  theme: {\n    extend: {},\n  },\n  plugins: [],\n} "
  },
  {
    "path": "test_docker_build.sh",
    "content": "#!/bin/bash\n\n# Test Docker builds individually\necho \"🐳 Testing Docker builds individually...\"\n\n# Function to check if Docker is running\ncheck_docker() {\n    if ! docker version >/dev/null 2>&1; then\n        echo \"❌ Docker is not running. Please start Docker Desktop.\"\n        exit 1\n    fi\n    echo \"✅ Docker is running\"\n}\n\n# Function to build and test a single container\nbuild_and_test() {\n    local service=$1\n    local dockerfile=$2\n    local port=$3\n    \n    echo \"\"\n    echo \"🔨 Building $service...\"\n    docker build -f $dockerfile -t \"rag-$service\" .\n    if [ $? -ne 0 ]; then\n        echo \"❌ Failed to build $service\"\n        return 1\n    fi\n    \n    echo \"✅ $service built successfully\"\n    \n    # Test running the container\n    echo \"🚀 Testing $service container...\"\n    docker run -d --name \"test-$service\" -p \"$port:$port\" \"rag-$service\"\n    if [ $? -ne 0 ]; then\n        echo \"❌ Failed to run $service\"\n        return 1\n    fi\n    \n    echo \"⏳ Waiting for $service to start...\"\n    sleep 10\n    \n    # Test health\n    if [ \"$service\" = \"frontend\" ]; then\n        curl -f \"http://localhost:$port\" >/dev/null 2>&1\n    elif [ \"$service\" = \"backend\" ]; then\n        curl -f \"http://localhost:$port/health\" >/dev/null 2>&1\n    elif [ \"$service\" = \"rag-api\" ]; then\n        curl -f \"http://localhost:$port/models\" >/dev/null 2>&1\n    fi\n    \n    if [ $? -eq 0 ]; then\n        echo \"✅ $service is healthy\"\n    else\n        echo \"⚠️ $service health check failed (but container is running)\"\n        docker logs \"test-$service\" | tail -10\n    fi\n    \n    # Cleanup\n    docker stop \"test-$service\" >/dev/null 2>&1\n    docker rm \"test-$service\" >/dev/null 2>&1\n    \n    return 0\n}\n\n# Main execution\ncheck_docker\n\necho \"🧹 Cleaning up old containers and images...\"\ndocker container prune -f >/dev/null 2>&1\ndocker image prune -f >/dev/null 2>&1\n\n# Build in dependency order\necho \"📦 Building containers in dependency order...\"\n\n# 1. RAG API (no dependencies)\nbuild_and_test \"rag-api\" \"Dockerfile.rag-api\" \"8001\"\nif [ $? -ne 0 ]; then\n    echo \"❌ RAG API build failed, stopping\"\n    exit 1\nfi\n\n# 2. Backend (depends on RAG API)\nbuild_and_test \"backend\" \"Dockerfile.backend\" \"8000\"\nif [ $? -ne 0 ]; then\n    echo \"❌ Backend build failed, stopping\"\n    exit 1\nfi\n\n# 3. Frontend (depends on Backend)\nbuild_and_test \"frontend\" \"Dockerfile.frontend\" \"3000\"\nif [ $? -ne 0 ]; then\n    echo \"❌ Frontend build failed, stopping\"\n    exit 1\nfi\n\necho \"\"\necho \"🎉 All containers built and tested successfully!\"\necho \"🚀 You can now run: ./start-docker.sh\" "
  },
  {
    "path": "test_markdown_streaming.js",
    "content": "\nconst testMarkdownWithExcessiveNewlines = `# Test Response\n\nThis is a test response with excessive newlines.\n\n\n\nHere's some content after multiple empty lines.\n\n\n\n\n## Section Header\n\nMore content here.\n\n\n\n\n\n\n### Subsection\n\nFinal content with lots of spacing.\n\n\n\n\nThe end.`;\n\nconst testStreamingTokens = [\n  \"# Test Response\\n\\n\",\n  \"This is a test response\",\n  \" with excessive newlines.\\n\\n\\n\\n\",\n  \"Here's some content after\",\n  \" multiple empty lines.\\n\\n\\n\\n\\n\",\n  \"## Section Header\\n\\n\",\n  \"More content here.\\n\\n\\n\\n\\n\\n\\n\",\n  \"### Subsection\\n\\n\",\n  \"Final content with lots\",\n  \" of spacing.\\n\\n\\n\\n\\n\",\n  \"The end.\"\n];\n\nfunction currentCleanup(text) {\n  return text.replace(/\\n{3,}/g, '\\n\\n');\n}\n\nfunction improvedCleanup(text) {\n  text = text.replace(/\\n{3,}/g, '\\n\\n');\n  \n  text = text.replace(/[ \\t]+$/gm, '');\n  \n  text = text.replace(/[ \\t]{3,}/g, ' ');\n  \n  text = text.replace(/[ \\t]*\\n[ \\t]*\\n[ \\t]*\\n/g, '\\n\\n');\n  \n  text = text.trim();\n  \n  return text;\n}\n\nconsole.log(\"=== ORIGINAL TEXT ===\");\nconsole.log(JSON.stringify(testMarkdownWithExcessiveNewlines));\n\nconsole.log(\"\\n=== CURRENT CLEANUP ===\");\nconsole.log(JSON.stringify(currentCleanup(testMarkdownWithExcessiveNewlines)));\n\nconsole.log(\"\\n=== IMPROVED CLEANUP ===\");\nconsole.log(JSON.stringify(improvedCleanup(testMarkdownWithExcessiveNewlines)));\n\nconsole.log(\"\\n=== STREAMING SIMULATION ===\");\nlet streamedText = \"\";\ntestStreamingTokens.forEach((token, i) => {\n  streamedText += token;\n  console.log(`Token ${i + 1}: \"${token}\"`);\n  console.log(`Accumulated (current): \"${currentCleanup(streamedText)}\"`);\n  console.log(`Accumulated (improved): \"${improvedCleanup(streamedText)}\"`);\n  console.log(\"---\");\n});\n"
  },
  {
    "path": "tsconfig.json",
    "content": "{\n  \"compilerOptions\": {\n    \"target\": \"ES2017\",\n    \"lib\": [\"dom\", \"dom.iterable\", \"esnext\"],\n    \"allowJs\": true,\n    \"skipLibCheck\": true,\n    \"strict\": true,\n    \"noEmit\": true,\n    \"esModuleInterop\": true,\n    \"module\": \"esnext\",\n    \"moduleResolution\": \"bundler\",\n    \"resolveJsonModule\": true,\n    \"isolatedModules\": true,\n    \"jsx\": \"preserve\",\n    \"incremental\": true,\n    \"plugins\": [\n      {\n        \"name\": \"next\"\n      }\n    ],\n    \"paths\": {\n      \"@/*\": [\"./src/*\"]\n    }\n  },\n  \"include\": [\"next-env.d.ts\", \"**/*.ts\", \"**/*.tsx\", \".next/types/**/*.ts\"],\n  \"exclude\": [\"node_modules\"]\n}\n"
  }
]