Full Code of zhongyu09/openchatbi for AI

main 428f5d88bb12 cached

133 files

701.7 KB

156.6k tokens

728 symbols

1 requests

Download .txt

Showing preview only (744K chars total). Download the full file or copy to clipboard to get everything.

Repository: zhongyu09/openchatbi
Branch: main
Commit: 428f5d88bb12
Files: 133
Total size: 701.7 KB

Directory structure:
gitextract_wkfrsja_/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows/
│       ├── docs.yml
│       ├── publish.yml
│       └── runledger.yml
├── .gitignore
├── CONTRIBUTING.md
├── Dockerfile.python-executor
├── LICENSE
├── README.md
├── baselines/
│   └── runledger-openchatbi.json
├── docs/
│   ├── Makefile
│   ├── make.bat
│   └── source/
│       ├── _templates/
│       │   └── layout.html
│       ├── catalog.rst
│       ├── code.rst
│       ├── conf.py
│       ├── config.rst
│       ├── core.rst
│       ├── index.rst
│       ├── llm.rst
│       ├── text2sql.rst
│       ├── timeseries.rst
│       └── tools.rst
├── evals/
│   ├── __init__.py
│   └── runledger/
│       ├── README.md
│       ├── __init__.py
│       ├── agent/
│       │   └── agent.py
│       ├── cases/
│       │   └── t1.yaml
│       ├── cassettes/
│       │   └── t1.jsonl
│       ├── schema.json
│       ├── suite.yaml
│       └── tools.py
├── example/
│   ├── bi.yaml
│   ├── common_columns.csv
│   ├── config.yaml
│   ├── sql_example.yaml
│   ├── table_columns.csv
│   ├── table_info.yaml
│   └── table_selection_example.csv
├── openchatbi/
│   ├── __init__.py
│   ├── agent_graph.py
│   ├── catalog/
│   │   ├── __init__.py
│   │   ├── catalog_loader.py
│   │   ├── catalog_store.py
│   │   ├── factory.py
│   │   ├── helper.py
│   │   ├── retrival_helper.py
│   │   ├── schema_retrival.py
│   │   ├── store/
│   │   │   ├── __init__.py
│   │   │   └── file_system.py
│   │   └── token_service.py
│   ├── code/
│   │   ├── docker_executor.py
│   │   ├── executor_base.py
│   │   ├── local_executor.py
│   │   └── restricted_local_executor.py
│   ├── config.yaml.template
│   ├── config_loader.py
│   ├── constants.py
│   ├── context_config.py
│   ├── context_manager.py
│   ├── graph_state.py
│   ├── llm/
│   │   └── llm.py
│   ├── prompts/
│   │   ├── agent_prompt.md
│   │   ├── extraction_prompt.md
│   │   ├── schema_linking_prompt.md
│   │   ├── sql_dialect/
│   │   │   └── presto.md
│   │   ├── summary_prompt.md
│   │   ├── system_prompt.py
│   │   ├── text2sql_prompt.md
│   │   └── visualization_prompt.md
│   ├── text2sql/
│   │   ├── __init__.py
│   │   ├── data.py
│   │   ├── extraction.py
│   │   ├── generate_sql.py
│   │   ├── schema_linking.py
│   │   ├── sql_graph.py
│   │   ├── text2sql_utils.py
│   │   └── visualization.py
│   ├── text_segmenter.py
│   ├── tool/
│   │   ├── ask_human.py
│   │   ├── mcp_tools.py
│   │   ├── memory.py
│   │   ├── run_python_code.py
│   │   ├── save_report.py
│   │   ├── search_knowledge.py
│   │   └── timeseries_forecast.py
│   └── utils.py
├── pyproject.toml
├── run_streamlit_ui.py
├── run_tests.py
├── sample_api/
│   └── async_api.py
├── sample_ui/
│   ├── async_graph_manager.py
│   ├── memory_ui.py
│   ├── plotly_utils.py
│   ├── simple_ui.py
│   ├── streaming_ui.py
│   ├── streamlit_ui.py
│   └── style.py
├── tests/
│   ├── README.md
│   ├── __init__.py
│   ├── conftest.py
│   ├── context_management/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── conftest.py
│   │   ├── test_agent_graph_integration.py
│   │   ├── test_context_config.py
│   │   ├── test_context_manager.py
│   │   ├── test_edge_cases.py
│   │   ├── test_runner.py
│   │   └── test_state_operations.py
│   ├── test_catalog_loader.py
│   ├── test_catalog_store.py
│   ├── test_config_loader.py
│   ├── test_graph_state.py
│   ├── test_incomplete_tool_calls.py
│   ├── test_memory.py
│   ├── test_plotly_utils.py
│   ├── test_simple_store.py
│   ├── test_text2sql_extraction.py
│   ├── test_text2sql_generate_sql.py
│   ├── test_text2sql_schema_linking.py
│   ├── test_text2sql_visualization.py
│   ├── test_tools_ask_human.py
│   ├── test_tools_run_python_code.py
│   ├── test_tools_search_knowledge.py
│   └── test_utils.py
└── timeseries_forecasting/
    ├── Dockerfile
    ├── README.md
    ├── app.py
    ├── build_and_run.sh
    ├── model_handler.py
    └── test_forecasting.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


================================================
FILE: .github/workflows/docs.yml
================================================
name: Build and Deploy Documentation

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: "pages"
  cancel-in-progress: false

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip setuptools wheel
        pip install -e ".[docs]"

    - name: Build documentation
      run: |
        cd docs
        make html

    - name: Setup Pages
      uses: actions/configure-pages@v4

    - name: Upload artifact
      uses: actions/upload-pages-artifact@v3
      with:
        path: './docs/build/html'

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

================================================
FILE: .github/workflows/publish.yml
================================================
name: Publish to PyPI

on:
  release:
    types: [published]  # Trigger when a release is published
  workflow_dispatch:  # Allow manual triggering

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.11', '3.12']

    steps:
    - uses: actions/checkout@v4

    - name: Install uv
      uses: astral-sh/setup-uv@v4
      with:
        version: "latest"

    - name: Set up Python ${{ matrix.python-version }}
      run: uv python install ${{ matrix.python-version }}

    - name: Install dependencies
      run: |
        uv sync --all-extras

    - name: Run linting
      run: |
        uv run black --check .

    - name: Run tests
      run: |
        uv run pytest -v --cov=openchatbi --cov-report=xml

    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella

  build:
    needs: test
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Install uv
      uses: astral-sh/setup-uv@v4
      with:
        version: "latest"

    - name: Set up Python
      run: uv python install 3.11

    - name: Build package
      run: |
        uv build

    - name: Check build artifacts
      run: |
        ls -la dist/
        uv run twine check dist/*

    - name: Upload build artifacts
      uses: actions/upload-artifact@v4
      with:
        name: dist
        path: dist/

  publish:
    needs: build
    runs-on: ubuntu-latest
    if: github.event_name == 'release'
    environment:
      name: pypi
      url: https://pypi.org/p/openchatbi
    permissions:
      id-token: write  # Required for PyPI trusted publishing
      contents: read

    steps:
    - name: Download build artifacts
      uses: actions/download-artifact@v4
      with:
        name: dist
        path: dist/

    - name: Publish to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        # Uses PyPI trusted publishing, no API token needed
        verbose: true
        print-hash: true

  publish-test:
    needs: build
    runs-on: ubuntu-latest
    if: github.event_name == 'workflow_dispatch'  # Only publish to TestPyPI when manually triggered
    environment:
      name: testpypi
      url: https://test.pypi.org/p/openchatbi
    permissions:
      id-token: write
      contents: read

    steps:
    - name: Download build artifacts
      uses: actions/download-artifact@v4
      with:
        name: dist
        path: dist/

    - name: Publish to TestPyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        repository-url: https://test.pypi.org/legacy/
        verbose: true
        print-hash: true

================================================
FILE: .github/workflows/runledger.yml
================================================
name: runledger
on:
  workflow_dispatch:
  pull_request:
    paths:
      - "openchatbi/**"

jobs:
  runledger:
    if: github.event_name == 'workflow_dispatch' || contains(github.event.pull_request.labels.*.name, 'runledger')
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install runledger
          python -m pip install .
      - name: Run deterministic evals (replay)
        run: |
          runledger run evals/runledger --mode replay --baseline baselines/runledger-openchatbi.json
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: runledger-artifacts
          path: runledger_out/**


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# UV
#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#uv.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
#poetry.toml

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#   pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
#   https://pdm-project.org/en/latest/usage/project/#working-with-version-control
#pdm.lock
#pdm.toml
.pdm-python
.pdm-build/

# pixi
#   Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
#pixi.lock
#   Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
#   in the .venv directory. It is recommended not to include this directory in version control.
.pixi

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/

# Visual Studio Code
#  Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 
#  that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
#  and can be added to the global gitignore or merged into this file. However, if you prefer, 
#  you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc

# Cursor
#  Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
#  exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
#  refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/

# project spec
openchatbi/config.yaml
memory.db
checkpoints.db
data
hf_model
timeseries_forecasting/hf_model
runledger_out/


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to OpenChatBI
Hi there! Thank you for your interest in contributing to OpenChatBI.

OpenChatBI started as a personal project, with the hope of making it easier for businesses to build their own ChatBI applications with less effort. To achieve this goal, I made it open source, and I greatly appreciate contributions of all kinds.

Whether you’d like to propose a new feature, refactor the code, enhance documentation, or fix bugs, your contributions are always welcome.


================================================
FILE: Dockerfile.python-executor
================================================
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install basic packages that might be needed for data analysis
RUN pip install --no-cache-dir \
    pandas \
    numpy \
    matplotlib \
    seaborn \
    requests \
    json5

# Create a directory for code execution
RUN mkdir -p /app/code

# Set up a non-root user for security
RUN useradd -m -u 1000 executor
USER executor

# Set the default command
CMD ["python3"]

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2025 Yu Zhong

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# OpenChatBI

OpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to help users 
query, analyze, and visualize data through natural language conversations. Built on LangGraph and LangChain ecosystem, 
it provides chat agents and workflows that support natural language to SQL conversion and streamlined data analysis.

Join the Slack channel to discuss: https://join.slack.com/t/openchatbicommunity/shared_invite/zt-3jpzpx9mv-Sk88RxpO4Up0L~YTZYf4GQ

<img src="https://github.com/zhongyu09/openchatbi/raw/main/example/demo.gif" alt="Demo" width="800">

## Core Features

1. **Natural Language Interaction**: Get data analysis results by asking questions in natural language
2. **Automatic SQL Generation**: Convert natural language queries into SQL statements using advanced text2sql workflows
   with schema linking and well organized prompt engineering
3. **Data Visualization**: Generate intuitive data visualizations (via plotly)
4. **Data Catalog Management**: Automatically discovers and indexes database table structures, supports flexible catalog
   storage backends with vector-based or BM25-based retrieval, and easily maintains business explanations for tables
   and columns as well as optimizes Prompts.
5. **Time Series Forecasting**: Forecasting models deployed in-house that can be called as tools
6. **Code Execution**: Execute Python code for data analysis and visualization
7. **Interactive Problem-Solving**: Proactively ask users for more context when information is incomplete
8. **Persistent Memory**: Conversation management and user characteristic memory based on LangGraph checkpointing
9. **MCP Support**: Integration with MCP tools by configuration
10. **Knowledge Base Integration**: Answer complex questions by combining catalog based knowledge retrival and external
   knowledge base retrival (via MCP tools)
11. **Web UI Interface**: Provide 2 sample UI: simple and streaming web interfaces using Gradio and Streamlit, easy to
   integrate with other web applications

## Roadmap

1. **Anomaly Detection Algorithm**: Time series anomaly detection
2. **Root Cause Analysis Algorithm**: Multi-dimensional drill-down capabilities for anomaly investigation

# Getting started

## Installation & Setup

### Prerequisites

- Python 3.11 or higher
- Access to a supported LLM provider (OpenAI, Anthropic, etc.)
- Data Warehouse (Database) credentials (like Presto, PostgreSQL, MySQL, etc.)
- (Optional) Embedding model for vector-based retrieval - if not available, BM25-based retrieval will be used
- (Optional) Docker - required only for `docker` executor mode

**Note on Chinese Text Segmentation**: For better Chinese text retrieval, `jieba` is used for word segmentation. However, `jieba` is not compatible with Python 3.12+. On Python 3.12 and higher, the system automatically falls back to simple punctuation-based segmentation for Chinese text.

### Installation

1. **Using uv (recommended):**

```bash
git clone git@github.com:zhongyu09/openchatbi
uv sync
```

2. **Using pip:**

```bash
pip install openchatbi
```

3. **For development:**

```bash
git clone git@github.com:zhongyu09/openchatbi
uv sync --group dev
```

Optional: If you want to use `pysqlite3` (newer SQLite builds), you can install it manually. If build fails, install SQLite first:

On macOS, try to install sqlite using Homebrew:
```bash
brew install sqlite
brew info sqlite
export LDFLAGS="-L/opt/homebrew/opt/sqlite/lib"
export CPPFLAGS="-I/opt/homebrew/opt/sqlite/include"
```
On Amazon Linux / RHEL / CentOS:
```bash
sudo yum install sqlite-devel
```
On Ubuntu / Debian:
```bash
sudo apt-get update
sudo apt-get install libsqlite3-dev
```

### Run Demo

Run demo using **example dataset** from spider dataset. You need to provide "YOUR OPENAI API KEY" or change config to use other LLM providers.

**Note**: The demo example includes embedding model configuration. If you want to run without an embedding model, you can remove the `embedding_model` section in the config - BM25 retrieval will be used automatically.

```bash
cp example/config.yaml openchatbi/config.yaml
sed -i 's/YOUR_API_KEY_HERE/[YOUR OPENAI API KEY]/g' openchatbi/config.yaml
python run_streamlit_ui.py
```

### Configuration

1. **Create configuration file**

Copy the configuration template:
```bash
cp openchatbi/config.yaml.template openchatbi/config.yaml
```
Or create an empty YAML file.

2. **Configure your LLMs:**

```yaml
# Select which provider to use
default_llm: openai

# Define one or more providers
llm_providers:
  openai:
    default_llm:
      class: langchain_openai.ChatOpenAI
      params:
        api_key: YOUR_API_KEY_HERE
        model: gpt-4.1
        temperature: 0.02
        max_tokens: 8192

    # Optional: Embedding model for vector-based retrieval and memory tools
    # If not configured, BM25-based retrieval will be used, and the memory tools will not work
    embedding_model:
      class: langchain_openai.OpenAIEmbeddings
      params:
        api_key: YOUR_API_KEY_HERE
        model: text-embedding-3-large
        chunk_size: 1024
```

3. **Configure your data warehouse:**

```yaml
organization: Your Company
dialect: presto
data_warehouse_config:
  uri: "presto://user@host:8080/catalog/schema"
  include_tables:
    - your_table_name
  database_name: "catalog.schema"
```

### Running the Application

1. **Invoking LangGraph:**

```bash
export CONFIG_FILE=YOUR_CONFIG_FILE_PATH
```

```python
from openchatbi import get_default_graph

graph = get_default_graph()
graph.invoke({"messages": [{"role": "user", "content": "Show me ctr trends for the past 7 days"}]},
    config={"configurable": {"thread_id": "1"}})
```

```
# System-generated SQL
SELECT date, SUM(clicks)/SUM(impression) AS ctr
FROM ad_performance
WHERE date >= CURRENT_DATE - 7 DAYS
GROUP BY date
ORDER BY date;
```

2. **Sample Web UI:**

Streamlit based UI:
```bash
streamlit run sample_ui streamlit_ui.py
```

Run Gradio based UI:
```bash
python sample_ui/streaming_ui.py
```

## Configuration Instructions

The configuration template is provided at `config.yaml.template`. Key configuration sections include:

### Basic Settings

- `organization`: Organization name (e.g., "Your Company")
- `dialect`: Database dialect (e.g., "presto")
- `bi_config_file`: Path to BI configuration file (e.g., "example/bi.yaml")

### Catalog Store Configuration

- `catalog_store`: Configuration for data catalog storage
    - `store_type`: Storage type (e.g., "file_system")
    - `data_path`: Path to catalog data stored by file system (e.g., "./example")

### Data Warehouse Configuration

- `data_warehouse_config`: Database connection settings
    - `uri`: Connection string for your database
    - `include_tables`: List of tables to include in catalog, leave empty to include all tables
    - `database_name`: Database name for catalog
    - `token_service`: Token service URL (for data warehouse that need token authentication like Presto)
    - `user_name` / `password`: Token service credentials

### LLM Configuration

Various LLMs are supported based on LangChain, see LangChain API
Document(https://python.langchain.com/api_reference/reference.html#integrations) for full list that support
`chat_models`. You can configure different LLMs for different tasks:

- `default_llm`: Primary language model for general tasks
- `embedding_model`: (Optional) Model for embedding generation. If not configured, BM25-based text retrieval will be used as fallback, and the memory tools will not work
- `text2sql_llm`: (Optional) Specialized model for SQL generation. If not configured, uses `default_llm`

Multiple providers (optional):

- Configure multiple providers under `llm_providers` and select with `default_llm: <provider_name>`.
- In `sample_ui/streamlit_ui.py`, a provider dropdown appears when `llm_providers` is configured.
- In `sample_api/async_api.py`, pass `provider` in the `/chat/stream` request body.

Commonly used LLM providers and their corresponding classes and installation commands:

- **Anthropic**: `langchain_anthropic.ChatAnthropic`, `pip install langchain-anthropic`
- **OpenAI**: `langchain_openai.ChatOpenAI`, `pip install langchain-openai`
- **Azure OpenAI**: `langchain_openai.AzureChatOpenAI`, `pip install langchain-openai`
- **Google Vertex AI**: `langchain_google_vertexai.ChatVertexAI`, `pip install langchain-google-vertexai`
- **Bedrock**: `langchain_aws.ChatBedrock`, `pip install langchain-aws`
- **Huggingface**: `langchain_huggingface.ChatHuggingFace`, `pip install langchain-huggingface`
- **Deepseek**: `langchain_deepseek.ChatDeepSeek`, `pip install langchain-deepseek`
- **Ollama**: `langchain_ollama.ChatOllama`, `pip install langchain-ollama`

### Advanced Configuration

OpenChatBI supports sophisticated customization through prompt engineering and catalog management features:

- **Prompt Engineering Configuration**: Customize system prompts, business glossaries, and data warehouse introductions
- **Data Catalog Management**: Configure table metadata, column descriptions, and SQL generation rules
- **Business Rules**: Define table selection criteria and domain-specific SQL constraints
- **Forecasting Service**: Configure the forecasting service url and prompt based on your own deployment 

For detailed configuration options and examples, see the [Advanced Features](#advanced-features) section.

## Architecture Overview

OpenChatBI is built using a modular architecture with clear separation of concerns:

1. **LangGraph Workflows**: Core orchestration using state machines for complex multi-step processes
2. **Catalog Management**: Flexible data catalog system with intelligent retrieval (vector-based or BM25 fallback)
3. **Text2SQL Pipeline**: Advanced natural language to SQL conversion with schema linking
4. **Code Execution**: Sandboxed Python execution environment for data analysis
5. **Tool Integration**: Extensible tool system for human interaction and knowledge search
6. **Persistent Memory**: SQLite-based conversation state management

## Technology Stack

- **Frameworks**: LangGraph, LangChain, FastAPI, Gradio/Streamlit
- **Large Language Models**: Azure OpenAI (GPT-4), Anthropic Claude, OpenAI GPT models
- **Text Retrieval**: Vector-based (with embedding models) or BM25-based (fallback without embeddings)
- **Databases**: Presto, Trino, MySQL with SQLAlchemy support
- **Code Execution**: Local Python, RestrictedPython, Docker containerization
- **Development**: Python 3.11+, with modern tooling (Black, Ruff, MyPy, Pytest)
- **Storage**: SQLite for conversation checkpointing, file system catalog storage

### Agent Graph
<img src="https://github.com/zhongyu09/openchatbi/raw/main/assets/agent_graph.png" alt="Agent Graph" width="800">

### Text2SQL Graph
<img src="https://github.com/zhongyu09/openchatbi/raw/main/assets/text2sql_graph.png" alt="Text2SQL Graph" width="800">

## Project Structure

```
openchatbi/
├── README.md                    # Project documentation
├── pyproject.toml               # Modern Python project configuration
├── Dockerfile.python-executor  # Docker image for isolated code execution
├── run_tests.py                # Test runner script
├── run_streamlit_ui.py         # Streamlit UI launcher
├── openchatbi/                 # Core application code
│   ├── __init__.py             # Package initialization
│   ├── config.yaml.template    # Configuration template
│   ├── config_loader.py        # Configuration management
│   ├── constants.py            # Application constants
│   ├── agent_graph.py          # Main LangGraph workflow
│   ├── graph_state.py          # State definition for workflows
│   ├── context_config.py       # Context management configuration
│   ├── context_manager.py      # Context window and token management
│   ├── text_segmenter.py       # Text segmentation with jieba support
│   ├── utils.py                # Utility functions and SimpleStore (BM25-based retrieval)
│   ├── catalog/                # Data catalog management
│   │   ├── __init__.py         # Package initialization
│   │   ├── catalog_loader.py   # Catalog loading logic
│   │   ├── catalog_store.py    # Catalog storage interface
│   │   ├── factory.py          # Catalog factory patterns
│   │   ├── helper.py           # Catalog helper functions
│   │   ├── retrival_helper.py  # Retrieval helper utilities
│   │   ├── schema_retrival.py  # Schema retrieval logic
│   │   ├── token_service.py    # Token service integration
│   │   └── store/              # Catalog storage implementations
│   │       └── file_system.py  # File system-based catalog storage
│   ├── code/                   # Code execution framework
│   │   ├── __init__.py         # Package initialization
│   │   ├── executor_base.py    # Base executor interface
│   │   ├── local_executor.py   # Local Python execution
│   │   ├── restricted_local_executor.py # RestrictedPython execution
│   │   └── docker_executor.py  # Docker-based isolated execution
│   ├── llm/                    # LLM integration layer
│   │   ├── __init__.py         # Package initialization
│   │   └── llm.py              # LLM management and retry logic
│   ├── prompts/                # Prompt templates and engineering
│   │   ├── __init__.py         # Package initialization
│   │   ├── agent_prompt.md     # Main agent prompts
│   │   ├── extraction_prompt.md # Information extraction prompts
│   │   ├── system_prompt.py    # System prompt management
│   │   ├── summary_prompt.md   # Summary conversation prompts
│   │   ├── table_selection_prompt.md # Table selection prompts
│   │   ├── text2sql_prompt.md  # Text-to-SQL prompts
│   │   └── sql_dialect/        # SQL dialect-specific prompts
│   ├── text2sql/               # Text-to-SQL conversion pipeline
│   │   ├── __init__.py         # Package initialization
│   │   ├── data.py             # Data and retriever for Text-to-SQL
│   │   ├── extraction.py       # Information extraction
│   │   ├── generate_sql.py     # SQL generation and execution logic
│   │   ├── schema_linking.py   # Schema linking process
│   │   ├── sql_graph.py        # SQL generation LangGraph workflow
│   │   ├── text2sql_utils.py   # Text2SQL utilities
│   │   └── visualization.py    # Data visualization functions
│   └── tool/                   # LangGraph tools and functions
│       ├── ask_human.py        # Human-in-the-loop interactions
│       ├── memory.py           # Memory management tool
│       ├── mcp_tools.py        # MCP (Model Context Protocol) integration
│       ├── run_python_code.py  # Configurable Python code execution
│       ├── save_report.py      # Report saving functionality
│       ├── search_knowledge.py # Knowledge base search
│       └── timeseries_forecast.py # Time series forecasting tool
├── sample_api/                 # API implementations
│   └── async_api.py            # Asynchronous FastAPI example
├── sample_ui/                  # Web interface implementations
│   ├── memory_ui.py            # Memory-enhanced UI interface
│   ├── plotly_utils.py         # Plotly utilities and helpers
│   ├── simple_ui.py            # Simple non-streaming Gradio UI
│   ├── streaming_ui.py         # Streaming Gradio UI with real-time updates
│   ├── streamlit_ui.py         # Streaming Streamlit UI with enhanced features
│   └── style.py                # UI styling and CSS
├── example/                    # Example configurations and data
│   ├── bi.yaml                 # BI configuration example
│   ├── config.yaml             # Application config example
│   ├── table_info.yaml         # Table information
│   ├── table_columns.csv       # Table column registry
│   ├── common_columns.csv      # Common column definitions
│   ├── sql_example.yaml        # SQL examples for retrieval
│   ├── table_selection_example.csv # Table selection examples
│   └── tracking_orders.sqlite  # Sample SQLite database
├── timeseries_forecasting/     # Time series forecasting service
│   ├── README.md               # Forecasting service documentation
│   └── ...                     # Forecasting service implementation
├── tests/                      # Test suite
│   ├── __init__.py             # Package initialization
│   ├── conftest.py             # Test configuration
│   ├── test_*.py               # Test modules for various components
│   └── README.md               # Testing documentation
├── docs/                       # Documentation
│   ├── source/                 # Sphinx documentation source
│   ├── build/                  # Built documentation
│   ├── Makefile                # Documentation build scripts
│   └── make.bat                # Windows build script
└── .github/                    # GitHub workflows and templates
    └── workflows/              # CI/CD workflows
```

## Advanced Features

### Visualization configuration
You can choose rule-based or llm-based visualization or disable visualization.
```yaml
# Options: "rule" (rule-based), "llm" (LLM-based), or null (skip visualization)
visualization_mode: llm
```

### Prompt Engineering
#### Basic Knowledge & Glossary

You can define basic knowledge and glossary in `example/bi.yaml`, for example:

```yaml
basic_knowledge_glossary: |
  # Basic Knowledge Introduction
    The basic knowledge about your company and its business, including key concepts, metrics, and processes.
  # Glossary
    Common terms and their definitions used in your business context.
```

#### Data Warehouse Introduction

You can provide a brief introduction of your data warehouse in `example/bi.yaml`, for example:

```yaml
data_warehouse_introduction: |
  # Data Warehouse Introduction
    This data warehouse is built on Presto and contains various tables related to XXXXX.
    The main fact tables include XXXX metrics, while dimension tables include XXXXX.
    The data is updated hourly and is used for reporting and analysis purposes.
```

#### Table Selection Rules

You can configure table selection rules in `example/bi.yaml`, for example:

```yaml
table_selection_extra_rule: |
  - All tables with is_valid can support both valid and invalid traffics
```

#### Custom SQL Rules

You can define your additional SQL Generation rules for tables in `example/table_info.yaml`, for example:

```yaml
sql_rule: |
  ### SQL Rules
  - All event_date in the table are stored in **UTC**. If the user specifies a timezone (e.g., CET, PST), convert between timezones accordingly.

```


### Catalog Management

#### Introduction

High-quality catalog data is essential for accurate Text2SQL generation and data analysis. OpenChatBI automatically 
discovers and indexes data warehouse table structures while providing flexible management for business metadata, column 
descriptions, and query optimization rules.

#### Catalog Structure

The catalog system organizes metadata in a hierarchical structure:

**Database Level**
- Top-level container for all tables and schemas

**Table Level**
- `description`: Business functionality and purpose of the table
- `selection_rule`: Guidelines for when and how to use this table in queries
- `sql_rule`: Specific SQL generation rules and constraints for this table

**Column Level**
- **Required Fields**: Essential metadata for each column to enable effective Text2SQL generation
  - `column_name`: Technical database column name
  - `display_name`: Human-readable name for business users
  - `alias`: Alternative names or abbreviations
  - `type`: Data type (string, integer, date, etc.)
  - `category`: Business category, dimension or metric
  - `tag`: Additional labels for filtering and organization
  - `description`: Detailed explanation of column purpose and usage
- **Two Types** of Columns
  - **Common Columns**: Columns with standardized business meanings shared across tables
  - **Table-Specific Columns**: Columns with context-dependent meanings that vary between tables
- **Derived Metrics**: Virtual metrics calculated from existing columns using SQL formulas
  - Computed dynamically during query execution rather than stored as physical columns
  - Examples: CTR (clicks/impressions), conversion rates, profit margins
  - Enable complex business calculations without pre-computing values
  
#### Loading Catalog from Database

OpenChatBI can automatically discover and load table structures from your data warehouse:

1. **Automatic Discovery**: Connects to your configured data warehouse and scans table schemas
2. **Metadata Extraction**: Extracts column names, data types, and basic structural information
3. **Incremental Updates**: Supports updating catalog data as your database schema evolves

Configure automatic catalog loading in your `config.yaml`:

```yaml
catalog_store:
  store_type: file_system
  data_path: ./catalog_data
data_warehouse_config:
  include_tables:
    - your_table_pattern
  # Leave empty to include all accessible tables
```

#### File System Catalog Store

The file system catalog store organizes metadata across multiple files for maintainability and version control:

**Core Table Information**
- `table_info.yaml`: Comprehensive table metadata organized hierarchically (database → table → information)
  - `type`: Table classification (e.g., "fact" for Fact Tables, "dimension" for Dimension Tables)
  - `description`: Business functionality and purpose
  - `selection_rule`: Usage guidelines in markdown list format (each line starts with `-`)
  - `sql_rule`: SQL generation rules in markdown header format (each rule starts with `####`)
  - `derived_metric`: Virtual metrics with calculation formulas, organized by groups:
    ```md
    #### Derived Ratio Metrics
    Click-through Rate (alias CTR): SUM(clicks) / SUM(impression)
    Conversion Rate (alias CVR): SUM(conversions) / SUM(clicks)
    ```

**Column Management**
- `table_columns.csv`: Basic column registry with schema `db_name,table_name,column_name`
- `table_spec_columns.csv`: Table-specific column metadata with full schema:
  `db_name,table_name,column_name,display_name,alias,type,category,tag,description`
- `common_columns.csv`: Shared column definitions across tables with schema:
  `column_name,display_name,alias,type,category,tag,description`

**Query Examples and Training Data**
- `table_selection_example.csv`: Table selection training examples with schema `question,selected_tables`
- `sql_example.yaml`: Query examples organized by database and table structure:
  ```yaml
  your_database:
    ad_performance: |
      Q: Show me CTR trends for the past 7 days
      A: SELECT date, SUM(clicks)/SUM(impressions) AS ctr
         FROM ad_performance
         WHERE date >= CURRENT_DATE - INTERVAL 7 DAY
         GROUP BY date
         ORDER BY date;
  ```

### Time Series Forecasting Service Setup

OpenChatBI can integrate with a time series forecasting service for advanced predictive analytics. Follow these steps to set up the service:

#### 1. Build and Run the Forecasting Service

See detailed instructions in [timeseries_forecasting/README.md](timeseries_forecasting/README.md)

Quick start:
```bash
cd timeseries_forecasting
./build_and_run.sh
```

#### 2. Configure Tool Usage Rules

In your `bi.yaml`, add constraints for the timeseries_forecast tool, e.g. if you are using `timer-base-84m` model:
```yaml
extra_tool_use_rule: |
  - timeseries_forecast tool requires at least 96 time points in input data. If no enough input data, set input_len to 96 to pad with zeros.
```

#### 3. Configure Service URL

In your `config.yaml`:
```yaml
# Time Series Forecasting Service Configuration
timeseries_forecasting_service_url: "http://localhost:8765"
```

**Important**: Adjust the URL based on your deployment scenario:
- **Local development** (OpenChatBI on host, Forecasting service in Docker): `http://localhost:8765`
- **Remote service**: `http://your-service-host:8765`


#### 4. Verify Service Health

Test the service is accessible:
```bash
curl http://localhost:8765/health
```

Expected response:
```json
{
  "status": "healthy",
  "model_initialized": true,
  "uptime_seconds": 123.45
}
``` 

### Python Code Execution Configuration

OpenChatBI supports multiple execution environments for running Python code with different security and performance characteristics:

```yaml
# Python Code Execution Configuration
python_executor: local  # Options: "local", "restricted_local", "docker"
```

#### Executor Types

- **`local`** (Default)
  - **Performance**: Fastest execution
  - **Security**: Least secure (code runs in current Python process)
  - **Capabilities**: Full Python capabilities and library access
  - **Use Case**: Development environments, trusted code execution

- **`restricted_local`**
  - **Performance**: Moderate execution speed
  - **Security**: Moderate security with RestrictedPython sandboxing
  - **Capabilities**: Limited Python features (no imports, file access, etc.)
  - **Use Case**: Semi-trusted environments with controlled execution

- **`docker`**
  - **Performance**: Slower due to container overhead
  - **Security**: Highest security with complete process isolation
  - **Capabilities**: Full Python capabilities within isolated container
  - **Use Case**: Production environments, untrusted code execution
  - **Requirements**: Docker must be installed and running

#### Docker Executor Setup

For production deployments or when running untrusted code, the Docker executor provides complete isolation:

1. **Install Docker**: Download and install Docker Desktop or Docker Engine
2. **Configure executor**: Set `python_executor: docker` in your config
3. **Automatic setup**: OpenChatBI will automatically build the required Docker image
4. **Fallback behavior**: If Docker is unavailable, automatically falls back to local executor

**Docker Executor Features**:
- Pre-installed data science libraries (pandas, numpy, matplotlib, seaborn)
- Network isolation for security
- Automatic container cleanup
- Resource isolation from host system

## Development & Testing

### Code Quality Tools

The project uses modern Python tooling for code quality:

```bash
# Format code
uv run black .

# Lint code  
uv run ruff check .

# Type checking
uv run mypy openchatbi/

# Security scanning
uv run bandit -r openchatbi/
```

### Testing

Run the test suite:

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=openchatbi --cov-report=html

# Run specific test files
uv run pytest test/test_generate_sql.py
uv run pytest test/test_agent_graph.py
```

### Pre-commit Hooks

Install pre-commit hooks for automatic code quality checks:

```bash
uv run pre-commit install
```

## Contribution Guidelines

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/fooBar`)
3. Commit your changes (`git commit -am 'Add some fooBar'`)
4. Push to the branch (`git push origin feature/fooBar`)
5. Create a new Pull Request

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details

## Contact & Support

- **Author**: Yu Zhong ([zhongyu8@gmail.com](mailto:zhongyu8@gmail.com))
- **Repository**: [github.com/zhongyu09/openchatbi](https://github.com/zhongyu09/openchatbi)
- **Issues**: [Report bugs and feature requests](https://github.com/zhongyu09/openchatbi/issues)


================================================
FILE: baselines/runledger-openchatbi.json
================================================
{
  "aggregates": {
    "cases_error": 0,
    "cases_fail": 0,
    "cases_pass": 1,
    "cases_total": 1,
    "metrics": {
      "cost_usd": {
        "max": null,
        "mean": null,
        "min": null,
        "p50": null,
        "p95": null
      },
      "steps": {
        "max": null,
        "mean": null,
        "min": null,
        "p50": null,
        "p95": null
      },
      "tokens_in": {
        "max": null,
        "mean": null,
        "min": null,
        "p50": null,
        "p95": null
      },
      "tokens_out": {
        "max": null,
        "mean": null,
        "min": null,
        "p50": null,
        "p95": null
      },
      "tool_calls": {
        "max": 1.0,
        "mean": 1.0,
        "min": 1.0,
        "p50": 1.0,
        "p95": 1.0
      },
      "tool_errors": {
        "max": 0.0,
        "mean": 0.0,
        "min": 0.0,
        "p50": 0.0,
        "p95": 0.0
      },
      "wall_ms": {
        "max": 1.0,
        "mean": 1.0,
        "min": 1.0,
        "p50": 1.0,
        "p95": 1.0
      }
    },
    "pass_rate": 1.0
  },
  "cases": [
    {
      "assertions": {
        "failed": 0,
        "total": 1
      },
      "cost_usd": null,
      "failed_assertions": null,
      "failure_reason": null,
      "id": "t1",
      "replay": {
        "cassette_path": "evals/runledger/cassettes/t1.jsonl",
        "cassette_sha256": "7e9830609490d140bf09178106dfa647bba4c9ec15859072b5aa2c3ae1659289"
      },
      "status": "pass",
      "steps": null,
      "tokens_in": null,
      "tokens_out": null,
      "tool_calls": 1,
      "tool_calls_by_name": {
        "search_knowledge": 1
      },
      "tool_errors": 0,
      "tool_errors_by_name": {},
      "wall_ms": 1
    }
  ],
  "generated_at": "2026-01-03T19:10:00Z",
  "run": {
    "ci": null,
    "exit_status": "success",
    "git_sha": null,
    "mode": "replay",
    "run_id": "baseline"
  },
  "runledger_version": "0.1.1",
  "schema_version": 1,
  "suite": {
    "agent_command": [
      "python",
      "evals/runledger/agent/agent.py"
    ],
    "cases_total": 1,
    "name": "runledger-openchatbi",
    "suite_config_hash": null,
    "suite_path": "evals/runledger/suite.yaml",
    "tool_mode": "replay"
  }
}


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = source
BUILDDIR      = build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/make.bat
================================================
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
	set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
	echo.
	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
	echo.installed, then set the SPHINXBUILD environment variable to point
	echo.to the full path of the 'sphinx-build' executable. Alternatively you
	echo.may add the Sphinx directory to PATH.
	echo.
	echo.If you don't have Sphinx installed, grab it from
	echo.https://www.sphinx-doc.org/
	exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd


================================================
FILE: docs/source/_templates/layout.html
================================================
{% extends "!layout.html" %}

{% block extrahead %}
  {{ super() }}
  <meta name="google-site-verification" content="geDcsz839O_UHavbn1pIpMOk6sJgneL4NlULcVJ4-KA" />
  <script async src="https://www.googletagmanager.com/gtag/js?id=AW-17595718197"></script>
  <script>
    window.dataLayer = window.dataLayer || [];
    function gtag(){dataLayer.push(arguments);}
    gtag('js', new Date());
    gtag('config', 'AW-17595718197');
  </script>
  <script>
    gtag('event', 'conversion', {
        'send_to': 'AW-17595718197/JxBiCPzC06AbELW0pcZB',
        'value': 1.0,
        'currency': 'SGD'
    });
  </script>
{% endblock %}

================================================
FILE: docs/source/catalog.rst
================================================
Catalog System
==============

Overview
--------

The catalog system manages metadata for database tables, columns, and business rules.

Catalog Store
-------------

.. automodule:: openchatbi.catalog.catalog_store
    :members:
    :undoc-members:
    :show-inheritance:

Filesystem Implementation
^^^^^^^^^^^^^^^^^^^^^^^^^

.. automodule:: openchatbi.catalog.store.file_system
    :members:
    :show-inheritance:

Catalog Loader
--------------

.. automodule:: openchatbi.catalog.catalog_loader
    :members:
    :undoc-members:
    :show-inheritance:

Schema Retrieval
----------------

.. automodule:: openchatbi.catalog.schema_retrival
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/code.rst
================================================
Code Execution
==============

Code Module
-----------

.. automodule:: openchatbi.code
    :members:
    :undoc-members:
    :show-inheritance:

Executor Base
-------------

.. automodule:: openchatbi.code.executor_base
    :members:
    :undoc-members:
    :show-inheritance:

Local Executor
--------------

.. automodule:: openchatbi.code.local_executor
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import os
import sys

sys.path.insert(0, os.path.abspath("../.."))

project = "OpenChatBI"
copyright = "2025, Yu Zhong"
author = "Yu Zhong"
release = "0.2.2"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

# Mock dependencies for documentation build
autodoc_mock_imports = []
extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.napoleon",
    "sphinx.ext.viewcode",
    "sphinx.ext.githubpages",
    "myst_parser",
]

# Set an environment variable to indicate we're building docs
import os

os.environ["SPHINX_BUILD"] = "1"

# MyST parser configuration
myst_enable_extensions = [
    "colon_fence",
    "deflist",
    "html_admonition",
    "html_image",
]
myst_heading_anchors = 3

templates_path = ["_templates"]
exclude_patterns = []

# Autodoc configuration
autodoc_default_options = {
    "members": True,
    "member-order": "bysource",
    "special-members": "__init__",
    "undoc-members": True,
    "exclude-members": "__weakref__",
}

# Napoleon configuration for Google/NumPy style docstrings
napoleon_google_docstring = True
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = False
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = False
napoleon_use_admonition_for_references = False
napoleon_use_ivar = False
napoleon_use_param = True
napoleon_use_rtype = True
napoleon_preprocess_types = False
napoleon_type_aliases = None
napoleon_attr_annotations = True

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "sphinx_rtd_theme"
html_static_path = ["_static"]

# GitHub Pages configuration
html_baseurl = "https://zhongyu09.github.io/openchatbi/"

# Theme options for RTD theme
html_theme_options = {
    "navigation_depth": 4,
    "collapse_navigation": False,
    "sticky_navigation": True,
    "includehidden": True,
    "titles_only": False,
}


================================================
FILE: docs/source/config.rst
================================================
Configuration
=============

The configuration system consists of two main classes:

- **Config**: Defines the configuration model.
- **ConfigLoader**: Manages loading and accessing configuration.

Config
------

.. autoclass:: openchatbi.config_loader.Config
    :exclude-members: organization, dialect, default_llm, embedding_model, text2sql_llm, bi_config, data_warehouse_config, catalog_store, mcp_servers, report_directory, python_executor

ConfigLoader
------------

.. autoclass:: openchatbi.config_loader.ConfigLoader
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/core.rst
================================================
Core Module
===========

Main Module
-----------

.. automodule:: openchatbi
    :members:
    :undoc-members:
    :show-inheritance:

Agent Graph
-----------

.. automodule:: openchatbi.agent_graph
    :members:
    :undoc-members:
    :show-inheritance:

State Management
----------------

.. automodule:: openchatbi.graph_state
    :members:
    :undoc-members:
    :show-inheritance:

Utilities
---------

.. automodule:: openchatbi.utils
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/index.rst
================================================
OpenChatBI Documentation
========================

`GitHub Repository <https://github.com/zhongyu09/openchatbi>`_

.. include:: ../../README.md
   :parser: myst_parser.sphinx_

.. toctree::
   :maxdepth: 4
   :caption: Documentation:
   :titlesonly:

   self

.. toctree::
   :maxdepth: 2
   :caption: API Reference:

   Core Module <core>
   Configuration <config>
   Catalog System <catalog>
   Text2SQL System <text2sql>
   Code Execution <code>
   LLM Integration <llm>
   Tools and Utilities <tools>
   Time Series Forecasting Service <timeseries>

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`


================================================
FILE: docs/source/llm.rst
================================================
LLM Integration
===============

LLM Module
----------

.. automodule:: openchatbi.llm
    :members:
    :undoc-members:
    :show-inheritance:

LLM Implementation
------------------

.. automodule:: openchatbi.llm.llm
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/text2sql.rst
================================================
Text2SQL System
===============

Overview
--------

Natural language to SQL conversion pipeline with schema linking and prompt engineering.


SQL Graph
---------

.. automodule:: openchatbi.text2sql.sql_graph
    :members:
    :undoc-members:
    :show-inheritance:

SQL Generation
--------------

.. automodule:: openchatbi.text2sql.generate_sql
    :members:
    :undoc-members:
    :show-inheritance:

Schema Linking
--------------

.. automodule:: openchatbi.text2sql.schema_linking
    :members:
    :undoc-members:
    :show-inheritance:

Information Extraction
----------------------

.. automodule:: openchatbi.text2sql.extraction
    :members:
    :undoc-members:
    :show-inheritance:

Text2SQL Utilities
-------------------

.. automodule:: openchatbi.text2sql.text2sql_utils
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: docs/source/timeseries.rst
================================================
Time Series Forecasting Service
========================

`GitHub Repository <https://github.com/zhongyu09/openchatbi/timeseries_forecasting>`_

.. include:: ../../timeseries_forecasting/README.md
   :parser: myst_parser.sphinx_


================================================
FILE: docs/source/tools.rst
================================================
Tools and Utilities
===================

Overview
--------

LangGraph tools for human interaction, code execution, and knowledge search.

Python Code Execution
----------------------

.. automodule:: openchatbi.tool.run_python_code
    :members:
    :undoc-members:
    :show-inheritance:

Human Interaction
-----------------

.. automodule:: openchatbi.tool.ask_human
    :members:
    :undoc-members:
    :show-inheritance:

Memory Management
-----------------

.. automodule:: openchatbi.tool.memory
    :members:
    :undoc-members:
    :show-inheritance:

Knowledge Search
----------------

.. automodule:: openchatbi.tool.search_knowledge
    :members:
    :undoc-members:
    :show-inheritance:

================================================
FILE: evals/__init__.py
================================================
"""Evaluation suites for RunLedger."""


================================================
FILE: evals/runledger/README.md
================================================
# RunLedger eval (OpenChatBI)

This suite is **replay-only** by default. It runs a deterministic CI check using a JSONL adapter that proxies tool calls through RunLedger and replays results from a cassette.

## Run (replay)

```bash
runledger run evals/runledger --mode replay --baseline baselines/runledger-openchatbi.json
```

## Record / update cassette (optional)

If you want to re-record the cassette with real tool outputs, run in record mode in a fully configured OpenChatBI environment (valid `openchatbi/config.yaml`, data warehouse/catalog, LLM keys).

```bash
runledger run evals/runledger --mode record \
  --baseline baselines/runledger-openchatbi.json \
  --tool-module evals.runledger.tools
```

Notes:
- Tool args are passed as JSON objects; see `evals/runledger/cassettes/t1.jsonl` for the exact shape.
- After recording, promote the new baseline:

```bash
runledger baseline promote \
  --from runledger_out/runledger-openchatbi/<run_id> \
  --to baselines/runledger-openchatbi.json
```



================================================
FILE: evals/runledger/__init__.py
================================================
"""RunLedger eval suite for OpenChatBI."""


================================================
FILE: evals/runledger/agent/agent.py
================================================
import json
import sys
from itertools import count
from typing import Any
from unittest.mock import MagicMock

import builtins
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
from langchain_core.tools import StructuredTool
from langgraph.checkpoint.memory import MemorySaver
from pydantic import BaseModel, Field

from openchatbi import config
import openchatbi.agent_graph as agent_graph


_CALL_COUNTER = count(1)
_ORIG_PRINT = builtins.print


def _safe_print(*args: Any, **kwargs: Any) -> None:
    """Suppress stdout prints so JSONL stays clean; allow stderr."""
    target = kwargs.get("file")
    if target is None or target is sys.stdout:
        return
    _ORIG_PRINT(*args, **kwargs)


builtins.print = _safe_print


class JsonlChannel:
    def __init__(self, stream: Any) -> None:
        self._stream = stream

    def read(self) -> dict[str, Any] | None:
        while True:
            line = self._stream.readline()
            if not line:
                return None
            line = line.strip()
            if not line:
                continue
            try:
                return json.loads(line)
            except json.JSONDecodeError:
                continue

    @staticmethod
    def send(payload: dict[str, Any]) -> None:
        sys.stdout.write(json.dumps(payload) + "\n")
        sys.stdout.flush()


def _last_user_text(messages: list[Any]) -> str:
    for message in reversed(messages):
        if isinstance(message, HumanMessage):
            return str(message.content).strip()
    return "OpenChatBI"


def _runledger_tool_call(channel: JsonlChannel, name: str, args: dict[str, Any]) -> Any:
    call_id = f"c{next(_CALL_COUNTER)}"
    channel.send({"type": "tool_call", "name": name, "call_id": call_id, "args": args})
    while True:
        message = channel.read()
        if message is None:
            raise RuntimeError("Tool result missing")
        if message.get("type") != "tool_result":
            continue
        if message.get("call_id") != call_id:
            continue
        if message.get("ok"):
            return message.get("result")
        raise RuntimeError(message.get("error") or "Tool error")


class SearchKnowledgeInput(BaseModel):
    reasoning: str = Field(description="Reason for searching knowledge")
    query_list: list[str] = Field(description="Query terms")
    knowledge_bases: list[str] = Field(description="Knowledge bases to search")
    with_table_list: bool = Field(default=False, description="Include table list")


class ShowSchemaInput(BaseModel):
    reasoning: str = Field(description="Reason for showing schema")
    tables: list[str] = Field(description="Table names")


class Text2SQLInput(BaseModel):
    reasoning: str = Field(description="Reason for calling text2sql")
    context: str = Field(description="Full context for the SQL graph")


class RunPythonInput(BaseModel):
    reasoning: str = Field(description="Reason for running python code")
    code: str = Field(description="Python code to execute")


class SaveReportInput(BaseModel):
    content: str = Field(description="Report content")
    title: str = Field(description="Report title")
    file_format: str = Field(description="File extension")


def _build_tool_proxies(channel: JsonlChannel) -> dict[str, StructuredTool]:
    def search_knowledge(
        reasoning: str,
        query_list: list[str],
        knowledge_bases: list[str],
        with_table_list: bool = False,
    ) -> Any:
        return _runledger_tool_call(
            channel,
            "search_knowledge",
            {
                "reasoning": reasoning,
                "query_list": query_list,
                "knowledge_bases": knowledge_bases,
                "with_table_list": with_table_list,
            },
        )

    def show_schema(reasoning: str, tables: list[str]) -> Any:
        return _runledger_tool_call(
            channel,
            "show_schema",
            {"reasoning": reasoning, "tables": tables},
        )

    def text2sql(reasoning: str, context: str) -> Any:
        return _runledger_tool_call(
            channel,
            "text2sql",
            {"reasoning": reasoning, "context": context},
        )

    def run_python_code(reasoning: str, code: str) -> Any:
        return _runledger_tool_call(
            channel,
            "run_python_code",
            {"reasoning": reasoning, "code": code},
        )

    def save_report(content: str, title: str, file_format: str = "md") -> Any:
        return _runledger_tool_call(
            channel,
            "save_report",
            {"content": content, "title": title, "file_format": file_format},
        )

    return {
        "search_knowledge": StructuredTool.from_function(
            func=search_knowledge,
            name="search_knowledge",
            description="RunLedger proxy for search_knowledge",
            args_schema=SearchKnowledgeInput,
        ),
        "show_schema": StructuredTool.from_function(
            func=show_schema,
            name="show_schema",
            description="RunLedger proxy for show_schema",
            args_schema=ShowSchemaInput,
        ),
        "text2sql": StructuredTool.from_function(
            func=text2sql,
            name="text2sql",
            description="RunLedger proxy for text2sql",
            args_schema=Text2SQLInput,
        ),
        "run_python_code": StructuredTool.from_function(
            func=run_python_code,
            name="run_python_code",
            description="RunLedger proxy for run_python_code",
            args_schema=RunPythonInput,
        ),
        "save_report": StructuredTool.from_function(
            func=save_report,
            name="save_report",
            description="RunLedger proxy for save_report",
            args_schema=SaveReportInput,
        ),
    }


def _stub_llm_call(chat_model: Any, messages: list[Any], **_kwargs: Any) -> AIMessage:
    tool_seen = any(isinstance(msg, ToolMessage) or getattr(msg, "type", None) == "tool" for msg in messages)
    if tool_seen:
        return AIMessage(content="Here is a deterministic summary based on the tool result.", tool_calls=[])

    user_text = _last_user_text(messages)
    tool_args = {
        "reasoning": "Look up relevant knowledge",
        "query_list": [user_text],
        "knowledge_bases": ["columns"],
        "with_table_list": False,
    }
    return AIMessage(
        content="Searching knowledge base.",
        tool_calls=[{"name": "search_knowledge", "args": tool_args, "id": "call_1"}],
    )


def _configure_agent_graph(channel: JsonlChannel) -> None:
    tool_proxies = _build_tool_proxies(channel)

    agent_graph.search_knowledge = tool_proxies["search_knowledge"]
    agent_graph.show_schema = tool_proxies["show_schema"]
    agent_graph.run_python_code = tool_proxies["run_python_code"]
    agent_graph.save_report = tool_proxies["save_report"]
    agent_graph.get_sql_tools = lambda *_args, **_kwargs: tool_proxies["text2sql"]
    agent_graph.build_sql_graph = lambda *_args, **_kwargs: object()
    agent_graph.get_memory_tools = lambda *_args, **_kwargs: []
    agent_graph.create_mcp_tools_sync = lambda *_args, **_kwargs: []
    agent_graph.check_forecast_service_health = lambda: False
    agent_graph.call_llm_chat_model_with_retry = _stub_llm_call


def _bootstrap_config() -> None:
    config.set(
        {
            "default_llm": MagicMock(),
            "data_warehouse_config": {},
            "catalog_store": {"store_type": "file_system", "auto_load": False},
        }
    )


def main() -> int:
    channel = JsonlChannel(sys.stdin)
    message = channel.read()
    if not message or message.get("type") != "task_start":
        return 1

    _bootstrap_config()
    _configure_agent_graph(channel)

    prompt = ""
    payload = message.get("input", {})
    if isinstance(payload, dict):
        prompt = payload.get("prompt") or payload.get("question") or payload.get("query") or ""
    if not prompt:
        prompt = "OpenChatBI"

    graph = agent_graph.build_agent_graph_sync(
        catalog=config.get().catalog_store,
        checkpointer=MemorySaver(),
        memory_store=None,
        enable_context_management=False,
    )

    result = graph.invoke({"messages": [{"role": "user", "content": prompt}]})
    output = ""
    if isinstance(result, dict) and result.get("messages"):
        output = str(result["messages"][-1].content)

    channel.send(
        {
            "type": "final_output",
            "output": {"category": "bi", "reply": output or "Completed request."},
        }
    )
    return 0


if __name__ == "__main__":
    raise SystemExit(main())


================================================
FILE: evals/runledger/cases/t1.yaml
================================================
id: t1
description: "basic BI flow with a single search_knowledge tool call"
input:
  prompt: "OpenChatBI"
cassette: cassettes/t1.jsonl


================================================
FILE: evals/runledger/cassettes/t1.jsonl
================================================
{"tool":"search_knowledge","args":{"knowledge_bases":["columns"],"query_list":["OpenChatBI"],"reasoning":"Look up relevant knowledge","with_table_list":false},"ok":true,"result":{"columns":"# Relevant Columns and Description:\n## openchatbi\n- Column Category: metric\n- Display Name: OpenChatBI\n- Description \"Project overview\""}}


================================================
FILE: evals/runledger/schema.json
================================================
{
  "type": "object",
  "properties": {
    "category": {
      "type": "string"
    },
    "reply": {
      "type": "string"
    }
  },
  "required": [
    "category",
    "reply"
  ]
}


================================================
FILE: evals/runledger/suite.yaml
================================================
suite_name: runledger-openchatbi
agent_command: ["python", "evals/runledger/agent/agent.py"]
mode: replay
cases_path: cases
tool_registry:
  - search_knowledge
tool_module: evals.runledger.tools

assertions:
  - type: json_schema
    schema_path: schema.json

budgets:
  max_wall_ms: 20000
  max_tool_calls: 5
  max_tool_errors: 0

baseline_path: ../../baselines/runledger-openchatbi.json


================================================
FILE: evals/runledger/tools.py
================================================
from __future__ import annotations

from typing import Any

from openchatbi.tool.search_knowledge import search_knowledge


def _invoke_tool(tool, args: dict[str, Any]) -> Any:
    return tool.invoke(args)


def _search_knowledge(args: dict[str, Any]) -> Any:
    return _invoke_tool(search_knowledge, args)


TOOLS = {
    "search_knowledge": _search_knowledge,
}


================================================
FILE: example/bi.yaml
================================================
extra_tool_use_rule: |
  - Try your best to give appropriate parameters when calling tools.
  - timeseries_forecast tool requires at least 96 time points in input data. If no enough input data, set input_len to 96 to pad with zeros.

table_selection_extra_rule: |
  - When users ask about orders, consider if they need customer information (join with Customers table)
  - For product-related queries, check if order information is needed (join with Order_Items)  
  - Shipment queries often require order and product details (join with multiple tables)
  - Invoice questions may need shipment information for complete tracking

text2sql_extra_rule: |
  - Use proper JOIN syntax when connecting related tables
  - Use LIKE operator for partial string matches in product names or customer names
  - Handle NULL values properly in optional fields like details columns

basic_knowledge_glossary: |
  # Sales Business System Glossary
  
  ## Overview
  You're answering questions related to a sales order tracking business system that manages the complete customer order lifecycle from placement to delivery.

  ## Key Business Concepts
  
  **Customer Management:**
  - Customer: Individual or entity who places orders
  - Customer Details: Additional information like contact info, preferences, or notes
  
  **Order Processing:**
  - Order: A request from a customer to purchase products
  - Order Status: Current state - Valid values: "Shipped", "Packing", "On Road"
  - Order Item: Individual product within an order (orders can contain multiple items)
  - Order Item Status: Status of specific items - Valid values: "Finish", "Payed", "Cancel"
  
  **Product Catalog:**
  - Product: Items available for purchase
  - Product Details: Specifications, descriptions, or additional product information
  
  **Fulfillment & Shipping:**
  - Shipment: Physical delivery package sent to customer
  - Shipment Items: Specific order items included in a shipment
  - Tracking Number: Unique identifier for package tracking
  - Shipment Date: When package was dispatched
  
  **Financial Processing:**
  - Invoice: Bill generated for completed orders
  - Invoice Number: Unique identifier for billing purposes
  - Invoice Date: When billing document was created
  
  ## Business Rules
  - One order can have multiple items (products)
  - One order can be fulfilled through multiple shipments
  - Each shipment links to one invoice for billing
  - Order items can have different statuses within the same order
  - Customers can have multiple orders over time

================================================
FILE: example/common_columns.csv
================================================
column_name,display_name,alias,type,category,tag,description,dimension_table,default
customer_id,Customer ID,cust_id,INTEGER,identifier,customer,Unique identifier for customers,Customers,
customer_name,Customer Name,cust_name,VARCHAR(80),attribute,customer,Name of the customer,Customers,
customer_details,Customer Details,cust_details,VARCHAR(255),attribute,customer,Additional customer information,Customers,
invoice_number,Invoice Number,inv_num,INTEGER,identifier,financial,Unique invoice identifier,Invoices,
invoice_date,Invoice Date,inv_date,DATETIME,temporal,financial,Date the invoice was created,Invoices,
invoice_details,Invoice Details,inv_details,VARCHAR(255),attribute,financial,Additional invoice information,Invoices,
order_item_id,Order Item ID,oi_id,INTEGER,identifier,order,Unique identifier for order items,Order_Items,
product_id,Product ID,prod_id,INTEGER,identifier,product,Unique identifier for products,Products,
order_id,Order ID,ord_id,INTEGER,identifier,order,Unique identifier for orders,Orders,
order_item_status,Order Item Status,oi_status,VARCHAR(10),status,order,Current status of the order item (Finish|Payed|Cancel),Order_Items,
order_item_details,Order Item Details,oi_details,VARCHAR(255),attribute,order,Additional order item information,Order_Items,
order_status,Order Status,ord_status,VARCHAR(10),status,order,Current status of the order (Shipped|Packing|On Road),Orders,
date_order_placed,Order Placed Date,ord_date,DATETIME,temporal,order,Date when the order was placed,Orders,
order_details,Order Details,ord_details,VARCHAR(255),attribute,order,Additional order information,Orders,
product_name,Product Name,prod_name,VARCHAR(80),attribute,product,Name of the product,Products,
product_details,Product Details,prod_details,VARCHAR(255),attribute,product,Additional product information,Products,
shipment_id,Shipment ID,ship_id,INTEGER,identifier,shipment,Unique identifier for shipments,Shipments,
shipment_tracking_number,Tracking Number,track_num,VARCHAR(80),identifier,shipment,Tracking number for shipment,Shipments,
shipment_date,Shipment Date,ship_date,DATETIME,temporal,shipment,Date when the shipment was sent,Shipments,
other_shipment_details,Shipment Details,ship_details,VARCHAR(255),attribute,shipment,Additional shipment information,Shipments,


================================================
FILE: example/config.yaml
================================================
organization: MyCompany
dialect: sqlite
bi_config_file: example/bi.yaml

python_executor: docker

# Visualization configuration
visualization_mode: llm

# Catalog store configuration
catalog_store:
  store_type: file_system
  data_path: ./example

# Data warehouse configuration
data_warehouse_config:
  # sqlite from spider->tracking_orders dataset
  uri: "sqlite:///example/tracking_orders.sqlite"
  database_name: ""

# LLM configurations
# Use OpenAI LLM, replace YOUR_API_KEY_HERE with your actual API key
default_llm:
  class: langchain_openai.ChatOpenAI
  params:
    api_key: YOUR_API_KEY_HERE
    model: gpt-4.1
    temperature: 0.01
    max_tokens: 8192

embedding_model:
  class: langchain_openai.OpenAIEmbeddings
  params:
    api_key: YOUR_API_KEY_HERE
    model: text-embedding-3-large
    chunk_size: 1024

# If you cannot access to OpenAI or other cloud LLM provider,
# uncomment the following lines instead to use Ollama local LLM
#default_llm:
#  class: langchain_ollama.ChatOllama
#  params:
#    model: gpt-oss:20b
#    temperature: 0.01
#    num_predict: 8192


================================================
FILE: example/sql_example.yaml
================================================
'':
  Customers: |
    Q: Show me all customers with their names and details
    A: SELECT customer_id, customer_name, customer_details 
    FROM Customers 
    ORDER BY customer_name
  Invoices: |
    Q: List all invoices from the last 30 days
    A: SELECT invoice_number, invoice_date, invoice_details 
    FROM Invoices 
    WHERE invoice_date >= DATE(''now'', ''-30 days'') 
    ORDER BY invoice_date DESC
    
  Order_Items: |
    Q: Show me all items in order 123
    A: SELECT oi.order_item_id, p.product_name, oi.order_item_status, oi.order_item_details 
    FROM Order_Items oi 
    JOIN Products p ON oi.product_id = p.product_id 
    WHERE oi.order_id = 123
  Orders: |
    Q: Find all pending orders with customer information
    A: SELECT o.order_id, c.customer_name, o.order_status, o.date_order_placed 
    FROM Orders o 
    JOIN Customers c ON o.customer_id = c.customer_id 
    WHERE o.order_status = ''pending'' 
    ORDER BY o.date_order_placed
  Products: |
    Q: Search for products containing ''laptop'' in the name
    A: SELECT product_id, product_name, product_details 
    FROM Products 
    WHERE product_name LIKE ''%laptop%'' 
    ORDER BY product_name'
  Shipment_Items: |
    Q: Show which order items are in shipment 456
    A: SELECT si.shipment_id, si.order_item_id, p.product_name 
    FROM Shipment_Items si 
    JOIN Order_Items oi ON si.order_item_id = oi.order_item_id 
    JOIN Products p ON oi.product_id = p.product_id 
    WHERE si.shipment_id = 456
  Shipments: |
    Q: Track all shipments for order 789
    A: SELECT shipment_id, shipment_tracking_number, shipment_date, other_shipment_details 
    FROM Shipments 
    WHERE order_id = 789 
    ORDER BY shipment_date


================================================
FILE: example/table_columns.csv
================================================
db_name,table_name,column_name
,Customers,customer_id
,Customers,customer_name
,Customers,customer_details
,Invoices,invoice_number
,Invoices,invoice_date
,Invoices,invoice_details
,Order_Items,order_item_id
,Order_Items,product_id
,Order_Items,order_id
,Order_Items,order_item_status
,Order_Items,order_item_details
,Orders,order_id
,Orders,customer_id
,Orders,order_status
,Orders,date_order_placed
,Orders,order_details
,Products,product_id
,Products,product_name
,Products,product_details
,Shipment_Items,shipment_id
,Shipment_Items,order_item_id
,Shipments,shipment_id
,Shipments,order_id
,Shipments,invoice_number
,Shipments,shipment_tracking_number
,Shipments,shipment_date
,Shipments,other_shipment_details


================================================
FILE: example/table_info.yaml
================================================
? ''
: Customers:
    description: 'Contains customer information including unique ID, name, and additional details'
    selection_rule: 'Select when queries involve customer information, customer names, or need to join orders with customer data'
    sql_rule: 'Use customer_id as primary key for joins. Always include customer_name when displaying customer information'
  Invoices:
    description: 'Stores invoice information with unique invoice numbers, dates, and details'
    selection_rule: 'Select when queries involve billing, invoice tracking, or financial reporting'
    sql_rule: 'Use invoice_number as primary key. Filter by invoice_date for temporal queries'
  Order_Items:
    description: 'Links products to orders with individual item status and details'
    selection_rule: 'Select when queries need product details within orders or item-level status tracking'
    sql_rule: 'Always join with Products table via product_id and Orders table via order_id for complete information'
  Orders:
    description: 'Main order table containing order status, placement date, and customer relationships'
    selection_rule: 'Select when queries involve order status, order history, or customer order relationships'
    sql_rule: 'Use order_id as primary key. Join with Customers via customer_id for customer information'
  Products:
    description: 'Product catalog containing product names, IDs, and detailed product information'
    selection_rule: 'Select when queries involve product information, product searches, or inventory-related questions'
    sql_rule: 'Use product_id as primary key. Use LIKE operator for product_name searches'
  Shipment_Items:
    description: 'Junction table linking shipments to specific order items'
    selection_rule: 'Select when queries need to track which specific items are in which shipments'
    sql_rule: 'Always join with both Shipments and Order_Items tables. No primary key - composite key of shipment_id and order_item_id'
  Shipments:
    description: 'Shipment tracking information including tracking numbers, dates, and shipment details'
    selection_rule: 'Select when queries involve shipping, delivery tracking, or fulfillment information'
    sql_rule: 'Use shipment_id as primary key. Join with Orders via order_id and Invoices via invoice_number for complete shipping context'


================================================
FILE: example/table_selection_example.csv
================================================
question,selected_tables
"Show me all customers","[""Customers""]"
"What orders were placed today?","[""Orders""]"
"List all products and their details","[""Products""]"
"Show me customer orders with their details","[""Customers"", ""Orders""]"
"What products are in each order?","[""Orders"", ""Order_Items"", ""Products""]"
"Show shipment tracking information","[""Shipments""]"
"Which items are in each shipment?","[""Shipments"", ""Shipment_Items"", ""Order_Items""]"
"Show order status and customer information","[""Orders"", ""Customers""]"
"What invoices were created this month?","[""Invoices""]"
"Show complete order fulfillment chain","[""Orders"", ""Order_Items"", ""Products"", ""Shipments"", ""Invoices""]"


================================================
FILE: openchatbi/__init__.py
================================================
"""OpenChatBI core module initialization."""

import os

from langgraph.graph.state import CompiledStateGraph

from openchatbi.config_loader import ConfigLoader

# Global configuration instance
config = ConfigLoader()
# Skip config loading during documentation build
if not os.environ.get("SPHINX_BUILD"):
    config.load()
else:
    config.set({})


def get_default_graph():
    """
    Build the synchronous mode of the agent graph using default catalog in config.

    Returns:
        CompiledStateGraph: Compiled agent graph ready for execution.
    """
    if os.environ.get("SPHINX_BUILD"):
        return None

    from langgraph.checkpoint.memory import MemorySaver

    from openchatbi.agent_graph import build_agent_graph_sync
    from openchatbi.tool.memory import get_sync_memory_store

    checkpointer = MemorySaver()
    return build_agent_graph_sync(
        config.get().catalog_store, checkpointer=checkpointer, memory_store=get_sync_memory_store()
    )


================================================
FILE: openchatbi/agent_graph.py
================================================
"""Main agent graph construction and execution logic."""

import datetime
import logging
import traceback
from collections.abc import Callable
from typing import Any

from langchain_core.language_models import BaseChatModel
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.tools import StructuredTool
from langchain_openai.chat_models.base import BaseChatOpenAI
from langgraph.constants import START
from langgraph.errors import GraphInterrupt
from langgraph.graph import END, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.prebuilt import ToolNode
from langgraph.store.base import BaseStore
from langgraph.types import Checkpointer, Send, interrupt
from pydantic import BaseModel, Field

from openchatbi import config
from openchatbi.catalog import CatalogStore
from openchatbi.constants import datetime_format
from openchatbi.context_config import get_context_config
from openchatbi.context_manager import ContextManager
from openchatbi.graph_state import AgentState, InputState, OutputState
from openchatbi.llm.llm import call_llm_chat_model_with_retry, get_llm
from openchatbi.prompts.system_prompt import get_agent_prompt_template
from openchatbi.text2sql.sql_graph import build_sql_graph
from openchatbi.tool.ask_human import AskHuman
from openchatbi.tool.mcp_tools import create_mcp_tools_sync, get_mcp_tools_async
from openchatbi.tool.memory import get_memory_tools
from openchatbi.tool.run_python_code import run_python_code
from openchatbi.tool.save_report import save_report
from openchatbi.tool.search_knowledge import search_knowledge, show_schema
from openchatbi.tool.timeseries_forecast import check_forecast_service_health, timeseries_forecast
from openchatbi.utils import log, recover_incomplete_tool_calls

logger = logging.getLogger(__name__)


def get_mcp_servers():
    """Get MCP servers from config with fallback for tests."""
    try:
        return config.get().mcp_servers
    except ValueError:
        return []


def ask_human(state: AgentState) -> dict[str, Any]:
    """Node function to ask human for additional information or clarification.

    Args:
        state (AgentState): The current graph state containing messages and context.

    Returns:
        dict: Updated state with human feedback as a tool message and user input.
    """
    tool_call = state["messages"][-1].tool_calls[0]
    tool_call_id = tool_call["id"]
    args = tool_call["args"]
    user_feedback = interrupt({"text": args["question"], "buttons": args.get("options", None)})
    tool_message = [{"tool_call_id": tool_call_id, "type": "tool", "content": user_feedback}]
    return {
        "messages": tool_message,
        "history_messages": [AIMessage(args["question"]), HumanMessage(user_feedback)],
        "user_input": user_feedback,
    }


class CallSQLGraphInput(BaseModel):
    reasoning: str = Field(
        description="Explanation of why Text2SQL tool is needed",
    )
    context: str = Field(
        description="""The full context pass to Text2SQL tool, make sure do not miss any potential information that related to user's question.
        Following the format: History Conversation: (user and assistant history dialog)
        Information: (the knowledge you retrival that is relevant, like metrics and dimensions)
        User's latest question:""",
    )


# Description for SQL tools
TEXT2SQL_TOOL_DESCRIPTION = """Text2SQL tool to generate and execute SQL query and build visualization DSL for UI
based on user's question and context.

Returns:
    str: A formatted response containing SQL, data, and visualization status.

Important notes:
- If user want to change the visualization chart type or style, add the requirement in the question
- Make sure to provide question in English
"""


def _format_sql_response(sql_graph_response: dict) -> str:
    """Format SQL graph response into a standardized string format.

    Args:
        sql_graph_response: The response dictionary from the SQL graph

    Returns:
        str: Formatted response string
    """
    sql = sql_graph_response.get("sql", "")
    data = sql_graph_response.get("data", "")
    visualization_dsl = sql_graph_response.get("visualization_dsl", {})

    response_parts = []
    if sql:
        response_parts.append(f"SQL Query:\n```sql\n{sql}\n```")
    if data:
        response_parts.append(f"\nQuery Results (CSV format):\n```csv\n{data}\n```")

    # Include visualization status
    if visualization_dsl and "error" not in visualization_dsl:
        chart_type = visualization_dsl.get("chart_type", "unknown")
        response_parts.append(
            f"\nVisualization Created: {chart_type} chart has been automatically generated and will be displayed in the UI."
        )
    elif visualization_dsl and "error" in visualization_dsl:
        response_parts.append(f"\nVisualization Error: {visualization_dsl['error']}")

    return "\n\n".join(response_parts) if response_parts else "No results returned."


def get_sql_tools(sql_graph: CompiledStateGraph, sync_mode: bool = False) -> Callable:
    """Create SQL generation tool from compiled SQL graph.

    Args:
        sql_graph (CompiledStateGraph): The compiled SQL generation subgraph.
        sync_mode (bool): Whether to create synchronous or asynchronous tools

    Returns:
        function: Tool function for SQL generation.
    """

    def call_sql_graph_sync(reasoning: str, context: str) -> str:
        """Sync node function for Text2SQL tool"""
        log(f"Call SQL graph (sync) with reasoning: {reasoning}, context: {context}")
        try:
            sql_graph_response = sql_graph.invoke({"messages": context})
            return _format_sql_response(sql_graph_response)
        except GraphInterrupt as e:
            log(f"Sql graph interrupted:\n{repr(e)}")
            raise e
        except Exception as e:
            log(f"Run sql graph error:\n{repr(e)}")
            traceback.print_exc()
        return "Error occurred when calling Text2SQL tool."

    async def call_sql_graph_async(reasoning: str, context: str) -> str:
        """Async node function for Text2SQL tool"""
        log(f"Call SQL graph (async) with reasoning: {reasoning}, context: {context}")
        try:
            sql_graph_response = await sql_graph.ainvoke({"messages": context})
            return _format_sql_response(sql_graph_response)
        except GraphInterrupt as e:
            log(f"Sql graph interrupted:\n{repr(e)}")
            raise e
        except Exception as e:
            log(f"Run sql graph error:\n{repr(e)}")
            traceback.print_exc()
        return "Error occurred when calling Text2SQL tool."

    if sync_mode:
        return StructuredTool.from_function(
            func=call_sql_graph_sync,
            name="text2sql",
            description=TEXT2SQL_TOOL_DESCRIPTION,
            args_schema=CallSQLGraphInput,
            return_direct=False,
        )
    else:
        return StructuredTool.from_function(
            coroutine=call_sql_graph_async,
            name="text2sql",
            description=TEXT2SQL_TOOL_DESCRIPTION,
            args_schema=CallSQLGraphInput,
            return_direct=False,
        )


def agent_llm_call(llm: BaseChatModel, tools: list, context_manager: ContextManager = None) -> Callable:
    """Create llm call function to generate reasoning and determine next node based on tool calls in LLM response.

    Args:
        llm (BaseChatModel): The LLM for agent decision-making.
        tools: List of tools.
        context_manager: Optional context manager for handling long conversations.

    Returns:
        function: function that processes state and determines next node.
    """

    # OpenAI models support strict tool calling
    if isinstance(llm, BaseChatOpenAI):
        llm_with_tools = llm.bind_tools(tools, strict=True)
    else:
        llm_with_tools = llm.bind_tools(tools)

    def _call_model(state: AgentState):
        # First, check and recover any incomplete tool calls
        recovery_ops = recover_incomplete_tool_calls(state)
        if recovery_ops:
            return {"messages": recovery_ops, "agent_next_node": "llm_node"}

        messages = state["messages"]
        final_messages = []
        if isinstance(messages[-1], HumanMessage):
            final_messages.append(messages[-1])

        # Apply context management if available (before processing)
        if context_manager:
            original_count = len(messages)
            context_manager.manage_context_messages(messages)
            if len(messages) != original_count:
                logger.info(f"Context management: modified messages from {original_count} to {len(messages)}")

        system_prompt = get_agent_prompt_template().replace(
            "[time_field_placeholder]", datetime.datetime.now().strftime(datetime_format)
        )

        response = call_llm_chat_model_with_retry(
            llm_with_tools,
            ([SystemMessage(system_prompt)] + messages),
            streaming_tokens=True,
            bound_tools=tools,
            parallel_tool_call=True,
        )
        if isinstance(response, AIMessage):
            tool_calls = response.tool_calls
            print("Tool Call:", ", ".join(tool["name"] for tool in tool_calls))
            if tool_calls:
                # Group tool calls by type for parallel routing
                ask_human_calls = [call for call in tool_calls if call["name"] == "AskHuman"]
                normal_tool_calls = [call for call in tool_calls if call["name"] != "AskHuman"]

                # Create Send objects for parallel routing
                sends = []
                if ask_human_calls:
                    # Create message with only AskHuman calls
                    ask_human_msg = AIMessage(content=response.content, tool_calls=ask_human_calls)
                    sends.append(Send("ask_human", {"messages": [ask_human_msg]}))

                if normal_tool_calls:
                    # Create message with only normal tool calls
                    tool_msg = AIMessage(content=response.content, tool_calls=normal_tool_calls)
                    sends.append(Send("use_tool", {"messages": [tool_msg]}))

                return {"messages": [response], "history_messages": final_messages, "sends": sends}
            else:
                final_messages.append(AIMessage(response.content))
                return {
                    "messages": [response],
                    "final_answer": response.content,
                    "history_messages": final_messages,
                    "agent_next_node": END,
                }
        elif response is None:
            return {
                "messages": [AIMessage("Sorry, the LLM service is currently unavailable.")],
                "history_messages": final_messages,
                "agent_next_node": END,
            }
        else:
            return {"messages": [response], "history_messages": final_messages, "agent_next_node": END}

    return _call_model


def _build_graph_core(
    catalog: CatalogStore,
    sync_mode: bool,
    checkpointer: Checkpointer,
    memory_store: BaseStore,
    memory_tools: list[Callable] | None,
    mcp_tools: list,
    enable_context_management: bool = True,
    llm_provider: str | None = None,
) -> CompiledStateGraph:
    """Core graph building logic shared by both sync and async versions.

    Args:
        catalog: Catalog store containing schema information
        sync_mode: Whether to use synchronous mode for tools and operations
        checkpointer: The Checkpointer for state persistence
        memory_store: The BaseStore to use for long-term memory
        memory_tools: List of memory tools (manage_memory_tool, search_memory_tool)
        mcp_tools: Pre-initialized MCP tools
        enable_context_management: Whether to enable context management

    Returns:
        CompiledStateGraph: Compiled agent graph ready for execution
    """
    sql_graph = build_sql_graph(catalog, checkpointer, memory_store, llm_provider=llm_provider)
    call_sql_graph_tool = get_sql_tools(sql_graph=sql_graph, sync_mode=sync_mode)

    # Use provided memory tools or create them
    if not memory_tools:
        memory_tools = get_memory_tools(get_llm(llm_provider), sync_mode=sync_mode, store=memory_store)

    log(str(mcp_tools))
    normal_tools = [
        search_knowledge,
        show_schema,
        call_sql_graph_tool,
        run_python_code,
        save_report,
    ]
    if memory_tools:
        normal_tools.extend(memory_tools)
    if check_forecast_service_health():
        normal_tools.append(timeseries_forecast)
    else:
        logger.warning("Time series forecasting service is not healthy. Skipping timeseries_forecast tool.")
    normal_tools.extend(mcp_tools)

    # Initialize context manager if enabled
    context_manager = None
    if enable_context_management:
        context_manager = ContextManager(llm=get_llm(llm_provider), config=get_context_config())

    tool_node = ToolNode(normal_tools)

    # Define the agent graph
    graph = StateGraph(AgentState, input_schema=InputState, output_schema=OutputState)

    # Add nodes to the graph
    graph.add_node("llm_node", agent_llm_call(get_llm(llm_provider), normal_tools + [AskHuman], context_manager))
    graph.add_node("ask_human", ask_human)
    graph.add_node("use_tool", tool_node)

    # Add edges between nodes
    graph.add_edge(START, "llm_node")
    graph.add_edge("ask_human", "llm_node")
    graph.add_edge("use_tool", "llm_node")

    # Add conditional routing from llm node
    def route_tools(state: AgentState):
        # Only use sends if the last message came from the llm node (has tool_calls)
        last_message = state["messages"][-1] if state["messages"] else None
        if (
            last_message
            and isinstance(last_message, AIMessage)
            and last_message.tool_calls
            and "sends" in state
            and state["sends"]
        ):
            return state["sends"]  # Return Send objects for parallel execution
        elif "agent_next_node" in state:
            return state["agent_next_node"]  # Return single node name
        else:
            return END

    graph.add_conditional_edges(
        "llm_node",
        route_tools,
        # mapping of paths to node names (for single routing)
        {
            "llm_node": "llm_node",
            "ask_human": "ask_human",
            "use_tool": "use_tool",
            END: END,
        },
    )

    graph = graph.compile(name="agent_graph", checkpointer=checkpointer, store=memory_store)
    return graph


def build_agent_graph_sync(
    catalog: CatalogStore,
    checkpointer: Checkpointer = None,
    memory_store: BaseStore = None,
    enable_context_management: bool = True,
    llm_provider: str | None = None,
) -> CompiledStateGraph:
    """Build the main agent graph with all nodes and edges (sync version).

    Args:
        catalog: Catalog store containing schema information.
        checkpointer: The Checkpointer for state persistence (short memory). If None, no short memory.
        memory_store: The BaseStore to use for long-term memory. If None, will auto assign according to sync_mode.
        enable_context_management: Whether to enable context management for long conversations.

    Returns:
        CompiledStateGraph: Compiled agent graph ready for execution.
    """
    # Get MCP tools for sync context
    mcp_tools = create_mcp_tools_sync(get_mcp_servers())

    return _build_graph_core(
        catalog=catalog,
        sync_mode=True,
        checkpointer=checkpointer,
        memory_store=memory_store,
        memory_tools=None,  # Always None for sync version - creates its own
        mcp_tools=mcp_tools,
        enable_context_management=enable_context_management,
        llm_provider=llm_provider,
    )


async def build_agent_graph_async(
    catalog: CatalogStore,
    checkpointer: Checkpointer = None,
    memory_store: BaseStore = None,
    memory_tools: list[Callable] = None,
    enable_context_management: bool = True,
    llm_provider: str | None = None,
) -> CompiledStateGraph:
    """Build the main agent graph with all nodes and edges (async version).

    This function is identical to build_agent_graph_sync but properly handles
    async MCP tool initialization when called from async contexts.

    Args:
        catalog: Catalog store containing schema information.
        checkpointer: The Checkpointer for state persistence (short memory). If None, no short memory.
        memory_store: The BaseStore to use for long-term memory. If None, will auto assign according to sync_mode.
        memory_tools: List of memory tools (manage_memory_tool, search_memory_tool). If None, creates async tools.
        enable_context_management: Whether to enable context management for long conversations.

    Returns:
        CompiledStateGraph: Compiled agent graph ready for execution.
    """
    # Get MCP tools for async context
    mcp_tools = await get_mcp_tools_async(get_mcp_servers())

    return _build_graph_core(
        catalog=catalog,
        sync_mode=False,
        checkpointer=checkpointer,
        memory_store=memory_store,
        memory_tools=memory_tools,
        mcp_tools=mcp_tools,
        enable_context_management=enable_context_management,
        llm_provider=llm_provider,
    )


================================================
FILE: openchatbi/catalog/__init__.py
================================================
"""Data catalog management module for OpenChatBI."""

from openchatbi.catalog.catalog_loader import (
    DataCatalogLoader,
    load_catalog_from_data_warehouse,
)
from openchatbi.catalog.catalog_store import CatalogStore
from openchatbi.catalog.factory import create_catalog_store

__all__ = [
    "CatalogStore",
    "DataCatalogLoader",
    "load_catalog_from_data_warehouse",
]


================================================
FILE: openchatbi/catalog/catalog_loader.py
================================================
import logging
from typing import Any

from sqlalchemy import MetaData, inspect
from sqlalchemy.engine import Engine

from .catalog_store import CatalogStore

logger = logging.getLogger(__name__)


class DataCatalogLoader:
    """
    The loader to load data catalog from data warehouse metadata and save to catalog store.
    """

    def __init__(self, engine: Engine, include_tables: list[str] | None = None):
        """
        Initialize catalog loader.

        Args:
            engine (Engine): SQLAlchemy engine instance
            include_tables (Optional[List[str]]): List of table names to include, None for all
        """
        self.engine = engine
        self.include_tables = include_tables
        self.metadata = MetaData()
        self.inspector = inspect(engine)

    def get_tables_and_columns(self) -> dict[str, list[dict[str, Any]]]:
        """
        Extract table and column metadata including comments using SQLAlchemy inspector.

        Returns:
            Dict[str, List[Dict[str, Any]]]: Dictionary mapping table names to list of column information
        """
        try:
            tables_columns = {}

            # Get all table names
            table_names = self.inspector.get_table_names()

            # Filter to specific tables if configured
            if self.include_tables:
                table_names = [name for name in table_names if name in self.include_tables]

            logger.info(f"Found {len(table_names)} tables to process")

            for table_name in table_names:
                try:
                    # Get column information for the table
                    columns = self.inspector.get_columns(table_name)
                    column_list = []
                    for column in columns:
                        is_common_column = column not in ("id", "name", "type", "status")
                        column_info = {
                            "column_name": column["name"],
                            "display_name": "",
                            "alias": "",
                            "type": str(column["type"]),
                            "category": "",
                            "tag": "",
                            "description": column.get("comment", "") or "",
                            "dimension_table": "",
                            "default": str(column.get("default", "")) if column.get("default") is not None else "",
                            "is_common": is_common_column,
                        }
                        column_list.append(column_info)

                    tables_columns[table_name] = column_list
                    logger.debug(f"Processed table {table_name} with {len(column_list)} columns")

                except Exception as e:
                    logger.error(f"Failed to process table {table_name}: {e}")
                    continue

            logger.info(f"Successfully processed {len(tables_columns)} tables")
            return tables_columns

        except Exception as e:
            logger.error(f"Failed to get tables and columns from data warehouse: {e}")
            return {}

    def get_table_indexes(self, table_name: str) -> list[dict[str, Any]]:
        """
        Get index information for a specific table.

        Args:
            table_name (str): Name of the table

        Returns:
            List[Dict[str, Any]]: List of index information
        """
        try:
            indexes = self.inspector.get_indexes(table_name)
            return indexes
        except Exception as e:
            logger.warning(f"Failed to get indexes for table {table_name}: {e}")
            return []

    def get_foreign_keys(self, table_name: str) -> list[dict[str, Any]]:
        """
        Get foreign key information for a specific table.

        Args:
            table_name (str): Name of the table

        Returns:
            List[Dict[str, Any]]: List of foreign key information
        """
        try:
            foreign_keys = self.inspector.get_foreign_keys(table_name)
            return foreign_keys
        except Exception as e:
            logger.warning(f"Failed to get foreign keys for table {table_name}: {e}")
            return []

    def save_to_catalog_store(
        self, catalog_store: CatalogStore, database_name: str | None = None, update: bool = False
    ) -> bool:
        """
        Extract warehouse metadata and save to catalog store.

        Args:
            catalog_store (CatalogStore): Target catalog store to load data to
            database_name (Optional[str]): Database name in catalog, defaults to 'default'
            update (bool): Update existing catalog store to sync with data warehouse

        Returns:
            bool: True if load was successful, False otherwise
        """
        try:
            if database_name is None:
                database_name = "default"

            # Get tables and columns from data warehouse
            tables_columns = self.get_tables_and_columns()

            if not tables_columns:
                logger.warning("No tables found in data warehouse")
                return True

            # Import each table
            success_count = 0
            total_count = len(tables_columns)

            for table_name, columns in tables_columns.items():
                try:
                    # Get table comment if available
                    table_comment = ""
                    try:
                        table_info = self.inspector.get_table_comment(table_name)
                        table_comment = table_info.get("text", "") if table_info else ""
                    except Exception:
                        # Some databases don't support table comments
                        pass

                    table_info = {"description": table_comment, "selection_rule": "", "sql_rule": ""}
                    if catalog_store.save_table_information(table_name, table_info, columns, database_name):
                        success_count += 1
                        logger.info(f"Successfully loaded table: {database_name}.{table_name}")
                    else:
                        logger.error(f"Failed to load table: {database_name}.{table_name}")

                    # init null SQL examples
                    catalog_store.save_table_sql_examples(
                        table_name, [{"question": "null", "answer": "null"}], database_name
                    )

                except Exception as e:
                    logger.error(f"Error loading table {table_name}: {e}")

            # init empty table selection examples
            catalog_store.save_table_selection_examples([("", [])])

            logger.info(f"Load completed: {success_count}/{total_count} tables loaded successfully")
            return success_count == total_count

        except Exception as e:
            logger.error(f"Failed to load data warehouse to catalog store: {e}")
            return False


def load_catalog_from_data_warehouse(catalog_store: CatalogStore) -> bool:
    """
    Load catalog data from data warehouse using SQLAlchemy based on data warehouse config (URI)

    Main entry point for catalog loading.

    Args:
        catalog_store (CatalogStore): Target catalog store

    Returns:
        bool: True if load was successful, False otherwise
    """
    try:
        data_warehouse_config = catalog_store.get_data_warehouse_config()
        database_uri = data_warehouse_config.get("uri")
        include_tables = data_warehouse_config.get("include_tables")
        database_name = data_warehouse_config.get("database_name", "default")
        engine = catalog_store.get_sql_engine()

        loader = DataCatalogLoader(engine, include_tables)
        return loader.save_to_catalog_store(catalog_store, database_name)

    except Exception as e:
        logger.error(f"Failed to import catalog from data warehouse URI {database_uri}: {e}")
        return False


================================================
FILE: openchatbi/catalog/catalog_store.py
================================================
from abc import ABC, abstractmethod
from typing import Any

from sqlalchemy import Engine


class CatalogStore(ABC):
    """
    Abstract base class defining the storage interface for data catalog (database, table, column definitions, descriptions, and additional prompts).

    Common columns which have same meanings across tables will be store centralized to avoid data duplication.

    Column attribute:

        - column_name: the name of the column
        - display_name: the display name of the column
        - type: the data type of the column
        - category: dimension or metric
        - description: the description of the column
        - is_common: is common column or not
    """

    @abstractmethod
    def get_data_warehouse_config(self) -> dict:
        """
        Get the data warehouse configuration

        Returns:
            dict: Data warehouse configuration
        """
        pass

    @abstractmethod
    def get_sql_engine(self) -> Engine:
        """
        Get the SQLAlchemy engine for the catalog

        Returns:
            Engine: SQLAlchemy engine
        """
        pass

    @abstractmethod
    def get_database_list(self) -> list[str]:
        """
        Get a list of all databases

        Returns:
            List[str]: List of database names
        """
        pass

    @abstractmethod
    def get_table_list(self, database: str | None = None) -> list[str]:
        """
        Get a list of all tables in the specified database, if database is None, return all tables

        Args:
            database (Optional[str]): Database name

        Returns:
            List[str]: List of table names
        """
        pass

    @abstractmethod
    def get_column_list(self, table: str | None = None, database: str | None = None) -> list[dict[str, Any]]:
        """
        Get all column information for the specified table, if table is None, return all common columns in the catalog

        Args:
            table (Optional[str]): Table name
            database (Optional[str]): Database name

        Returns:
            List[Dict[str, Any]]: List of column information, each column contains name, type, description, etc.
        """
        pass

    @abstractmethod
    def get_table_information(self, table: str, database: str | None = None) -> dict[str, Any]:
        """
        Get the information for the specified table

        Args:
            table (str): Table name
            database (Optional[str]): Database name

        Returns:
            Dict[str, Any]: Table information, including description text, selection rules, etc.
        """
        pass

    @abstractmethod
    def get_sql_examples(
        self, table: str | None = None, database: str | None = None
    ) -> list[tuple[str, str, list[str]]]:
        """
        Get SQL examples

        Args:
            table (Optional[str]): Table name
            database (Optional[str]): Database name

        Returns:
            List[Tuple[str, str, List[str]]]: List of SQL examples, each example is a Tuple-3 as (question, SQL, full_table_names)
        """
        pass

    @abstractmethod
    def get_table_selection_examples(self) -> list[tuple[str, list[str]]]:
        """
        Get table selection examples

        Returns:
            List[Tuple[str, List[str]]]: List of table selection examples, each example is a Tuple-2 as (question, selected tables)
        """
        pass

    @abstractmethod
    def save_table_information(
        self,
        table: str,
        information: dict[str, Any],
        columns: list[dict[str, Any]],
        database: str | None = None,
        update_existing: bool = False,
    ) -> bool:
        """
        Save the information and columns for a table

        Args:
            table (str): Table name
            information (Dict[str, Any]): Table information
            columns (List[Dict[str, Any]]): List of column information, each column dict contains at lease
                column_name, type, category, description
            database (Optional[str]): Database name
            update_existing (bool): Update existing table and column information

        Returns:
            bool: Whether the save was successful
        """
        pass

    @abstractmethod
    def save_table_sql_examples(self, table: str, examples: list[dict[str, str]], database: str | None = None) -> bool:
        """
        Save SQL examples for a table

        Args:
            table (str): Table name
            examples (List[Dict[str, str]]): List of SQL examples
            database (Optional[str]): Database name

        Returns:
            bool: Whether the save was successful
        """
        pass

    @abstractmethod
    def save_table_selection_examples(self, examples: list[tuple[str, list[str]]]) -> bool:
        """
        Save table selection examples

        Args:
            examples (List[Tuple[str, List[str]]]): List of table selection examples

        Returns:
            bool: Whether the save was successful
        """
        pass

    @abstractmethod
    def check_exists(self) -> bool:
        """
        Check if the catalog store has existing data/content

        Returns:
            bool: True if catalog store has existing data, False if empty or missing essential files
        """
        pass


def split_db_table_name(table: str, database: str | None = None) -> tuple[str, str, str]:
    """
    Split full table name into db name and table name
    Args:
        table (str): if database is None, should be full table name like `db.table`, otherwise should be only table name
        database (Optional[str]): Database name
    Returns:
        Tuple[str, str, str]: full_table_name, db_name, table_name

    """
    full_table_name = table
    if database is not None and "." not in table:
        full_table_name = f"{database}.{table}"
    if "." in full_table_name:
        db_name, table_name = full_table_name.rsplit(".", 1)
    else:
        db_name = ""
        table_name = full_table_name
    return full_table_name, db_name, table_name


================================================
FILE: openchatbi/catalog/factory.py
================================================
import logging
import os

from openchatbi.catalog.catalog_loader import load_catalog_from_data_warehouse
from openchatbi.catalog.catalog_store import CatalogStore
from openchatbi.catalog.store.file_system import FileSystemCatalogStore

logger = logging.getLogger(__name__)


# Factory function for creating CatalogStore instances
def create_catalog_store(
    store_type: str, auto_load: bool = True, data_warehouse_config: dict = None, **kwargs
) -> CatalogStore:
    """
    Create a CatalogStore instance

    Args:
        store_type (str): Storage type, supports 'file_system'
        auto_load (bool): Whether to autoload from database if catalog files don't exist
        data_warehouse_config (dict): Data warehouse configuration dictionary
        **kwargs: Other parameters

    Returns:
        CatalogStore: CatalogStore instance

    Raises:
        ValueError: If the storage type is not supported
    """
    if store_type == "file_system":
        data_path = kwargs.get("data_path", "data")
        # convert relative path to absolute path
        if not data_path.startswith("/"):
            data_path = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(__file__))), data_path)
        catalog_store = FileSystemCatalogStore(data_path, data_warehouse_config)

        # Check if autoload is enabled and if catalog files are missing
        if auto_load:
            _auto_load_catalog_if_needed(catalog_store)

        return catalog_store
    else:
        raise ValueError(f"Unsupported storage type: {store_type}")


def _auto_load_catalog_if_needed(catalog_store: CatalogStore) -> None:
    """
    Autoload catalog from data warehouse if catalog files are missing or empty

    Args:
        catalog_store (CatalogStore): The catalog store instance
    """

    # Check if catalog store has existing data using the store's own check_exists method
    if not catalog_store.check_exists():
        logger.info("Catalog files missing or empty, attempting to load from data warehouse...")

        try:
            # Get data warehouse config from loaded configuration
            data_warehouse_config = catalog_store.get_data_warehouse_config()
            if not data_warehouse_config:
                logger.warning("No data warehouse configuration found, skipping autoload")
                return

            warehouse_uri = data_warehouse_config.get("uri")
            if not warehouse_uri:
                logger.warning("No data warehouse URI found in configuration, skipping autoload")
                return

            # load catalog from data warehouse
            success = load_catalog_from_data_warehouse(catalog_store)

            if success:
                logger.info("Successfully loaded catalog from data warehouse")
            else:
                logger.error("Failed to load catalog from data warehouse")
                raise Exception("Failed to load catalog from data warehouse")

        except Exception as e:
            logger.warning(f"Autoload from data warehouse failed: {e}")
            raise Exception("Failed to load catalog from data warehouse") from e


================================================
FILE: openchatbi/catalog/helper.py
================================================
from typing import Any

import requests
from sqlalchemy import Engine, create_engine

from openchatbi.catalog.token_service import apply_token_for_user
from openchatbi.utils import log


def get_requests_session(token: str, header_extra_params: dict) -> requests.Session:
    """Create HTTP session with bearer token authentication."""
    session = requests.Session()
    session.headers.update({"Authorization": f"Bearer {token}"})
    if header_extra_params:
        session.headers.update(header_extra_params)
    return session


def create_sqlalchemy_engine_instance(data_warehouse_config: dict[str, Any]) -> Engine:
    """
    Create SQLAlchemy engine instance from data warehouse config

    Args:
        data_warehouse_config: Config dict with 'uri' and optional 'token_service'

    Returns:
        Configured SQLAlchemy engine
    """
    database_uri = data_warehouse_config.get("uri")

    engine_args = {"echo": True}

    # Handle Presto authentication
    if "presto" in database_uri and "token_service" in data_warehouse_config:
        token_service = data_warehouse_config.get("token_service")
        user_name = data_warehouse_config.get("user_name")
        password = data_warehouse_config.get("password")
        header_extra_params = data_warehouse_config.get("header_extra_params", {})
        token = apply_token_for_user(token_service, user_name, password)
        log(f"Applied presto token: {token} for user: {user_name}")
        engine_args["connect_args"] = {
            "protocol": "https",
            "requests_session": get_requests_session(token, header_extra_params),
        }
        database_uri = database_uri.format(user_name=user_name)

    engine = create_engine(database_uri, **engine_args)

    return engine


================================================
FILE: openchatbi/catalog/retrival_helper.py
================================================
"""Helper functions for building column retrieval systems."""

from rank_bm25 import BM25Okapi

from openchatbi.llm.llm import get_embedding_model
from openchatbi.text_segmenter import _segmenter
from openchatbi.utils import create_vector_db, log


def get_columns_metadata(catalog):
    """Extract column metadata for indexing.

    Args:
        catalog: Catalog store instance.

    Returns:
        tuple: (columns, col_dict, column_tokens, embedding_keys)
    """
    columns = catalog.get_column_list()
    col_dict = {}
    column_tokens = []
    embedding_keys = []
    for column in columns:
        col_dict[column["column_name"]] = column
        text_parts = [
            column.get("column_name", ""),
            column.get("display_name", ""),
            column.get("alias", ""),
            column.get("tag", ""),
            column.get("description", ""),
        ]
        text = " ".join(text_parts)
        tokens = [token for token in _segmenter.cut(text) if token not in ("_", " ")]
        column_tokens.append(tokens)
        embedding_key = f"{column['column_name']}: {column['display_name']}"
        embedding_keys.append(embedding_key)
    return columns, col_dict, column_tokens, embedding_keys


def build_column_tables_mapping(catalog):
    """Build a mapping of column names to their corresponding table names."""
    column_tables_mapping = {}
    for table_name in catalog.get_table_list():
        for column in catalog.get_column_list(table_name):
            column_name = column["column_name"]
            if column_name not in column_tables_mapping:
                column_tables_mapping[column_name] = []
            column_tables_mapping[column_name].append(table_name)
    return column_tables_mapping


def build_columns_retriever(catalog, vector_db_path: str = None):
    """Build BM25 and vector retrievers for columns.

    Args:
        catalog: Catalog store instance.
        vector_db_path: Path to the vector database file.

    Returns:
        tuple: (bm25, vector_db, columns, col_dict)
    """
    columns, col_dict, column_tokens, embedding_keys = get_columns_metadata(catalog)

    bm25 = BM25Okapi(column_tokens)

    log("Building vector database for columns...")
    vector_db = create_vector_db(
        embedding_keys,
        get_embedding_model(),
        metadatas=columns,
        collection_name="columns",
        collection_metadata={"hnsw:space": "cosine"},
        chroma_db_path=vector_db_path,
    )

    return bm25, vector_db, columns, col_dict


================================================
FILE: openchatbi/catalog/schema_retrival.py
================================================
"""Schema and column retrieval functionality for finding relevant database structures."""

import os
import re

import Levenshtein

from openchatbi import config
from openchatbi.catalog.retrival_helper import build_column_tables_mapping, build_columns_retriever
from openchatbi.text_segmenter import _segmenter
from openchatbi.utils import log

# Skip build during documentation build
if not os.environ.get("SPHINX_BUILD"):
    try:
        _catalog_store = config.get().catalog_store
    except ValueError:
        _catalog_store = None
else:
    _catalog_store = None

if _catalog_store:
    bm25, vector_db, columns, col_dict = build_columns_retriever(_catalog_store, config.get().vector_db_path)
    column_tables_mapping = build_column_tables_mapping(_catalog_store)
else:
    bm25, vector_db, columns, col_dict = None, None, [], {}
    column_tables_mapping = {}


def column_retrieval(query, db, k=10, threshold=0.5, filter=None):
    """Retrieves relevant columns based on a similarity search.

    Args:
        query (str): The query string to search for.
        db: The vector database to search in.
        k (int, optional): The number of top results to return. Defaults to 10.
        threshold (float, optional): The similarity threshold for filtering results. Defaults to 0.5.
        filter (dict, optional): A filter to apply to the search. Defaults to None.

    Returns:
        list: List of relevant column names.
    """
    log(f"Get the top relevant columns for query: {query}")
    similar_column_key_scores = db.similarity_search_with_score(query, k=k, filter=filter)
    # log(f"similar_column_key_scores: {similar_column_key_scores}")
    column_names = [key.metadata["column_name"] for (key, score) in similar_column_key_scores if score < threshold]
    log(f"Filtered relevant columns: {column_names}")
    return column_names


def merge_list(list1, list2):
    return list(set(list1 + list2))


def edit_distance_score(key1, key2):
    """Calculate normalized edit distance score between two strings.

    Returns:
        float: Score between 0 (identical) and 1 (completely different).
    """
    dist = Levenshtein.distance(key1, key2)
    max_len = max(len(key1), len(key2))
    return dist / max_len if max_len > 0 else 1


def edit_distance_search(keywords_list, top_k=10, threshold=0.5):
    """Searches for columns using edit distance similarity.

    Args:
        keywords_list (list): List of keywords to search for.
        top_k (int, optional): The number of top results to return per keyword. Defaults to 10.
        threshold (float, optional): The maximum edit distance score to consider. Defaults to 0.5.

    Returns:
        list: List of relevant column names.
    """
    keys = set([re.sub(r"(_id|_name| id| name)$", "", key.lower()) for key in keywords_list])
    column_similarity_score = set()
    for key in keys:
        key_column_similarity_score = {}
        for column_name, row in col_dict.items():
            column_name_score = edit_distance_score(
                key, re.sub(r"(_id|_name| id| name)$", "", row.get("column_name", ""))
            )
            display_score = edit_distance_score(
                key, re.sub(r"(_id|_name| id| name)$", "", row.get("display_name", "").lower())
            )
            if column_name_score < threshold or display_score < threshold:
                key_column_similarity_score[column_name] = min(column_name_score, display_score)
        key_top_column = [
            key for key, _ in sorted(key_column_similarity_score.items(), key=lambda x: x[1], reverse=True)[:top_k]
        ]
        column_similarity_score.update(key_top_column)
    return list(column_similarity_score)


def bm25_search(query_list, top_k=5, score_threshold=0.5):
    """Performs a BM25 search on columns based on the query.

    Args:
        query_list (list): List of query terms.
        top_k (int, optional): The number of top results to return. Defaults to 5.
        score_threshold (float, optional): The minimum BM25 score to consider. Defaults to 0.5.

    Returns:
        list: List of relevant column names.
    """
    query_tokens = [token for token in _segmenter.cut(" ".join(query_list)) if token not in ("_", " ")]
    scores = bm25.get_scores(query_tokens)
    ranked = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
    results = []
    for idx, score in ranked[:top_k]:
        if score_threshold and score < score_threshold:
            continue
        results.append(columns[idx]["column_name"])
    return results


def get_relevant_columns(keywords_list, dimensions, metrics):
    """Get the most relevant columns for given keywords, dimensions, and metrics.

    Uses multiple retrieval methods (BM25, edit distance, vector similarity)
    to find the best matching columns.

    Args:
        keywords_list (list): General keywords to search for.
        dimensions (list): Dimension-specific keywords.
        metrics (list): Metric-specific keywords.

    Returns:
        list: Relevant column names.
    """
    # 1. BM25 search for general keywords
    total_results = bm25_search(keywords_list, top_k=len(keywords_list) * 4)

    # 2. Edit distance search for exact matches
    keyword_len = len(keywords_list + dimensions + metrics)
    ed_results = edit_distance_search(keywords_list + dimensions + metrics, top_k=keyword_len, threshold=0.3)
    total_results = merge_list(total_results, ed_results)

    # 3. Vector similarity search for dimensions
    if dimensions:
        d_results = column_retrieval(" ".join(dimensions), vector_db, k=10, filter={"category": "dimension"})
        total_results = merge_list(total_results, d_results)

    # 4. Vector similarity search for metrics
    if metrics:
        m_results = column_retrieval(" ".join(metrics), vector_db, k=10, threshold=0.55, filter={"category": "metric"})
        total_results = merge_list(total_results, m_results)

    log(f"Relevant columns: {total_results}")
    return total_results


================================================
FILE: openchatbi/catalog/store/__init__.py
================================================
"""Catalog store implementations."""

from .file_system import FileSystemCatalogStore


================================================
FILE: openchatbi/catalog/store/file_system.py
================================================
"""File system-based catalog store implementation."""

import csv
import logging
import os
import re
import traceback
from typing import Any

import yaml
from sqlalchemy import Engine

from ..catalog_store import CatalogStore, split_db_table_name
from ..helper import create_sqlalchemy_engine_instance

logger = logging.getLogger(__name__)


class FileSystemCatalogStore(CatalogStore):
    """File system-based data catalog storage implementation.

    Stores catalog data in CSV and YAML files on the local filesystem.
    """

    data_path: str
    table_info_file: str
    sql_example_file: str
    table_selection_example_file: str
    table_columns_file: str
    common_columns_file: str
    table_spec_columns_file: str

    _table_info_cache: dict | None
    _table_columns_cache: dict | None
    _common_columns_cache: dict | None
    _table_spec_columns_cache: dict | None
    _sql_example_cache: dict | None
    _table_selection_example_cache: dict | None

    _data_warehouse_config: dict
    _sql_engine: Engine

    def __init__(self, data_path: str, data_warehouse_config: dict):
        """Initialize filesystem catalog store.

        Args:
            data_path (str): Directory absolute path for storing catalog files.
            data_warehouse_config (dict): Data warehouse configuration dictionary with keys:
                - uri (str): Database connection URI
                - include_tables (Optional[List[str]]): List of tables to include, if None include all
                - database_name (Optional[str]): Database name to use in catalog
        """
        if not isinstance(data_path, str) or not data_path.strip():
            raise ValueError("data_path must be a non-empty string")

        if data_warehouse_config is None:
            data_warehouse_config = {}
        elif not isinstance(data_warehouse_config, dict):
            raise ValueError("data_warehouse_config must be a dictionary")

        self.data_path = data_path.strip()
        self.table_info_file = os.path.join(data_path, "table_info.yaml")
        self.sql_example_file = os.path.join(data_path, "sql_example.yaml")
        self.table_selection_example_file = os.path.join(data_path, "table_selection_example.csv")
        self.table_columns_file = os.path.join(data_path, "table_columns.csv")
        self.common_columns_file = os.path.join(data_path, "common_columns.csv")
        self.table_spec_columns_file = os.path.join(data_path, "table_spec_columns.csv")

        # Ensure directory exists with proper error handling
        try:
            os.makedirs(self.data_path, exist_ok=True)
        except (OSError, PermissionError) as e:
            raise RuntimeError(f"Failed to create data directory '{self.data_path}': {e}") from e

        # Initialize cache
        self._table_info_cache = None
        self._table_columns_cache = None
        self._common_columns_cache = None
        self._table_spec_columns_cache = None
        self._sql_example_cache = None
        self._table_selection_example_cache = None

        self._data_warehouse_config = data_warehouse_config
        try:
            self._sql_engine = create_sqlalchemy_engine_instance(data_warehouse_config)
        except Exception as e:
            logger.warning(f"Failed to create SQL engine: {e}. Some catalog operations may not work.")
            self._sql_engine = None

    def _clear_cache(self) -> None:
        """
        Clear all cached data to ensure consistency after data modifications
        """
        self._table_info_cache = None
        self._table_columns_cache = None
        self._common_columns_cache = None
        self._table_spec_columns_cache = None
        self._sql_example_cache = None
        self._table_selection_example_cache = None
        logger.debug("Cleared all caches")

    def get_data_warehouse_config(self) -> dict:
        return self._data_warehouse_config

    def get_sql_engine(self) -> Engine:
        if self._sql_engine is None:
            raise RuntimeError("SQL engine is not available. Check data warehouse configuration.")
        return self._sql_engine

    def _validate_table_name(self, table: str) -> bool:
        """
        Validate table name

        Args:
            table (str): Table name

        Returns:
            bool: Whether the table name is valid

        Raises:
            ValueError: If table name is invalid
        """
        if not table or not isinstance(table, str):
            raise ValueError("Table name must be a non-empty string")

        # Check for invalid characters (allow dots for db.table format)
        invalid_chars = ["/", "\\", "*", "?", "<", ">", "|", '"', "'"]
        if any(char in table for char in invalid_chars):
            raise ValueError(f"Table name contains invalid characters: {table}")

        return True

    def _validate_column_data(self, columns: list[dict[str, Any]]) -> bool:
        """
        Validate column data format

        Args:
            columns (List[Dict[str, Any]]): List of column information

        Returns:
            bool: Whether the column data is valid

        Raises:
            ValueError: If column data is invalid
        """
        if not isinstance(columns, list):
            raise ValueError("Columns must be a list")

        required_fields = {"column_name", "type"}

        for i, column in enumerate(columns):
            if not isinstance(column, dict):
                raise ValueError(f"Column {i} must be a dictionary")

            # Check required fields
            missing_fields = required_fields - set(column.keys())
            if missing_fields:
                raise ValueError(f"Column {i} missing required fields: {missing_fields}")

            # Validate column_name
            column_name = column.get("column_name")
            if not isinstance(column_name, str) or not column_name.strip():
                raise ValueError(f"Column {i}: column_name must be a non-empty string")

            # Validate type
            column_type = column.get("type")
            if not isinstance(column_type, str) or not column_type.strip():
                raise ValueError(f"Column {i}: type must be a non-empty string")

        return True

    def _validate_table_information(self, information: dict[str, Any]) -> bool:
        """
        Validate table information format

        Args:
            information (Dict[str, Any]): Table information

        Returns:
            bool: Whether the table information is valid

        Raises:
            ValueError: If table information is invalid
        """
        if not isinstance(information, dict):
            raise ValueError("Table information must be a dictionary")

        # Validate optional string fields
        string_fields = ["description", "selection_rule"]
        for field in string_fields:
            if field in information:
                value = information[field]
                if value is not None and not isinstance(value, str):
                    raise ValueError(f"Table information field '{field}' must be a string or None")

        return True

    def _validate_sql_examples(self, examples: list[dict[str, str]]) -> bool:
        """
        Validate SQL examples format

        Args:
            examples (List[Dict[str, str]]): List of SQL examples

        Returns:
            bool: Whether the SQL examples are valid

        Raises:
            ValueError: If SQL examples are invalid
        """
        if not isinstance(examples, list):
            raise ValueError("Examples must be a list")

        required_fields = {"question", "answer"}

        for i, example in enumerate(examples):
            if not isinstance(example, dict):
                raise ValueError(f"Example {i} must be a dictionary")

            # Check required fields
            missing_fields = required_fields - set(example.keys())
            if missing_fields:
                raise ValueError(f"Example {i} missing required fields: {missing_fields}")

            # Validate fields are non-empty strings
            for field in required_fields:
                value = example.get(field)
                if not isinstance(value, str) or not value.strip():
                    raise ValueError(f"Example {i}: {field} must be a non-empty string")

        return True

    @staticmethod
    def _load_yaml_file(file_path: str) -> dict:
        """
        Load YAML file

        Args:
            file_path (str): File path

        Returns:
            Dict: YAML content
        """
        if not os.path.exists(file_path):
            logger.debug(f"YAML file does not exist: {file_path}")
            return {}

        try:
            with open(file_path, encoding="utf-8") as f:
                data = yaml.safe_load(f) or {}
                logger.debug(f"Successfully loaded YAML file: {file_path}")
                return data
        except Exception as e:
            logger.error(f"Failed to load YAML file {file_path}: {e}")
            logger.error(traceback.format_stack())
            return {}

    @staticmethod
    def _load_csv_file(file_path: str) -> list[dict[str, str]]:
        """
        Load CSV file

        Args:
            file_path (str): File path

        Returns:
            List[Dict[str, str]]: List of rows as dictionaries
        """
        if not os.path.exists(file_path):
            logger.debug(f"CSV file does not exist: {file_path}")
            return []

        try:
            result = []
            with open(file_path, encoding="utf-8") as f:
                reader = csv.DictReader(f)
                for row in reader:
                    result.append(row)
            logger.debug(f"Successfully loaded CSV file: {file_path} with {len(result)} rows")
            return result
        except Exception as e:
            logger.error(f"Failed to load CSV file {file_path}: {e}")
            logger.error(traceback.format_stack())
            return []

    @staticmethod
    def _save_yaml_file(file_path: str, data: dict) -> bool:
        """
        Save YAML file

        Args:
            file_path (str): File path
            data (Dict): Data to save

        Returns:
            bool: Whether the save was successful
        """
        try:
            with open(file_path, "w", encoding="utf-8") as f:
                yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
            return True
        except Exception as e:
            logger.error(f"Failed to save YAML file {file_path}: {e}")
            logger.error(traceback.format_stack())
            return False

    @staticmethod
    def _save_csv_file(file_path: str, data: list[dict[str, str]], headers: list[str] = None) -> bool:
        """
        Save CSV file

        Args:
            file_path (str): File path
            data (List[Dict[str, str]]): List of rows as dictionaries
            headers (List[str]): List of header names in sequence

        Returns:
            bool: Whether the save was successful
        """
        try:
            if not data:
                return True

            # Get all possible headers from all rows
            all_headers = set()
            for row in data:
                all_headers.update(row.keys())

            # If specify field_names, make sure all keys are in field_names
            if headers is not None:
                for key in all_headers:
                    if key not in headers:
                        headers.append(key)

            with open(file_path, "w", encoding="utf-8", newline="") as f:
                writer = csv.DictWriter(f, fieldnames=headers)
                writer.writeheader()
                for row in data:
                    writer.writerow(row)

            return True
        except Exception as e:
            logger.error(f"Failed to save CSV file {file_path}: {e}")
            logger.error(traceback.format_stack())
            return False

    def _load_tables(self) -> dict[str, list[str]]:
        # Load table_columns.csv
        table_columns_csv = self._load_csv_file(self.table_columns_file)

        # Get unique db_name.table_name combinations
        table_dict = {}
        for row in table_columns_csv:
            if "db_name" in row and "table_name" in row and "column_name" in row:
                db_name = row["db_name"]
                table_name = row["table_name"]
                column_name = row["column_name"]
                full_table_name = f"{db_name}.{table_name}"
                if full_table_name not in table_dict:
                    table_dict[full_table_name] = []
                table_dict[full_table_name].append(column_name)
        return table_dict

    def _load_common_columns(self) -> dict[str, dict[str, Any]]:
        # Load common_columns.csv to get column details
        columns_csv = self._load_csv_file(self.common_columns_file)

        # Filter and return column details
        column_dict = {}
        for row in columns_csv:
            if row.get("column_name") and row.get("type"):
                # Convert row to Dict[str, Any]
                column_info = {}
                for key, value in row.items():
                    if key != "":
                        column_info[key] = value
                column_dict[row["column_name"]] = column_info

        return column_dict

    def _load_table_spec_columns(self) -> dict[str, dict[str, Any]]:
        """
        Load info of table spec columns
        Returns:
            Dict[str, Dict[str, Any]]: Dictionary of table specific columns information, keyed by "full_table_name:column_name"
        """
        # Load table_spec_columns.csv to get table specific column details
        columns_csv = self._load_csv_file(self.table_spec_columns_file)

        # Filter and return column details
        column_dict = {}
        for row in columns_csv:
            if "db_name" in row and "table_name" in row and "column_name" in row and row["column_name"]:
                # Convert row to Dict[(str, str), Any]
                full_table_name = f"{row['db_name']}.{row['table_name']}"
                column_info = {}
                for key, value in row.items():
                    if key != "":
                        column_info[key] = value
                column_dict[f"{full_table_name}:{row['column_name']}"] = column_info

        return column_dict

    def _parse_example_text(self, example_text: str) -> list[tuple[str, str]]:
        """
        Parse example text, format is Q: ... A: ...

        Args:
            example_text (str): Example text

        Returns:
            List[Tuple[str, str]]: List of parsed question-answer pairs
        """
        examples = []
        lines = example_text.strip().split("\n")

        question = ""
        answer = ""
        current_type = None

        for line in lines:
            if line.startswith("Q:"):
                # If there is already a complete question-answer pair, add it to the results
                if question and answer:
                    examples.append((question.strip(), answer.strip()))
                    question = ""
                    answer = ""

                question = line[2:]
                current_type = "Q"
            elif line.startswith("A:"):
                answer = line[2:]
                current_type = "A"
            else:
                # Continue adding to the current type
                if current_type == "Q":
                    question += "\n" + line
                elif current_type == "A":
                    answer += "\n" + line

        # Add the last question-answer pair
        if question and answer:
            examples.append((question.strip(), answer.strip()))

        return examples

    def get_database_list(self) -> list[str]:
        # Extract unique database names
        databases = set()
        for table in self._get_all_table_schema().keys():
            full_table_name, db_name, table_name = split_db_table_name(table)
            databases.add(db_name)

        return list(databases)

    def _get_all_table_schema(self) -> dict[str, list[str]]:
        """
        Get all tables schema (columns of table)
        Returns:
            Dict[str, List[str]]: Tables schema (columns) dict, keyed by table name
        """
        if self._table_columns_cache is None:
            self._table_columns_cache = self._load_tables()
        # Return a deep copy to prevent external modifications
        return {k: v.copy() for k, v in self._table_columns_cache.items()}

    def get_table_list(self, database: str | None = None) -> list[str]:
        tables = self._get_all_table_schema()
        if database is None:
            return list(tables.keys())

        # Filter by database
        filtered_tables = []
        for full_table_name in tables.keys():
            _, db_name, table_name = split_db_table_name(full_table_name)
            if db_name == database:
                filtered_tables.append(full_table_name)

        return filtered_tables

    def _get_common_columns(self) -> dict[str, dict[str, Any]]:
        """
        Get information of all common columns
        Returns:
            Dict[str, Dict[str, Any]]: Dictionary of columns information, keyed by column name
        """
        if self._common_columns_cache is None:
            self._common_columns_cache = self._load_common_columns()
        # Return a deep copy to prevent external modifications
        return {k: v.copy() for k, v in self._common_columns_cache.items()}

    def _get_table_spec_columns(self) -> dict[str, dict[str, Any]]:
        """
        Get information of all table specific columns
        Returns:
            Dict[str, Dict[str, Any]]: Dictionary of table specific columns information, keyed by "full_table_name:column_name"
        """
        if self._table_spec_columns_cache is None:
            self._table_spec_columns_cache = self._load_table_spec_columns()
        # Return a deep copy to prevent external modifications
        return {k: v.copy() for k, v in self._table_spec_columns_cache.items()}

    def get_column_list(self, table: str | None = None, database: str | None = None) -> list[dict[str, Any]]:
        _common_columns = self._get_common_columns()
        if table is None:
            return list(_common_columns.values())

        # Get the full table name
        full_table_name, db_name, table_name = split_db_table_name(table, database)

        # Filter table columns
        tables_dict = self._get_all_table_schema()
        if full_table_name not in tables_dict:
            return []

        table_columns = tables_dict[full_table_name]

        # If no columns found, return empty list
        if not table_columns:
            return []

        # Filter and return column details
        result = []
        _table_spec_columns = self._get_table_spec_columns()
        for column in table_columns:
            # check if the column is table specific
            key = f"{full_table_name}:{column}"
            if key in _table_spec_columns:
                column_info = _table_spec_columns[key]
                column_info["is_common"] = False
                result.append(column_info)
            else:
                column_info = _common_columns.get(column)
                if column_info:
                    column_info["is_common"] = True
                    result.append(column_info)
        return result

    def get_table_information(self, table: str, database: str | None = None) -> dict[str, Any]:
        full_table_name, db_name, table_name = split_db_table_name(table, database)

        if self._table_info_cache is None:
            self._table_info_cache = self._load_yaml_file(self.table_info_file)

        if db_name in self._table_info_cache and table_name in self._table_info_cache[db_name]:
            # Return a copy to prevent external modifications
            return self._table_info_cache[db_name][table_name].copy()

        return {}

    def get_sql_examples(
        self, table: str | None = None, database: str | None = None
    ) -> list[tuple[str, str, list[str]]]:
        if self._sql_example_cache is None:
            self._sql_example_cache = self._load_yaml_file(self.sql_example_file)

        if table is None:
            # If no table specified, return all examples
            examples = []
            for db_name, tables in self._sql_example_cache.items():
                for table_name, example_text in tables.items():
                    qa_pairs = self._parse_example_text(example_text)
                    examples.extend([(q, a, [f"{db_name}.{table_name}"]) for (q, a) in qa_pairs])
            return examples

        full_table_name, db_name, table_name = split_db_table_name(table, database)

        # Find examples that include this table
        examples = []

        # Check the fact section
        if db_name in self._sql_example_cache:
            if table_name in self._sql_example_cache[db_name]:
                # Parse example text, format is Q: ... A: ...
                qa_pairs = self._parse_example_text(self._sql_example_cache[db_name][table_name])
                examples.extend([(q, a, [full_table_name]) for (q, a) in qa_pairs])

        return examples

    @staticmethod
    def _load_table_selection_examples_from_csv(file_path: str) -> list[tuple[str, list[str]]]:
        examples = []
        try:
            with open(file_path, encoding="utf-8") as f:
                reader = csv.DictReader(f)
                for row in reader:
                    question = row.get("question", "").strip()
                    selected_tables = row.get("selected_tables", "").strip()
                    if question and selected_tables:
                        table_list = [p.strip() for p in re.split(r"[ ,\n]", selected_tables) if p.strip()]
                        examples.append((question, table_list))
        except (FileNotFoundError, PermissionError, UnicodeDecodeError) as e:
            logger.warning(f"Failed to load table selection examples from {file_path}: {e}")
        return examples

    def get_table_selection_examples(self) -> list[tuple[str, list[str]]]:
        if self._table_selection_example_cache is None:
            self._table_selection_example_cache = self._load_table_selection_examples_from_csv(
                self.table_selection_example_file
            )
        return self._table_selection_example_cache

    def save_table_information(
        self,
        table: str,
        information: dict[str, Any],
        columns: list[dict[str, Any]],
        database: str | None = None,
        update_existing: bool = False,
    ) -> bool:
        # Validate input data (let validation errors propagate)
        self._validate_table_name(table)
        self._validate_table_information(information)
        self._validate_column_data(columns)

        try:
            full_table_name, db_name, table_name = split_db_table_name(table, database)

            table_info = self._load_yaml_file(self.table_info_file)

            # Save columns first
            if not self._save_columns(table_name, columns, db_name, update_existing):
                logger.error(f"Failed to save columns for table {full_table_name}")
                return False

            # Save table information (ensure proper structure)
            if db_name not in table_info:
                table_info[db_name] = {}
            if update_existing or table_name not in table_info[db_name]:
                table_info[db_name][table_name] = information
            success = self._save_yaml_file(self.table_info_file, table_info)

            if success:
                logger.info(f"Successfully saved table information for {full_table_name}")
                # Clear cache to ensure consistency
                self._clear_cache()

            return success
        except Exception as e:
            logger.error(f"Unexpected error when saving table information: {e}")
            logger.error(traceback.format_stack())
            return False

    def _save_columns(
        self, table_name: str, columns: list[dict[str, Any]], db_name: str = "", update_existing: bool = False
    ) -> bool:
        """
        Save columns information to common_columns.csv and columns of tables to table_columns.csv

        Args:
            table_name (str): Table name
            columns (List[Dict[str, Any]]): List of column information
            db_name (str): Database name
            update_existing (bool): Update existing column information

        Returns:
            bool: Whether the save was successful
        """
        full_table_name, db_name, table_name = split_db_table_name(table_name, db_name)
        # Load existing data
        tables_data = self._load_csv_file(self.table_columns_file)
        common_columns_dict = self._load_common_columns()
        table_spec_columns_dict = self._load_table_spec_columns()

        # Create a set of existing table-column combinations
        existing_table_columns = set()
        for row in tables_data:
            if "db_name" in row and "table_name" in row and "column_name" in row:
                key = f"{row['db_name']}.{row['table_name']}:{row['column_name']}"
                existing_table_columns.add(key)

        # Update table_columns.csv and track new columns to add

        for column in columns:
            if "column_name" not in column:
                continue

            column_name = column["column_name"]
            is_common_column = column.get("is_common", False)

            key = f"{full_table_name}:{column_name}"
            column_info = {k: str(v) for k, v in column.items() if k != "is_common"}
            if not is_common_column:
                column_info["db_name"] = db_name
                column_info["table_name"] = table_name

            # New column of the table -> add to table_columns.csv
            if key not in existing_table_columns:
                tables_data.append({"db_name": db_name, "table_name": table_name, "column_name": column_name})
                existing_table_columns.add(key)
                if is_common_column:
                    # Handle common_columns.csv - avoid duplicates
                    if column_name not in common_columns_dict:
                        # Add new columns to columns_data
                        logger.info(f"Add new column column {column_name}")
                        common_columns_dict[column_name] = column_info
                else:
                    table_spec_columns_dict[key] = column_info
            # Apply updates to existing columns in columns_data
            elif update_existing:
                if is_common_column:
                    common_columns_dict[column_name] = column_info
                else:
                    table_spec_columns_dict[key] = column_info

        # Save updated data
        tables_success = self._save_csv_file(
            self.table_columns_file, tables_data, ["db_name", "table_name", "column_name"]
        )
        common_columns_success = self._save_csv_file(
            self.common_columns_file,
            list(common_columns_dict.values()),
            ["column_name", "display_name", "alias", "type", "category", "tag", "description"],
        )
        table_spec_columns_success = self._save_csv_file(
            self.table_spec_columns_file,
            list(table_spec_columns_dict.values()),
            ["db_name", "table_name", "column_name", "display_name", "alias", "type", "category", "tag", "description"],
        )

        success = tables_success and common_columns_success and table_spec_columns_success
        if success:
            # Clear cache to ensure consistency
            self._clear_cache()
            logger.debug(f"Successfully saved columns for table {table_name}")

        return success

    def save_table_sql_examples(self, table: str, examples: list[dict[str, str]], database: str | None = None) -> bool:
        # Validate input data (let validation errors propagate)
        self._validate_table_name(table)
        self._validate_sql_examples(examples)

        try:
            full_table_name, db_name, table_name = split_db_table_name(table, database)

            sql_examples = self._load_yaml_file(self.sql_example_file)

            # Ensure database exists in structure
            if db_name not in sql_examples:
                sql_examples[db_name] = {}

            # example text
            example_text = ""
            for example in examples:
                example_text += f"Q: {example['question']}\nA: {example['answer']}\n\n"

            sql_examples[db_name][table_name] = example_text.strip()

            success = self._save_yaml_file(self.sql_example_file, sql_examples)

            if success:
                logger.info(f"Successfully saved {len(examples)} examples for table {full_table_name}")
                # Update cache
                self._sql_example_cache = sql_examples

            return success
        except Exception as e:
            logger.error(f"Unexpected error when saving table examples: {e}")
            logger.error(traceback.format_stack())
            return False

    def save_table_selection_examples(self, examples: list[tuple[str, list[str]]]) -> bool:
        example_data = []
        for example in examples:
            example_data.append({"question": example[0], "selected_tables": example[1]})
        save_success = self._save_csv_file(
            self.table_selection_example_file, example_data, ["question", "selected_tables"]
        )
        if save_success:
            logger.info(f"Successfully saved {len(examples)} table selection examples.")
        return save_success

    def check_exists(self) -> bool:
        try:
            # Check if essential catalog files exist and have content
            files_missing = (
                not os.path.exists(self.table_columns_file)
                or not os.path.exists(self.common_columns_file)
                or os.path.getsize(self.table_columns_file) <= 1  # Empty or just header
                or os.path.getsize(self.common_columns_file) <= 1
            )

            return not files_missing

        except Exception as e:
            logger.warning(f"Error checking catalog existence: {e}")
            logger.error(traceback.format_stack())
            return False


================================================
FILE: openchatbi/catalog/token_service.py
================================================
"""Token service for authentication with external services."""

import json

import requests


class TokenService:
    """Service for managing authentication tokens.

    Handles token application, validation, and authentication
    with external services.
    """

    base_url = None
    token = None
    user_name = None
    password = None

    def __init__(self, user_name: str, password: str):
        """Initialize token service."""
        self.user_name = user_name
        self.password = password

    def apply_token(self):
        """Apply for authentication token using credentials."""
        response = requests.post(
            self.base_url + "/apply_token", data=json.dumps({"user_name": self.user_name, "password": self.password})
        )
        resp_json = response.json()
        self.token = resp_json.get("token")


def apply_token_for_user(token_url: str, user_name: str, password: str):
    """Apply for token and return token with username.

    Args:
        token_url (str): Base URL for token service.
        user_name (str): The user name.
        password (str): The password.

    Returns:
        token
    """
    token_service = TokenService(user_name, password)
    token_service.base_url = token_url
    token_service.apply_token()
    return token_service.token


================================================
FILE: openchatbi/code/docker_executor.py
================================================
import os
import shutil
import subprocess
import tempfile
from pathlib import Path

import docker
from docker.errors import ContainerError

from openchatbi.code.executor_base import ExecutorBase


def check_docker_status() -> tuple[bool, str]:
    """
    Check Docker installation and status without initializing DockerExecutor.

    Returns:
        Tuple[bool, str]: (is_available, status_message)
    """
    try:
        # Check if Docker CLI is installed
        if not shutil.which("docker"):
            return False, "Docker is not installed. Please install Docker."

        # Check if Docker daemon is running
        result = subprocess.run(["docker", "info"], capture_output=True, text=True, timeout=10)

        if result.returncode == 0:
            return True, "Docker is installed and running"
        else:
            if "Cannot connect to the Docker daemon" in result.stderr:
                return False, "Docker is installed but not running. Please start the Docker daemon."
            else:
                return False, f"Docker is not available: {result.stderr.strip()}"

    except subprocess.TimeoutExpired:
        return False, "Docker command timed out. Docker may not be running properly."
    except FileNotFoundError:
        return False, "Docker command not found. Please install Docker."
    except Exception as e:
        return False, f"Error checking Docker status: {str(e)}"


class DockerExecutor(ExecutorBase):
    """Docker-based Python code executor for isolated execution."""

    def __init__(self, variable: dict = None):
        super().__init__(variable)
        self.image_name = "python-executor"
        self.dockerfile_path = Path(__file__).parent.parent.parent / "Dockerfile.python-executor"

        # Check Docker installation and status
        self._check_docker_availability()

        try:
            self.client = docker.from_env()
            # Build Docker image if it doesn't exist
            self._ensure_image_exists()
        except Exception as e:
            self._handle_docker_error(e)

    @staticmethod
    def _check_docker_availability():
        """Check if Docker is installed and available."""
        # Check if Docker CLI is installed
        if not shutil.which("docker"):
            raise RuntimeError("Docker is not installed. Please install Docker and ensure it's in your system PATH.")

        # Check if Docker daemon is running
        try:
            result = subprocess.run(["docker", "info"], capture_output=True, text=True, timeout=10)
            if result.returncode != 0:
                if "Cannot connect to the Docker daemon" in result.stderr:
                    raise RuntimeError(
                        "Docker is installed but not running. Please start the Docker daemon and try again."
                    )
                else:
                    raise RuntimeError(
                        f"Docker is not available. Please check Docker installation and status. "
                        f"Error: {result.stderr.strip()}"
                    )
        except subprocess.TimeoutExpired:
            raise RuntimeError("Docker command timed out. Please check if Docker is running properly.")
        except FileNotFoundError:
            raise RuntimeError("Docker command not found. Please install Docker and ensure it's in your system PATH.")

    @staticmethod
    def _handle_docker_error(error: Exception):
        """Handle Docker-related errors with specific error messages."""
        error_str = str(error).lower()

        if "connection aborted" in error_str and "no such file or directory" in error_str:
            raise RuntimeError("Docker is not running. Please start the Docker daemon and try again.")
        elif "permission denied" in error_str:
            raise RuntimeError(
                "Permission denied accessing Docker. Please ensure your user has Docker permissions "
                "or try running with appropriate privileges."
            )
        elif "docker daemon" in error_str or "connection refused" in error_str:
            raise RuntimeError("Cannot connect to Docker daemon. Please start the Docker daemon and try again.")
        else:
            raise RuntimeError(
                f"Failed to initialize Docker client. Please ensure Docker is installed and running. "
                f"Error: {str(error)}"
            )

    def _ensure_image_exists(self):
        """Build Docker image if it doesn't exist."""
        try:
            self.client.images.get(self.image_name)
        except docker.errors.ImageNotFound:
            print(f"Building Docker image '{self.image_name}'...")
            self.client.images.build(
                path=str(self.dockerfile_path.parent),
                dockerfile=self.dockerfile_path.name,
                tag=self.image_name,
                rm=True,
            )
            print(f"Docker image '{self.image_name}' built successfully.")

    def run_code(self, code: str) -> tuple[bool, str]:
        """Execute Python code in a Docker container."""
        try:
            # Create a temporary file with the code
            with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
                # Add variable definitions to the code
                variable_code = ""
                for key, value in self._variable.items():
                    if isinstance(value, str):
                        variable_code += f'{key} = "{value}"\n'
                    else:
                        variable_code += f"{key} = {repr(value)}\n"

                full_code = variable_code + "\n" + code
                f.write(full_code)
                temp_file_path = f.name

            try:
                # Run the code in a Docker container
                container = self.client.containers.run(
                    self.image_name,
                    command=["python3", f"/app/{os.path.basename(temp_file_path)}"],
                    volumes={temp_file_path: {"bind": f"/app/{os.path.basename(temp_file_path)}", "mode": "ro"}},
                    remove=True,
                    detach=False,
                    stdout=True,
                    stderr=True,
                    network_mode="none",  # Disable network access for security
                )

                # Get the output
                output = container.decode("utf-8")
                return True, output

            except ContainerError as e:
                # Container exited with non-zero code
                error_output = e.stderr if e.stderr else str(e)
                return False, f"Container execution failed: {error_output}"

        except Exception as e:
            return False, f"Docker execution error: {str(e)}"

        finally:
            # Clean up temporary file
            if "temp_file_path" in locals() and os.path.exists(temp_file_path):
                try:
                    os.unlink(temp_file_path)
                except (OSError, PermissionError) as e:
                    # Log but don't fail the operation for cleanup issues
                    print(f"Warning: Failed to clean up temporary file {temp_file_path}: {e}")

    def __del__(self):
        """Clean up Docker client on deletion."""
        try:
            if hasattr(self, "client") and self.client is not None:
                self.client.close()
        except Exception:
            # Ignore cleanup errors during object destruction
            pass


================================================
FILE: openchatbi/code/executor_base.py
================================================
from typing import Any


class ExecutorBase:
    """Base class for executing python code."""

    _variable: dict

    def __init__(self, variable: dict = None):
        if variable is None:
            self._variable = {}
        else:
            self._variable = variable

    def run_code(self, code: str) -> (bool, str):
        """Execute python code."""
        raise NotImplementedError()

    def set_variable(self, key: str, value: Any) -> None:
        """Set variable."""
        self._variable[key] = value


================================================
FILE: openchatbi/code/local_executor.py
================================================
import sys
from io import StringIO

from openchatbi.code.executor_base import ExecutorBase


class LocalExecutor(ExecutorBase):

    def run_code(self, code: str) -> str:
        safe_globals = {"__builtins__": __builtins__}
        original_stdout = sys.stdout
        output_buffer = StringIO()
        sys.stdout = output_buffer
        try:
            exec(code, safe_globals, safe_globals)
            output = output_buffer.getvalue()
            return True, output
        except Exception as e:
            return False, str(e)
        finally:
            sys.stdout = original_stdout


================================================
FILE: openchatbi/code/restricted_local_executor.py
================================================
import sys
from io import StringIO

from RestrictedPython import compile_restricted, safe_globals, utility_builtins
from RestrictedPython.Guards import safe_builtins, safer_getattr

from openchatbi.code.executor_base import ExecutorBase


class RestrictedLocalExecutor(ExecutorBase):

    def run_code(self, code: str) -> (bool, str):
        try:
            # compile restricted code
            byte_code = compile_restricted(code, "<string>", "exec")
            if byte_code is None:
                return False, "Failed to compile restricted code"

            restricted_locals = {}
            restricted_globals = safe_globals.copy()

            # Set up restricted environment with necessary functions
            restricted_globals.update(safe_builtins)
            restricted_globals["_getattr_"] = safer_getattr
            restricted_globals["__builtins__"] = utility_builtins

            # Add variable definitions to the restricted locals
            for key, value in self._variable.items():
                restricted_locals[key] = value

            # Capture print output
            original_stdout = sys.stdout
            output_buffer = StringIO()
            sys.stdout = output_buffer

            # Use the standard print function for RestrictedPython
            restricted_globals["_print_"] = lambda *args, **kwargs: print(*args, **kwargs)

            exec(byte_code, restricted_globals, restricted_locals)
            output = output_buffer.getvalue()

            return True, output

        except Exception as e:
            return False, str(e)
        finally:
            if "original_stdout" in locals():
                sys.stdout = original_stdout


================================================
FILE: openchatbi/config.yaml.template
================================================
organization: The Company
dialect: presto
bi_config_file: example/bi.yaml

# Python Code Execution Configuration
# Options: "local", "restricted_local", "docker"
# - local: Run code in the current Python process (fastest, least secure)
# - restricted_local: Run code with RestrictedPython (moderate security, some limitations)
# - docker: Run code in isolated Docker containers (slowest, most secure, requires Docker to be installed)
python_executor: local

# Visualization configuration
# Options: "rule" (rule-based), "llm" (LLM-based), or null (skip visualization)
# visualization_mode: llm

# Context management configuration
# Controls how conversation context is managed and compressed when it becomes too long
context_config:
  # Enable/disable context management entirely
  enabled: true

  # Token limit that triggers context management (when conversation exceeds this, compression starts)
  summary_trigger_tokens: 12000

  # Number of recent messages to always preserve in full (never compress these)
  keep_recent_messages: 20

  # Historical tool output compression limits
  max_tool_output_length: 2000  # Max length for historical tool outputs
  max_sql_result_rows: 50       # Max rows to keep in CSV results
  max_code_output_lines: 50     # Max lines for code execution output

  # Conversation summarization settings
  enable_summarization: true         # Enable conversation summarization
  enable_conversation_summary: true  # Enable detailed conversation summary
  summary_max_messages: 50           # Max messages to include in summary context

  # Content preservation settings
  preserve_tool_errors: true    # Always preserve error messages in full
  preserve_recent_sql: true     # Preserve SQL content (less aggressive compression)

# Time Series Forecasting Service Configuration
# URL for the time series forecasting service endpoint, adjust based on your deployment scenario:
# - Local development (OpenChatBI on host, Forecasting service in Docker): "http://localhost:8765"
# - Remote service: "http://your-service-host:8765"
timeseries_forecasting_service_url: "http://localhost:8765"

# Catalog store configuration
catalog_store:
  store_type: file_system
  data_path: ./example

# Data warehouse configuration
data_warehouse_config:
  uri: "presto://{user_name}@domain:8080/db/default"
  include_tables:
    - null  # null means include all tables, or specify yaml list
  database_name: "db.default"  # database name to use in catalog
  token_service: "https://tokens-domain:8080/v1"
  user_name: TOKEN_SERVICE_USER_NAME
  password: TOKEN_SERVICE_PASSWORD

# Vector database (chroma) path
# vector_db_path: ./.chroma_db

# LLM configurations (multiple providers)
#
# 1) Define providers under `llm_providers`
# 2) Select which one to use by setting `default_llm: <provider_name>`
default_llm: openai
llm_providers:
  openai:
    default_llm:
      class: langchain_openai.ChatOpenAI
      params:
        api_key: YOUR_API_KEY_HERE
        model: gpt-4.1
        temperature: 0.01
        max_tokens: 8192
    embedding_model:
      class: langchain_openai.OpenAIEmbeddings
      params:
        api_key: YOUR_API_KEY_HERE
        model: text-embedding-3-large
        chunk_size: 1024
    # Optional
    text2sql_llm:
      class: langchain_openai.ChatOpenAI
      params:
        api_key: YOUR_API_KEY_HERE
        model: gpt-4.1
        temperature: 0.0
        max_tokens: 8192
  # anthropic:
  #   default_llm:
  #     class: langchain_anthropic.ChatAnthropic
  #     params:
  #       api_key: YOUR_API_KEY_HERE
  #       model: claude-3-5-sonnet-latest

# MCP (Model Context Protocol) server configurations
mcp_servers:
  # File system MCP server (stdio transport)
  - name: filesystem
    transport: stdio
    command: ["npx", "-y", "@modelcontextprotocol/server-filesystem"]
    args: ["--path", "/tmp"]
    enabled: false
    timeout: 30
  

  # Example HTTP-based MCP server (streamable_http transport)
  - name: weather
    transport: streamable_http
    url: "http://localhost:8000/mcp/"
    headers:
      Authorization: "Bearer YOUR_TOKEN"
    enabled: false
    timeout: 30


================================================
FILE: openchatbi/config_loader.py
================================================
import importlib
import os
from importlib.util import find_spec
from typing import Any
from unittest.mock import MagicMock

from langchain_core.language_models import BaseChatModel
from pydantic import BaseModel

from openchatbi.catalog.factory import create_catalog_store
from openchatbi.utils import log


class LLMProviderConfig(BaseModel):
    """Resolved LLM objects for a single provider."""

    model_config = {"arbitrary_types_allowed": True}

    default_llm: BaseChatModel | MagicMock
    embedding_model: BaseModel | MagicMock | None = None
    text2sql_llm: BaseChatModel | MagicMock | None = None


class Config(BaseModel):
    """Configuration model for the OpenChatBI application.

    Attributes:
        organization (str): Organization name. Defaults to "The Company".
        dialect (str): SQL dialect to use. Defaults to "presto".
        default_llm (BaseChatModel): Default language model for general tasks.
        embedding_model (BaseModel): Language model for embedding generation.
        text2sql_llm (Optional[BaseChatModel]): Language model specifically for text-to-SQL tasks.
        bi_config (Dict[str, Any]): BI configuration loaded from YAML file. Defaults to empty dict.
        data_warehouse_config (Dict[str, Any]): Data warehouse configuration. Defaults to empty dict.
    """

    model_config = {"arbitrary_types_allowed": True}

    # General Configurations
    organization: str = "The Company"
    dialect: str = "presto"

    # LLM Configurations
    default_llm: BaseChatModel | MagicMock
    embedding_model: BaseModel | MagicMock | None = None
    text2sql_llm: BaseChatModel | MagicMock | None = None
    # Multiple LLM providers (optional)
    llm_provider: str | None = None
    llm_providers: dict[str, LLMProviderConfig] = {}

    # BI Configuration
    bi_config: dict[str, Any] = {}

    # Data Warehouse Configuration
    data_warehouse_config: dict[str, Any] = {}

    # Catalog Store
    catalog_store: Any = None

    # Path to the vector database file
    vector_db_path: str = None

    # MCP Servers Configuration
    mcp_servers: list[dict[str, Any]] = []

    # Report Configuration
    report_directory: str = "./data"

    # Code Execution Configuration
    python_executor: str = "local"  # Options: "local", "restricted_local", "docker"

    # Visualization Configuration
    visualization_mode: str | None = "rule"  # Options: "rule", "llm", None (skip visualization)

    # Context Management Configuration
    context_config: dict[str, Any] = {}

    # Time Series Service Configuration
    timeseries_forecasting_service_url: str = "http://localhost:8765"

    @classmethod
    def from_dict(cls, config: dict[str, Any]) -> "Config":
        """Creates a Config instance from a dictionary.

        Args:
            config (Dict[str, Any]): Dictionary containing configuration values.

        Returns:
            Config: A new Config instance with the provided values.
        """
        return cls(**config)


class ConfigLoader:
    """Singleton class to load and manage configuration settings for OpenChatBI.

    This class provides methods to load, get, and set configuration parameters
    for the application, including LLM models, SQL dialect, and other settings.
    """

    _instance = None
    _config: Config = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

    llm_configs = ["default_llm", "embedding_model", "text2sql_llm"]

    def get(self) -> Config:
        """Get the current configuration.

        Returns:
            Config: The current configuration instance.

        Raises:
            ValueError: If the configuration has not been loaded.
        """
        if self._config is None:
            raise ValueError("Configuration has not been loaded. Please call load() or set() first.")
        return self._config

    def load(self, config_file: str = None) -> None:
        """Load configuration from a YAML file.

        Args:
            config_file (str, optional): Path to configuration file. Uses CONFIG_FILE
                environment variable or 'openchatbi/config.yaml' if not provided.

        Raises:
            ImportError: If pyyaml is not installed.
            FileNotFoundError: If the configuration file cannot be found.
        """
        if config_file is None:
            config_file = os.getenv("CONFIG_FILE", "openchatbi/config.yaml")

        if not find_spec("yaml"):
            raise ImportError("Please install pyyaml to use this feature.")

        import yaml

        try:
            with open(config_file, encoding="utf-8") as file:
                config_data = yaml.safe_load(file)
                if config_data is None:
                    config_data = {}
        except FileNotFoundError:
            log(f"Configuration file not found: {config_file}, leave config un-loaded.")
            return
        except yaml.YAMLError as e:
            raise ValueError(f"Invalid YAML in configuration file {config_file}: {e}")
        except Exception as e:
            raise RuntimeError(f"Failed to read configuration file {config_file}: {e}")

        self._process_config_dict(config_data)
        self._config = Config.from_dict(config_data)

    def _process_config_dict(self, config_data: dict[str, Any]) -> None:
        """
        Processes a configuration dictionary.
        """
        self._process_llm_providers(config_data)

        providers = config_data.get("llm_providers", {})
        selected_provider = None

        default_llm_value = config_data.get("default_llm")
        if isinstance(default_llm_value, str):
            # Simplified multi-provider config: default_llm: <provider_name>
            if not providers:
                raise ValueError("default_llm is a provider name but llm_providers is missing.")
            selected_provider = default_llm_value
        elif providers:
            # Backwards-compat: allow selecting provider via llm_provider
            legacy_provider = config_data.get("llm_provider")
            if isinstance(legacy_provider, str):
                selected_provider = legacy_provider
            elif "default_llm" not in config_data:
                # Pick the first provider in config order for backwards-compatible YAML behavior
                selected_provider = next(iter(providers.keys()), None)
            elif isinstance(default_llm_value, dict):
                raise ValueError(
                    "When using llm_providers, set default_llm to a provider name (e.g. default_llm: openai), "
                    "not a class config."
                )

        if providers:
            if not selected_provider or selected_provider not in providers:
                raise ValueError(f"Unknown LLM provider '{selected_provider}'. Available: {sorted(providers.keys())}")
            # Store selected provider for runtime lookups (UI/API can still override per-request)
            config_data["llm_provider"] = selected_provider
            # Populate top-level LLM objects for legacy call sites
            config_data["default_llm"] = providers[selected_provider].default_llm
            config_data.setdefault("embedding_model", providers[selected_provider].embedding_model)
            config_data.setdefault("text2sql_llm", providers[selected_provider].text2sql_llm)
        elif "default_llm" not in config_data:
            raise ValueError("Missing LLM config key: default_llm")

        if not config_data.get("embedding_model"):
            log("WARN: Missing LLM config key: embedding_model, will use BM25 based retrival only")
        if "data_warehouse_config" not in config_data:
            raise ValueError("Missing Data Warehouse config key: data_warehouse_config")

        # Load BI configuration
        if "bi_config_file" in config_data:
            bi_config = self.load_bi_config(config_data["bi_config_file"])
            bi_config.update(config_data.get("bi_config", {}))
            config_data["bi_config"] = bi_config

        if "catalog_store" in config_data:
            if "store_type" not in config_data["catalog_store"]:
                raise ValueError("catalog_store must have a store_type field.")
            catalog_store = create_catalog_store(
                **config_data["catalog_store"],
                auto_load=config_data["catalog_store"].get("auto_load", True),
                data_warehouse_config=config_data.get("data_warehouse_config"),
            )
        else:
            log("Catalog store config key `catalog_store` not found. Using default file system store.")
            catalog_store = create_catalog_store(
                store_type="file_system",
                auto_load=True,
                data_warehouse_config=config_data.get("data_warehouse_config"),
            )
        config_data["catalog_store"] = catalog_store

        for config_key in self.llm_configs:
            config_item = config_data.get(config_key)
            if not isinstance(config_item, dict) or "class" not in config_item:
                continue
            config_data[config_key] = self._instantiate_from_config_dict(config_item, config_key=config_key)

    def _instantiate_from_config_dict(self, config_item: dict[str, Any], *, config_key: str) -> Any:
        try:
            class_path = config_item["class"]
            if "." not in class_path:
                raise ValueError(f"Invalid class path format: {class_path}")
            module_name, class_name = class_path.rsplit(".", 1)
            module = importlib.import_module(module_name)
            llm_cls = getattr(module, class_name)
            params = config_item.get("params", {})
            return llm_cls(**params)
        except (ImportError, AttributeError, ValueError, TypeError) as e:
            raise RuntimeError(f"Failed to load {config_key} class '{config_item.get('class', '')}': {e}") from e

    def _process_llm_providers(self, config_data: dict[str, Any]) -> None:
        """Resolve llm_providers into instantiated provider configs (if present)."""
        raw_providers = config_data.get("llm_providers")
        if not raw_providers:
            return
        if not isinstance(raw_providers, dict):
            raise ValueError("llm_providers must be a mapping of provider_name -> config")

        providers: dict[str, LLMProviderConfig] = {}
        for provider_name, provider_cfg in raw_providers.items():
            if isinstance(provider_cfg, LLMProviderConfig):
                providers[str(provider_name)] = provider_cfg
                continue
            if not isinstance(provider_cfg, dict):
                raise ValueError(f"llm_providers.{provider_name} must be a mapping")

            resolved_cfg: dict[str, Any] = dict(provider_cfg)
            for config_key in self.llm_configs:
                config_item = resolved_cfg.get(config_key)
                if not isinstance(config_item, dict) or "class" not in config_item:
                    continue
                resolved_cfg[config_key] = self._instantiate_from_config_dict(
                    config_item, config_key=f"llm_providers.{provider_name}.{config_key}"
                )

            if "default_llm" not in resolved_cfg or resolved_cfg["default_llm"] is None:
                raise ValueError(f"llm_providers.{provider_name} missing default_llm")

            providers[str(provider_name)] = LLMProviderConfig(**resolved_cfg)

        config_data["llm_providers"] = providers

    def load_bi_config(self, bi_config_file: str) -> dict[str, Any]:
        """Load BI configuration from a YAML file.

        Args:
            bi_config_file (str): Path to the BI configuration file.
                Defaults to 'example/bi.yaml'.

        Returns:
            Dict[str, Any]: The loaded BI configuration as a dictionary.

        Raises:
            ImportError: If pyyaml is not installed.
            FileNotFoundError: If the BI configuration file cannot be found.
        """
        if not find_spec("yaml"):
            raise ImportError("Please install pyyaml to use this feature.")

        import yaml

        bi_config_data = {}

        try:
            with open(bi_config_file, encoding="utf-8") as file:
                bi_config_data = yaml.safe_load(file) or {}
        except FileNotFoundError:
            log(f"Warning: BI config file '{bi_config_file}' not found. Ignore load BI config from yaml file.")
        except yaml.YAMLError as e:
            log(f"Warning: Invalid YAML in BI config file '{bi_config_file}': {e}. Using empty config.")
        except Exception as e:
            log(f"Warning: Failed to read BI config file '{bi_config_file}': {e}. Using empty config.")

        return bi_config_data

    def set(self, config: dict[str, Any]) -> None:
        """Set the configuration from a dictionary.

        Args:
            config (Dict[str, Any]): Dictionary containing configuration values.
        """
        self._process_config_dict(config)
        self._config = Config.from_dict(config)


================================================
FILE: openchatbi/constants.py
================================================
"""Constants used throughout the OpenChatBI application."""

# Date/time format strings
datetime_format = "%Y-%m-%d %H:%M:%S"
date_format = "%Y-%m-%d"
datetime_format_ms = "%Y-%m-%d %H:%M:%S.%f"
datetime_format_ms_T = "%Y-%m-%dT%H:%M:%S.%fZ"

# SQL execution status codes
SQL_NA = "SQL_NA"
SQL_SUCCESS = "SQL_SUCCESS"
SQL_EXECUTE_TIMEOUT = "SQL_CHECK_TIMEOUT"
SQL_SYNTAX_ERROR = "SQL_SYNTAX_ERROR"
SQL_UNKNOWN_ERROR = "SQL_UNKNOWN_ERROR"


MCP_TOOL_DEFAULT_TIMEOUT_SECONDS = 60


================================================
FILE: openchatbi/context_config.py
================================================
"""Configuration for context management settings."""

from dataclasses import dataclass

from openchatbi import config


@dataclass
class ContextConfig:
    """Configuration class for context management settings."""

    # Enable/disable context management
    enabled: bool = True

    # Token limits for triggering context management
    summary_trigger_tokens: int = 12000

    # Message retention (how many recent messages to always preserve)
    keep_recent_messages: int = 20

    # Historical tool output compression limits
    max_tool_output_length: int = 2000  # Max length for historical tool outputs
    max_sql_result_rows: int = 50  # Max rows to keep in CSV results
    max_code_output_lines: int = 50  # Max lines for code execution output

    # Conversation summarization
    enable_summarization: bool = True
    enable_conversation_summary: bool = True
    summary_max_messages: int = 50  # Max messages to include in summary context

    # Content preservation settings
    preserve_tool_errors: bool = True  # Always preserve error messages in full
    preserve_recent_sql: bool = True  # Preserve SQL content (less aggressive compression)


def get_context_config() -> ContextConfig:
    """Get the current context configuration.

    This function loads context configuration from the main config system.
    Falls back to default configuration if not available.

    Returns:
        ContextConfig: The current context configuration
    """
    try:
        main_config = config.get()

        # Check if context_config exists in the main config
        if hasattr(main_config, "context_config") and main_config.context_config:
            context_config_dict = main_config.context_config
            # Create ContextConfig from the loaded configuration
            context_config = ContextConfig()
            for key, value in context_config_dict.items():
                if hasattr(context_config, key):
                    setattr(context_config, key, value)
            return context_config
    except (ImportError, ValueError, AttributeError):
        # Fall back to default if config system is not available or configured
        pass

    return ContextConfig()


def update_context_config(**kwargs) -> ContextConfig:
    """Update context configuration with new values.

    Args:
        **kwargs: Configuration parameters to update

    Returns:
        ContextConfig: Updated configuration
    """
    config = get_context_config()
    for key, value in kwargs.items():
        if hasattr(config, key):
            setattr(config, key, value)
    return config


================================================
FILE: openchatbi/context_manager.py
================================================
"""Context management utilities for handling long conversations."""

import json
import re
import uuid

from langchain_core.language_models import BaseChatModel
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage, ToolMessage

from openchatbi.context_config import ContextConfig, get_context_config
from openchatbi.llm.llm import call_llm_chat_model_with_retry
from openchatbi.prompts.system_prompt import get_summary_prompt_template
from openchatbi.utils import log


class ContextManager:
    """Manages conversation context to prevent token limit issues."""

    def __init__(self, llm: BaseChatModel, config: ContextConfig = None):
        """Initialize context manager.

        Args:
            llm: Language model for summarization
            config: Context configuration. If None, uses default config.
        """
        self.llm = llm
        self.config = config or get_context_config()

    # ============================================================================
    # PUBLIC API METHODS
    # ============================================================================

    def manage_context_messages(self, messages: list) -> None:
        """Main context management function that directly modifies messages list.

        Args:
            messages: The list of messages to manage (modified in place)
        """
        if not self.config.enabled:
            return

        if not messages:
            return

        # Check if we need to manage context
        estimated_tokens = self.estimate_message_tokens(messages)
        if estimated_tokens <= self.config.summary_trigger_tokens:
            return  # No action needed

        log(f"Context management triggered: {estimated_tokens} tokens > {self.config.summary_trigger_tokens}")

        # Apply historical tool message compression directly
        self._compress_historical_tool_messages(messages)

        # Check if we still need summarization after compression
        remaining_tokens = self.estimate_message_tokens(messages)
        if remaining_tokens > self.config.summary_trigger_tokens and self.config.enable_summarization:
            self._apply_conversation_summarization(messages)

        log("Context management completed")

    # ============================================================================
    # TOKEN ESTIMATION METHODS
    # ============================================================================

    @staticmethod
    def estimate_tokens(text: str) -> int:
        """Rough token estimation (1 token ≈ 4 characters for most languages)."""
        return len(text) // 4

    def estimate_message_tokens(self, messages: list[BaseMessage]) -> int:
        """Estimate total tokens in a list of messages."""
        total = 0
        for msg in messages:
            total += self.estimate_tokens(str(msg.content))
            # Add tokens for metadata and structure
            total += 50
        return total

    # ============================================================================
    # TOOL OUTPUT TRIMMING METHODS
    # ============================================================================

    def trim_tool_output(self, content: str, tool_name: str = "") -> str:
        """Trim tool output to manageable size while preserving key information."""
        if len(content) <= self.config.max_tool_output_length:
            return content

        # Preserve full error messages if configured
        if self.config.preserve_tool_errors and ("Error:" in content or "Traceback" in content):
            return content

        # For SQL results, preserve structure
        if "```sql" in content or "```csv" in content:
            return self._trim_structured_output(content)

        # For code execution results
        if "```python" in content or "Traceback" in content:
            return self._trim_code_output(content)

        # Generic trimming
        max_len = self.config.max_tool_output_length
        trimmed = content[: max_len // 2] + "\n\n... [Output truncated] ...\n\n" + content[-max_len // 2 :]
        return trimmed

    def _trim_structured_output(self, content: str) -> str:
        """Trim SQL/CSV output while preserving structure."""
        parts = []

        # Extract SQL query (always keep)
        sql_match = re.search(r"```sql\n(.*?)\n```", content, re.DOTALL)
        if sql_match:
            parts.append(f"```sql\n{sql_match.group(1)}\n```")

        # Extract and trim CSV data
        csv_match = re.search(r"```csv\n(.*?)\n```", content, re.DOTALL)
        if csv_match:
            csv_data = csv_match.group(1)
            lines = csv_data.split("\n")
            max_rows = self.config.max_sql_result_rows

            if len(lines) > max_rows:  # Keep header + first half + last quarter
                keep_start = max_rows // 2
                keep_end = max_rows // 4
                trimmed_csv = "\n".join(
                    lines[: keep_start + 1]
                    + [f"... [{len(lines) - keep_start - keep_end - 1} rows omitted] ..."]
                    + lines[-keep_end:]
                )
                parts.append(f"```csv\n{trimmed_csv}\n```")
            else:
                parts.append(f"```csv\n{csv_data}\n```")

        # Keep visualization info
        viz_match = re.search(r"Visualization Created:.*", content)
        if viz_match:
            parts.append(viz_match.group(0))

        return "\n\n".join(parts)

    def _trim_code_output(self, content: str) -> str:
        """Trim Python code execution output."""
        # Keep error messages (full) if configured
        if self.config.preserve_tool_errors and ("Traceback" in content or "Error:" in content):
            return content

        lines = content.split("\n")
        max_lines = self.config.max_code_output_lines

        if len(lines) <= max_lines:
            return content

        # Keep first half and last quarter
        keep_start = max_lines // 2
        keep_end = max_lines // 4
        return "\n".join(lines[:keep_start] + ["... [Output truncated] ..."] + lines[-keep_end:])

    # ============================================================================
    # CONVERSATION SUMMARIZATION METHODS
    # ============================================================================

    def summarize_conversation(self, messages: list[BaseMessage]) -> str:
        """Create a summary of conversation history."""
        if not self.config.enable_conversation_summary:
            return ""

        # Filter out system messages for summarization
        # Note: The messages passed in are already historical messages (split point already calculated)
        messages_to_summarize = []
        for msg in messages:
            if not isinstance(msg, SystemMessage):
                messages_to_summarize.append(msg)

        if not messages_to_summarize:
            return ""

        # Create summarization prompt
        conversation_text = self._format_messages_for_summary(messages_to_summarize)

        # Get the summary prompt template from the file and replace placeholder
        summary_prompt = get_summary_prompt_template().replace("[conversation_text]", conversation_text)

        try:
            response = call_llm_chat_model_with_retry(
                self.llm, [HumanMessage(content=summary_prompt)], parallel_tool_call=False
            )

            if isinstance(response, AIMessage):
                return f"[Conversation Summary]: {response.content}"
            return "[Summary generation failed]"

        except Exception as e:
            log(f"Failed to generate conversation summary: {e}")
            return "[Summary generation failed]"

    def _truncate_text(self, text: str, truncate_len: int = 500) -> str:
        # do not truncate Conversation Summary
        if text.startswith("[Conversation Summary]"):
            return text
        if len(text) > truncate_len:
            return text[:truncate_len] + "... [truncated]"
        return text

    def _truncate_text_or_list(self, content):
        results = []
        if isinstance(content, str):
            results.append(self._truncate_text(content))
        elif isinstance(content, list):
            for item in content:
                if isinstance(item, str):
                    results.append(self._truncate_text(item))
                elif isinstance(item, dict):
                    if item["type"] == "text":
                        results.append(self._truncate_text(item["text"]))
                    elif item["type"] == "tool_use":
                        results.append(json.dumps(item))
        return results

    def _format_messages_for_summary(self, messages: list[BaseMessage]) -> str:
        """Format messages for summary generation."""
        formatted = []
        max_messages = self.config.summary_max_messages

        # Limit messages for summary context
        for msg in mess

Download .txt

gitextract_wkfrsja_/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows/
│       ├── docs.yml
│       ├── publish.yml
│       └── runledger.yml
├── .gitignore
├── CONTRIBUTING.md
├── Dockerfile.python-executor
├── LICENSE
├── README.md
├── baselines/
│   └── runledger-openchatbi.json
├── docs/
│   ├── Makefile
│   ├── make.bat
│   └── source/
│       ├── _templates/
│       │   └── layout.html
│       ├── catalog.rst
│       ├── code.rst
│       ├── conf.py
│       ├── config.rst
│       ├── core.rst
│       ├── index.rst
│       ├── llm.rst
│       ├── text2sql.rst
│       ├── timeseries.rst
│       └── tools.rst
├── evals/
│   ├── __init__.py
│   └── runledger/
│       ├── README.md
│       ├── __init__.py
│       ├── agent/
│       │   └── agent.py
│       ├── cases/
│       │   └── t1.yaml
│       ├── cassettes/
│       │   └── t1.jsonl
│       ├── schema.json
│       ├── suite.yaml
│       └── tools.py
├── example/
│   ├── bi.yaml
│   ├── common_columns.csv
│   ├── config.yaml
│   ├── sql_example.yaml
│   ├── table_columns.csv
│   ├── table_info.yaml
│   └── table_selection_example.csv
├── openchatbi/
│   ├── __init__.py
│   ├── agent_graph.py
│   ├── catalog/
│   │   ├── __init__.py
│   │   ├── catalog_loader.py
│   │   ├── catalog_store.py
│   │   ├── factory.py
│   │   ├── helper.py
│   │   ├── retrival_helper.py
│   │   ├── schema_retrival.py
│   │   ├── store/
│   │   │   ├── __init__.py
│   │   │   └── file_system.py
│   │   └── token_service.py
│   ├── code/
│   │   ├── docker_executor.py
│   │   ├── executor_base.py
│   │   ├── local_executor.py
│   │   └── restricted_local_executor.py
│   ├── config.yaml.template
│   ├── config_loader.py
│   ├── constants.py
│   ├── context_config.py
│   ├── context_manager.py
│   ├── graph_state.py
│   ├── llm/
│   │   └── llm.py
│   ├── prompts/
│   │   ├── agent_prompt.md
│   │   ├── extraction_prompt.md
│   │   ├── schema_linking_prompt.md
│   │   ├── sql_dialect/
│   │   │   └── presto.md
│   │   ├── summary_prompt.md
│   │   ├── system_prompt.py
│   │   ├── text2sql_prompt.md
│   │   └── visualization_prompt.md
│   ├── text2sql/
│   │   ├── __init__.py
│   │   ├── data.py
│   │   ├── extraction.py
│   │   ├── generate_sql.py
│   │   ├── schema_linking.py
│   │   ├── sql_graph.py
│   │   ├── text2sql_utils.py
│   │   └── visualization.py
│   ├── text_segmenter.py
│   ├── tool/
│   │   ├── ask_human.py
│   │   ├── mcp_tools.py
│   │   ├── memory.py
│   │   ├── run_python_code.py
│   │   ├── save_report.py
│   │   ├── search_knowledge.py
│   │   └── timeseries_forecast.py
│   └── utils.py
├── pyproject.toml
├── run_streamlit_ui.py
├── run_tests.py
├── sample_api/
│   └── async_api.py
├── sample_ui/
│   ├── async_graph_manager.py
│   ├── memory_ui.py
│   ├── plotly_utils.py
│   ├── simple_ui.py
│   ├── streaming_ui.py
│   ├── streamlit_ui.py
│   └── style.py
├── tests/
│   ├── README.md
│   ├── __init__.py
│   ├── conftest.py
│   ├── context_management/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── conftest.py
│   │   ├── test_agent_graph_integration.py
│   │   ├── test_context_config.py
│   │   ├── test_context_manager.py
│   │   ├── test_edge_cases.py
│   │   ├── test_runner.py
│   │   └── test_state_operations.py
│   ├── test_catalog_loader.py
│   ├── test_catalog_store.py
│   ├── test_config_loader.py
│   ├── test_graph_state.py
│   ├── test_incomplete_tool_calls.py
│   ├── test_memory.py
│   ├── test_plotly_utils.py
│   ├── test_simple_store.py
│   ├── test_text2sql_extraction.py
│   ├── test_text2sql_generate_sql.py
│   ├── test_text2sql_schema_linking.py
│   ├── test_text2sql_visualization.py
│   ├── test_tools_ask_human.py
│   ├── test_tools_run_python_code.py
│   ├── test_tools_search_knowledge.py
│   └── test_utils.py
└── timeseries_forecasting/
    ├── Dockerfile
    ├── README.md
    ├── app.py
    ├── build_and_run.sh
    ├── model_handler.py
    └── test_forecasting.py

Download .txt

SYMBOL INDEX (728 symbols across 73 files)

FILE: evals/runledger/agent/agent.py
  function _safe_print (line 21) | def _safe_print(*args: Any, **kwargs: Any) -> None:
  class JsonlChannel (line 32) | class JsonlChannel:
    method __init__ (line 33) | def __init__(self, stream: Any) -> None:
    method read (line 36) | def read(self) -> dict[str, Any] | None:
    method send (line 50) | def send(payload: dict[str, Any]) -> None:
  function _last_user_text (line 55) | def _last_user_text(messages: list[Any]) -> str:
  function _runledger_tool_call (line 62) | def _runledger_tool_call(channel: JsonlChannel, name: str, args: dict[st...
  class SearchKnowledgeInput (line 78) | class SearchKnowledgeInput(BaseModel):
  class ShowSchemaInput (line 85) | class ShowSchemaInput(BaseModel):
  class Text2SQLInput (line 90) | class Text2SQLInput(BaseModel):
  class RunPythonInput (line 95) | class RunPythonInput(BaseModel):
  class SaveReportInput (line 100) | class SaveReportInput(BaseModel):
  function _build_tool_proxies (line 106) | def _build_tool_proxies(channel: JsonlChannel) -> dict[str, StructuredTo...
  function _stub_llm_call (line 186) | def _stub_llm_call(chat_model: Any, messages: list[Any], **_kwargs: Any)...
  function _configure_agent_graph (line 204) | def _configure_agent_graph(channel: JsonlChannel) -> None:
  function _bootstrap_config (line 219) | def _bootstrap_config() -> None:
  function main (line 229) | def main() -> int:

FILE: evals/runledger/tools.py
  function _invoke_tool (line 8) | def _invoke_tool(tool, args: dict[str, Any]) -> Any:
  function _search_knowledge (line 12) | def _search_knowledge(args: dict[str, Any]) -> Any:

FILE: openchatbi/__init__.py
  function get_default_graph (line 18) | def get_default_graph():

FILE: openchatbi/agent_graph.py
  function get_mcp_servers (line 43) | def get_mcp_servers():
  function ask_human (line 51) | def ask_human(state: AgentState) -> dict[str, Any]:
  class CallSQLGraphInput (line 72) | class CallSQLGraphInput(BaseModel):
  function _format_sql_response (line 97) | def _format_sql_response(sql_graph_response: dict) -> str:
  function get_sql_tools (line 128) | def get_sql_tools(sql_graph: CompiledStateGraph, sync_mode: bool = False...
  function agent_llm_call (line 185) | def agent_llm_call(llm: BaseChatModel, tools: list, context_manager: Con...
  function _build_graph_core (line 273) | def _build_graph_core(
  function build_agent_graph_sync (line 373) | def build_agent_graph_sync(
  function build_agent_graph_async (line 406) | async def build_agent_graph_async(

FILE: openchatbi/catalog/catalog_loader.py
  class DataCatalogLoader (line 12) | class DataCatalogLoader:
    method __init__ (line 17) | def __init__(self, engine: Engine, include_tables: list[str] | None = ...
    method get_tables_and_columns (line 30) | def get_tables_and_columns(self) -> dict[str, list[dict[str, Any]]]:
    method get_table_indexes (line 84) | def get_table_indexes(self, table_name: str) -> list[dict[str, Any]]:
    method get_foreign_keys (line 101) | def get_foreign_keys(self, table_name: str) -> list[dict[str, Any]]:
    method save_to_catalog_store (line 118) | def save_to_catalog_store(
  function load_catalog_from_data_warehouse (line 184) | def load_catalog_from_data_warehouse(catalog_store: CatalogStore) -> bool:

FILE: openchatbi/catalog/catalog_store.py
  class CatalogStore (line 7) | class CatalogStore(ABC):
    method get_data_warehouse_config (line 24) | def get_data_warehouse_config(self) -> dict:
    method get_sql_engine (line 34) | def get_sql_engine(self) -> Engine:
    method get_database_list (line 44) | def get_database_list(self) -> list[str]:
    method get_table_list (line 54) | def get_table_list(self, database: str | None = None) -> list[str]:
    method get_column_list (line 67) | def get_column_list(self, table: str | None = None, database: str | No...
    method get_table_information (line 81) | def get_table_information(self, table: str, database: str | None = Non...
    method get_sql_examples (line 95) | def get_sql_examples(
    method get_table_selection_examples (line 111) | def get_table_selection_examples(self) -> list[tuple[str, list[str]]]:
    method save_table_information (line 121) | def save_table_information(
    method save_table_sql_examples (line 146) | def save_table_sql_examples(self, table: str, examples: list[dict[str,...
    method save_table_selection_examples (line 161) | def save_table_selection_examples(self, examples: list[tuple[str, list...
    method check_exists (line 174) | def check_exists(self) -> bool:
  function split_db_table_name (line 184) | def split_db_table_name(table: str, database: str | None = None) -> tupl...

FILE: openchatbi/catalog/factory.py
  function create_catalog_store (line 12) | def create_catalog_store(
  function _auto_load_catalog_if_needed (line 46) | def _auto_load_catalog_if_needed(catalog_store: CatalogStore) -> None:

FILE: openchatbi/catalog/helper.py
  function get_requests_session (line 10) | def get_requests_session(token: str, header_extra_params: dict) -> reque...
  function create_sqlalchemy_engine_instance (line 19) | def create_sqlalchemy_engine_instance(data_warehouse_config: dict[str, A...

FILE: openchatbi/catalog/retrival_helper.py
  function get_columns_metadata (line 10) | def get_columns_metadata(catalog):
  function build_column_tables_mapping (line 40) | def build_column_tables_mapping(catalog):
  function build_columns_retriever (line 52) | def build_columns_retriever(catalog, vector_db_path: str = None):

FILE: openchatbi/catalog/schema_retrival.py
  function column_retrieval (line 30) | def column_retrieval(query, db, k=10, threshold=0.5, filter=None):
  function merge_list (line 51) | def merge_list(list1, list2):
  function edit_distance_score (line 55) | def edit_distance_score(key1, key2):
  function edit_distance_search (line 66) | def edit_distance_search(keywords_list, top_k=10, threshold=0.5):
  function bm25_search (line 97) | def bm25_search(query_list, top_k=5, score_threshold=0.5):
  function get_relevant_columns (line 119) | def get_relevant_columns(keywords_list, dimensions, metrics):

FILE: openchatbi/catalog/store/file_system.py
  class FileSystemCatalogStore (line 19) | class FileSystemCatalogStore(CatalogStore):
    method __init__ (line 43) | def __init__(self, data_path: str, data_warehouse_config: dict):
    method _clear_cache (line 90) | def _clear_cache(self) -> None:
    method get_data_warehouse_config (line 102) | def get_data_warehouse_config(self) -> dict:
    method get_sql_engine (line 105) | def get_sql_engine(self) -> Engine:
    method _validate_table_name (line 110) | def _validate_table_name(self, table: str) -> bool:
    method _validate_column_data (line 133) | def _validate_column_data(self, columns: list[dict[str, Any]]) -> bool:
    method _validate_table_information (line 172) | def _validate_table_information(self, information: dict[str, Any]) -> ...
    method _validate_sql_examples (line 198) | def _validate_sql_examples(self, examples: list[dict[str, str]]) -> bool:
    method _load_yaml_file (line 234) | def _load_yaml_file(file_path: str) -> dict:
    method _load_csv_file (line 259) | def _load_csv_file(file_path: str) -> list[dict[str, str]]:
    method _save_yaml_file (line 287) | def _save_yaml_file(file_path: str, data: dict) -> bool:
    method _save_csv_file (line 308) | def _save_csv_file(file_path: str, data: list[dict[str, str]], headers...
    method _load_tables (line 347) | def _load_tables(self) -> dict[str, list[str]]:
    method _load_common_columns (line 364) | def _load_common_columns(self) -> dict[str, dict[str, Any]]:
    method _load_table_spec_columns (line 381) | def _load_table_spec_columns(self) -> dict[str, dict[str, Any]]:
    method _parse_example_text (line 404) | def _parse_example_text(self, example_text: str) -> list[tuple[str, st...
    method get_database_list (line 447) | def get_database_list(self) -> list[str]:
    method _get_all_table_schema (line 456) | def _get_all_table_schema(self) -> dict[str, list[str]]:
    method get_table_list (line 467) | def get_table_list(self, database: str | None = None) -> list[str]:
    method _get_common_columns (line 481) | def _get_common_columns(self) -> dict[str, dict[str, Any]]:
    method _get_table_spec_columns (line 492) | def _get_table_spec_columns(self) -> dict[str, dict[str, Any]]:
    method get_column_list (line 503) | def get_column_list(self, table: str | None = None, database: str | No...
    method get_table_information (line 539) | def get_table_information(self, table: str, database: str | None = Non...
    method get_sql_examples (line 551) | def get_sql_examples(
    method _load_table_selection_examples_from_csv (line 581) | def _load_table_selection_examples_from_csv(file_path: str) -> list[tu...
    method get_table_selection_examples (line 596) | def get_table_selection_examples(self) -> list[tuple[str, list[str]]]:
    method save_table_information (line 603) | def save_table_information(
    method _save_columns (line 644) | def _save_columns(
    method save_table_sql_examples (line 729) | def save_table_sql_examples(self, table: str, examples: list[dict[str,...
    method save_table_selection_examples (line 763) | def save_table_selection_examples(self, examples: list[tuple[str, list...
    method check_exists (line 774) | def check_exists(self) -> bool:

FILE: openchatbi/catalog/token_service.py
  class TokenService (line 8) | class TokenService:
    method __init__ (line 20) | def __init__(self, user_name: str, password: str):
    method apply_token (line 25) | def apply_token(self):
  function apply_token_for_user (line 34) | def apply_token_for_user(token_url: str, user_name: str, password: str):

FILE: openchatbi/code/docker_executor.py
  function check_docker_status (line 13) | def check_docker_status() -> tuple[bool, str]:
  class DockerExecutor (line 44) | class DockerExecutor(ExecutorBase):
    method __init__ (line 47) | def __init__(self, variable: dict = None):
    method _check_docker_availability (line 63) | def _check_docker_availability():
    method _handle_docker_error (line 88) | def _handle_docker_error(error: Exception):
    method _ensure_image_exists (line 107) | def _ensure_image_exists(self):
    method run_code (line 121) | def run_code(self, code: str) -> tuple[bool, str]:
    method __del__ (line 172) | def __del__(self):

FILE: openchatbi/code/executor_base.py
  class ExecutorBase (line 4) | class ExecutorBase:
    method __init__ (line 9) | def __init__(self, variable: dict = None):
    method run_code (line 15) | def run_code(self, code: str) -> (bool, str):
    method set_variable (line 19) | def set_variable(self, key: str, value: Any) -> None:

FILE: openchatbi/code/local_executor.py
  class LocalExecutor (line 7) | class LocalExecutor(ExecutorBase):
    method run_code (line 9) | def run_code(self, code: str) -> str:

FILE: openchatbi/code/restricted_local_executor.py
  class RestrictedLocalExecutor (line 10) | class RestrictedLocalExecutor(ExecutorBase):
    method run_code (line 12) | def run_code(self, code: str) -> (bool, str):

FILE: openchatbi/config_loader.py
  class LLMProviderConfig (line 14) | class LLMProviderConfig(BaseModel):
  class Config (line 24) | class Config(BaseModel):
    method from_dict (line 82) | def from_dict(cls, config: dict[str, Any]) -> "Config":
  class ConfigLoader (line 94) | class ConfigLoader:
    method __new__ (line 104) | def __new__(cls):
    method get (line 111) | def get(self) -> Config:
    method load (line 124) | def load(self, config_file: str = None) -> None:
    method _process_config_dict (line 159) | def _process_config_dict(self, config_data: dict[str, Any]) -> None:
    method _instantiate_from_config_dict (line 234) | def _instantiate_from_config_dict(self, config_item: dict[str, Any], *...
    method _process_llm_providers (line 247) | def _process_llm_providers(self, config_data: dict[str, Any]) -> None:
    method load_bi_config (line 279) | def load_bi_config(self, bi_config_file: str) -> dict[str, Any]:
    method set (line 312) | def set(self, config: dict[str, Any]) -> None:

FILE: openchatbi/context_config.py
  class ContextConfig (line 9) | class ContextConfig:
  function get_context_config (line 36) | def get_context_config() -> ContextConfig:
  function update_context_config (line 64) | def update_context_config(**kwargs) -> ContextConfig:

FILE: openchatbi/context_manager.py
  class ContextManager (line 16) | class ContextManager:
    method __init__ (line 19) | def __init__(self, llm: BaseChatModel, config: ContextConfig = None):
    method manage_context_messages (line 33) | def manage_context_messages(self, messages: list) -> None:
    method estimate_tokens (line 67) | def estimate_tokens(text: str) -> int:
    method estimate_message_tokens (line 71) | def estimate_message_tokens(self, messages: list[BaseMessage]) -> int:
    method trim_tool_output (line 84) | def trim_tool_output(self, content: str, tool_name: str = "") -> str:
    method _trim_structured_output (line 106) | def _trim_structured_output(self, content: str) -> str:
    method _trim_code_output (line 141) | def _trim_code_output(self, content: str) -> str:
    method summarize_conversation (line 162) | def summarize_conversation(self, messages: list[BaseMessage]) -> str:
    method _truncate_text (line 196) | def _truncate_text(self, text: str, truncate_len: int = 500) -> str:
    method _truncate_text_or_list (line 204) | def _truncate_text_or_list(self, content):
    method _format_messages_for_summary (line 219) | def _format_messages_for_summary(self, messages: list[BaseMessage]) ->...
    method _compress_historical_tool_messages (line 247) | def _compress_historical_tool_messages(self, messages: list[BaseMessag...
    method _apply_conversation_summarization (line 274) | def _apply_conversation_summarization(self, messages: list[BaseMessage...
    method _find_safe_split_point (line 306) | def _find_safe_split_point(self, messages: list[BaseMessage]) -> int:
    method _should_compress_historical_tool_message (line 333) | def _should_compress_historical_tool_message(self, tool_msg: ToolMessa...
    method _is_error_content (line 366) | def _is_error_content(self, content: str) -> bool:
    method _is_sql_content (line 390) | def _is_sql_content(self, content: str) -> bool:
    method _is_data_query_result (line 406) | def _is_data_query_result(self, content: str) -> bool:
    method _is_python_execution_result (line 420) | def _is_python_execution_result(self, content: str) -> bool:

FILE: openchatbi/graph_state.py
  function add_history_messages (line 10) | def add_history_messages(left: list, right: list):
  class AgentState (line 18) | class AgentState(MessagesState):
  class SQLGraphState (line 31) | class SQLGraphState(MessagesState):
  class InputState (line 49) | class InputState(MessagesState):
  class OutputState (line 55) | class OutputState(MessagesState):
  class SQLOutputState (line 61) | class SQLOutputState(MessagesState):

FILE: openchatbi/llm/llm.py
  function list_llm_providers (line 13) | def list_llm_providers() -> list[str]:
  function _get_provider_config (line 22) | def _get_provider_config(provider: str | None):
  function get_embedding_model (line 34) | def get_embedding_model(provider: str | None = None):
  function get_default_llm (line 42) | def get_default_llm(provider: str | None = None):
  function get_llm (line 50) | def get_llm(provider: str | None = None):
  function get_text2sql_llm (line 55) | def get_text2sql_llm(provider: str | None = None):
  function _invalid_tool_names (line 63) | def _invalid_tool_names(valid_tools, tool_calls) -> str:
  function call_llm_chat_model_with_retry (line 71) | def call_llm_chat_model_with_retry(

FILE: openchatbi/prompts/system_prompt.py
  function get_basic_knowledge (line 17) | def get_basic_knowledge():
  function get_data_warehouse_introduction (line 25) | def get_data_warehouse_introduction():
  function get_agent_extra_tool_use_rule (line 33) | def get_agent_extra_tool_use_rule():
  function get_organization (line 41) | def get_organization():
  function get_dialect_rules (line 49) | def get_dialect_rules():
  function get_agent_prompt_template (line 65) | def get_agent_prompt_template() -> str:
  function get_extraction_prompt_template (line 80) | def get_extraction_prompt_template() -> str:
  function get_table_selection_prompt_template (line 93) | def get_table_selection_prompt_template() -> str:
  function get_text2sql_prompt_template (line 105) | def get_text2sql_prompt_template() -> str:
  function get_visualization_prompt_template (line 119) | def get_visualization_prompt_template() -> str:
  function get_summary_prompt_template (line 128) | def get_summary_prompt_template() -> str:
  function get_text2sql_dialect_prompt_template (line 137) | def get_text2sql_dialect_prompt_template(dialect: str) -> str:
  function reset_cache (line 148) | def reset_cache():

FILE: openchatbi/text2sql/extraction.py
  function generate_extraction_prompt (line 17) | def generate_extraction_prompt() -> str:
  function parse_extracted_info_json (line 31) | def parse_extracted_info_json(llm_answer_content: Any) -> dict[str, Any]:
  function information_extraction (line 49) | def information_extraction(llm: BaseChatModel) -> Callable:
  function information_extraction_conditional_edges (line 95) | def information_extraction_conditional_edges(state: SQLGraphState):

FILE: openchatbi/text2sql/generate_sql.py
  function create_sql_nodes (line 34) | def create_sql_nodes(
  function should_retry_sql (line 354) | def should_retry_sql(state: SQLGraphState) -> str:
  function should_execute_sql (line 387) | def should_execute_sql(state: SQLGraphState) -> str:

FILE: openchatbi/text2sql/schema_linking.py
  function schema_linking (line 17) | def schema_linking(llm: BaseChatModel, catalog: CatalogStore):

FILE: openchatbi/text2sql/sql_graph.py
  function ask_human (line 23) | def ask_human(state):
  function should_generate_visualization_or_retry (line 40) | def should_generate_visualization_or_retry(state: SQLGraphState) -> str:
  function build_sql_graph (line 61) | def build_sql_graph(

FILE: openchatbi/text2sql/text2sql_utils.py
  function init_sql_example_retriever (line 7) | def init_sql_example_retriever(catalog, vector_db_path: str = None):
  function init_table_selection_example_dict (line 34) | def init_table_selection_example_dict(catalog, vector_db_path: str = None):

FILE: openchatbi/text2sql/visualization.py
  class ChartType (line 15) | class ChartType(Enum):
  class VisualizationConfig (line 29) | class VisualizationConfig:
  class VisualizationDSL (line 46) | class VisualizationDSL:
    method to_dict (line 54) | def to_dict(self) -> dict[str, Any]:
  class VisualizationService (line 64) | class VisualizationService:
    method __init__ (line 79) | def __init__(self, llm: BaseChatModel | None = None):
    method _get_chart_type_by_rule (line 87) | def _get_chart_type_by_rule(self, question: str, schema_info: dict[str...
    method generate_visualization_dsl (line 129) | def generate_visualization_dsl(
    method _llm_recommend_chart_type (line 246) | def _llm_recommend_chart_type(self, question: str, schema_info: dict[s...
    method generate_visualization (line 278) | def generate_visualization(

FILE: openchatbi/text_segmenter.py
  class TextSegmenter (line 19) | class TextSegmenter:
    method __init__ (line 30) | def __init__(self, use_jieba: bool = True):
    method _contains_chinese (line 47) | def _contains_chinese(text: str) -> bool:
    method _simple_cut (line 58) | def _simple_cut(self, text: str) -> list[str]:
    method cut (line 74) | def cut(self, text: str) -> list[str]:
  class SimpleSegmenter (line 97) | class SimpleSegmenter:
    method __init__ (line 107) | def __init__(self):
    method cut (line 114) | def cut(self, text: str) -> list[str]:

FILE: openchatbi/tool/ask_human.py
  class AskHuman (line 6) | class AskHuman(BaseModel):

FILE: openchatbi/tool/mcp_tools.py
  function make_tool_sync_compatible (line 21) | def make_tool_sync_compatible(tool: StructuredTool, timeout: int) -> Str...
  class MCPServerConfig (line 81) | class MCPServerConfig(BaseModel):
  function create_mcp_tools_async (line 101) | async def create_mcp_tools_async(server_configs: list[dict[str, Any]]) -...
  function create_mcp_tools_sync (line 195) | def create_mcp_tools_sync(server_configs: list[dict[str, Any]]) -> list[...
  function get_mcp_tools_async (line 237) | async def get_mcp_tools_async(server_configs: list[dict[str, Any]]) -> l...
  function reset_mcp_tools_cache (line 254) | def reset_mcp_tools_cache() -> None:

FILE: openchatbi/tool/memory.py
  class UserProfile (line 39) | class UserProfile(BaseModel):
  function get_sync_memory_store (line 48) | def get_sync_memory_store() -> SqliteStore | None:
  function get_async_memory_store (line 72) | async def get_async_memory_store() -> AsyncSqliteStore | None:
  function cleanup_async_memory_store (line 92) | async def cleanup_async_memory_store() -> None:
  function setup_async_memory_store (line 105) | async def setup_async_memory_store() -> Any:
  function fix_schema_for_openai (line 110) | def fix_schema_for_openai(schema: dict) -> None:
  function get_memory_manager (line 130) | def get_memory_manager() -> Any:
  class StructuredToolWithRequired (line 142) | class StructuredToolWithRequired(StructuredTool):
    method __init__ (line 143) | def __init__(self, orig_tool: StructuredTool):
    method tool_call_schema (line 154) | def tool_call_schema(self) -> "ArgsSchema":
  function get_memory_tools (line 166) | def get_memory_tools(
  function get_async_memory_tools (line 188) | async def get_async_memory_tools(llm: BaseChatModel) -> list[StructuredT...

FILE: openchatbi/tool/run_python_code.py
  class PythonCodeInput (line 13) | class PythonCodeInput(BaseModel):
  function _create_executor (line 18) | def _create_executor():
  function run_python_code (line 51) | def run_python_code(reasoning: str, code: str) -> str:

FILE: openchatbi/tool/save_report.py
  class SaveReportInput (line 13) | class SaveReportInput(BaseModel):
  function save_report (line 22) | def save_report(content: str, title: str, file_format: str = "md") -> str:

FILE: openchatbi/tool/search_knowledge.py
  class SearchInput (line 11) | class SearchInput(BaseModel):
  function search_knowledge (line 27) | def search_knowledge(
  class ShowSchemaInput (line 42) | class ShowSchemaInput(BaseModel):
  function show_schema (line 50) | def show_schema(reasoning: str, tables: list[str]) -> list[str]:
  function search_column_from_catalog (line 60) | def search_column_from_catalog(query_list: list[str], with_table_list: b...
  function list_table_from_catalog (line 70) | def list_table_from_catalog(tables: list[str]) -> list[str]:
  function render_column_result (line 92) | def render_column_result(column_list: list[str], with_table_list: bool =...

FILE: openchatbi/tool/timeseries_forecast.py
  class TimeseriesForecastInput (line 16) | class TimeseriesForecastInput(BaseModel):
  function _check_service_health (line 35) | def _check_service_health(service_url: str) -> bool:
  function check_forecast_service_health (line 47) | def check_forecast_service_health() -> bool:
  function _call_timeseries_service (line 56) | def _call_timeseries_service(
  function _format_forecast_result (line 95) | def _format_forecast_result(result: dict[str, Any], reasoning: str, inpu...
  function timeseries_forecast (line 151) | def timeseries_forecast(

FILE: openchatbi/utils.py
  function log (line 22) | def log(args) -> None:
  function get_text_from_content (line 27) | def get_text_from_content(content: str | list[str | dict]) -> str:
  function get_text_from_message_chunk (line 46) | def get_text_from_message_chunk(chunk: AIMessageChunk) -> str:
  function extract_json_from_answer (line 60) | def extract_json_from_answer(answer: str) -> dict:
  function get_report_download_response (line 75) | def get_report_download_response(filename: str) -> FileResponse:
  function _create_chroma_from_texts (line 129) | def _create_chroma_from_texts(
  function create_vector_db (line 148) | def create_vector_db(
  function recover_incomplete_tool_calls (line 223) | def recover_incomplete_tool_calls(state: AgentState) -> list:
  class SimpleStore (line 313) | class SimpleStore(VectorStore):
    method __init__ (line 316) | def __init__(
    method _tokenize (line 344) | def _tokenize(self, text: str) -> list[str]:
    method similarity_search (line 355) | def similarity_search(self, query: str, k: int = 4, **kwargs: Any) -> ...
    method similarity_search_with_score (line 382) | def similarity_search_with_score(self, query: str, k: int = 4, **kwarg...
    method _select_relevance_score_fn (line 409) | def _select_relevance_score_fn(self):
    method add_texts (line 416) | def add_texts(
    method delete (line 460) | def delete(self, ids: list[str] | None = None, **kwargs: Any) -> bool ...
    method get_by_ids (line 495) | def get_by_ids(self, ids: list[str], /) -> list[Document]:
    method from_texts (line 508) | def from_texts(
    method max_marginal_relevance_search (line 531) | def max_marginal_relevance_search(
    method _calculate_similarity (line 611) | def _calculate_similarity(self, doc1: Document, doc2: Document) -> float:

FILE: run_streamlit_ui.py
  function main (line 16) | def main():

FILE: run_tests.py
  function run_command (line 9) | def run_command(cmd, description):
  function main (line 34) | def main():

FILE: sample_api/async_api.py
  function get_or_build_graph (line 25) | async def get_or_build_graph(provider: str | None):
  function lifespan (line 38) | async def lifespan(app: FastAPI):
  class UserRequest (line 50) | class UserRequest(BaseModel):
  function chat_stream (line 60) | async def chat_stream(req: UserRequest):
  function get_user_memories (line 96) | async def get_user_memories(user_id: str):
  function download_report (line 140) | async def download_report(filename: str):

FILE: sample_ui/async_graph_manager.py
  class AsyncGraphManager (line 13) | class AsyncGraphManager:
    method __init__ (line 16) | def __init__(self):
    method initialize (line 24) | async def initialize(self):
    method get_graph (line 52) | async def get_graph(self, llm_provider: str | None = None):
    method cleanup (line 71) | async def cleanup(self):

FILE: sample_ui/memory_ui.py
  function get_thread_memory_store (line 13) | def get_thread_memory_store() -> Any:
  function list_all_memories (line 33) | def list_all_memories() -> list[dict[str, Any]]:
  function format_memories_for_display (line 67) | def format_memories_for_display(memories: list[dict[str, Any]]) -> str:
  function refresh_memories (line 111) | def refresh_memories() -> list[list[str]]:
  function delete_memory_by_key (line 117) | def delete_memory_by_key(namespace_str: str, key: str) -> str:

FILE: sample_ui/plotly_utils.py
  function create_plotly_chart (line 11) | def create_plotly_chart(data_csv: str, visualization_dsl: dict[str, Any]...
  function create_line_chart (line 60) | def create_line_chart(df: pd.DataFrame, config: dict[str, Any], layout: ...
  function create_bar_chart (line 94) | def create_bar_chart(df: pd.DataFrame, config: dict[str, Any], layout: d...
  function create_pie_chart (line 123) | def create_pie_chart(df: pd.DataFrame, config: dict[str, Any], layout: d...
  function create_scatter_chart (line 136) | def create_scatter_chart(df: pd.DataFrame, config: dict[str, Any], layou...
  function create_histogram_chart (line 149) | def create_histogram_chart(df: pd.DataFrame, config: dict[str, Any], lay...
  function create_box_chart (line 162) | def create_box_chart(df: pd.DataFrame, config: dict[str, Any], layout: d...
  function create_table_chart (line 179) | def create_table_chart(df: pd.DataFrame, config: dict[str, Any], layout:...
  function create_empty_chart (line 203) | def create_empty_chart(message: str) -> go.Figure:
  function visualization_dsl_to_gradio_plot (line 225) | def visualization_dsl_to_gradio_plot(data_csv: str, visualization_dsl: d...
  function create_inline_chart_markdown (line 247) | def create_inline_chart_markdown(data_csv: str, visualization_dsl: dict[...

FILE: sample_ui/simple_ui.py
  function chat_fn (line 32) | def chat_fn(message: str, history: list[tuple[str, str]], user_id: str =...
  function respond (line 77) | def respond(
  function download_report (line 90) | def download_report(filename: str):

FILE: sample_ui/streaming_ui.py
  function lifespan (line 38) | async def lifespan(app: FastAPI):
  function get_or_create_event_loop (line 54) | def get_or_create_event_loop():
  function _async_respond_helper (line 65) | async def _async_respond_helper(message, chat_history, user_id, session_...
  function respond (line 175) | def respond(message, chat_history, user_id, session_id="default"):
  function list_user_memories (line 209) | def list_user_memories(user_id: str) -> str:
  function show_chart_panel (line 325) | def show_chart_panel():
  function hide_chart_panel (line 329) | def hide_chart_panel():
  function download_report (line 359) | async def download_report(filename: str):

FILE: sample_ui/streamlit_ui.py
  function process_user_message_stream (line 42) | async def process_user_message_stream(
  function get_available_reports (line 233) | def get_available_reports() -> list[str]:
  function get_report_file_content (line 255) | def get_report_file_content(filename: str) -> tuple[bytes | None, str | ...
  function process_download_links (line 303) | def process_download_links(content: str) -> str:
  function render_content_with_downloads (line 335) | def render_content_with_downloads(content: str) -> None:
  function display_message_with_thinking (line 365) | def display_message_with_thinking(
  function cleanup_session (line 544) | def cleanup_session():

FILE: tests/conftest.py
  function test_config (line 20) | def test_config() -> dict[str, Any]:
  function temp_dir (line 35) | def temp_dir() -> Generator[Path, None, None]:
  function mock_llm (line 42) | def mock_llm() -> FakeListChatModel:
  function sample_agent_state (line 50) | def sample_agent_state() -> AgentState:
  function mock_catalog_store (line 61) | def mock_catalog_store(temp_dir: Path) -> FileSystemCatalogStore:
  function mock_database_engine (line 113) | def mock_database_engine():
  function sample_table_info (line 127) | def sample_table_info() -> dict[str, Any]:
  function sample_messages (line 142) | def sample_messages() -> list:
  function reset_config_loader (line 152) | def reset_config_loader():
  function mock_config (line 164) | def mock_config():
  function setup_test_env (line 186) | def setup_test_env(monkeypatch, temp_dir):
  class MockTokenService (line 192) | class MockTokenService:
    method __init__ (line 195) | def __init__(self):
    method get_token (line 198) | def get_token(self) -> str:
  function mock_token_service (line 203) | def mock_token_service() -> MockTokenService:
  function sample_sql_examples (line 209) | def sample_sql_examples() -> list:
  function mock_presto_connection (line 222) | def mock_presto_connection():

FILE: tests/context_management/conftest.py
  function mock_llm (line 12) | def mock_llm():
  function mock_llm_with_summary_response (line 20) | def mock_llm_with_summary_response():
  function standard_config (line 28) | def standard_config():
  function minimal_config (line 44) | def minimal_config():
  function disabled_config (line 57) | def disabled_config():
  function sample_conversation (line 65) | def sample_conversation():
  function large_sql_output (line 85) | def large_sql_output():
  function large_python_output (line 107) | def large_python_output():
  function error_output (line 119) | def error_output():
  function pytest_configure (line 132) | def pytest_configure(config):
  function pytest_collection_modifyitems (line 138) | def pytest_collection_modifyitems(config, items):

FILE: tests/context_management/test_agent_graph_integration.py
  class TestAgentGraphIntegration (line 15) | class TestAgentGraphIntegration:
    method mock_catalog (line 19) | def mock_catalog(self):
    method mock_llm (line 26) | def mock_llm(self):
    method mock_tools (line 33) | def mock_tools(self):
    method test_config (line 43) | def test_config(self):
    method test_agent_llm_node_with_context_manager (line 52) | def test_agent_llm_node_with_context_manager(self, mock_llm, mock_tool...
    method test_agent_llm_node_without_context_manager (line 76) | def test_agent_llm_node_without_context_manager(self, mock_llm, mock_t...
    method test_build_graph_core_with_context_management (line 88) | def test_build_graph_core_with_context_management(self, mock_catalog, ...
    method test_build_graph_core_without_context_management (line 135) | def test_build_graph_core_without_context_management(self, mock_catalo...
    method test_build_agent_graph_sync_with_context_management (line 180) | def test_build_agent_graph_sync_with_context_management(self, mock_cat...
    method test_build_agent_graph_async_with_context_management (line 201) | async def test_build_agent_graph_async_with_context_management(self, m...
    method test_full_context_management_flow (line 223) | def test_full_context_management_flow(self, mock_llm_call, mock_catalog):
  class TestContextManagementEdgeCases (line 264) | class TestContextManagementEdgeCases:
    method test_empty_message_handling (line 267) | def test_empty_message_handling(self):
    method test_state_message_type_validation (line 277) | def test_state_message_type_validation(self):
    method test_context_management_with_tool_calls (line 299) | def test_context_management_with_tool_calls(self):
    method test_summarization_failure_fallback (line 320) | def test_summarization_failure_fallback(self, mock_llm_call):

FILE: tests/context_management/test_context_config.py
  class TestContextConfig (line 6) | class TestContextConfig:
    method test_default_config_values (line 9) | def test_default_config_values(self):
    method test_custom_config_values (line 27) | def test_custom_config_values(self):
    method test_config_validation_logic (line 47) | def test_config_validation_logic(self):
    method test_get_context_config (line 63) | def test_get_context_config(self):
    method test_update_context_config_single_value (line 68) | def test_update_context_config_single_value(self):
    method test_update_context_config_multiple_values (line 78) | def test_update_context_config_multiple_values(self):
    method test_update_context_config_invalid_attribute (line 92) | def test_update_context_config_invalid_attribute(self):
    method test_update_context_config_returns_copy (line 98) | def test_update_context_config_returns_copy(self):
  class TestContextConfigPresets (line 108) | class TestContextConfigPresets:
    method test_minimal_context_config (line 111) | def test_minimal_context_config(self):
    method test_aggressive_compression_config (line 124) | def test_aggressive_compression_config(self):
    method test_development_debug_config (line 142) | def test_development_debug_config(self):
    method test_production_optimized_config (line 154) | def test_production_optimized_config(self):
  class TestContextConfigEdgeCases (line 171) | class TestContextConfigEdgeCases:
    method test_zero_values (line 174) | def test_zero_values(self):
    method test_very_large_values (line 187) | def test_very_large_values(self):
    method test_inconsistent_token_limits (line 197) | def test_inconsistent_token_limits(self):
    method test_all_features_disabled (line 202) | def test_all_features_disabled(self):
    method test_config_serialization (line 214) | def test_config_serialization(self):
    method test_config_immutability_simulation (line 229) | def test_config_immutability_simulation(self):
    method test_realistic_configuration_scenarios (line 238) | def test_realistic_configuration_scenarios(self):

FILE: tests/context_management/test_context_manager.py
  class TestContextManager (line 12) | class TestContextManager:
    method mock_llm (line 16) | def mock_llm(self):
    method default_config (line 25) | def default_config(self):
    method context_manager (line 39) | def context_manager(self, mock_llm, default_config):
    method test_token_estimation (line 43) | def test_token_estimation(self, context_manager):
    method test_message_token_estimation (line 53) | def test_message_token_estimation(self, context_manager):
    method test_trim_short_tool_output (line 66) | def test_trim_short_tool_output(self, context_manager):
    method test_trim_long_generic_output (line 72) | def test_trim_long_generic_output(self, context_manager):
    method test_trim_sql_output (line 82) | def test_trim_sql_output(self, context_manager):
    method test_trim_code_output (line 113) | def test_trim_code_output(self, context_manager):
    method test_preserve_error_output (line 122) | def test_preserve_error_output(self, context_manager):
    method test_conversation_summary_disabled (line 140) | def test_conversation_summary_disabled(self, context_manager):
    method test_conversation_summary_success (line 149) | def test_conversation_summary_success(self, mock_llm_call, context_man...
    method test_conversation_summary_failure (line 170) | def test_conversation_summary_failure(self, mock_llm_call, context_man...
    method test_manage_context_disabled (line 187) | def test_manage_context_disabled(self, context_manager):
    method test_manage_context_empty_messages (line 196) | def test_manage_context_empty_messages(self, context_manager):
    method test_manage_context_tool_message_trimming (line 203) | def test_manage_context_tool_message_trimming(self, context_manager):
    method test_manage_context_with_summarization (line 227) | def test_manage_context_with_summarization(self, mock_llm_call, contex...
    method test_format_messages_for_summary (line 256) | def test_format_messages_for_summary(self, context_manager):
    method test_format_long_ai_message_for_summary (line 272) | def test_format_long_ai_message_for_summary(self, context_manager):
  function sample_sql_output (line 288) | def sample_sql_output():
  function sample_error_output (line 318) | def sample_error_output():

FILE: tests/context_management/test_edge_cases.py
  class TestContextManagementEdgeCases (line 12) | class TestContextManagementEdgeCases:
    method edge_case_config (line 16) | def edge_case_config(self):
    method context_manager (line 26) | def context_manager(self, edge_case_config):
    method test_empty_and_none_inputs (line 30) | def test_empty_and_none_inputs(self, context_manager):
    method test_malformed_messages (line 45) | def test_malformed_messages(self, context_manager):
    method test_extremely_long_single_message (line 59) | def test_extremely_long_single_message(self, context_manager):
    method test_tool_message_without_tool_call_id (line 73) | def test_tool_message_without_tool_call_id(self, context_manager):
    method test_circular_references_in_content (line 86) | def test_circular_references_in_content(self, context_manager):
    method test_zero_configuration_values (line 109) | def test_zero_configuration_values(self):
    method test_negative_configuration_values (line 126) | def test_negative_configuration_values(self):
    method test_unicode_and_encoding_edge_cases (line 143) | def test_unicode_and_encoding_edge_cases(self, context_manager):
    method test_extremely_nested_or_complex_structures (line 162) | def test_extremely_nested_or_complex_structures(self, context_manager):
    method test_sql_output_edge_cases (line 183) | def test_sql_output_edge_cases(self, context_manager):
    method test_conversation_state_consistency (line 227) | def test_conversation_state_consistency(self, context_manager):

FILE: tests/context_management/test_runner.py
  function run_tests (line 9) | def run_tests(test_type="all", verbose=False, coverage=False):
  function main (line 72) | def main():

FILE: tests/context_management/test_state_operations.py
  class TestMessageBasedContextManagement (line 12) | class TestMessageBasedContextManagement:
    method test_config (line 16) | def test_config(self):
    method context_manager (line 28) | def context_manager(self, test_config):
    method test_no_operations_when_disabled (line 33) | def test_no_operations_when_disabled(self, context_manager):
    method test_no_operations_when_under_limit (line 43) | def test_no_operations_when_under_limit(self, context_manager):
    method test_historical_tool_compression (line 52) | def test_historical_tool_compression(self, context_manager):
    method test_error_message_preservation (line 89) | def test_error_message_preservation(self, context_manager):
    method test_sql_content_preservation (line 113) | def test_sql_content_preservation(self, context_manager):
    method test_conversation_summarization (line 147) | def test_conversation_summarization(self, mock_llm_call, context_manag...
    method test_content_type_detection (line 183) | def test_content_type_detection(self, context_manager):
    method test_should_compress_logic (line 216) | def test_should_compress_logic(self, context_manager):
    method test_recent_messages_always_preserved (line 237) | def test_recent_messages_always_preserved(self, context_manager):
    method test_message_order_preservation (line 270) | def test_message_order_preservation(self, context_manager):

FILE: tests/test_catalog_loader.py
  class TestDataCatalogLoader (line 10) | class TestDataCatalogLoader:
    method mock_engine (line 14) | def mock_engine(self):
    method test_catalog_loader_initialization (line 19) | def test_catalog_loader_initialization(self, mock_engine):
    method test_catalog_loader_without_include_tables (line 29) | def test_catalog_loader_without_include_tables(self, mock_engine):
    method test_get_tables_and_columns (line 37) | def test_get_tables_and_columns(self, mock_engine):
    method test_get_table_indexes (line 54) | def test_get_table_indexes(self, mock_engine):
    method test_get_foreign_keys (line 66) | def test_get_foreign_keys(self, mock_engine):
    method test_save_to_catalog_store_success (line 80) | def test_save_to_catalog_store_success(self, mock_engine):
    method test_save_to_catalog_store_failure (line 103) | def test_save_to_catalog_store_failure(self, mock_engine):
    method test_load_catalog_from_data_warehouse (line 118) | def test_load_catalog_from_data_warehouse(self):
    method test_error_handling_in_get_tables_and_columns (line 138) | def test_error_handling_in_get_tables_and_columns(self, mock_engine):

FILE: tests/test_catalog_store.py
  class TestCatalogStore (line 9) | class TestCatalogStore:
    method test_catalog_store_is_abstract (line 12) | def test_catalog_store_is_abstract(self):
    method test_catalog_store_interface_methods (line 17) | def test_catalog_store_interface_methods(self):
  class TestFileSystemCatalogStore (line 28) | class TestFileSystemCatalogStore:
    method test_filesystem_store_initialization (line 31) | def test_filesystem_store_initialization(self, temp_dir):
    method test_get_tables_from_csv (line 40) | def test_get_tables_from_csv(self, mock_catalog_store):
    method test_get_columns_from_csv (line 47) | def test_get_columns_from_csv(self, mock_catalog_store):
    method test_get_table_info (line 57) | def test_get_table_info(self, mock_catalog_store):
    method test_get_tables_file_not_found (line 63) | def test_get_tables_file_not_found(self, temp_dir):
    method test_get_columns_file_not_found (line 75) | def test_get_columns_file_not_found(self, temp_dir):
    method test_get_tables_malformed_csv (line 87) | def test_get_tables_malformed_csv(self, temp_dir):
    method test_get_tables_pandas_error (line 100) | def test_get_tables_pandas_error(self, temp_dir):
    method test_get_table_schema (line 109) | def test_get_table_schema(self, mock_catalog_store):
    method test_search_tables (line 116) | def test_search_tables(self, mock_catalog_store):
    method test_get_all_table_names (line 124) | def test_get_all_table_names(self, mock_catalog_store):
    method test_case_insensitive_table_lookup (line 133) | def test_case_insensitive_table_lookup(self, mock_catalog_store):
    method test_data_path_validation (line 142) | def test_data_path_validation(self):
    method test_concurrent_access (line 153) | def test_concurrent_access(self, mock_catalog_store):

FILE: tests/test_config_loader.py
  class TestConfigLoader (line 11) | class TestConfigLoader:
    method test_config_initialization (line 14) | def test_config_initialization(self):
    method test_config_from_dict (line 27) | def test_config_from_dict(self):
    method test_config_loader_initialization (line 46) | def test_config_loader_initialization(self):
    method test_load_config_from_file (line 52) | def test_load_config_from_file(self, temp_dir):
    method test_load_config_missing_file (line 91) | def test_load_config_missing_file(self):
    method test_load_config_invalid_yaml (line 105) | def test_load_config_invalid_yaml(self, temp_dir):
    method test_load_config_with_bi_config_file (line 115) | def test_load_config_with_bi_config_file(self, temp_dir):
    method test_load_config_with_catalog_store (line 155) | def test_load_config_with_catalog_store(self, temp_dir):
    method test_load_config_with_llm_configs (line 190) | def test_load_config_with_llm_configs(self, temp_dir):
    method test_load_config_with_llm_providers_selected_by_default_llm (line 229) | def test_load_config_with_llm_providers_selected_by_default_llm(self, ...
    method test_set_config (line 278) | def test_set_config(self):
    method test_get_config_not_loaded (line 308) | def test_get_config_not_loaded(self):
    method test_load_bi_config_missing_file (line 316) | def test_load_bi_config_missing_file(self, temp_dir):
    method test_catalog_store_missing_store_type (line 326) | def test_catalog_store_missing_store_type(self, temp_dir):

FILE: tests/test_graph_state.py
  class TestAgentState (line 8) | class TestAgentState:
    method test_agent_state_with_data (line 11) | def test_agent_state_with_data(self):
    method test_agent_state_message_types (line 25) | def test_agent_state_message_types(self):
    method test_agent_state_immutability (line 40) | def test_agent_state_immutability(self):
  class TestInputState (line 68) | class TestInputState:
    method test_input_state_creation (line 71) | def test_input_state_creation(self):
    method test_input_state_empty_messages (line 79) | def test_input_state_empty_messages(self):
  class TestOutputState (line 86) | class TestOutputState:
    method test_output_state_creation (line 89) | def test_output_state_creation(self):
    method test_output_state_with_multiple_messages (line 97) | def test_output_state_with_multiple_messages(self):
  class TestStateIntegration (line 112) | class TestStateIntegration:
    method test_input_to_agent_state_conversion (line 115) | def test_input_to_agent_state_conversion(self):
    method test_agent_to_output_state_conversion (line 126) | def test_agent_to_output_state_conversion(self):
    method test_state_serialization_compatibility (line 142) | def test_state_serialization_compatibility(self):

FILE: tests/test_incomplete_tool_calls.py
  class TestIncompleteToolCallRecovery (line 12) | class TestIncompleteToolCallRecovery:
    method test_no_messages (line 15) | def test_no_messages(self):
    method test_no_tool_calls (line 21) | def test_no_tool_calls(self):
    method test_complete_tool_calls (line 28) | def test_complete_tool_calls(self):
    method test_incomplete_single_tool_call (line 42) | def test_incomplete_single_tool_call(self):
    method test_incomplete_multiple_tool_calls (line 61) | def test_incomplete_multiple_tool_calls(self):
    method test_partial_incomplete_tool_calls (line 88) | def test_partial_incomplete_tool_calls(self):
    method test_multiple_ai_messages_with_tool_calls (line 124) | def test_multiple_ai_messages_with_tool_calls(self):
    method test_llm_node_integration_with_recovery (line 145) | def test_llm_node_integration_with_recovery(self):

FILE: tests/test_memory.py
  class TestUserProfile (line 27) | class TestUserProfile:
    method test_user_profile_basic_initialization (line 30) | def test_user_profile_basic_initialization(self):
    method test_user_profile_optional_fields (line 39) | def test_user_profile_optional_fields(self):
    method test_user_profile_partial_initialization (line 48) | def test_user_profile_partial_initialization(self):
    method test_user_profile_serialization (line 57) | def test_user_profile_serialization(self):
  class TestMemoryStoreManagement (line 68) | class TestMemoryStoreManagement:
    method setup_test_env (line 72) | def setup_test_env(self, tmp_path: Path):
    method test_get_sync_memory_store (line 84) | def test_get_sync_memory_store(self, mock_config, mock_connect):
    method test_get_async_memory_store (line 104) | async def test_get_async_memory_store(self, mock_config, mock_from_con...
    method test_cleanup_async_memory_store (line 123) | async def test_cleanup_async_memory_store(self, mock_context_manager):
    method test_setup_async_memory_store (line 133) | async def test_setup_async_memory_store(self, mock_get_store):
  class TestMemoryTools (line 144) | class TestMemoryTools:
    method test_get_memory_tools_sync_mode (line 150) | def test_get_memory_tools_sync_mode(self, mock_get_store, mock_search_...
    method test_get_memory_tools_with_openai_llm (line 172) | def test_get_memory_tools_with_openai_llm(self, mock_config, mock_sear...
    method test_get_async_memory_tools (line 198) | async def test_get_async_memory_tools(self, mock_config, mock_get_tool...
  class TestMemoryManager (line 217) | class TestMemoryManager:
    method test_get_memory_manager (line 222) | def test_get_memory_manager(self, mock_config, mock_create_manager):
    method test_get_memory_manager_singleton (line 242) | def test_get_memory_manager_singleton(self, mock_config, mock_create_m...
  class TestSchemaFixer (line 257) | class TestSchemaFixer:
    method test_fix_schema_for_openai_basic (line 260) | def test_fix_schema_for_openai_basic(self):
    method test_fix_schema_for_openai_nested_object (line 268) | def test_fix_schema_for_openai_nested_object(self):
    method test_fix_schema_for_openai_with_arrays (line 281) | def test_fix_schema_for_openai_with_arrays(self):
  class TestStructuredToolWithRequired (line 291) | class TestStructuredToolWithRequired:
    method test_structured_tool_with_required_initialization (line 294) | def test_structured_tool_with_required_initialization(self):
    method test_tool_call_schema_property (line 312) | def test_tool_call_schema_property(self):

FILE: tests/test_plotly_utils.py
  function sample_csv_data (line 14) | def sample_csv_data():
  function sample_line_dsl (line 26) | def sample_line_dsl():
  function sample_bar_dsl (line 37) | def sample_bar_dsl():
  function sample_pie_dsl (line 48) | def sample_pie_dsl():
  class TestPlotlyChartCreation (line 58) | class TestPlotlyChartCreation:
    method test_create_line_chart_success (line 61) | def test_create_line_chart_success(self, sample_csv_data, sample_line_...
    method test_create_line_chart_with_color (line 69) | def test_create_line_chart_with_color(self, sample_csv_data):
    method test_create_line_chart_with_multiple_y_columns (line 84) | def test_create_line_chart_with_multiple_y_columns(self):
    method test_create_bar_chart_success (line 104) | def test_create_bar_chart_success(self, sample_csv_data, sample_bar_dsl):
    method test_create_pie_chart_success (line 112) | def test_create_pie_chart_success(self, sample_csv_data, sample_pie_dsl):
    method test_create_scatter_chart (line 120) | def test_create_scatter_chart(self, sample_csv_data):
    method test_create_histogram_chart (line 134) | def test_create_histogram_chart(self, sample_csv_data):
    method test_create_box_chart (line 148) | def test_create_box_chart(self, sample_csv_data):
    method test_create_table_chart (line 162) | def test_create_table_chart(self, sample_csv_data):
  class TestErrorHandling (line 178) | class TestErrorHandling:
    method test_empty_data (line 181) | def test_empty_data(self):
    method test_invalid_csv_data (line 188) | def test_invalid_csv_data(self, sample_bar_dsl):
    method test_missing_columns (line 197) | def test_missing_columns(self, sample_csv_data):
    method test_unsupported_chart_type (line 211) | def test_unsupported_chart_type(self, sample_csv_data):
    method test_visualization_dsl_error (line 220) | def test_visualization_dsl_error(self):
  class TestVisualizationDslToGradioPlot (line 230) | class TestVisualizationDslToGradioPlot:
    method test_successful_conversion (line 233) | def test_successful_conversion(self, sample_csv_data, sample_line_dsl):
    method test_empty_dsl (line 242) | def test_empty_dsl(self, sample_csv_data):
    method test_no_data (line 250) | def test_no_data(self, sample_line_dsl):
  class TestCreateEmptyChart (line 258) | class TestCreateEmptyChart:
    method test_create_empty_chart (line 261) | def test_create_empty_chart(self):
  function sample_time_series_data (line 274) | def sample_time_series_data():
  class TestIntegrationScenarios (line 284) | class TestIntegrationScenarios:
    method test_sales_dashboard_scenario (line 287) | def test_sales_dashboard_scenario(self, sample_csv_data):
    method test_time_series_scenario (line 305) | def test_time_series_scenario(self, sample_time_series_data):
    method test_multiple_metrics_scenario (line 320) | def test_multiple_metrics_scenario(self, sample_time_series_data):

FILE: tests/test_simple_store.py
  class TestSimpleStore (line 8) | class TestSimpleStore:
    method sample_texts (line 12) | def sample_texts(self):
    method sample_metadatas (line 22) | def sample_metadatas(self):
    method simple_store (line 32) | def simple_store(self, sample_texts):
    method test_initialization_basic (line 36) | def test_initialization_basic(self, sample_texts):
    method test_initialization_with_metadata_and_ids (line 45) | def test_initialization_with_metadata_and_ids(self, sample_texts, samp...
    method test_similarity_search (line 59) | def test_similarity_search(self, simple_store):
    method test_similarity_search_with_score (line 71) | def test_similarity_search_with_score(self, simple_store):
    method test_empty_store (line 86) | def test_empty_store(self):
    method test_add_texts (line 94) | def test_add_texts(self, simple_store):
    method test_delete (line 117) | def test_delete(self):
    method test_get_by_ids (line 144) | def test_get_by_ids(self, sample_texts):
    method test_from_texts (line 163) | def test_from_texts(self, sample_texts, sample_metadatas):
    method test_as_retriever (line 173) | def test_as_retriever(self, simple_store):
    method test_chinese_and_mixed_language (line 181) | def test_chinese_and_mixed_language(self):
    method test_max_marginal_relevance_search (line 211) | def test_max_marginal_relevance_search(self, simple_store):
    method test_calculate_similarity (line 245) | def test_calculate_similarity(self, simple_store):

FILE: tests/test_text2sql_extraction.py
  class TestText2SQLExtraction (line 18) | class TestText2SQLExtraction:
    method test_generate_extraction_prompt (line 21) | def test_generate_extraction_prompt(self):
    method test_parse_extracted_info_json_valid (line 33) | def test_parse_extracted_info_json_valid(self):
    method test_parse_extracted_info_json_invalid (line 53) | def test_parse_extracted_info_json_invalid(self):
    method test_information_extraction_function_creation (line 63) | def test_information_extraction_function_creation(self):
    method test_information_extraction_successful (line 72) | def test_information_extraction_successful(self):
    method test_information_extraction_empty_response (line 100) | def test_information_extraction_empty_response(self):
    method test_information_extraction_conditional_edges_success (line 118) | def test_information_extraction_conditional_edges_success(self):
    method test_information_extraction_conditional_edges_failure (line 132) | def test_information_extraction_conditional_edges_failure(self):
    method test_information_extraction_conditional_edges_missing (line 143) | def test_information_extraction_conditional_edges_missing(self):
    method test_information_extraction_with_retry_on_failure (line 152) | def test_information_extraction_with_retry_on_failure(self):
    method test_information_extraction_time_period_detection (line 178) | def test_information_extraction_time_period_detection(self):
    method test_information_extraction_error_handling (line 206) | def test_information_extraction_error_handling(self):

FILE: tests/test_text2sql_generate_sql.py
  class TestText2SQLGenerateSQL (line 12) | class TestText2SQLGenerateSQL:
    method mock_llm (line 16) | def mock_llm(self):
    method mock_catalog (line 23) | def mock_catalog(self):
    method test_create_sql_nodes (line 60) | def test_create_sql_nodes(self, mock_llm, mock_catalog):
    method test_generate_sql_node_success (line 71) | def test_generate_sql_node_success(self, mock_llm, mock_catalog):
    method test_generate_sql_node_missing_rewrite_question (line 90) | def test_generate_sql_node_missing_rewrite_question(self, mock_llm, mo...
    method test_generate_sql_node_missing_tables (line 103) | def test_generate_sql_node_missing_tables(self, mock_llm, mock_catalog):
    method test_execute_sql_node_success (line 114) | def test_execute_sql_node_success(self, mock_llm, mock_catalog):
    method test_execute_sql_node_empty_sql (line 128) | def test_execute_sql_node_empty_sql(self, mock_llm, mock_catalog):
    method test_execute_sql_node_syntax_error (line 141) | def test_execute_sql_node_syntax_error(self, mock_llm, mock_catalog):
    method test_regenerate_sql_node_success (line 162) | def test_regenerate_sql_node_success(self, mock_llm, mock_catalog):
    method test_should_retry_sql_success (line 186) | def test_should_retry_sql_success(self):
    method test_should_retry_sql_timeout (line 196) | def test_should_retry_sql_timeout(self):
    method test_should_retry_sql_retry_needed (line 206) | def test_should_retry_sql_retry_needed(self):
    method test_should_retry_sql_max_retries_reached (line 213) | def test_should_retry_sql_max_retries_reached(self):
    method test_should_execute_sql_with_sql (line 220) | def test_should_execute_sql_with_sql(self):
    method test_should_execute_sql_without_sql (line 227) | def test_should_execute_sql_without_sql(self):
    method test_sql_generation_with_examples (line 234) | def test_sql_generation_with_examples(self, mock_llm, mock_catalog):
    method test_sql_error_handling_database_error (line 260) | def test_sql_error_handling_database_error(self, mock_llm, mock_catalog):
    method test_regenerate_sql_empty_response (line 280) | def test_regenerate_sql_empty_response(self, mock_llm, mock_catalog):

FILE: tests/test_text2sql_schema_linking.py
  class TestText2SQLSchemaLinking (line 12) | class TestText2SQLSchemaLinking:
    method mock_llm (line 16) | def mock_llm(self):
    method mock_catalog (line 23) | def mock_catalog(self):
    method test_select_table_function_creation (line 32) | def test_select_table_function_creation(self, mock_llm, mock_catalog):
    method test_select_table_success (line 38) | def test_select_table_success(self, mock_llm, mock_catalog):
    method test_select_table_missing_rewrite_question (line 98) | def test_select_table_missing_rewrite_question(self, mock_llm, mock_ca...
    method test_select_table_with_examples (line 111) | def test_select_table_with_examples(self, mock_llm, mock_catalog):
    method test_select_table_invalid_table_selection (line 168) | def test_select_table_invalid_table_selection(self, mock_llm, mock_cat...
    method test_select_table_retry_mechanism (line 207) | def test_select_table_retry_mechanism(self, mock_llm, mock_catalog):
    method test_select_table_with_time_filter (line 249) | def test_select_table_with_time_filter(self, mock_llm, mock_catalog):
    method test_select_table_llm_error_handling (line 299) | def test_select_table_llm_error_handling(self, mock_llm, mock_catalog):
    method test_select_table_max_retries_exceeded (line 335) | def test_select_table_max_retries_exceeded(self, mock_llm, mock_catalog):

FILE: tests/test_text2sql_visualization.py
  class TestVisualizationService (line 8) | class TestVisualizationService:
    method test_generate_visualization_dsl_basic (line 11) | def test_generate_visualization_dsl_basic(self):
    method test_get_chart_type_by_rule_with_datetime (line 31) | def test_get_chart_type_by_rule_with_datetime(self):
    method test_generate_visualization_dsl_error_handling (line 47) | def test_generate_visualization_dsl_error_handling(self):
    method test_get_chart_type_by_rule_line_chart (line 58) | def test_get_chart_type_by_rule_line_chart(self):
    method test_get_chart_type_by_rule_pie_chart (line 73) | def test_get_chart_type_by_rule_pie_chart(self):
    method test_get_chart_type_by_rule_bar_chart (line 89) | def test_get_chart_type_by_rule_bar_chart(self):
    method test_get_chart_type_by_rule_scatter_plot (line 105) | def test_get_chart_type_by_rule_scatter_plot(self):
    method test_get_chart_type_by_rule_histogram (line 120) | def test_get_chart_type_by_rule_histogram(self):
    method test_get_chart_type_by_rule_data_based_priority (line 130) | def test_get_chart_type_by_rule_data_based_priority(self):
    method test_generate_visualization_dsl_line_chart (line 147) | def test_generate_visualization_dsl_line_chart(self):
    method test_generate_visualization_dsl_bar_chart (line 167) | def test_generate_visualization_dsl_bar_chart(self):
    method test_generate_visualization_dsl_pie_chart (line 188) | def test_generate_visualization_dsl_pie_chart(self):
    method test_generate_visualization_dsl_empty_data (line 207) | def test_generate_visualization_dsl_empty_data(self):
    method test_visualization_config_dataclass (line 224) | def test_visualization_config_dataclass(self):
    method test_visualization_dsl_to_dict (line 236) | def test_visualization_dsl_to_dict(self):
  class TestChartType (line 253) | class TestChartType:
    method test_chart_type_values (line 256) | def test_chart_type_values(self):
  function sample_csv_data (line 269) | def sample_csv_data():
  function sample_time_series_data (line 281) | def sample_time_series_data():
  class TestVisualizationIntegration (line 291) | class TestVisualizationIntegration:
    method test_complete_workflow_line_chart (line 294) | def test_complete_workflow_line_chart(self, sample_time_series_data):
    method test_complete_workflow_bar_chart (line 319) | def test_complete_workflow_bar_chart(self, sample_csv_data):

FILE: tests/test_tools_ask_human.py
  class TestAskHuman (line 9) | class TestAskHuman:
    method test_ask_human_basic_initialization (line 12) | def test_ask_human_basic_initialization(self):
    method test_ask_human_empty_options (line 22) | def test_ask_human_empty_options(self):
    method test_ask_human_validation_error (line 29) | def test_ask_human_validation_error(self):
    method test_ask_human_serialization (line 37) | def test_ask_human_serialization(self):

FILE: tests/test_tools_run_python_code.py
  class TestRunPythonCode (line 8) | class TestRunPythonCode:
    method test_run_python_code_basic (line 11) | def test_run_python_code_basic(self):
    method test_run_python_code_with_variables (line 25) | def test_run_python_code_with_variables(self):
    method test_run_python_code_data_analysis (line 44) | def test_run_python_code_data_analysis(self):
    method test_run_python_code_matplotlib_plot (line 63) | def test_run_python_code_matplotlib_plot(self):
    method test_run_python_code_syntax_error (line 84) | def test_run_python_code_syntax_error(self):
    method test_run_python_code_runtime_error (line 99) | def test_run_python_code_runtime_error(self):
    method test_run_python_code_import_error (line 119) | def test_run_python_code_import_error(self):
    method test_run_python_code_multiline_output (line 137) | def test_run_python_code_multiline_output(self):
    method test_run_python_code_with_sql_data (line 156) | def test_run_python_code_with_sql_data(self):
    method test_run_python_code_empty_code (line 178) | def test_run_python_code_empty_code(self):
    method test_run_python_code_whitespace_only (line 192) | def test_run_python_code_whitespace_only(self):
    method test_run_python_code_with_comments (line 206) | def test_run_python_code_with_comments(self):
    method test_run_python_code_security_restrictions (line 224) | def test_run_python_code_security_restrictions(self):
    method test_run_python_code_timeout_handling (line 249) | def test_run_python_code_timeout_handling(self):
    method test_run_python_code_memory_limit (line 268) | def test_run_python_code_memory_limit(self):
    method test_run_python_code_return_values (line 287) | def test_run_python_code_return_values(self):
    method test_run_python_code_exception_details (line 311) | def test_run_python_code_exception_details(self):
    method test_run_python_code_executor_selection (line 332) | def test_run_python_code_executor_selection(self):

FILE: tests/test_tools_search_knowledge.py
  class TestSearchKnowledge (line 10) | class TestSearchKnowledge:
    method test_search_knowledge_basic (line 13) | def test_search_knowledge_basic(self):
    method test_search_knowledge_table_matching (line 36) | def test_search_knowledge_table_matching(self):
    method test_search_knowledge_empty_query (line 58) | def test_search_knowledge_empty_query(self):
    method test_search_knowledge_no_matches (line 80) | def test_search_knowledge_no_matches(self):
    method test_search_knowledge_multiple_matches (line 102) | def test_search_knowledge_multiple_matches(self):
    method test_search_knowledge_with_synonyms (line 125) | def test_search_knowledge_with_synonyms(self):
    method test_search_knowledge_case_insensitive (line 146) | def test_search_knowledge_case_insensitive(self):
    method test_search_knowledge_partial_matches (line 167) | def test_search_knowledge_partial_matches(self):
    method test_search_knowledge_error_handling (line 187) | def test_search_knowledge_error_handling(self):
    method test_show_schema_basic (line 207) | def test_show_schema_basic(self):
    method test_show_schema_detailed_info (line 222) | def test_show_schema_detailed_info(self):
    method test_show_schema_nonexistent_table (line 240) | def test_show_schema_nonexistent_table(self):
    method test_show_schema_table_error (line 253) | def test_show_schema_table_error(self):
    method test_show_schema_complex_table (line 264) | def test_show_schema_complex_table(self):
    method test_search_knowledge_with_metrics (line 280) | def test_search_knowledge_with_metrics(self):
    method test_search_knowledge_contextual_search (line 302) | def test_search_knowledge_contextual_search(self):
    method test_search_knowledge_with_aggregations (line 323) | def test_search_knowledge_with_aggregations(self):
    method test_show_schema_with_examples (line 343) | def test_show_schema_with_examples(self):
    method test_search_knowledge_performance (line 359) | def test_search_knowledge_performance(self):
    method test_search_knowledge_special_characters (line 380) | def test_search_knowledge_special_characters(self):
    method test_search_knowledge_unicode_support (line 400) | def test_search_knowledge_unicode_support(self):
    method test_knowledge_integration_with_state (line 420) | def test_knowledge_integration_with_state(self):

FILE: tests/test_utils.py
  class TestUtilityFunctions (line 11) | class TestUtilityFunctions:
    method test_log_function_basic (line 14) | def test_log_function_basic(self):
    method test_log_function_multiple_messages (line 25) | def test_log_function_multiple_messages(self):
    method test_log_function_empty_message (line 37) | def test_log_function_empty_message(self):
    method test_log_function_none_message (line 48) | def test_log_function_none_message(self):
    method test_log_function_complex_objects (line 59) | def test_log_function_complex_objects(self):
    method test_log_function_with_exception (line 74) | def test_log_function_with_exception(self):
    method test_log_function_stderr_error (line 89) | def test_log_function_stderr_error(self, mock_stderr):
    method test_log_function_unicode_handling (line 97) | def test_log_function_unicode_handling(self):
    method test_log_function_large_message (line 110) | def test_log_function_large_message(self):
    method test_log_function_newline_handling (line 123) | def test_log_function_newline_handling(self):
    method test_log_function_timestamp_format (line 137) | def test_log_function_timestamp_format(self):
    method test_log_function_concurrent_calls (line 149) | def test_log_function_concurrent_calls(self):

FILE: timeseries_forecasting/app.py
  class ForecastRequest (line 28) | class ForecastRequest(BaseModel):
  class ForecastResponse (line 42) | class ForecastResponse(BaseModel):
  class ErrorResponse (line 51) | class ErrorResponse(BaseModel):
  function startup_event (line 64) | async def startup_event():
  function health_check (line 85) | async def health_check():
  function ping (line 97) | async def ping():
  function predict (line 111) | async def predict(request: ForecastRequest):
  function model_info (line 157) | async def model_info():
  function root (line 171) | async def root():
  function http_exception_handler (line 189) | async def http_exception_handler(request: Request, exc: HTTPException):
  function general_exception_handler (line 198) | async def general_exception_handler(request: Request, exc: Exception):

FILE: timeseries_forecasting/model_handler.py
  class TransformerModelHandler (line 16) | class TransformerModelHandler:
    method __init__ (line 21) | def __init__(self, model_path: str = "hf_model"):
    method initialize (line 30) | def initialize(self) -> bool:
    method preprocess (line 69) | def preprocess(
    method inference (line 152) | def inference(self, input_tensor: torch.Tensor, metadata: dict[str, An...
    method postprocess (line 183) | def postprocess(self, output_tensor: torch.Tensor, metadata: dict[str,...
    method predict (line 222) | def predict(
  function get_model_handler (line 282) | def get_model_handler() -> TransformerModelHandler:

FILE: timeseries_forecasting/test_forecasting.py
  class TimeseriesForecastingTester (line 12) | class TimeseriesForecastingTester:
    method __init__ (line 15) | def __init__(self, base_url="http://localhost:8765"):
    method generate_sample_data (line 21) | def generate_sample_data(self, length=100, frequency="H"):
    method test_basic_forecasting (line 43) | def test_basic_forecasting(self):
    method test_structured_data (line 81) | def test_structured_data(self):
    method test_different_windows (line 125) | def test_different_windows(self):
    method test_error_handling (line 153) | def test_error_handling(self):
    method test_health_check (line 184) | def test_health_check(self):
    method run_all_tests (line 206) | def run_all_tests(self):
  function main (line 255) | def main():

Download .json

Condensed preview — 133 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (764K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 535,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Describe the b"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 595,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Is your fea"
  },
  {
    "path": ".github/workflows/docs.yml",
    "chars": 1130,
    "preview": "name: Build and Deploy Documentation\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n\npermiss"
  },
  {
    "path": ".github/workflows/publish.yml",
    "chars": 2721,
    "preview": "name: Publish to PyPI\n\non:\n  release:\n    types: [published]  # Trigger when a release is published\n  workflow_dispatch:"
  },
  {
    "path": ".github/workflows/runledger.yml",
    "chars": 874,
    "preview": "name: runledger\non:\n  workflow_dispatch:\n  pull_request:\n    paths:\n      - \"openchatbi/**\"\n\njobs:\n  runledger:\n    if: "
  },
  {
    "path": ".gitignore",
    "chars": 4813,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[codz]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packag"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 485,
    "preview": "# Contributing to OpenChatBI\nHi there! Thank you for your interest in contributing to OpenChatBI.\n\nOpenChatBI started as"
  },
  {
    "path": "Dockerfile.python-executor",
    "chars": 430,
    "preview": "FROM python:3.11-slim\n\n# Set working directory\nWORKDIR /app\n\n# Install basic packages that might be needed for data anal"
  },
  {
    "path": "LICENSE",
    "chars": 1065,
    "preview": "MIT License\n\nCopyright (c) 2025 Yu Zhong\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\no"
  },
  {
    "path": "README.md",
    "chars": 27221,
    "preview": "# OpenChatBI\n\nOpenChatBI is an open source, chat-based intelligent BI tool powered by large language models, designed to"
  },
  {
    "path": "baselines/runledger-openchatbi.json",
    "chars": 2230,
    "preview": "{\n  \"aggregates\": {\n    \"cases_error\": 0,\n    \"cases_fail\": 0,\n    \"cases_pass\": 1,\n    \"cases_total\": 1,\n    \"metrics\":"
  },
  {
    "path": "docs/Makefile",
    "chars": 638,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "docs/make.bat",
    "chars": 804,
    "preview": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sp"
  },
  {
    "path": "docs/source/_templates/layout.html",
    "chars": 625,
    "preview": "{% extends \"!layout.html\" %}\n\n{% block extrahead %}\n  {{ super() }}\n  <meta name=\"google-site-verification\" content=\"geD"
  },
  {
    "path": "docs/source/catalog.rst",
    "chars": 698,
    "preview": "Catalog System\n==============\n\nOverview\n--------\n\nThe catalog system manages metadata for database tables, columns, and "
  },
  {
    "path": "docs/source/code.rst",
    "chars": 413,
    "preview": "Code Execution\n==============\n\nCode Module\n-----------\n\n.. automodule:: openchatbi.code\n    :members:\n    :undoc-members"
  },
  {
    "path": "docs/source/conf.py",
    "chars": 2510,
    "preview": "# Configuration file for the Sphinx documentation builder.\n#\n# For the full list of built-in configuration values, see t"
  },
  {
    "path": "docs/source/config.rst",
    "chars": 582,
    "preview": "Configuration\n=============\n\nThe configuration system consists of two main classes:\n\n- **Config**: Defines the configura"
  },
  {
    "path": "docs/source/core.rst",
    "chars": 499,
    "preview": "Core Module\n===========\n\nMain Module\n-----------\n\n.. automodule:: openchatbi\n    :members:\n    :undoc-members:\n    :show"
  },
  {
    "path": "docs/source/index.rst",
    "chars": 645,
    "preview": "OpenChatBI Documentation\n========================\n\n`GitHub Repository <https://github.com/zhongyu09/openchatbi>`_\n\n.. in"
  },
  {
    "path": "docs/source/llm.rst",
    "chars": 275,
    "preview": "LLM Integration\n===============\n\nLLM Module\n----------\n\n.. automodule:: openchatbi.llm\n    :members:\n    :undoc-members:"
  },
  {
    "path": "docs/source/text2sql.rst",
    "chars": 844,
    "preview": "Text2SQL System\n===============\n\nOverview\n--------\n\nNatural language to SQL conversion pipeline with schema linking and "
  },
  {
    "path": "docs/source/timeseries.rst",
    "chars": 229,
    "preview": "Time Series Forecasting Service\n========================\n\n`GitHub Repository <https://github.com/zhongyu09/openchatbi/ti"
  },
  {
    "path": "docs/source/tools.rst",
    "chars": 701,
    "preview": "Tools and Utilities\n===================\n\nOverview\n--------\n\nLangGraph tools for human interaction, code execution, and k"
  },
  {
    "path": "evals/__init__.py",
    "chars": 39,
    "preview": "\"\"\"Evaluation suites for RunLedger.\"\"\"\n"
  },
  {
    "path": "evals/runledger/README.md",
    "chars": 1007,
    "preview": "# RunLedger eval (OpenChatBI)\n\nThis suite is **replay-only** by default. It runs a deterministic CI check using a JSONL "
  },
  {
    "path": "evals/runledger/__init__.py",
    "chars": 43,
    "preview": "\"\"\"RunLedger eval suite for OpenChatBI.\"\"\"\n"
  },
  {
    "path": "evals/runledger/agent/agent.py",
    "chars": 8688,
    "preview": "import json\nimport sys\nfrom itertools import count\nfrom typing import Any\nfrom unittest.mock import MagicMock\n\nimport bu"
  },
  {
    "path": "evals/runledger/cases/t1.yaml",
    "chars": 136,
    "preview": "id: t1\ndescription: \"basic BI flow with a single search_knowledge tool call\"\ninput:\n  prompt: \"OpenChatBI\"\ncassette: cas"
  },
  {
    "path": "evals/runledger/cassettes/t1.jsonl",
    "chars": 335,
    "preview": "{\"tool\":\"search_knowledge\",\"args\":{\"knowledge_bases\":[\"columns\"],\"query_list\":[\"OpenChatBI\"],\"reasoning\":\"Look up releva"
  },
  {
    "path": "evals/runledger/schema.json",
    "chars": 187,
    "preview": "{\n  \"type\": \"object\",\n  \"properties\": {\n    \"category\": {\n      \"type\": \"string\"\n    },\n    \"reply\": {\n      \"type\": \"st"
  },
  {
    "path": "evals/runledger/suite.yaml",
    "chars": 389,
    "preview": "suite_name: runledger-openchatbi\nagent_command: [\"python\", \"evals/runledger/agent/agent.py\"]\nmode: replay\ncases_path: ca"
  },
  {
    "path": "evals/runledger/tools.py",
    "chars": 365,
    "preview": "from __future__ import annotations\n\nfrom typing import Any\n\nfrom openchatbi.tool.search_knowledge import search_knowledg"
  },
  {
    "path": "example/bi.yaml",
    "chars": 2545,
    "preview": "extra_tool_use_rule: |\n  - Try your best to give appropriate parameters when calling tools.\n  - timeseries_forecast tool"
  },
  {
    "path": "example/common_columns.csv",
    "chars": 2323,
    "preview": "column_name,display_name,alias,type,category,tag,description,dimension_table,default\r\ncustomer_id,Customer ID,cust_id,IN"
  },
  {
    "path": "example/config.yaml",
    "chars": 1081,
    "preview": "organization: MyCompany\ndialect: sqlite\nbi_config_file: example/bi.yaml\n\npython_executor: docker\n\n# Visualization config"
  },
  {
    "path": "example/sql_example.yaml",
    "chars": 1717,
    "preview": "'':\n  Customers: |\n    Q: Show me all customers with their names and details\n    A: SELECT customer_id, customer_name, c"
  },
  {
    "path": "example/table_columns.csv",
    "chars": 743,
    "preview": "db_name,table_name,column_name\r\n,Customers,customer_id\r\n,Customers,customer_name\r\n,Customers,customer_details\r\n,Invoices"
  },
  {
    "path": "example/table_info.yaml",
    "chars": 2343,
    "preview": "? ''\n: Customers:\n    description: 'Contains customer information including unique ID, name, and additional details'\n   "
  },
  {
    "path": "example/table_selection_example.csv",
    "chars": 731,
    "preview": "question,selected_tables\r\n\"Show me all customers\",\"[\"\"Customers\"\"]\"\r\n\"What orders were placed today?\",\"[\"\"Orders\"\"]\"\r\n\"L"
  },
  {
    "path": "openchatbi/__init__.py",
    "chars": 974,
    "preview": "\"\"\"OpenChatBI core module initialization.\"\"\"\n\nimport os\n\nfrom langgraph.graph.state import CompiledStateGraph\n\nfrom open"
  },
  {
    "path": "openchatbi/agent_graph.py",
    "chars": 17415,
    "preview": "\"\"\"Main agent graph construction and execution logic.\"\"\"\n\nimport datetime\nimport logging\nimport traceback\nfrom collectio"
  },
  {
    "path": "openchatbi/catalog/__init__.py",
    "chars": 383,
    "preview": "\"\"\"Data catalog management module for OpenChatBI.\"\"\"\n\nfrom openchatbi.catalog.catalog_loader import (\n    DataCatalogLoa"
  },
  {
    "path": "openchatbi/catalog/catalog_loader.py",
    "chars": 7956,
    "preview": "import logging\nfrom typing import Any\n\nfrom sqlalchemy import MetaData, inspect\nfrom sqlalchemy.engine import Engine\n\nfr"
  },
  {
    "path": "openchatbi/catalog/catalog_store.py",
    "chars": 6099,
    "preview": "from abc import ABC, abstractmethod\nfrom typing import Any\n\nfrom sqlalchemy import Engine\n\n\nclass CatalogStore(ABC):\n   "
  },
  {
    "path": "openchatbi/catalog/factory.py",
    "chars": 3129,
    "preview": "import logging\nimport os\n\nfrom openchatbi.catalog.catalog_loader import load_catalog_from_data_warehouse\nfrom openchatbi"
  },
  {
    "path": "openchatbi/catalog/helper.py",
    "chars": 1761,
    "preview": "from typing import Any\n\nimport requests\nfrom sqlalchemy import Engine, create_engine\n\nfrom openchatbi.catalog.token_serv"
  },
  {
    "path": "openchatbi/catalog/retrival_helper.py",
    "chars": 2522,
    "preview": "\"\"\"Helper functions for building column retrieval systems.\"\"\"\n\nfrom rank_bm25 import BM25Okapi\n\nfrom openchatbi.llm.llm "
  },
  {
    "path": "openchatbi/catalog/schema_retrival.py",
    "chars": 6004,
    "preview": "\"\"\"Schema and column retrieval functionality for finding relevant database structures.\"\"\"\n\nimport os\nimport re\n\nimport L"
  },
  {
    "path": "openchatbi/catalog/store/__init__.py",
    "chars": 86,
    "preview": "\"\"\"Catalog store implementations.\"\"\"\n\nfrom .file_system import FileSystemCatalogStore\n"
  },
  {
    "path": "openchatbi/catalog/store/file_system.py",
    "chars": 30454,
    "preview": "\"\"\"File system-based catalog store implementation.\"\"\"\n\nimport csv\nimport logging\nimport os\nimport re\nimport traceback\nfr"
  },
  {
    "path": "openchatbi/catalog/token_service.py",
    "chars": 1306,
    "preview": "\"\"\"Token service for authentication with external services.\"\"\"\n\nimport json\n\nimport requests\n\n\nclass TokenService:\n    \""
  },
  {
    "path": "openchatbi/code/docker_executor.py",
    "chars": 7469,
    "preview": "import os\nimport shutil\nimport subprocess\nimport tempfile\nfrom pathlib import Path\n\nimport docker\nfrom docker.errors imp"
  },
  {
    "path": "openchatbi/code/executor_base.py",
    "chars": 520,
    "preview": "from typing import Any\n\n\nclass ExecutorBase:\n    \"\"\"Base class for executing python code.\"\"\"\n\n    _variable: dict\n\n    d"
  },
  {
    "path": "openchatbi/code/local_executor.py",
    "chars": 596,
    "preview": "import sys\nfrom io import StringIO\n\nfrom openchatbi.code.executor_base import ExecutorBase\n\n\nclass LocalExecutor(Executo"
  },
  {
    "path": "openchatbi/code/restricted_local_executor.py",
    "chars": 1693,
    "preview": "import sys\nfrom io import StringIO\n\nfrom RestrictedPython import compile_restricted, safe_globals, utility_builtins\nfrom"
  },
  {
    "path": "openchatbi/config.yaml.template",
    "chars": 4125,
    "preview": "organization: The Company\ndialect: presto\nbi_config_file: example/bi.yaml\n\n# Python Code Execution Configuration\n# Optio"
  },
  {
    "path": "openchatbi/config_loader.py",
    "chars": 13093,
    "preview": "import importlib\nimport os\nfrom importlib.util import find_spec\nfrom typing import Any\nfrom unittest.mock import MagicMo"
  },
  {
    "path": "openchatbi/constants.py",
    "chars": 478,
    "preview": "\"\"\"Constants used throughout the OpenChatBI application.\"\"\"\n\n# Date/time format strings\ndatetime_format = \"%Y-%m-%d %H:%"
  },
  {
    "path": "openchatbi/context_config.py",
    "chars": 2599,
    "preview": "\"\"\"Configuration for context management settings.\"\"\"\n\nfrom dataclasses import dataclass\n\nfrom openchatbi import config\n\n"
  },
  {
    "path": "openchatbi/context_manager.py",
    "chars": 17092,
    "preview": "\"\"\"Context management utilities for handling long conversations.\"\"\"\n\nimport json\nimport re\nimport uuid\n\nfrom langchain_c"
  },
  {
    "path": "openchatbi/graph_state.py",
    "chars": 1763,
    "preview": "\"\"\"State classes for OpenChatBI graph execution.\"\"\"\n\nfrom typing import Annotated, Any\n\nfrom langchain_core.messages imp"
  },
  {
    "path": "openchatbi/llm/llm.py",
    "chars": 5577,
    "preview": "import time\nimport traceback\n\nfrom langchain_core.language_models import BaseChatModel\nfrom langchain_core.runnables.bas"
  },
  {
    "path": "openchatbi/prompts/agent_prompt.md",
    "chars": 3311,
    "preview": "You are a helpful BI assistant that can answer user's question. \nUse the instructions below and the tools available to y"
  },
  {
    "path": "openchatbi/prompts/extraction_prompt.md",
    "chars": 7480,
    "preview": "You are a specialized language expert responsible for analyzing user questions and extracting structured information for"
  },
  {
    "path": "openchatbi/prompts/schema_linking_prompt.md",
    "chars": 2769,
    "preview": "You are a language expert and professional SQL engineer tasked with analyzing questions from [organization] users and se"
  },
  {
    "path": "openchatbi/prompts/sql_dialect/presto.md",
    "chars": 5616,
    "preview": "# Rules for Presto SQL\n- Use 'LIKE' instead of 'ILIKE' in the Presto SQL.\n- If there is a 'GROUP BY' clause in the user "
  },
  {
    "path": "openchatbi/prompts/summary_prompt.md",
    "chars": 2350,
    "preview": "Create a concise summary of this conversation for continuing the data analysis work. Focus on:\n\n1. **User's Main Questio"
  },
  {
    "path": "openchatbi/prompts/system_prompt.py",
    "chars": 6206,
    "preview": "\"\"\"System prompt templates and business configuration.\"\"\"\n\nimport importlib.resources\n\nfrom openchatbi import config\n\n# "
  },
  {
    "path": "openchatbi/prompts/text2sql_prompt.md",
    "chars": 1694,
    "preview": "You are a professional SQL engineer, your task is to transform user query into [dialect] SQL. \n- I will give you the bus"
  },
  {
    "path": "openchatbi/prompts/visualization_prompt.md",
    "chars": 1296,
    "preview": "You are a data visualization expert. Analyze the user's question and data to recommend the most appropriate chart type.\n"
  },
  {
    "path": "openchatbi/text2sql/__init__.py",
    "chars": 52,
    "preview": "\"\"\"Text-to-SQL conversion module for OpenChatBI.\"\"\"\n"
  },
  {
    "path": "openchatbi/text2sql/data.py",
    "chars": 802,
    "preview": "import os\n\nfrom openchatbi import config\nfrom openchatbi.text2sql.text2sql_utils import init_sql_example_retriever, init"
  },
  {
    "path": "openchatbi/text2sql/extraction.py",
    "chars": 4078,
    "preview": "\"\"\"Information extraction module for text2sql processing.\"\"\"\n\nimport traceback\nfrom collections.abc import Callable\nfrom"
  },
  {
    "path": "openchatbi/text2sql/generate_sql.py",
    "chars": 16817,
    "preview": "import datetime\nfrom collections.abc import Callable\nfrom typing import Any\n\nimport pandas as pd\nfrom langchain_core.lan"
  },
  {
    "path": "openchatbi/text2sql/schema_linking.py",
    "chars": 10262,
    "preview": "\"\"\"Schema linking module for table and column selection in text2sql.\"\"\"\n\nfrom datetime import datetime\n\nfrom langchain_c"
  },
  {
    "path": "openchatbi/text2sql/sql_graph.py",
    "chars": 6006,
    "preview": "\"\"\"SQL generation graph construction and execution.\"\"\"\n\nfrom langchain_openai.chat_models.base import BaseChatOpenAI\nfro"
  },
  {
    "path": "openchatbi/text2sql/text2sql_utils.py",
    "chars": 2074,
    "preview": "\"\"\"Utility functions for text2sql retrieval systems.\"\"\"\n\nfrom openchatbi.llm.llm import get_embedding_model\nfrom opencha"
  },
  {
    "path": "openchatbi/text2sql/visualization.py",
    "chars": 12758,
    "preview": "\"\"\"Visualization generation for SQL query results using Plotly.\"\"\"\n\nfrom dataclasses import dataclass\nfrom enum import E"
  },
  {
    "path": "openchatbi/text_segmenter.py",
    "chars": 4166,
    "preview": "\"\"\"Text segmentation utility with jieba support.\"\"\"\n\nimport re\nimport string\nimport sys\n\n# Try to import jieba, fallback"
  },
  {
    "path": "openchatbi/tool/ask_human.py",
    "chars": 605,
    "preview": "\"\"\"Tool for asking human clarification when information is ambiguous.\"\"\"\n\nfrom pydantic import BaseModel, Field\n\n\nclass "
  },
  {
    "path": "openchatbi/tool/mcp_tools.py",
    "chars": 9650,
    "preview": "\"\"\"MCP (Model Context Protocol) tools integration for OpenChatBI.\n\nThis module provides integration with MCP servers usi"
  },
  {
    "path": "openchatbi/tool/memory.py",
    "chars": 6385,
    "preview": "import functools\nimport sys\nfrom typing import Any\n\ntry:\n    import pysqlite3 as sqlite3\nexcept ImportError:  # pragma: "
  },
  {
    "path": "openchatbi/tool/run_python_code.py",
    "chars": 2891,
    "preview": "\"\"\"Tool for running python code.\"\"\"\n\nfrom langchain.tools import tool\nfrom pydantic import BaseModel, Field\n\nfrom opench"
  },
  {
    "path": "openchatbi/tool/save_report.py",
    "chars": 2437,
    "preview": "\"\"\"Tool for saving reports to files.\"\"\"\n\nimport datetime\nfrom pathlib import Path\n\nfrom langchain.tools import tool\nfrom"
  },
  {
    "path": "openchatbi/tool/search_knowledge.py",
    "chars": 4558,
    "preview": "\"\"\"Tools for searching knowledge bases and schema information.\"\"\"\n\nfrom langchain.tools import tool\nfrom pydantic import"
  },
  {
    "path": "openchatbi/tool/timeseries_forecast.py",
    "chars": 8479,
    "preview": "\"\"\"Tool for time series forecasting.\"\"\"\n\nimport logging\nfrom typing import Any\n\nimport requests\nfrom langchain.tools imp"
  },
  {
    "path": "openchatbi/utils.py",
    "chars": 21646,
    "preview": "\"\"\"Utility functions for OpenChatBI.\"\"\"\n\nimport json\nimport sys\nimport uuid\nfrom pathlib import Path\nfrom typing import "
  },
  {
    "path": "pyproject.toml",
    "chars": 5995,
    "preview": "[project]\nname = \"openchatbi\"\nversion = \"0.2.2\"\ndescription = \"OpenChatBI - Natural language business intelligence power"
  },
  {
    "path": "run_streamlit_ui.py",
    "chars": 1448,
    "preview": "#!/usr/bin/env python3\n\"\"\"\nLaunch script for the Streamlit-based OpenChatBI interface.\n\nUsage:\n    python run_streamlit_"
  },
  {
    "path": "run_tests.py",
    "chars": 3646,
    "preview": "#!/usr/bin/env python3\n\"\"\"Test runner script for OpenChatBI.\"\"\"\n\nimport argparse\nimport subprocess\nimport sys\n\n\ndef run_"
  },
  {
    "path": "sample_api/async_api.py",
    "chars": 4903,
    "preview": "\"\"\"Async API for streaming chat responses from OpenChatBI.\"\"\"\n\nimport asyncio\nfrom typing import Any\nfrom collections im"
  },
  {
    "path": "sample_ui/async_graph_manager.py",
    "chars": 2957,
    "preview": "\"\"\"Common AsyncGraphManager for UIs.\"\"\"\n\nfrom typing import Any\n\nfrom langgraph.checkpoint.sqlite.aio import AsyncSqlite"
  },
  {
    "path": "sample_ui/memory_ui.py",
    "chars": 5884,
    "preview": "\"\"\"Memory listing UI for OpenChatBI using FastAPI and Gradio.\"\"\"\n\nimport json\nfrom typing import Any\n\nimport gradio as g"
  },
  {
    "path": "sample_ui/plotly_utils.py",
    "chars": 10015,
    "preview": "\"\"\"Plotly utilities for generating charts from visualization DSL.\"\"\"\n\nfrom io import StringIO\nfrom typing import Any\n\nim"
  },
  {
    "path": "sample_ui/simple_ui.py",
    "chars": 3816,
    "preview": "\"\"\"Simple web UI for OpenChatBI using FastAPI and Gradio.\"\"\"\n\nfrom collections import defaultdict\n\nimport gradio as gr\ni"
  },
  {
    "path": "sample_ui/streaming_ui.py",
    "chars": 15149,
    "preview": "\"\"\"Gradio-based Streaming UI for OpenChatBI with real-time chat interface.\"\"\"\n\nimport asyncio\nimport sys\nfrom collection"
  },
  {
    "path": "sample_ui/streamlit_ui.py",
    "chars": 21835,
    "preview": "\"\"\"Streamlit-based Streaming UI for OpenChatBI with collapsible thinking sections.\"\"\"\n\nimport asyncio\nimport sys\nimport "
  },
  {
    "path": "sample_ui/style.py",
    "chars": 855,
    "preview": "# Custom CSS for styling the chat interface\ncustom_css = \"\"\"\n#chatbot {\n    height: 600px !important;\n    font-family: \""
  },
  {
    "path": "tests/README.md",
    "chars": 8563,
    "preview": "# OpenChatBI Test Suite\n\nThis directory contains comprehensive unit tests for the OpenChatBI project. The test suite is "
  },
  {
    "path": "tests/__init__.py",
    "chars": 35,
    "preview": "\"\"\"Test package for OpenChatBI.\"\"\"\n"
  },
  {
    "path": "tests/conftest.py",
    "chars": 7038,
    "preview": "\"\"\"Pytest configuration and shared fixtures.\"\"\"\n\nimport tempfile\nfrom collections.abc import Generator\nfrom pathlib impo"
  },
  {
    "path": "tests/context_management/README.md",
    "chars": 6243,
    "preview": "# Context Management Test Suite\n\nThis directory contains comprehensive tests for the context management functionality in"
  },
  {
    "path": "tests/context_management/__init__.py",
    "chars": 70,
    "preview": "\"\"\"Context management test package.\"\"\"\n\n# Test package initialization\n"
  },
  {
    "path": "tests/context_management/conftest.py",
    "chars": 4546,
    "preview": "\"\"\"Pytest configuration and fixtures for context management tests.\"\"\"\n\nfrom unittest.mock import Mock\n\nimport pytest\nfro"
  },
  {
    "path": "tests/context_management/test_agent_graph_integration.py",
    "chars": 14762,
    "preview": "\"\"\"Integration tests for agent graph with context management.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nf"
  },
  {
    "path": "tests/context_management/test_context_config.py",
    "chars": 9624,
    "preview": "\"\"\"Unit tests for context configuration.\"\"\"\n\nfrom openchatbi.context_config import ContextConfig, get_context_config, up"
  },
  {
    "path": "tests/context_management/test_context_manager.py",
    "chars": 13218,
    "preview": "\"\"\"Unit tests for ContextManager class.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nfrom langchain_core.mes"
  },
  {
    "path": "tests/context_management/test_edge_cases.py",
    "chars": 10236,
    "preview": "\"\"\"Edge cases for context management.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nfrom langchain_core.messa"
  },
  {
    "path": "tests/context_management/test_runner.py",
    "chars": 2801,
    "preview": "\"\"\"Test runner script for context management tests.\"\"\"\n\nimport argparse\nimport subprocess\nimport sys\nfrom pathlib import"
  },
  {
    "path": "tests/context_management/test_state_operations.py",
    "chars": 13705,
    "preview": "\"\"\"Tests for message-based context management operations.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nfrom "
  },
  {
    "path": "tests/test_catalog_loader.py",
    "chars": 6396,
    "preview": "\"\"\"Tests for catalog loader functionality.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\n\nfrom openchatbi.cat"
  },
  {
    "path": "tests/test_catalog_store.py",
    "chars": 7621,
    "preview": "\"\"\"Tests for catalog store functionality.\"\"\"\n\nimport pytest\n\nfrom openchatbi.catalog.catalog_store import CatalogStore\nf"
  },
  {
    "path": "tests/test_config_loader.py",
    "chars": 14031,
    "preview": "\"\"\"Tests for configuration loading functionality.\"\"\"\n\nfrom unittest.mock import MagicMock, patch\n\nimport pytest\nimport y"
  },
  {
    "path": "tests/test_graph_state.py",
    "chars": 6126,
    "preview": "\"\"\"Tests for graph state management.\"\"\"\n\nfrom langchain_core.messages import AIMessage, HumanMessage, ToolMessage\n\nfrom "
  },
  {
    "path": "tests/test_incomplete_tool_calls.py",
    "chars": 7352,
    "preview": "\"\"\"Tests for incomplete tool call recovery functionality.\"\"\"\n\nfrom unittest.mock import Mock\n\nfrom langchain_core.messag"
  },
  {
    "path": "tests/test_memory.py",
    "chars": 12934,
    "preview": "\"\"\"Tests for memory tool functionality.\"\"\"\n\nfrom pathlib import Path\nfrom unittest.mock import AsyncMock, Mock, patch\n\ni"
  },
  {
    "path": "tests/test_plotly_utils.py",
    "chars": 11924,
    "preview": "\"\"\"Tests for plotly utilities in the UI.\"\"\"\n\nimport plotly.graph_objects as go\nimport pytest\n\nfrom sample_ui.plotly_util"
  },
  {
    "path": "tests/test_simple_store.py",
    "chars": 10224,
    "preview": "\"\"\"Unit tests for SimpleStore.\"\"\"\n\nimport pytest\n\nfrom openchatbi.utils import SimpleStore\n\n\nclass TestSimpleStore:\n    "
  },
  {
    "path": "tests/test_text2sql_extraction.py",
    "chars": 8803,
    "preview": "\"\"\"Tests for text2sql information extraction functionality.\"\"\"\n\nimport json\nfrom datetime import date\nfrom unittest.mock"
  },
  {
    "path": "tests/test_text2sql_generate_sql.py",
    "chars": 11226,
    "preview": "\"\"\"Tests for text2sql SQL generation functionality.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nfrom langch"
  },
  {
    "path": "tests/test_text2sql_schema_linking.py",
    "chars": 17829,
    "preview": "\"\"\"Tests for text2sql schema linking functionality.\"\"\"\n\nfrom unittest.mock import Mock, patch\n\nimport pytest\nfrom langch"
  },
  {
    "path": "tests/test_text2sql_visualization.py",
    "chars": 12278,
    "preview": "\"\"\"Tests for text2sql visualization functionality.\"\"\"\n\nimport pytest\n\nfrom openchatbi.text2sql.visualization import Char"
  },
  {
    "path": "tests/test_tools_ask_human.py",
    "chars": 1525,
    "preview": "\"\"\"Tests for ask_human tool functionality.\"\"\"\n\nimport pytest\nfrom pydantic import ValidationError\n\nfrom openchatbi.tool."
  },
  {
    "path": "tests/test_tools_run_python_code.py",
    "chars": 13123,
    "preview": "\"\"\"Tests for run_python_code tool functionality.\"\"\"\n\nfrom unittest.mock import patch\n\nfrom openchatbi.tool.run_python_co"
  },
  {
    "path": "tests/test_tools_search_knowledge.py",
    "chars": 17259,
    "preview": "\"\"\"Tests for search_knowledge tool functionality.\"\"\"\n\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom openchatbi.to"
  },
  {
    "path": "tests/test_utils.py",
    "chars": 5637,
    "preview": "\"\"\"Tests for utility functions.\"\"\"\n\nimport io\nfrom unittest.mock import patch\n\nimport pytest\n\nfrom openchatbi.utils impo"
  },
  {
    "path": "timeseries_forecasting/Dockerfile",
    "chars": 882,
    "preview": "FROM python:3.10-slim\n\n# Install only essential build tools\nRUN apt-get update && apt-get install -y --no-install-recomm"
  },
  {
    "path": "timeseries_forecasting/README.md",
    "chars": 5916,
    "preview": "# Transformer Time Series Forecasting Service\n\nA Docker-based time series forecasting service using Transformer based mo"
  },
  {
    "path": "timeseries_forecasting/app.py",
    "chars": 6720,
    "preview": "\"\"\"app.py: FastAPI application for Transformer time series forecasting.\"\"\"\n\nimport logging\nimport time\nfrom typing impor"
  },
  {
    "path": "timeseries_forecasting/build_and_run.sh",
    "chars": 2120,
    "preview": "#!/bin/bash\n\n# Build and run script for time series forecasting service\nset -e\n\necho \"=== Building Timeseries Forecastin"
  },
  {
    "path": "timeseries_forecasting/model_handler.py",
    "chars": 9843,
    "preview": "\"\"\"model_handler.py: Transformer based model handler for time series forecasting.\"\"\"\n\nimport logging\nfrom typing import "
  },
  {
    "path": "timeseries_forecasting/test_forecasting.py",
    "chars": 9717,
    "preview": "#!/usr/bin/env python3\n\"\"\"test_forecasting.py: Test script for Timer forecasting service.\"\"\"\n\nimport time\nfrom datetime "
  }
]

About this extraction

This page contains the full source code of the zhongyu09/openchatbi GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 133 files (701.7 KB), approximately 156.6k tokens, and a symbol index with 728 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo