Showing preview only (626K chars total). Download the full file or copy to clipboard to get everything.
Repository: premAI-io/premsql
Branch: main
Commit: 7041239e5ce1
Files: 90
Total size: 593.0 KB
Directory structure:
gitextract_acehzbqp/
├── .gitignore
├── README.md
├── examples/
│ ├── agent_server.ipynb
│ ├── agents.ipynb
│ ├── datasets.ipynb
│ ├── error_dataset.ipynb
│ ├── evaluation.ipynb
│ ├── finetuning.ipynb
│ ├── generators.ipynb
│ ├── lora_tuning.py
│ └── simple_pipeline.ipynb
├── premsql/
│ ├── __init__.py
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── baseline/
│ │ │ ├── __init__.py
│ │ │ ├── main.py
│ │ │ ├── prompts.py
│ │ │ └── workers/
│ │ │ ├── __init__.py
│ │ │ ├── analyser.py
│ │ │ ├── followup.py
│ │ │ ├── plotter.py
│ │ │ └── text2sql.py
│ │ ├── memory.py
│ │ ├── models.py
│ │ ├── router.py
│ │ ├── tools/
│ │ │ ├── __init__.py
│ │ │ └── plot/
│ │ │ ├── base.py
│ │ │ └── matplotlib_tool.py
│ │ └── utils.py
│ ├── cli.py
│ ├── datasets/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── collator.py
│ │ ├── error_dataset.py
│ │ ├── real/
│ │ │ ├── bird.py
│ │ │ ├── domains.py
│ │ │ └── spider.py
│ │ └── synthetic/
│ │ └── gretel.py
│ ├── evaluator/
│ │ ├── README.md
│ │ ├── __init__.py
│ │ └── base.py
│ ├── executors/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── from_langchain.py
│ │ └── from_sqlite.py
│ ├── generators/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── huggingface.py
│ │ ├── mlx.py
│ │ ├── ollama_model.py
│ │ ├── openai.py
│ │ └── premai.py
│ ├── logger.py
│ ├── playground/
│ │ ├── __init__.py
│ │ ├── backend/
│ │ │ ├── api/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── admin.py
│ │ │ │ ├── apps.py
│ │ │ │ ├── migrations/
│ │ │ │ │ ├── 0001_initial.py
│ │ │ │ │ └── __init__.py
│ │ │ │ ├── models.py
│ │ │ │ ├── pydantic_models.py
│ │ │ │ ├── serializers.py
│ │ │ │ ├── services.py
│ │ │ │ ├── tests.py
│ │ │ │ ├── urls.py
│ │ │ │ ├── utils.py
│ │ │ │ └── views.py
│ │ │ ├── backend/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── asgi.py
│ │ │ │ ├── settings.py
│ │ │ │ ├── urls.py
│ │ │ │ └── wsgi.py
│ │ │ ├── backend_client.py
│ │ │ └── manage.py
│ │ ├── frontend/
│ │ │ ├── components/
│ │ │ │ ├── chat.py
│ │ │ │ ├── session.py
│ │ │ │ ├── streamlit_plot.py
│ │ │ │ └── uploader.py
│ │ │ ├── main.py
│ │ │ └── utils.py
│ │ └── inference_server/
│ │ ├── api_client.py
│ │ └── service.py
│ ├── prompts.py
│ ├── tuner/
│ │ ├── __init__.py
│ │ ├── callback.py
│ │ ├── config.py
│ │ ├── full.py
│ │ └── peft.py
│ └── utils.py
└── pyproject.toml
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
data
experiments
output
test.py
exps
# Python specific
*.pyc
*.pyo
__pycache__/
# Virtual environments
venv/
env/
env.bak/
env1/
env2/
.env
# IDE specific
.idea/
.vscode/
# Compiled files
*.pyc
*.pyo
*.pyd
*.so
*.dll
*.exe
*.out
*.pyc
*.whl
# Logs and databases
*.log
*.sqlite3
*.db
# Data science and ML specific
data/
models/
*.h5
*.pkl
*.joblib
# Jupyter Notebook specific
.ipynb_checkpoints/
================================================
FILE: README.md
================================================
# PremSQL | Easy to use fully local RAG on Databases
[](https://pepy.tech/projects/premsql)
<a href="https://huggingface.co/premai-io/prem-1B-SQL"><img src="https://img.shields.io/badge/HuggingFace (downloads till Feb14: 26K+) -yellow"></a>
PremSQL is an open-source library designed to help developers create secure, fully local Text-to-SQL solutions using small language models. It provides all the essential tools to build and deploy end-to-end Text-to-SQL pipelines with customizable components, making it ideal for secure, autonomous AI-powered data analysis.

## New: PremSQL Playground, Agents and API
We just rleased the latest version of PremSQL. It comes with the following:
- **PremSQL Agents:** Using PremSQL agents you can make analysis, plot charts and query to databases all using Natural Language. For now it comes with a baseline level agent. Using our library you can customize agents and build on top of it.
- **PremSQL API**: A self hosted API which can then be used using any language to make requests to use the deployed agents.
- **PremSQL Playground**: A playground UI (self hosted) which you can use interact with Text to SQL agents for your analysis tasks. You can also test your customized agents using this playground as well. Watch it in action.
https://github.com/user-attachments/assets/b6db9737-cd42-4848-8a44-f23a5de1f600
## News and blogs
- [Nov 18th 2024] [Prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) reached 10K + downloads in HuggingFace
- [Nov 7th 2024] Release of [Prem-1B-SQL Ollama](https://ollama.com/anindya/prem1b-sql-ollama-fp116) and Ollama support.
- [Nov 5th 2024] Release of PremSQL agents, AgentServer and Playground
- [Oct 30th] Prem-1B-SQL crossed 5K + downloads on Huggingface
- [Sep 20th 2024] First release of [Prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) (51.54% on BirdBench private dataset) | [Blog post](https://blog.premai.io/prem-1b-sql-fully-local-performant-slm-for-text-to-sql/)
- [Sep 10th 2024] First release of PremSQL | [Blog Post](https://blog.premai.io/premsql-towards-end-to-end-local-text-to-sql-pipelines-2/)
- [Blog post]: [Using PremSQL to evaluate different open and closed source models](https://blog.premai.io/premsql-towards-end-to-end-local-text-to-sql-pipelines-2/)
- [Blog post]: [State of Text to SQL 2024](https://blog.premai.io/state-of-text2sql-2024/)
## 🚀 Features
- **Local-First**: Avoid third-party closed-source providers and keep your data secure.
- **Multiple connectors**: Supports [PremAI](https://app.premai.io/projects/), [Ollama](https://ollama.com/), [HuggingFace](https://huggingface.co/), [Apple MLX](https://huggingface.co/), [OpenAI](https://openai.com/).
- **Customizable Datasets**: Create, fine-tune, and evaluate models with built-in or custom datasets.
- **Robust Executors and Evaluators**: Easily connect to databases and assess model performance.
- **Advanced Generators**: Convert natural language prompts into executable SQL queries.
- **Error Handling and Self-Correction**: Automatically correct SQL queries during inference.
- **Fine-Tuning Support**: Fine-tune models with LoRA, QLoRA, or full fine-tuning strategies.
- **Agents**: Use PremSQL baseline agent to perform Text to SQL, write analysis reports and plot simple charts on databases.
- **Playground**: Use our playground to do the same for agents but with a better ChatGPT UI like experience dedicated for AI powered data analysis.
- **Importing CSVs or Kaggle CSV dataset directly to PremSQL playground**: You can analyse any CSV dataset from kaggle directly or from any folder using PremSQL.
Last but not the least, all the features are extendible for your very own customization and private data.
## 📚 Table of Contents
- [PremSQL](#premsql)
- [🚀 Features](#-features)
- [📚 Table of Contents](#-table-of-contents)
- [🛠️ Installation](#️-installation)
- [🚀 Quickstart](#-quickstart)
- [📦 Components Overview](#-components-overview)
- [Datasets](#datasets)
- [Executors](#executors)
- [Evaluators](#evaluators)
- [Generators](#generators)
- [Error Handling](#error-handling)
- [Tuner](#tuner)
- [Agents](#agents)
- [AgentServer and Playground](#playground)
- [🤝 Contributing](#-contributing)
- [🛣️ Roadmap](#️-roadmap)
- [📝 License](#-license)
## 🛠️ Installation
PremSQL requires Python 3.8 or higher. Install the library via pip:
```bash
pip install -U premsql
```
## 🚀 Quickstart
Here’s a quick example of how to use PremSQL to generate SQL queries, plot charts and analyse dataframes all in natural language. You can name this file as `start_agent.py`
```python start_agent.py
import os
from dotenv import load_dotenv
from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorPremAI
from premsql.executors import ExecutorUsingLangChain
from premsql.agents.tools import SimpleMatplotlibTool
load_dotenv()
text2sql_model = Text2SQLGeneratorPremAI(
model_name="gpt-4o", experiment_name="text2sql_model", type="test",
premai_api_key=os.environ.get("PREMAI_API_KEY"),
project_id=os.environ.get("PREMAI_PROJECT_ID")
)
analyser_plotter_model = Text2SQLGeneratorPremAI(
model_name="gpt-4o", experiment_name="text2sql_model", type="test",
premai_api_key=os.environ.get("PREMAI_API_KEY"),
project_id=os.environ.get("PREMAI_PROJECT_ID")
)
# Enter your Database path here. Supported SQLite, Postgres, MySQL and an unique session name.
db_connection_uri = "<sqlite:///db_path>"
session_name = "<session_name>"
agent = BaseLineAgent(
session_name=session_name,
db_connection_uri=db_connection_uri,
specialized_model1=text2sql_model,
specialized_model2=analyser_plotter_model,
executor=ExecutorUsingLangChain(),
auto_filter_tables=False,
plot_tool=SimpleMatplotlibTool()
)
# Query the database
response = agent(
"/query show me the phone numbers of direct charter-funded schools opened after 2000/1/1"
)
# Analyze the results
analysis = agent(
"/analyse what patterns do you see in the data?"
)
# Create a visualization
plot = agent(
"/plot create a bar chart showing school counts by year"
)
```
You can launch the PremSQL Playground (as shown in the above video by adding these two additional lines after instantiating Agent)
```python
agent_server = AgentServer(agent=agent, port={port})
agent_server.launch()
```
And then open two terminal. On one side write:
```bash
premsql launch all
```
and on the second side of the terminal write:
```bash
python start_agent.py
```
## 📦 Components Overview
### [Datasets](https://docs.premai.io/premsql/introduction)
PremSQL provides a simple API to use various pre-processed datasets for Text-to-SQL tasks. Text-to-SQL is complex as it requires data dependencies on databases and tables. The premsql datasets help streamline this by providing easy access to datasets and enabling you to create your own datasets with private databases.
Currently, the following datasets are readily available:
1. [BirdBench Dataset](https://huggingface.co/datasets/premai-io/birdbench)
2. [Spider Unified Datasets](https://huggingface.co/datasets/premai-io/spider)
3. [Domains Dataset](https://huggingface.co/datasets/premai-io/domains)
4. [Gretel AI Dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)
**Example usage:**
```python
from premsql.datasets import Text2SQLDataset
bird_dataset = Text2SQLDataset(
dataset_name='bird', split="train", force_download=False,
dataset_folder="/path/to/your/data" # change this to the path where you want to store the dataset
)
```
### Generators
PremSQL generators are responsible for converting natural language questions into SQL queries. Think of these as modular inference APIs specific to text-to-SQL. You can integrate various third-party APIs, models, or custom pipelines.
**Example:**
```python
from premsql.generators import Text2SQLGeneratorHF
from premsql.datasets import Text2SQLDataset
# Define a dataset
dataset = bird_dataset = Text2SQLDataset(
dataset_name='bird', split="train", force_download=False,
dataset_folder="/path/to/dataset"
).setup_dataset(num_rows=10, num_fewshot=3)
# Define a generator
generator = Text2SQLGeneratorHF(
model_or_name_or_path="premai-io/prem-1B-SQL",
experiment_name="test_generators",
device="cuda:0",
type="test"
)
# Generate on the full dataset
responses = generator.generate_and_save_results(
dataset=bird_dataset,
temperature=0.1,
max_new_tokens=256
)
print(responses)
```
Results are saved in the experiment_path as predict.json.
We also support execution guided decoding. This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.

A quick glance on execution guided decoding:
```python
from premsql.executors import SQLiteExecutor
executor = SQLiteExecutor()
response = generator.generate_and_save_results(
dataset=bird_dataset,
temperature=0.1,
max_new_tokens=256,
force=True,
executor=executor,
max_retries=5 # this is optional (default is already set to 5)
)
```
### [Executors](https://docs.premai.io/premsql/executors)
An executor executes the generated SQL queries against the database and fetches the results. It is a crucial component in the Text-to-SQL pipeline, as it ensures that the generated SQL queries are valid and return the expected results. PremSQL supports a native executor for SQLite databases and also supports [LangChain's SQLDatabase](https://python.langchain.com/v0.2/docs/integrations/tools/sql_database/)
as an executor.
**Example usage**
```python
from premsql.executors import SQLiteExecutor
# Instantiate the executor
executor = SQLiteExecutor()
# Set a sample dataset path
db_path = "./data/db/california_schools.sqlite"
sql = 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1'
# execute the SQL
result = executor.execute_sql(
sql=sql,
dsn_or_db_path=db_path
)
print(result)
```
This will show:
```python
{'result': [('Brief Encounter',)], 'error': None, 'execution_time': 0.03717160224914551}
```
### [Evaluators](https://docs.premai.io/premsql/evaluators)
Executors connect to databases and execute SQL, while evaluators assess the performance of your models against predefined metrics like Execution Accuracy (EX) and Valid Efficiency Score (VES).
**Example Usage:**
```python
from premsql.executors import SQLiteExecutor
from premsql.evaluator import Text2SQLEvaluator
# Define the executor
executor = SQLiteExecutor()
# Define the evaluator
evaluator = Text2SQLEvaluator(
executor=executor,
experiment_path=generator.experiment_path
)
# Now evaluate the models
results = evaluator.execute(
metric_name="accuracy",
model_responses=response,
filter_by="db_id",
meta_time_out=10
)
print(results)
```
Using the `filter_by` option to filter results by `db_id` allows you to see overall accuracy and its distribution across different databases. If a key like `difficulty` is available, it will show performance distribution over various difficulty levels. Filtering evaluations by available keys helps in analyzing and understanding model performance empirically. Below is a visualization of model performance across different databases based on the applied filters.

### [Error Handling](https://docs.premai.io/premsql/error_dataset)
Error-handling prompts are crucial for refining model performance, especially in complex tasks like Text-to-SQL generation. The prompts help the model learn how to handle errors by providing additional context and guidance based on past mistakes. By training on these prompts, the model can self-correct during inference, improving the quality of its output.
**Example Error Correction Prompt:**
```plaintext
{existing_prompt}
# Generated SQL: {sql}
## Error Message
{error_msg}
Carefully review the original question and error message, then rewrite the SQL query to address the identified issues.
```
To create a self-correction / error-correction dataset:
- You start with an existing training dataset
- You run an evaluation on that training dataset using an un-trained model.
- You gather the data and pass it to the error-handling prompt
- Finally, you save the results ready to be used for fine-tuning.
Here is the code to get started to make a self-correction dataset using existing datasets:
```python
from premsql.datasets.error_dataset import ErrorDatasetGenerator
from premsql.generators.huggingface import Text2SQLGeneratorHF
from premsql.executors.from_langchain import ExecutorUsingLangChain
from premsql.datasets import BirdDataset
generator = Text2SQLGeneratorHF(
model_or_name_or_path="premai-io/prem-1B-SQL",
experiment_name="testing_error_gen",
type="train", # do not type: 'test' since this will be used during training
device="cuda:0"
)
executor = ExecutorUsingLangChain()
bird_train = BirdDataset(
split="train",
dataset_folder="/path/to/dataset"
).setup_dataset(num_rows=10)
error_dataset_gen = ErrorDatasetGenerator(generator=generator, executor=executor)
error_dataset = error_dataset_gen.generate_and_save(
datasets=bird_train,
force=True
)
```
### [Tuner](https://docs.premai.io/premsql/tuner)
`premsql tuner` is a module designed to fine-tune models specifically for text-to-SQL tasks. The module offers multiple ways of fine-tuning, providing flexibility based on your project's needs.
### Supported Fine-Tuning Methods
1. **Full Fine-Tuning**: Standard model fine-tuning with all its parameters.
2. **PEFT using LoRA**: Parameter-efficient-fine-tuning with LoRA (Low-Rank Adaptation) for faster and more efficient training.
3. **PEFT using QLoRA**: Another PEFT approach using Quantized LoRA, optimizing resource use during training.
In addition to these methods, you can create custom fine-tuning pipelines using the components and tools provided by premsql.
### Agents
Agents has been quite popular for a while. Simply we can define agents as an orchestrated workflows between different LLMs/SLMs. PremSQL Agents are mainly focussed to execute tasks related to Databases. Breifly PremSQL agents can:

- Query (`/query`) to a database from user’s natural language input.
- Analyse (`/analyse`) the database output and user query and give back a answer in natural language.
- Plot (`/plot`) basic charts based on user’s query.
- Lastly anything (`/followup`) which does not fit the above three categories, it can give you a followup on what do next.
PremSQL comes with a minimal agentic implementation (more implementation variants will come in later versions), which can query to a DB, provide analysis over dataframes and answer user questions and plot simple graphs. This is how you use our baseline Text to SQL agent.
```python
import os
from dotenv import load_dotenv
from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorPremAI
from premsql.executors import ExecutorUsingLangChain
from premsql.agents.tools import SimpleMatplotlibTool
load_dotenv()
text2sql_model = Text2SQLGeneratorPremAI(
model_name="gpt-4o", experiment_name="text2sql_model", type="test",
premai_api_key=os.environ.get("PREMAI_API_KEY"),
project_id=os.environ.get("PREMAI_PROJECT_ID")
)
analyser_plotter_model = Text2SQLGeneratorPremAI(
model_name="gpt-4o", experiment_name="text2sql_model", type="test",
premai_api_key=os.environ.get("PREMAI_API_KEY"),
project_id=os.environ.get("PREMAI_PROJECT_ID")
)
# Enter your Database path here. Supported SQLite, Postgres, MySQL and an unique session name.
db_connection_uri = "<sqlite:///db_path>"
session_name = "<session_name>"
agent = BaseLineAgent(
session_name=session_name,
db_connection_uri=db_connection_uri,
specialized_model1=text2sql_model,
specialized_model2=analyser_plotter_model,
executor=ExecutorUsingLangChain(),
auto_filter_tables=False,
plot_tool=SimpleMatplotlibTool()
)
# Query the database
response = agent(
"/query show me the phone numbers of direct charter-funded schools opened after 2000/1/1"
)
# Analyze the results
analysis = agent(
"/analyse what patterns do you see in the data?"
)
# Create a visualization
plot = agent(
"/plot create a bar chart showing school counts by year"
)
```
You can learn more about PremSQL agents and their design patterns in details in [the documentation](https://docs.premai.io/premsql/introduction).
### Playground
You can think of Playground as a similar environment like chatGPT UI for specialized for RAGs on databases. There are different personas of usage of the PremSQL playground. To launch the Playground you need to write in the terminal:
```bash
premsql launch all
```
This will run two things:
- Django Backend API server (runing on port 8000)
- Streamlit UI which is our Playground.
In the above section you have see how we have defined our agent. You can deploy this agent anywhere using the AgentServer which is a fastapi wrapper. Using this you can either deploy as many instances of PremSQL Baseline agent or your own agent of your choice and connect it to the playground either to test it or use it for your internal database. Here is how you define your server and launch it.
```python
# File name: start_agent_server.py
from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
# Define your agent as shown above:
agent = BaseLineAgent(...)
agent_server = AgentServer(agent=agent, port={port})
agent_server.launch()
```
Now inside another terminal write:
```bash
python start_agent_server.py
```
This can be any python file name. This will run a fastapi server. You need to paste the deployed url and paste it inside `Register New Session` part of the UI. Below shows, how the basic backend architecture looks like on how Playground communicates with the server.

As you can see from the above architecture, you can create independent sessions using the starter script. You can do different levels of customization on this. For instance:
- You can use different generators and different models
- You can add your own DB executor
- Last but not the least, you can add a new worker or make your own agent using combination of our pre-existing worker implementations and your own logics.
So, you can add as many such agents with different customization or your own PremSQL compatible agents and test them and use them with PremSQL Playground. You can learn about more technical details in [the documentation](https://docs.premai.io/premsql/introduction).
## 🛣️ Roadmap
PremSQL is continuously evolving, with exciting features planned for future releases:
- **Synthesizer Component**: A tool to generate synthetic datasets from private data, enabling fully private text-to-SQL workflows and enhancing model fine-tuning capabilities.
- **Training Better Small Language Models**: Ongoing training and optimization of small language models specifically tailored to PremSQL’s unique requirements, ensuring efficient and effective performance in text-to-SQL tasks.
- **Optimization of Generators and Executors**: Improvements to enhance the robustness of existing components, including parallel processing to speed up generation and execution times.
- **Standard Tests and Stability Improvements**: Introduction of comprehensive tests for greater stability of the library and the planned rollout of a simple user interface to improve the overall user experience.
Stay tuned for these exciting updates! We encourage you to contribute and provide feedback to help us shape the future of PremSQL.
## 📝 License
PremSQL is licensed under the MIT License. See the [LICENSE](LICENSE) file for more information.
## ☘️ Citation
```
@misc{Anindyadeep2024PremSQL,
author = {Anindyadeep},
title = {PremSQL: End-to-End Local-First Text-to-SQL Pipelines},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/premAI-io/premsql}},
note = {Accessed: 2024-12-10}
}
```
================================================
FILE: examples/agent_server.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/anindya/personal/PremSQL/v2_agent/premsql\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/anindya/Library/Caches/pypoetry/virtualenvs/text2sql-jLjiS8B5-py3.11/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
" self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
]
}
],
"source": [
"cd .."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[7546]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.sample(range(7000, 9000), k=1)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8194"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random.choice(range(7000, 9000))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from premsql.generators import Text2SQLGeneratorOpenAI\n",
"\n",
"Text2SQLGeneratorOpenAI(openai_api_key=)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a file named `serve.py` (or it could be anything) and add the following lines there:\n",
"\n",
"```Python\n",
"from premsql.playground import AgentServer\n",
"from premsql.agents import BaseLineAgent\n",
"from premsql.generators import Text2SQLGeneratorMLX\n",
"from premsql.executors import ExecutorUsingLangChain\n",
"from premsql.agents.tools import SimpleMatplotlibTool\n",
"\n",
"db_connection_uri = (\n",
" \"sqlite://///Users/anindya/personal/PremSQL/v2_agent/premsql/codebase_community.sqlite\"\n",
")\n",
"text2sql_model = Text2SQLGeneratorMLX(\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\", experiment_name=\"text2sql_model\", type=\"test\"\n",
")\n",
"\n",
"analyser_plotter_model = Text2SQLGeneratorMLX(\n",
" model_name_or_path=\"meta-llama/Llama-3.2-1B-Instruct\", experiment_name=\"analyser_model\", type=\"test\",\n",
")\n",
"\n",
"baseline = BaseLineAgent(\n",
" session_name=\"local_db_rag\", # An unique session name must be put\n",
" db_connection_uri=db_connection_uri, # DB which needs to connect for Text to SQL \n",
" specialized_model1=text2sql_model, # This referes to the Text to SQL model\n",
" specialized_model2=analyser_plotter_model, # This refers to any model other than Text to SQL\n",
" executor=ExecutorUsingLangChain(), # Which DB executor to use\n",
" auto_filter_tables=False, # Whether to filter tables before Text to SQL or not (uses LLM)\n",
" plot_tool=SimpleMatplotlibTool() # Matplotlib Tool which will be used by plotter worker\n",
")\n",
"\n",
"agent_server = AgentServer(agent=baseline, port=8263)\n",
"agent_server.launch()\n",
"```\n",
"\n",
"After this just run:\n",
"\n",
"```bash\n",
"python serve.py\n",
"```\n",
"\n",
"You will see a FastAPI server got started at your mentioned port with the following output:\n",
"\n",
"```bash\n",
"INFO: Started server process [78518]\n",
"INFO: Waiting for application startup.\n",
"2024-10-28 00:29:46,953 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting up the application\n",
"INFO: Application startup complete.\n",
"INFO: Uvicorn running on http://0.0.0.0:8263 (Press CTRL+C to quit)\n",
"```\n",
"\n",
"This means that our server has started now we can query it with our Terminal using Curl or Python requests or Javascript axios. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"from premsql.playground import InferenceServerAPIClient\n",
"from premsql.agents.tools import SimpleMatplotlibTool"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "text2sql-jLjiS8B5-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: examples/agents.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/anindya/personal/PremSQL/v2_agent/premsql\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/anindya/Library/Caches/pypoetry/virtualenvs/text2sql-jLjiS8B5-py3.11/lib/python3.11/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
" self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
]
}
],
"source": [
"cd .."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Agents\n",
"\n",
"We are all familiar about agents. Simply we can define agents as an orchestrated workflows between different LLMs. In PremSQL we are bringing the very first versions of Text2SQL agents. Agents in PremSQL are made using the available modular components like generators, executors etc. \n",
"\n",
"You can even extend agents with your custom logic and workflows with very less number of code using PremSQL. We will explore this in a the coming sections. Agents for Database specific RAGs mainly consist of the following tasks:\n",
"\n",
"1. Executing Queries to databases from natural language contexts (a.k.a Text to SQL).\n",
"2. Analysing the table and giving out insights in natural language. \n",
"3. Plotting different graphs to draw out relationship between entities from natural language questions. \n",
"4. A followup which includes error handling of agents and asking followup questions from the user. \n",
"\n",
"Additionally we maintain a memory that keep tracks of the previous conversation which it uses as context to get the current result. So to summarise, PremSQL has four \"routes\" that it needs to define before running. Here is a schematic diagram to understand how\n",
"PremSQL agents works. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"../examples/agent_flow.png\" width=\"1000\" height=\"550\">"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import HTML\n",
"HTML('<img src=\"../examples/agent_flow.png\" width=\"1000\" height=\"550\">')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So in any typical DB based Agentic RAG workflow, the following sequence of event happens:\n",
"\n",
"1. user asks a query. In PremSQL if you want to ask a query for:\n",
" - for Text to SQL, then use `/query`\n",
" - for analysing the output dataframe then use `/analyse` \n",
" - for plotting something `/plot`\n",
" - anything else goes under `/followup`. If you do not provide these markers, it goes to followup\n",
" route by default. We can also implement an \"LLM\" based router, but we think it is an overkill. \n",
"\n",
"2. Once user provides a query specificying the proper routes, it goes to the following set of \"Workers\". Workers are the specialized components whose job is to complete one specific task. So each worker has some specific set of output schema. You can learn more about different output schema [here](/premsql/agents/models.py)\n",
"\n",
"3. Once the worker processes the input, it provides some output. Then our output parser parses the output and gives back the result to the user. Additionally it updates the memory. \n",
"\n",
"### Building on top of Workers\n",
"\n",
"So the above workflow is fixed in PremSQL. However you can create your custom Text to SQL / Analyser / Plotter or Followup worker. As long as it adheres with the [output schema](/premsql/agents/models.py), it will be compatible and used with other PremSQL features like Agent Server and Playground. \n",
"\n",
"### Now Let's watch Agents in action"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/anindya/Library/Caches/pypoetry/virtualenvs/text2sql-jLjiS8B5-py3.11/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
}
],
"source": [
"# Since this demo is done on Mac, so I am using MLX. However same can be done with PremSDK, HF and OpenAI sdk. \n",
"\n",
"from premsql.agents import BaseLineAgent\n",
"from premsql.generators import Text2SQLGeneratorMLX\n",
"from premsql.executors import ExecutorUsingLangChain\n",
"from premsql.agents.tools import SimpleMatplotlibTool"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:03:38,828 - [GENERATOR] - INFO - Experiment folder found in: experiments/test/text2sql_model\n",
"Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 127100.12it/s]\n",
"Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 76260.07it/s]\n",
"2024-10-28 00:03:40,709 - [GENERATOR] - INFO - Experiment folder found in: experiments/test/analyser_model\n",
"Fetching 8 files: 100%|██████████| 8/8 [00:00<00:00, 68900.27it/s]\n",
"Fetching 8 files: 100%|██████████| 8/8 [00:00<00:00, 57952.39it/s]\n"
]
}
],
"source": [
"# Define the generator that will do the Text to SQL task\n",
"\n",
"text2sql_model = Text2SQLGeneratorMLX(\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\", experiment_name=\"text2sql_model\", type=\"test\"\n",
")\n",
"\n",
"analyser_plotter_model = Text2SQLGeneratorMLX(\n",
" model_name_or_path=\"meta-llama/Llama-3.2-1B-Instruct\", experiment_name=\"analyser_model\", type=\"test\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Now define your agent\n",
"db_connection_uri = (\n",
" \"sqlite://///Users/anindya/personal/PremSQL/v2_agent/premsql/codebase_community.sqlite\"\n",
")\n",
"\n",
"baseline = BaseLineAgent(\n",
" session_name=\"local_db_rag\", # An unique session name must be put\n",
" db_connection_uri=db_connection_uri, # DB which needs to connect for Text to SQL \n",
" specialized_model1=text2sql_model, # This referes to the Text to SQL model\n",
" specialized_model2=analyser_plotter_model, # This refers to any model other than Text to SQL\n",
" executor=ExecutorUsingLangChain(), # Which DB executor to use\n",
" auto_filter_tables=False, # Whether to filter tables before Text to SQL or not (uses LLM)\n",
" plot_tool=SimpleMatplotlibTool() # Matplotlib Tool which will be used by plotter worker\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:03:42,866 - [BASELINE-ROUTER] - INFO - Routing to: query\n",
"2024-10-28 00:03:46,238 - [BASELINE-TEXT2SQL-WORKER] - INFO - Taking the following selected table in schema: ['badges', 'comments', 'posts', 'tags', 'users', 'votes']\n",
"2024-10-28 00:03:49,252 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
}
],
"source": [
"output = baseline(\n",
" question=\"/query what all tables are present in the database\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>type</th>\n",
" <th>name</th>\n",
" <th>tbl_name</th>\n",
" <th>rootpage</th>\n",
" <th>sql</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>table</td>\n",
" <td>badges</td>\n",
" <td>badges</td>\n",
" <td>4</td>\n",
" <td>CREATE TABLE badges\\n(\\n Id INTEGER ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>table</td>\n",
" <td>comments</td>\n",
" <td>comments</td>\n",
" <td>5645</td>\n",
" <td>CREATE TABLE comments\\n(\\n Id ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>table</td>\n",
" <td>postHistory</td>\n",
" <td>postHistory</td>\n",
" <td>5646</td>\n",
" <td>CREATE TABLE postHistory\\n(\\n Id ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>index</td>\n",
" <td>sqlite_autoindex_postHistory_1</td>\n",
" <td>postHistory</td>\n",
" <td>5647</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>table</td>\n",
" <td>postLinks</td>\n",
" <td>postLinks</td>\n",
" <td>5648</td>\n",
" <td>CREATE TABLE postLinks\\n(\\n Id I...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>table</td>\n",
" <td>posts</td>\n",
" <td>posts</td>\n",
" <td>5649</td>\n",
" <td>CREATE TABLE posts\\n(\\n Id ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>index</td>\n",
" <td>sqlite_autoindex_posts_1</td>\n",
" <td>posts</td>\n",
" <td>5650</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>table</td>\n",
" <td>tags</td>\n",
" <td>tags</td>\n",
" <td>5651</td>\n",
" <td>CREATE TABLE tags\\n(\\n Id INTEGE...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>table</td>\n",
" <td>users</td>\n",
" <td>users</td>\n",
" <td>5652</td>\n",
" <td>CREATE TABLE users\\n(\\n Id INT...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>index</td>\n",
" <td>sqlite_autoindex_users_1</td>\n",
" <td>users</td>\n",
" <td>5653</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>table</td>\n",
" <td>votes</td>\n",
" <td>votes</td>\n",
" <td>5656</td>\n",
" <td>CREATE TABLE votes\\n(\\n Id INTEGE...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" type name tbl_name rootpage \\\n",
"0 table badges badges 4 \n",
"1 table comments comments 5645 \n",
"2 table postHistory postHistory 5646 \n",
"3 index sqlite_autoindex_postHistory_1 postHistory 5647 \n",
"4 table postLinks postLinks 5648 \n",
"5 table posts posts 5649 \n",
"6 index sqlite_autoindex_posts_1 posts 5650 \n",
"7 table tags tags 5651 \n",
"8 table users users 5652 \n",
"9 index sqlite_autoindex_users_1 users 5653 \n",
"10 table votes votes 5656 \n",
"\n",
" sql \n",
"0 CREATE TABLE badges\\n(\\n Id INTEGER ... \n",
"1 CREATE TABLE comments\\n(\\n Id ... \n",
"2 CREATE TABLE postHistory\\n(\\n Id ... \n",
"3 None \n",
"4 CREATE TABLE postLinks\\n(\\n Id I... \n",
"5 CREATE TABLE posts\\n(\\n Id ... \n",
"6 None \n",
"7 CREATE TABLE tags\\n(\\n Id INTEGE... \n",
"8 CREATE TABLE users\\n(\\n Id INT... \n",
"9 None \n",
"10 CREATE TABLE votes\\n(\\n Id INTEGE... "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output.show_output_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:04:50,910 - [BASELINE-ROUTER] - INFO - Routing to: analyse\n",
"2024-10-28 00:04:56,051 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
}
],
"source": [
"analysis = baseline(\n",
" question=\"/analyse Which tables I should use for understand relation about user votes\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Use the votes table to understand the relation about user votes.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"analysis.analysis"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:05:45,790 - [BASELINE-ROUTER] - INFO - Routing to: query\n",
"2024-10-28 00:05:48,085 - [BASELINE-TEXT2SQL-WORKER] - INFO - Taking the following selected table in schema: ['votes']\n",
"2024-10-28 00:05:48,704 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
}
],
"source": [
"output = baseline(\n",
" question=\"/query show me the first 10 rows in votes\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>PostId</th>\n",
" <th>VoteTypeId</th>\n",
" <th>CreationDate</th>\n",
" <th>UserId</th>\n",
" <th>BountyAmount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>10</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>12</td>\n",
" <td>6</td>\n",
" <td>2</td>\n",
" <td>2010-07-19</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id PostId VoteTypeId CreationDate UserId BountyAmount\n",
"0 1 3 2 2010-07-19 None None\n",
"1 2 2 2 2010-07-19 None None\n",
"2 3 5 2 2010-07-19 None None\n",
"3 4 5 2 2010-07-19 None None\n",
"4 5 3 2 2010-07-19 None None\n",
"5 6 4 2 2010-07-19 None None\n",
"6 7 2 2 2010-07-19 None None\n",
"7 10 3 2 2010-07-19 None None\n",
"8 11 5 2 2010-07-19 None None\n",
"9 12 6 2 2010-07-19 None None"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output.show_output_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'SELECT * FROM votes LIMIT 10;'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# You can also see what was the SQL used\n",
"output.sql_string"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:07:39,085 - [BASELINE-ROUTER] - INFO - Routing to: query\n",
"2024-10-28 00:07:42,631 - [BASELINE-TEXT2SQL-WORKER] - INFO - Error while selecting table: 'include'\n",
"2024-10-28 00:07:42,632 - [BASELINE-TEXT2SQL-WORKER] - INFO - Taking the following selected table in schema: ['votes']\n",
"2024-10-28 00:07:43,507 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
}
],
"source": [
"output = baseline(\"/query what is the max and min value of creation date in votes\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>max(CreationDate)</th>\n",
" <th>min(CreationDate)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2011-05-01</td>\n",
" <td>2010-07-19</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" max(CreationDate) min(CreationDate)\n",
"0 2011-05-01 2010-07-19"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output.show_output_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:08:46,680 - [BASELINE-ROUTER] - INFO - Routing to: query\n",
"2024-10-28 00:08:48,365 - [BASELINE-TEXT2SQL-WORKER] - INFO - Taking the following selected table in schema: ['votes']\n",
"2024-10-28 00:08:49,891 - [PIPELINE-UTILS] - INFO - Truncating output table to first 200 rows only\n",
"2024-10-28 00:08:49,893 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
}
],
"source": [
"output = baseline(\"/query show me all the rows in votes where creation date was in the month of march 2011\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>PostId</th>\n",
" <th>VoteTypeId</th>\n",
" <th>CreationDate</th>\n",
" <th>UserId</th>\n",
" <th>BountyAmount</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>33262</td>\n",
" <td>7672</td>\n",
" <td>2</td>\n",
" <td>2011-03-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33263</td>\n",
" <td>7648</td>\n",
" <td>2</td>\n",
" <td>2011-03-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>33264</td>\n",
" <td>7721</td>\n",
" <td>2</td>\n",
" <td>2011-03-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>33265</td>\n",
" <td>7674</td>\n",
" <td>2</td>\n",
" <td>2011-03-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>33266</td>\n",
" <td>7687</td>\n",
" <td>2</td>\n",
" <td>2011-03-01</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>195</th>\n",
" <td>33483</td>\n",
" <td>1164</td>\n",
" <td>2</td>\n",
" <td>2011-03-02</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>33484</td>\n",
" <td>1164</td>\n",
" <td>5</td>\n",
" <td>2011-03-02</td>\n",
" <td>1720.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>33485</td>\n",
" <td>5591</td>\n",
" <td>2</td>\n",
" <td>2011-03-02</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>33486</td>\n",
" <td>5591</td>\n",
" <td>5</td>\n",
" <td>2011-03-02</td>\n",
" <td>1720.0</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>33487</td>\n",
" <td>7764</td>\n",
" <td>2</td>\n",
" <td>2011-03-02</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>200 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Id PostId VoteTypeId CreationDate UserId BountyAmount\n",
"0 33262 7672 2 2011-03-01 NaN NaN\n",
"1 33263 7648 2 2011-03-01 NaN NaN\n",
"2 33264 7721 2 2011-03-01 NaN NaN\n",
"3 33265 7674 2 2011-03-01 NaN NaN\n",
"4 33266 7687 2 2011-03-01 NaN NaN\n",
".. ... ... ... ... ... ...\n",
"195 33483 1164 2 2011-03-02 NaN NaN\n",
"196 33484 1164 5 2011-03-02 1720.0 NaN\n",
"197 33485 5591 2 2011-03-02 NaN NaN\n",
"198 33486 5591 5 2011-03-02 1720.0 NaN\n",
"199 33487 7764 2 2011-03-02 NaN NaN\n",
"\n",
"[200 rows x 6 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output.show_output_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-10-28 00:10:06,403 - [BASELINE-ROUTER] - INFO - Routing to: plot\n",
"2024-10-28 00:10:06,407 - [PLOT-WORKER] - INFO - Going for generation\n",
"2024-10-28 00:10:07,197 - [PLOT-WORKER] - INFO - Plot config: {'x': 'VoteTypeId', 'y': 'CreationDate', 'plot_type': 'scatter'}\n",
"2024-10-28 00:10:07,274 - [PLOT-WORKER] - INFO - Done base64 conversion\n",
"2024-10-28 00:10:07,276 - [PIPELINE-MEMORY] - INFO - Pushed to the database\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA90AAAJOCAYAAACqS2TfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABTDElEQVR4nO3dd3RU1d7G8WfSQ0ICAQIEIQnNEAiCmCtFmnQQBESKSrVy6SoXUVqQot6roNJsF0QFBAWUJtJFmgiCdJCq9GZCaIHMef/gZl6GFJKQzSTh+1lrluScPef8Zs4k5sneZ2+bZVmWAAAAAABAlnNzdQEAAAAAAORWhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAABScejQIdlsNk2ZMsXVpSCLhYWFqUuXLq4uI8dZuXKlbDabVq5c6epSACDHIHQDwD1o27ZtatOmjUJDQ+Xj46NixYqpQYMG+vDDD42dc9q0aRo7dmyy7ceOHdOwYcO0ZcsWY+e+VVJwSHp4enqqZMmS6tSpkw4cOJAl51i7dq2GDRumv//+O1PP37x5s2w2mwYNGpRqm3379slms+nll19O93FTuw7pMWzYMKf3LbVHnTp1MnX8O9WlSxf5+/u75Nx3Ii4uTjExMXrggQfk7+8vX19fVahQQQMGDNCxY8dcUtOECRNc+semmz9PHh4eCgoKUpUqVdSnTx/t3Lkz08e9dOmShg0bxh8NANxVHq4uAABwd61du1Z169ZViRIl9Pzzz6tIkSL6888/tX79er3//vvq1auXkfNOmzZN27dvV9++fZ22Hzt2TDExMQoLC1OlSpWMnDs1vXv3VnR0tK5du6bNmzfr448/1oIFC7Rt2zaFhITc0bHXrl2rmJgYdenSRfny5cvw8x988EFFRERo+vTpGjFiRIptpk2bJkl65pln0n3c1K5DerRu3VqlS5d2fB0fH6/u3burVatWat26tWN74cKFM3zse9WBAwdUv359HTlyRE8++aReeOEFeXl56ffff9dnn32mOXPmaO/evXe9rgkTJqhgwYLJRgPUqlVLly9flpeXl/EaGjRooE6dOsmyLMXGxmrr1q36/PPPNWHCBL399tsZ+mNTkkuXLikmJkaSXPbHIQD3HkI3ANxjRo4cqcDAQG3cuDFZGDx16pRrijLg4sWL8vPzS7NNzZo11aZNG0lS165dVbZsWfXu3Vuff/65Bg4ceDfKTNPTTz+twYMHa/369apatWqy/dOnT1dERIQefPDBu1JPxYoVVbFiRcfXZ86cUffu3VWxYsUMBX/ccP36dbVu3VonT57UypUr9cgjjzjtHzlypN5+++00j3Hp0iXlyZPHZJlO3Nzc5OPjc1fOVbZs2WSfq7feekvNmzfXK6+8ooiICDVt2vSu1AIAd4Lh5QBwj9m/f7/Kly+fYu9rcHBwsm1ffvml/vGPfyhPnjzKnz+/atWqpR9//NGx/7vvvlOzZs0UEhIib29vlSpVSm+++aYSExMdberUqaMFCxbo8OHDjiGjYWFhWrlypaKjoyXdCL1J+24e1rphwwY1btxYgYGBypMnj2rXrq01a9Y41Zg07Hnnzp166qmnlD9//mQBJj0effRRSdLBgwfTbLd8+XLVrFlTfn5+ypcvnx5//HHt2rXLqZ7+/ftLksLDwx2v69ChQ5JuhNXdu3fr0qVLaZ7n6aeflvT/Pdo327Rpk/bs2eNoI93onSxfvry8vb0VEhKiHj16OA1vT+06JLl69aqGDh2q0qVLy9vbW8WLF9e//vUvXb16Nc06kxw4cEA2m01jxoxJtm/t2rWy2WyaPn264z2y2WzavXu32rZtq4CAABUoUEB9+vTRlStXkj3/yy+/VJUqVeTr66ugoCC1b99ef/75521rsixLI0aM0H333ac8efKobt262rFjx22fd+3aNQUFBalr167J9sXFxcnHx0evvvqqY9uHH36o8uXLO75PHnrooRSv282+/fZbbd26VW+88UaKn9eAgACNHDnS8XWdOnVUoUIFbdq0SbVq1VKePHn0+uuvS0r/tZs8ebIeffRRBQcHy9vbW5GRkZo4caJTm7CwMO3YsUOrVq1KdstAavd0z5o1y3F9ChYsqGeeeUZHjx51apM0/P/o0aNq2bKl/P39VahQIb366qtOPy/SUqBAAc2YMUMeHh5O701CQoKGDBmiKlWqKDAwUH5+fqpZs6ZWrFjhaHPo0CEVKlRIkhQTE+N4bcOGDXO02b17t9q0aaOgoCD5+PjooYce0vfff5+u2gAgNfR0A8A9JjQ0VOvWrdP27dtVoUKFNNvGxMRo2LBhql69uoYPHy4vLy9t2LBBy5cvV8OGDSVJU6ZMkb+/v15++WX5+/tr+fLlGjJkiOLi4vTvf/9bkvTGG28oNjZWf/31lyOQ+fv7q1y5cho+fLiGDBmiF154QTVr1pQkVa9eXdKNcNukSRNVqVJFQ4cOlZubmyM0rF69Wv/4xz+c6n3yySdVpkwZjRo1SpZlZfi92b9/v6Qbv9inZunSpWrSpIlKliypYcOG6fLly/rwww9Vo0YNbd68WWFhYWrdurX27t2r6dOna8yYMSpYsKAkOX7hHzdunGJiYrRixYo0h7iGh4erevXqmjlzpsaMGSN3d3fHvqRA99RTT0m6EWJjYmJUv359de/eXXv27NHEiRO1ceNGrVmzRp6enqleB0my2+1q0aKFfv75Z73wwgsqV66ctm3bpjFjxmjv3r2aO3fubd+/kiVLqkaNGvrqq6/Ur18/p31fffWV8ubNq8cff9xpe9u2bRUWFqbRo0dr/fr1+uCDD3T+/HlNnTrV0WbkyJEaPHiw2rZtq+eee06nT5/Whx9+qFq1aum3335Lc/j+kCFDNGLECDVt2lRNmzbV5s2b1bBhQyUkJKT5Wjw9PdWqVSvNnj1bH330kdNw6rlz5+rq1atq3769JOmTTz5R79691aZNG8cfDX7//Xdt2LDBcX1SkhTmOnbsmGYtNzt79qyaNGmi9u3b65lnnlHhwoUzdO0mTpyo8uXLq0WLFvLw8NC8efP0z3/+U3a7XT169JAkjR07Vr169ZK/v7/eeOMNSWnfMjBlyhR17dpV0dHRGj16tE6ePKn3339fa9asSXZ9EhMT1ahRIz388MP6z3/+o6VLl+rdd99VqVKl1L1793S9ByVKlFDt2rW1YsUKxcXFKSAgQHFxcfr000/VoUMHPf/887pw4YI+++wzNWrUSL/88osqVaqkQoUKaeLEicluiUgavbFjxw7VqFFDxYoV02uvvSY/Pz/NnDlTLVu21LfffqtWrVql+zoBgBMLAHBP+fHHHy13d3fL3d3dqlatmvWvf/3LWrx4sZWQkODUbt++fZabm5vVqlUrKzEx0Wmf3W53/PvSpUvJzvHiiy9aefLksa5cueLY1qxZMys0NDRZ240bN1qSrMmTJyc7R5kyZaxGjRolO194eLjVoEEDx7ahQ4dakqwOHTqk6z1YsWKFJcn673//a50+fdo6duyYtWDBAissLMyy2WzWxo0bLcuyrIMHDyarrVKlSlZwcLB19uxZx7atW7dabm5uVqdOnRzb/v3vf1uSrIMHDyY7f1K9K1asuG2t48ePtyRZixcvdmxLTEy0ihUrZlWrVs2yLMs6deqU5eXlZTVs2NDpWo0bN87xOpOkdh2++OILy83NzVq9erXT9kmTJlmSrDVr1iR7zunTpy1J1tChQx3bPvroI0uStWvXLse2hIQEq2DBglbnzp2TvQctWrRwOuY///lPS5K1detWy7Is69ChQ5a7u7s1cuRIp3bbtm2zPDw8nLZ37tzZ8vPzc3yd9L40a9bM6TP0+uuvW5Kc6knJ4sWLLUnWvHnznLY3bdrUKlmypOPrxx9/3Cpfvnyax0pJ5cqVrcDAwHS3r127tiXJmjRpktP2jFy7lL5fGzVq5PR6LMuyypcvb9WuXTtZ26TvnaTPbkJCghUcHGxVqFDBunz5sqPd/PnzLUnWkCFDHNs6d+5sSbKGDx/udMzKlStbVapUcdomyerRo0cK78INffr0cfqcXL9+3bp69apTm/Pnz1uFCxe2unXr5tiW0mc2Sb169ayoqCinn1t2u92qXr26VaZMmVRrAYDbYXg5ANxjGjRooHXr1qlFixbaunWr3nnnHTVq1EjFihVzGkY5d+5c2e12DRkyRG5uzv+7sNlsjn/7+vo6/n3hwgWdOXNGNWvW1KVLl7R79+5M17llyxbt27dPTz31lM6ePaszZ87ozJkzunjxourVq6effvpJdrvd6TkvvfRShs7RrVs3FSpUSCEhIWrWrJkuXryozz//XA899FCK7Y8fP64tW7aoS5cuCgoKcmyvWLGiGjRooIULF6brvMOGDZNlWemayKldu3by9PR0Gqq8atUqHT161DG0fOnSpUpISFDfvn2drtXzzz+vgIAALViw4LbnmTVrlsqVK6eIiAjHe33mzBnHkPubh+mmpW3btvLx8dFXX33l2LZ48WKdOXMmxfu+k3pXkyRN5Jf0Xs6ePVt2u11t27Z1qqtIkSIqU6ZMmnUlvS+9evVy+symdxK5Rx99VAULFtTXX3/t2Hb+/HktWbJE7dq1c2zLly+f/vrrL23cuDFdx00SFxenvHnzZug53t7eyYa8Z+Ta3fz9GhsbqzNnzqh27do6cOCAYmNjM1SLJP366686deqU/vnPfzrd692sWTNFRESk+Nm79fu0Zs2aGV41IGmExoULFyRJ7u7ujtEIdrtd586d0/Xr1/XQQw9p8+bNtz3euXPntHz5crVt29bxc+zMmTM6e/asGjVqpH379iUbLg8A6cXwcgC4B0VHR2v27NlKSEjQ1q1bNWfOHI0ZM0Zt2rTRli1bFBkZqf3798vNzU2RkZFpHmvHjh0aNGiQli9frri4OKd9mfklPsm+ffskSZ07d061TWxsrPLnz+/4Ojw8PEPnGDJkiGrWrCl3d3cVLFhQ5cqVk4dH6v9rPHz4sCTp/vvvT7avXLlyWrx4cbomcMuIAgUKqFGjRpozZ44mTZokHx8fTZs2TR4eHmrbtm2adXl5ealkyZKO/WnZt2+fdu3a5RgCf6v0TrKXL18+NW/eXNOmTdObb74p6cbQ8mLFijlC4M3KlCnj9HWpUqXk5ubmuP993759siwrWbsknp6eqdaS9LpvfW6hQoWcPjep8fDw0BNPPKFp06bp6tWr8vb21uzZs3Xt2jWn0D1gwAAtXbpU//jHP1S6dGk1bNhQTz31lGrUqJHm8QMCAjIcNosVK5Zs5vCMXLs1a9Zo6NChWrduXbI5BWJjYxUYGJihetL6noiIiNDPP//stM3HxydZnfnz59f58+czdN74+HhJcvqjxeeff653331Xu3fv1rVr1xzb0/Nz4Y8//pBlWRo8eLAGDx6cYptTp06pWLFiGaoTACRCNwDc07y8vBQdHa3o6GiVLVtWXbt21axZszR06NB0Pf/vv/9W7dq1FRAQoOHDh6tUqVLy8fHR5s2bNWDAgGQ90RmR9Nx///vfqS4lduuazDf34qVHVFSU6tevn6n67qZnnnlG8+fP1/z589WiRQt9++23atiwYaohKzPsdruioqL03nvvpbi/ePHi6T5Wp06dNGvWLK1du1ZRUVH6/vvv9c9//jPZiImU3NwjnVSXzWbTokWLnO5pT2J6Xe727dvro48+0qJFi9SyZUvNnDlTEREReuCBBxxtypUrpz179mj+/Pn64Ycf9O2332rChAkaMmSIY3mqlEREROi3337Tn3/+me73N6XPeHqv3f79+1WvXj1FRETovffeU/HixeXl5aWFCxdqzJgxd/T9ml4pXcPM2L59u9zd3R2B+ssvv1SXLl3UsmVL9e/fX8HBwXJ3d9fo0aMdczWkJem1v/rqq2rUqFGKbW5eLg8AMoLQDQCQJMeQ6uPHj0u60eNot9u1c+fOVEPvypUrdfbsWc2ePVu1atVybE9p9u9bw9TttpcqVUrSjd7A7BKMQ0NDJUl79uxJtm/37t0qWLCgo5c7tdeVGS1atFDevHk1bdo0eXp66vz5806zlt9cV8mSJR3bExISdPDgQaf3L633e+vWrapXr94d1964cWMVKlRIX331lR5++GFdunQp1cnC9u3b59QT+ccff8hutztmVS9VqpQsy1J4eLjKli2boTqS3pd9+/Y5vS+nT59Od89qrVq1VLRoUX399dd65JFHtHz5csfkYjfz8/NTu3bt1K5dOyUkJKh169YaOXKkBg4cmOoSW82bN9f06dP15Zdf3tESdem9dvPmzdPVq1f1/fffq0SJEo7tKQ3RT+9n4ObP3q0jGfbs2ePYn5WOHDmiVatWqVq1ao6e7m+++UYlS5bU7NmznWq/9Q+Iqb2upM+Hp6dntvl5AyD34J5uALjHrFixIsWZvZPuoU0aJtqyZUu5ublp+PDhyXrAkp6f1Gt18/ESEhI0YcKEZMf38/NLcbh5Uki9eWkrSapSpYpKlSql//znP46hpDc7ffp0qq/RlKJFi6pSpUr6/PPPnerdvn27fvzxR6c1g1N7XVL6lwxL4uvrq1atWmnhwoWaOHGi/Pz8nGYBr1+/vry8vPTBBx84XYvPPvtMsbGxatasmVNdKV2Htm3b6ujRo/rkk0+S7bt8+bIuXryYrlqlG8OyO3TooJkzZ2rKlCmKiopyWt/7ZuPHj3f6+sMPP5QkNWnSRJLUunVrubu7KyYmJtnn1rIsnT17NtU66tevL09PT3344YdOzx07dmy6X4ubm5vatGmjefPm6YsvvtD169edhpZLSlaDl5eXIiMjZVmW0zDnW7Vp00ZRUVEaOXKk1q1bl2z/hQsXUgz4t0rvtUvp+zU2NlaTJ09O9jw/P78UP7u3euihhxQcHKxJkyY5LU+2aNEi7dq1y+mzlxXOnTunDh06KDEx0em9Sem1bdiwIdn7mrSm+a2vLTg4WHXq1NFHH33k+MPjzVzx8wZA7kFPNwDcY3r16qVLly6pVatWioiIUEJCgtauXauvv/5aYWFhjkmaSpcurTfeeENvvvmmatasqdatW8vb21sbN25USEiIRo8ererVqyt//vzq3LmzevfuLZvNpi+++CLFUF+lShV9/fXXevnllxUdHS1/f381b95cpUqVUr58+TRp0iTlzZtXfn5+evjhhxUeHq5PP/1UTZo0Ufny5dW1a1cVK1ZMR48e1YoVKxQQEKB58+bd7bdP//73v9WkSRNVq1ZNzz77rGPJsMDAQKf1fqtUqSLpxnJp7du3l6enp5o3by4/P790Lxl2s2eeeUZTp07V4sWL9fTTTzvdN16oUCENHDhQMTExaty4sVq0aKE9e/ZowoQJio6OdprALLXr0LFjR82cOVMvvfSSVqxYoRo1aigxMVG7d+/WzJkztXjx4lQnmEtJp06d9MEHH2jFihV6++23U2138OBBtWjRQo0bN9a6dev05Zdf6qmnnnIM3y5VqpRGjBihgQMH6tChQ2rZsqXy5s2rgwcPas6cOXrhhRec1su+WdIa0KNHj9Zjjz2mpk2b6rffftOiRYscy7ilR7t27fThhx9q6NChioqKUrly5Zz2N2zYUEWKFFGNGjVUuHBh7dq1S+PGjVOzZs3SnCjN09NTs2fPVv369VWrVi21bdtWNWrUkKenp3bs2KFp06Ypf/78TutRpyS9165hw4by8vJS8+bN9eKLLyo+Pl6ffPKJgoODkwXNKlWqaOLEiRoxYoRKly6t4ODgFO/J9/T01Ntvv62uXbuqdu3a6tChg2PJsLCwsGRLx2XE3r179eWXX8qyLMXFxWnr1q2aNWuW4uPj9d5776lx48aOto899phmz56tVq1aqVmzZjp48KAmTZqkyMhIpz/a+fr6KjIyUl9//bXKli2roKAgVahQQRUqVND48eP1yCOPKCoqSs8//7xKliypkydPat26dfrrr7+0devWTL8WAPe4uz9hOgDAlRYtWmR169bNioiIsPz9/S0vLy+rdOnSVq9evayTJ08ma//f//7Xqly5suXt7W3lz5/fql27trVkyRLH/jVr1lhVq1a1fH19rZCQEMcSZLplSaz4+HjrqaeesvLly2dJclq26rvvvrMiIyMtDw+PZEt0/fbbb1br1q2tAgUKWN7e3lZoaKjVtm1ba9myZY42SctPnT59Ol3vQdKyR7NmzUqzXUpLhlmWZS1dutSqUaOG5evrawUEBFjNmze3du7cmez5b775plWsWDHLzc3NafmwjCwZluT69etW0aJFLUnWwoULU2wzbtw4KyIiwvL09LQKFy5sde/e3Tp//rxTm7SuQ0JCgvX2229b5cuXd1zvKlWqWDExMVZsbGyy86W1/JJl3Vh2ys3Nzfrrr7+S7Ut6D3bu3Gm1adPGyps3r5U/f36rZ8+eTktPJfn222+tRx55xPLz87P8/PysiIgIq0ePHtaePXscbW5dMsyybiyvFhMTYxUtWtTy9fW16tSpY23fvt0KDQ297ZJhSex2u1W8eHFLkjVixIhk+z/66COrVq1ajs9oqVKlrP79+6f4nqXk/Pnz1pAhQ6yoqCgrT548lo+Pj1WhQgVr4MCB1vHjxx3tateunerSZOm9dt9//71VsWJFy8fHxwoLC7Pefvtt67///W+y5e1OnDhhNWvWzMqbN68lybF82K1LhiX5+uuvHT8ngoKCrKeffjrZdU/p+ljW/38WbibJ8XBzc7Py5ctnVa5c2erTp4+1Y8eOZMew2+3WqFGjrNDQUMvb29uqXLmyNX/+fKtz587Jlshbu3atVaVKFcvLyyvZ53f//v1Wp06drCJFilienp5WsWLFrMcee8z65ptvUnzfASA9bJaVQncEAADAHapcubKCgoK0bNmyZPuGDRummJgYnT59OkO9zgAA5DTc0w0AALLcr7/+qi1btqhTp06uLgUAAJfinm4AAJBltm/frk2bNundd99V0aJFk006BgDAvYaebgAAkGW++eYbde3aVdeuXdP06dNTXS4LAIB7Bfd0AwAAAABgCD3dAAAAAAAYQugGAAAAAMAQJlLLpex2u44dO6a8efPKZrO5uhwAAAAAyFUsy9KFCxcUEhIiN7fU+7MJ3bnUsWPHVLx4cVeXAQAAAAC52p9//qn77rsv1f2E7lwqb968km58AAICAlxcDQAAAADkLnFxcSpevLgje6WG0J1LJQ0pDwgIIHQDAAAAgCG3u52XidQAAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQD1cXAOR05+IT1P7jtTp1IUHBeb0044XqCvL3cnVZyKUuJyRq1MKdOnT2ksIK5NHrTSPl6+Xu6rIAAACQCpf2dI8ePVrR0dHKmzevgoOD1bJlS+3Zs8epzZUrV9SjRw8VKFBA/v7+euKJJ3Ty5EmnNr1791aVKlXk7e2tSpUqJTvPlStX1KVLF0VFRcnDw0MtW7ZMd42zZs1SRESEfHx8FBUVpYULFzrtHzZsmCIiIuTn56f8+fOrfv362rBhw22Pe+TIETVr1kx58uRRcHCw+vfvr+vXrzv2z549Ww0aNFChQoUUEBCgatWqafHixemuG3dH9IglenDEEu09dVF/X76mvacu6sERSxQ9YomrS0Mu9PzUjSo35Ad9sf6IVu87oy/WH1G5IT/o+akbXV0aAAAAUuHS0L1q1Sr16NFD69ev15IlS3Tt2jU1bNhQFy9edLTp16+f5s2bp1mzZmnVqlU6duyYWrdunexY3bp1U7t27VI8T2Jionx9fdW7d2/Vr18/3fWtXbtWHTp00LPPPqvffvtNLVu2VMuWLbV9+3ZHm7Jly2rcuHHatm2bfv75Z4WFhalhw4Y6ffp0qsdNTExUs2bNlJCQoLVr1+rzzz/XlClTNGTIEEebn376SQ0aNNDChQu1adMm1a1bV82bN9dvv/2W7vphVvSIJTodn5DivtPxCQRvZKnnp27Ukp2nUty3ZOcpgjcAAEA2ZbMsy3J1EUlOnz6t4OBgrVq1SrVq1VJsbKwKFSqkadOmqU2bNpKk3bt3q1y5clq3bp2qVq3q9Pxhw4Zp7ty52rJlS6rn6NKli/7++2/NnTv3tvW0a9dOFy9e1Pz58x3bqlatqkqVKmnSpEkpPicuLk6BgYFaunSp6tWrl2KbRYsW6bHHHtOxY8dUuHBhSdKkSZM0YMAAnT59Wl5eKQ9NLl++vNq1a+cUzlOTVEdsbKwCAgJu2x4Zcy4+QQ+mI1RvHtSAoea4Y5cTElVuyA+3bbdreGOGmgMAANwl6c1c2WoitdjYWElSUFCQJGnTpk26du2aU+90RESESpQooXXr1hmvZ926dcl6xhs1apTquRMSEvTxxx8rMDBQDzzwQJrHjYqKcgTupOPGxcVpx44dKT7HbrfrwoULjvfmVlevXlVcXJzTA+a0/3htlrYD0jJq4c4sbQcAAIC7J9uEbrvdrr59+6pGjRqqUKGCJOnEiRPy8vJSvnz5nNoWLlxYJ06cMF7TiRMnnIJxaueeP3++/P395ePjozFjxmjJkiUqWLBgho+btC8l//nPfxQfH6+2bdumuH/06NEKDAx0PIoXL37b14fMO3Uh5WHlmW0HpOXQ2UtZ2g4AAAB3T7YJ3T169ND27ds1Y8aMu37uI0eOyN/f3/EYNWpUhp5ft25dbdmyRWvXrlXjxo3Vtm1bnTp1497LJk2aOI5bvnz5TNU3bdo0xcTEaObMmQoODk6xzcCBAxUbG+t4/Pnnn5k6F9InOG/6hoyntx2QlrACebK0HQAAAO6ebLFkWM+ePTV//nz99NNPuu+++xzbixQpooSEBP39999Ovd0nT55UkSJFsuz8ISEhTveBJw3hLlKkSLKZ0lM6t5+fn0qXLq3SpUuratWqKlOmjD777DMNHDhQn376qS5fvixJ8vT0dBz3l19+SXbcpH03mzFjhp577jnNmjUrzUngvL295e3tnYFXjTsx44Xq6bqne8YL1e9CNcjtXm8aqS/WH0lXOwAAAGQvLu3ptixLPXv21Jw5c7R8+XKFh4c77a9SpYo8PT21bNkyx7Y9e/boyJEjqlatWpbV4eHh4QjNpUuXdoTuatWqOZ1bkpYsWXLbc9vtdl29elWSVKxYMcdxQ0NDHcfdtm2bozc86bgBAQGKjPz/X5qnT5+url27avr06WrWrFmWvFZkjSB/LxW6zQRphfy9mEQNWcLXy10NIlMe5ZKkQWQwk6gBAABkQy7t6e7Ro4emTZum7777Tnnz5nXczxwYGChfX18FBgbq2Wef1csvv6ygoCAFBASoV69eqlatmtPM5X/88Yfi4+N14sQJXb582dFrHRkZ6ZgJfOfOnUpISNC5c+d04cIFR5uU1vVO0qdPH9WuXVvvvvuumjVrphkzZujXX3/Vxx9/LEm6ePGiRo4cqRYtWqho0aI6c+aMxo8fr6NHj+rJJ59M9bgNGzZUZGSkOnbsqHfeeUcnTpzQoEGD1KNHD0dv9bRp09S5c2e9//77evjhhx3vTdL7AtfbOKhBqsuGFfL30sZBDVxQFXKrTzpFp7psWIPIYH3SKdoFVQEAAOB2XLpkmM1mS3H75MmT1aVLF0nSlStX9Morr2j69Om6evWqGjVqpAkTJjgNw65Tp45WrVqV7DgHDx5UWFiYJCksLEyHDx9O1uZ2L3/WrFkaNGiQDh06pDJlyuidd95R06ZNHbU99dRT2rBhg86cOaMCBQooOjpagwYNUnR02r8AHz58WN27d9fKlSvl5+enzp0766233pKHh0ear6lz586aMmVKmseWWDLsbjoXn6D2H6/VqQsJCs7rpRkvVKeHG8ZcTkjUqIU7dejsJYUVyKPXm0bSww0AAOAC6c1c2WqdbmQdQjcAAAAAmJMj1+kGAAAAACA3IXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDMh269+/fr0GDBqlDhw46deqUJGnRokXasWNHlhUHAAAAAEBOlqnQvWrVKkVFRWnDhg2aPXu24uPjJUlbt27V0KFDs7RAAAAAAAByqkyF7tdee00jRozQkiVL5OXl5dj+6KOPav369VlWHAAAAAAAOVmmQve2bdvUqlWrZNuDg4N15syZOy4KAAAAAIDcIFOhO1++fDp+/Hiy7b/99puKFSt2x0UBAAAAAJAbZCp0t2/fXgMGDNCJEydks9lkt9u1Zs0avfrqq+rUqVNW1wgAAAAAQI6UqdA9atQoRUREqHjx4oqPj1dkZKRq1aql6tWra9CgQVldIwAAAAAAOZLNsiwrs0/+888/tW3bNsXHx6ty5coqU6ZMVtaGOxAXF6fAwEDFxsYqICDA1eUAAAAAQK6S3syVqZ7u4cOH69KlSypevLiaNm2qtm3bqkyZMrp8+bKGDx+e6aIBAAAAAMhNMtXT7e7uruPHjys4ONhp+9mzZxUcHKzExMQsKxCZQ083AAAAAJhjtKfbsizZbLZk27du3aqgoKDMHBIAAAAAgFzHIyON8+fPL5vNJpvNprJlyzoF78TERMXHx+ull17K8iIBAAAAAMiJMhS6x44dK8uy1K1bN8XExCgwMNCxz8vLS2FhYapWrVqWFwkAAAAAQE6UodDduXNnSVJ4eLiqV68uT09PI0UBAAAAAJAbZCh0J6ldu7bj31euXFFCQoLTfibuAgAAAAAgkxOpXbp0ST179lRwcLD8/PyUP39+pwcAAAAAAMhk6O7fv7+WL1+uiRMnytvbW59++qliYmIUEhKiqVOnZnWNAAAAAADkSJkaXj5v3jxNnTpVderUUdeuXVWzZk2VLl1aoaGh+uqrr/T0009ndZ0AAAAAAOQ4merpPnfunEqWLCnpxv3b586dkyQ98sgj+umnn7KuOgAAAAAAcrBMhe6SJUvq4MGDkqSIiAjNnDlT0o0e8Hz58mVZcQAAAAAA5GSZCt1du3bV1q1bJUmvvfaaxo8fLx8fH/Xr10/9+/fP0gIBAAAAAMipbJZlWXd6kMOHD2vTpk0qXbq0KlasmBV14Q7FxcUpMDBQsbGxLOEGAAAAAFksvZkrUxOp3So0NFShoaFZcSgAAAAAAHKNDIduu92uKVOmaPbs2Tp06JBsNpvCw8PVpk0bdezYUTabzUSdAAAAAADkOBm6p9uyLLVo0ULPPfecjh49qqioKJUvX16HDx9Wly5d1KpVK1N1AgAAAACQ42Sop3vKlCn66aeftGzZMtWtW9dp3/Lly9WyZUtNnTpVnTp1ytIiAQAAAADIiTLU0z19+nS9/vrryQK3JD366KN67bXX9NVXX2VZcQAAAAAA5GQZCt2///67GjdunOr+Jk2aOJYSAwAAAADgXpeh0H3u3DkVLlw41f2FCxfW+fPn77goAAAAAABygwyF7sTERHl4pH4buLu7u65fv37HRQEAAAAAkBtkaCI1y7LUpUsXeXt7p7j/6tWrWVIUAAAAAAC5QYZCd+fOnW/bhpnLAQAAAAC4IUOhe/LkyabqAAAAAAAg18nQPd0AAAAAACD9MtTTneTixYt66623tGzZMp06dUp2u91p/4EDB7KkOAAAAAAAcrJMhe7nnntOq1atUseOHVW0aFHZbLasrgsAAAAAgBwvU6F70aJFWrBggWrUqJHV9QAAAAAAkGtk6p7u/PnzKygoKKtrAQAAAAAgV8lU6H7zzTc1ZMgQXbp0KavrAQAAAAAg18jU8PJ3331X+/fvV+HChRUWFiZPT0+n/Zs3b86S4gAAAAAAyMkyFbpbtmyZxWUAAAAAAJD72CzLslxdBLJeXFycAgMDFRsbq4CAAFeXAwAAAAC5SnozV6Z6upNs2rRJu3btkiSVL19elStXvpPDAQAAAACQq2QqdJ86dUrt27fXypUrlS9fPknS33//rbp162rGjBkqVKhQVtYIAAAAAECOlKnZy3v16qULFy5ox44dOnfunM6dO6ft27crLi5OvXv3zuoaAQAAAADIkTJ1T3dgYKCWLl2q6Ohop+2//PKLGjZsqL///jur6kMmcU83AAAAAJiT3syVqZ5uu92ebJkwSfL09JTdbs/MIQEAAAAAyHUyFbofffRR9enTR8eOHXNsO3r0qPr166d69eplWXEAAAAAAORkmQrd48aNU1xcnMLCwlSqVCmVKlVK4eHhiouL04cffpjVNQIAAAAAkCNlavby4sWLa/PmzVq6dKl2794tSSpXrpzq16+fpcUBAAAAAJCTZWoiNWR/TKQGAAAAAOakN3Olu6f7gw8+0AsvvCAfHx998MEHabZl2TAAAAAAADLQ0x0eHq5ff/1VBQoUUHh4eOoHtNl04MCBLCsQmUNPNwAAAACYk+U93QcPHkzx3wAAAAAAIGWZmr18+PDhunTpUrLtly9f1vDhw++4KAAAAAAAcoNMTaTm7u6u48ePKzg42Gn72bNnFRwcrMTExCwrEJnD8HIAAAAAMCe9mStTPd2WZclmsyXbvnXrVgUFBWXmkAAAAAAA5DoZWqc7f/78stlsstlsKlu2rFPwTkxMVHx8vF566aUsLxIAAAAAgJwoQ6F77NixsixL3bp1U0xMjAIDAx37vLy8FBYWpmrVqmV5kQAAAAAA5EQZCt2dO3eWdGP5sOrVq8vT09NIUQAAAAAA5AYZCt1Jateu7fj3lStXlJCQ4LSfibsAAAAAAMjkRGqXLl1Sz549FRwcLD8/P+XPn9/pAQAAAAAAMhm6+/fvr+XLl2vixIny9vbWp59+qpiYGIWEhGjq1KlZXSMAAAAAADlSpoaXz5s3T1OnTlWdOnXUtWtX1axZU6VLl1ZoaKi++uorPf3001ldJwAAAAAAOU6merrPnTunkiVLSrpx//a5c+ckSY888oh++umnrKsOAAAAAIAcLFOhu2TJkjp48KAkKSIiQjNnzpR0owc8X758WVYcAAAAAAA5WaZCd9euXbV161ZJ0muvvabx48fLx8dH/fr1U//+/bO0QAAAAAAAciqbZVnWnR7k8OHD2rRpk0qXLq2KFStmRV24Q3FxcQoMDFRsbCxLuAEAAABAFktv5srURGo3u3LlikJDQxUaGnqnhwIAAAAAIFfJ1PDyxMREvfnmmypWrJj8/f114MABSdLgwYP12WefZWmBAAAAAADkVJkK3SNHjtSUKVP0zjvvyMvLy7G9QoUK+vTTT7OsOAAAAAAAcrJMhe6pU6fq448/1tNPPy13d3fH9gceeEC7d+/OsuIAAAAAAMjJMhW6jx49qtKlSyfbbrfbde3atTsuCgAAAACA3CBToTsyMlKrV69Otv2bb75R5cqV77goAAAAAAByg0zNXj5kyBB17txZR48eld1u1+zZs7Vnzx5NnTpV8+fPz+oaAQAAAADIkTLV0/34449r3rx5Wrp0qfz8/DRkyBDt2rVL8+bNU4MGDbK6RgAAAAAAcqQM93Rfv35do0aNUrdu3bRkyRITNQEAAAAAkCtkuKfbw8ND77zzjq5fv26iHgAAAAAAco1MDS+vV6+eVq1aldW1AAAAAACQq2RqIrUmTZrotdde07Zt21SlShX5+fk57W/RokWWFAcAAAAAQE5msyzLyuiT3NxS7yC32WxKTEy8o6Jw5+Li4hQYGKjY2FgFBAS4uhwAAAAAyFXSm7ky1dNtt9szXRgAAAAAAPeKDN3TvXz5ckVGRiouLi7ZvtjYWJUvX16rV6/OsuIAAAAAAMjJMhS6x44dq+effz7FrvPAwEC9+OKLeu+997KsOAAAAAAAcrIMhe6tW7eqcePGqe5v2LChNm3adMdFAQAAAACQG2QodJ88eVKenp6p7vfw8NDp06fvuCgAAAAAAHKDDIXuYsWKafv27anu//3331W0aNE7LgoAAAAAgNwgQ6G7adOmGjx4sK5cuZJs3+XLlzV06FA99thjWVYcAAAAAAA5WYbW6T558qQefPBBubu7q2fPnrr//vslSbt379b48eOVmJiozZs3q3DhwsYKRvqwTjcAAAAAmGNkne7ChQtr7dq16t69uwYOHKikvG6z2dSoUSONHz+ewA0AAAAAwP9kKHRLUmhoqBYuXKjz58/rjz/+kGVZKlOmjPLnz2+iPgAAAAAAcqwMh+4k+fPnV3R0dFbWAgAAAABArpKhidQAAAAAAED6EboBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAAAAAAAMIXQDAAAAAGAIoRsAAAAAAEMI3QAAAAAAGELoBgAAAADAEEI3AAAAAACGELoBAAAAADCE0A0AAAAAgCGEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBCXhu7Ro0crOjpaefPmVXBwsFq2bKk9e/Y4tbly5Yp69OihAgUKyN/fX0888YROnjzp1KZ3796qUqWKvL29ValSpWTnuXLlirp06aKoqCh5eHioZcuW6a5x1qxZioiIkI+Pj6KiorRw4UKn/cOGDVNERIT8/PyUP39+1a9fXxs2bLjtcY8cOaJmzZopT548Cg4OVv/+/XX9+nXH/uPHj+upp55S2bJl5ebmpr59+6a75uwu4bpdn60+oCHfbddnqw8o4brd1SXdkbDXFiR7AKas33vW6bO2fu9ZV5cEAACyiUS7pXX7z+q7LUe1bv9ZJdotV5d0Ry4nJGrw3G3q+NkGDZ67TZcTEl1dUqZ4uPLkq1atUo8ePRQdHa3r16/r9ddfV8OGDbVz5075+flJkvr166cFCxZo1qxZCgwMVM+ePdW6dWutWbPG6VjdunXThg0b9Pvvvyc7T2Jionx9fdW7d299++236a5v7dq16tChg0aPHq3HHntM06ZNU8uWLbV582ZVqFBBklS2bFmNGzdOJUuW1OXLlzVmzBg1bNhQf/zxhwoVKpTicRMTE9WsWTMVKVJEa9eu1fHjx9WpUyd5enpq1KhRkqSrV6+qUKFCGjRokMaMGZPumrO70Qt36pPVB3Xz9//Ihbv0fM1wDWwa6brCMim1gB322gIdeqvZXa4GuV1Kn7f2/10vSXzeAAC4x/2w/bhi5u3U8dgrjm1FA300tHmkGlco6sLKMuf5qRu1ZOcpx9er90lfrD+iBpHB+qRTtAsryzibZVnZ5s8fp0+fVnBwsFatWqVatWopNjZWhQoV0rRp09SmTRtJ0u7du1WuXDmtW7dOVatWdXr+sGHDNHfuXG3ZsiXVc3Tp0kV///235s6de9t62rVrp4sXL2r+/PmObVWrVlWlSpU0adKkFJ8TFxenwMBALV26VPXq1UuxzaJFi/TYY4/p2LFjKly4sCRp0qRJGjBggE6fPi0vLy+n9nXq1FGlSpU0duzY29Z8ax2xsbEKCAhI9/NMGr1wpz766WCq+1+slbOCd3p6tAlCyCp83gAAQGp+2H5c3b/crFuDne1//534zIM5KnjfGrhvlV2Cd3ozV7a6pzs2NlaSFBQUJEnatGmTrl27pvr16zvaREREqESJElq3bp3xetatW+d0bklq1KhRqudOSEjQxx9/rMDAQD3wwANpHjcqKsoRuJOOGxcXpx07dmRN8dlMwnW7PlmdeuCWpE9WH8wxQ83TO4ScoebICukdQs5QcwAA7j2Jdksx83YmC9ySHNti5u3MMUPNLyckphm4JWnJzlM5aqh5tgnddrtdffv2VY0aNRxDt0+cOCEvLy/ly5fPqW3hwoV14sQJ4zWdOHHCKRindu758+fL399fPj4+GjNmjJYsWaKCBQtm+LhJ+zLj6tWriouLc3pkJ1+sO6TbfZ/brRvtADhLGkKeVe0AAEDu8cvBc05Dym9lSToee0W/HDx394q6A6MW7szSdtlBtgndPXr00Pbt2zVjxoy7fu4jR47I39/f8Ui6rzq96tatqy1btmjt2rVq3Lix2rZtq1Onbvx1pkmTJo7jli9f3kT5km5MShcYGOh4FC9e3Ni5MuPwuUtZ2g4AAACAdOpC6oE7M+1c7dDZ9OWB9LbLDlw6kVqSnj17av78+frpp5903333ObYXKVJECQkJ+vvvv516u0+ePKkiRYpk2flDQkKc7gNPGt5epEiRZDOlp3RuPz8/lS5dWqVLl1bVqlVVpkwZffbZZxo4cKA+/fRTXb58WZLk6enpOO4vv/yS7LhJ+zJj4MCBevnllx1fx8XFZavgHRqUJ0vbAQAAAJCC8/pkaTtXCyuQR6v3pa9dTuHSnm7LstSzZ0/NmTNHy5cvV3h4uNP+KlWqyNPTU8uWLXNs27Nnj44cOaJq1aplWR0eHh6O0Fy6dGlH6K5WrZrTuSVpyZIltz233W7X1atXJUnFihVzHDc0NNRx3G3btjl6w5OOGxAQoMjIzE0k5u3trYCAAKdHdtKxWpjcbGm3cbPdaAfA2YxuVW/fKAPtAABA7vGP8CAVDfRRar9q23RjFvN/hAfdzbIy7fV0Tqyc3nbZgUtDd48ePfTll19q2rRpyps3r06cOKETJ044eoYDAwP17LPP6uWXX9aKFSu0adMmde3aVdWqVXOaufyPP/7Qli1bHM/dsmWLtmzZooSEBEebnTt3asuWLTp37pxiY2MdbdLSp08f/fDDD3r33Xe1e/duDRs2TL/++qt69uwpSbp48aJef/11rV+/XocPH9amTZvUrVs3HT16VE8++WSqx23YsKEiIyPVsWNHbd26VYsXL9agQYPUo0cPeXt7O9ol1RgfH6/Tp09ry5Yt2rkz59y7cDMvDzc9XzM8zTbP1wyXl0e2ueMhTemdJZrZpJEVqpYtkKXtAABA7uHuZtPQ5jcC6K3BO+nroc0j5X67HrBswtfLXQ0ig9Ns0yAyWL5e7nepojvn0iXDbLaUL/zkyZPVpUsXSdKVK1f0yiuvaPr06bp69aoaNWqkCRMmOA3DrlOnjlatWpXsOAcPHlRYWJgkKSwsTIcPH07W5nYvf9asWRo0aJAOHTqkMmXK6J133lHTpk0dtT311FPasGGDzpw5owIFCig6OlqDBg1SdHTaU9gfPnxY3bt318qVK+Xn56fOnTvrrbfekofH/4/4T+n9CQ0N1aFDh9I8tpQ9lwyTUl6n282mXLdOt0TgRtbj8wYAAFKT29fpTpJdlguT0p+5stU63cg62TV0SzeWD/ti3SEdPndJoUF51LFaWI7p4U5JSkGIAART1u896zRL+YxuVenhBgAAkm4sH/bLwXM6deGKgvPeGFKeU3q4U3I5IVGjFu7UobOXFFYgj15vGpmtergJ3fe47By6AQAAACCnS2/myrndiwAAAAAAZHOEbgAAAAAADCF0AwAAAABgCKEbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCNwAAAAAAhhC6AQAAAAAwhNANAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIZ4uLoAmGFZliQpLi7OxZUAAAAAQO6TlLWSsldqCN251IULFyRJxYsXd3ElAAAAAJB7XbhwQYGBganut1m3i+XIkex2u44dO6a8efPKZrO5upxcLy4uTsWLF9eff/6pgIAAV5eDW3B9sjeuT/bHNcreuD7ZG9cn++MaZW/Z+fpYlqULFy4oJCREbm6p37lNT3cu5ebmpvvuu8/VZdxzAgICst0PA/w/rk/2xvXJ/rhG2RvXJ3vj+mR/XKPsLbten7R6uJMwkRoAAAAAAIYQugEAAAAAMITQDWQBb29vDR06VN7e3q4uBSng+mRvXJ/sj2uUvXF9sjeuT/bHNcrecsP1YSI1AAAAAAAMoacbAAAAAABDCN0AAAAAABhC6AYAAAAAwBBCN5BJo0ePVnR0tPLmzavg4GC1bNlSe/bscXVZSMVbb70lm82mvn37uroU3OTo0aN65plnVKBAAfn6+ioqKkq//vqrq8uCpMTERA0ePFjh4eHy9fVVqVKl9Oabb4qpYFznp59+UvPmzRUSEiKbzaa5c+c67bcsS0OGDFHRokXl6+ur+vXra9++fa4p9h6U1vW5du2aBgwYoKioKPn5+SkkJESdOnXSsWPHXFfwPeZ23z83e+mll2Sz2TR27Ni7Vh/Sd4127dqlFi1aKDAwUH5+foqOjtaRI0fufrEZROgGMmnVqlXq0aOH1q9fryVLlujatWtq2LChLl686OrScIuNGzfqo48+UsWKFV1dCm5y/vx51ahRQ56enlq0aJF27typd999V/nz53d1aZD09ttva+LEiRo3bpx27dqlt99+W++8844+/PBDV5d2z7p48aIeeOABjR8/PsX977zzjj744ANNmjRJGzZskJ+fnxo1aqQrV67c5UrvTWldn0uXLmnz5s0aPHiwNm/erNmzZ2vPnj1q0aKFCyq9N93u+yfJnDlztH79eoWEhNylypDkdtdo//79euSRRxQREaGVK1fq999/1+DBg+Xj43OXK804Zi8Hssjp06cVHBysVatWqVatWq4uB/8THx+vBx98UBMmTNCIESNUqVIl/nKdTbz22mtas2aNVq9e7epSkILHHntMhQsX1meffebY9sQTT8jX11dffvmlCyuDJNlsNs2ZM0ctW7aUdKOXOyQkRK+88opeffVVSVJsbKwKFy6sKVOmqH379i6s9t5z6/VJycaNG/WPf/xDhw8fVokSJe5ecUj1+hw9elQPP/ywFi9erGbNmqlv376MkHORlK5R+/bt5enpqS+++MJ1hWUSPd1AFomNjZUkBQUFubgS3KxHjx5q1qyZ6tev7+pScIvvv/9eDz30kJ588kkFBwercuXK+uSTT1xdFv6nevXqWrZsmfbu3StJ2rp1q37++Wc1adLExZUhJQcPHtSJEyecftYFBgbq4Ycf1rp161xYGVITGxsrm82mfPnyuboUSLLb7erYsaP69++v8uXLu7oc3MJut2vBggUqW7asGjVqpODgYD388MNp3iaQnRC6gSxgt9vVt29f1ahRQxUqVHB1OfifGTNmaPPmzRo9erSrS0EKDhw4oIkTJ6pMmTJavHixunfvrt69e+vzzz93dWnQjZEI7du3V0REhDw9PVW5cmX17dtXTz/9tKtLQwpOnDghSSpcuLDT9sKFCzv2Ifu4cuWKBgwYoA4dOiggIMDV5UA3bqnx8PBQ7969XV0KUnDq1CnFx8frrbfeUuPGjfXjjz+qVatWat26tVatWuXq8m7Lw9UFALlBjx49tH37dv3888+uLgX/8+eff6pPnz5asmRJjrjX515kt9v10EMPadSoUZKkypUra/v27Zo0aZI6d+7s4uowc+ZMffXVV5o2bZrKly+vLVu2qG/fvgoJCeH6AHfg2rVratu2rSzL0sSJE11dDiRt2rRJ77//vjZv3iybzebqcpACu90uSXr88cfVr18/SVKlSpW0du1aTZo0SbVr13ZlebdFTzdwh3r27Kn58+drxYoVuu+++1xdDv5n06ZNOnXqlB588EF5eHjIw8NDq1at0gcffCAPDw8lJia6usR7XtGiRRUZGem0rVy5cjliFtJ7Qf/+/R293VFRUerYsaP69evHyJFsqkiRIpKkkydPOm0/efKkYx9cLylwHz58WEuWLKGXO5tYvXq1Tp06pRIlSjh+Zzh8+LBeeeUVhYWFubo8SCpYsKA8PDxy7O8N9HQDmWRZlnr16qU5c+Zo5cqVCg8Pd3VJuEm9evW0bds2p21du3ZVRESEBgwYIHd3dxdVhiQ1atRItsze3r17FRoa6qKKcLNLly7Jzc35b/Pu7u6O3gZkL+Hh4SpSpIiWLVumSpUqSZLi4uK0YcMGde/e3bXFQdL/B+59+/ZpxYoVKlCggKtLwv907Ngx2dwvjRo1UseOHdW1a1cXVYWbeXl5KTo6Osf+3kDoBjKpR48emjZtmr777jvlzZvXcc9cYGCgfH19XVwd8ubNm+z+ej8/PxUoUID77rOJfv36qXr16ho1apTatm2rX375RR9//LE+/vhjV5cGSc2bN9fIkSNVokQJlS9fXr/99pvee+89devWzdWl3bPi4+P1xx9/OL4+ePCgtmzZoqCgIJUoUUJ9+/bViBEjVKZMGYWHh2vw4MEKCQlJcwZtZJ20rk/RokXVpk0bbd68WfPnz1diYqLj94agoCB5eXm5qux7xu2+f279I4inp6eKFCmi+++//26Xes+63TXq37+/2rVrp1q1aqlu3br64YcfNG/ePK1cudJ1RaeXBSBTJKX4mDx5sqtLQypq165t9enTx9Vl4Cbz5s2zKlSoYHl7e1sRERHWxx9/7OqS8D9xcXFWnz59rBIlSlg+Pj5WyZIlrTfeeMO6evWqq0u7Z61YsSLF/+907tzZsizLstvt1uDBg63ChQtb3t7eVr169aw9e/a4tuh7SFrX5+DBg6n+3rBixQpXl35PuN33z61CQ0OtMWPG3NUa73XpuUafffaZVbp0acvHx8d64IEHrLlz57qu4AxgnW4AAAAAAAxhIjUAAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIYRuAACAW4SFhWns2LGuLgMAkAsQugEAuAc1b95cjRs3TnHf6tWrZbPZ9Pvvv6d5jIwG00OHDslms6X5mDJlSgZeRebUqVNHffv2NX4eAAAkycPVBQAAgLvv2Wef1RNPPKG//vpL9913n9O+yZMn66GHHlLFihWz9JzFixfX8ePHHV//5z//0Q8//KClS5c6tgUGBmbpOQEAcDV6ugEAuAc99thjKlSoULKe5fj4eM2aNUvPPvusvv32W5UvX17e3t4KCwvTu+++62hXp04dHT58WP369XP0Uif5+eefVbNmTfn6+qp48eLq3bu3Ll68KHd3dxUpUsTx8Pf3l4eHh4oUKaIrV64oJCREO3bscKpn7NixCg0Nld1u18qVK2Wz2bRgwQJVrFhRPj4+qlq1qrZv3+70nNTOn5pTp06pefPm8vX1VXh4uL766qs7eGcBAHBG6AYA4B7k4eGhTp06acqUKbIsy7F91qxZSkxMVLly5dS2bVu1b99e27Zt07BhwzR48GBHSJ89e7buu+8+DR8+XMePH3f0YO/fv1+NGzfWE088od9//11ff/21fv75Z/Xs2TPNesLCwlS/fn1NnjzZafvkyZPVpUsXubn9/68s/fv317vvvquNGzeqUKFCat68ua5du5bp83fp0kV//vmnVqxYoW+++UYTJkzQqVOnMvR+AgCQGpt18/9pAQDAPWP37t0qV66cVqxYoTp16kiSatWq5ehZPn36tH788UdH+3/9619asGCBozc6LCxMffv2dbo/+rnnnpO7u7s++ugjx7aff/5ZtWvX1sWLF+Xj4+PYPmzYMM2dO1dbtmyRJM2cOVMvvfSSjh8/Lm9vb23evFkPPfSQDhw4oLCwMK1cuVJ169bVjBkz1K5dO0nSuXPndN9992nKlClq27Ztus5fp04dVapUSWPHjtXevXt1//3365dfflF0dLTT+zJmzBju/QYA3DF6ugEAuEdFRESoevXq+u9//ytJ+uOPP7R69Wo9++yz2rVrl2rUqOHUvkaNGtq3b58SExNTPebWrVs1ZcoU+fv7Ox6NGjWS3W7XwYMH06ynZcuWcnd315w5cyRJU6ZMUd26dRUWFubUrlq1ao5/BwUF6f7779euXbsydf5du3bJw8NDVapUcXpf8uXLl2atAACkFxOpAQBwD3v22WfVq1cvjR8/XpMnT1apUqVUu3btTB8vPj5eL774onr37p1sX4kSJdJ8rpeXlzp16qTJkyerdevWmjZtmt5///27dn4AAEwgdAMAcA9r27at+vTpo2nTpmnq1Knq3r27bDabypUrpzVr1ji1XbNmjcqWLSt3d3dJN0Lyrb3eDz74oHbu3KnSpUtnqp7nnntOFSpU0IQJE3T9+nW1bt06WZv169c7AvT58+e1d+9elStXLlPnj4iI0PXr17Vp0ybH8PI9e/bo77//zlT9AADciuHlAADcw/z9/dWuXTsNHDhQx48fV5cuXSRJr7zyipYtW6Y333xTe/fu1eeff65x48bp1VdfdTw3LCxMP/30k44ePaozZ85IkgYMGKC1a9eqZ8+e2rJli/bt26fvvvvuthOpJSlXrpyqVq2qAQMGqEOHDvL19U3WZvjw4Vq2bJm2b9+uLl26qGDBgmrZsmWmzn///fercePGevHFF7VhwwZt2rRJzz33XIrnBQAgMwjdAADc45599lmdP39ejRo1UkhIiKQbPcYzZ87UjBkzVKFCBQ0ZMkTDhw93hHLpRvg9dOiQSpUqpUKFCkmSKlasqFWrVmnv3r2qWbOmKleurCFDhjiOm956EhIS1K1btxT3v/XWW+rTp4+qVKmiEydOaN68efLy8sr0+SdPnqyQkBDVrl1brVu31gsvvKDg4OB01wsAQFqYvRwAAGQrb775pmbNmqXff//daXvS7OXnz59nojMAQI5BTzcAAMgW4uPjtX37do0bN069evVydTkAAGQJQjcAAMgWevbsqSpVqqhOnTqpDi0HACCnYXg5AAAAAACG0NMNAAAAAIAhhG4AAAAAAAwhdAMAAAAAYAihGwAAAAAAQwjdAAAAAAAYQugGAAAAAMAQQjcAAAAAAIYQugEAAAAAMITQDQAAAACAIf8Hn3rI71df6W8AAAAASUVORK5CYII=",
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Now let's plot a time series between creation date and vote type id\n",
"\n",
"plot = baseline(question=\"/plot a relation between vote type and creation date in scatter\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Super cool right? Remmember, all of these things are done locally in a single MacBook M3 Pro. We are loading two 1B parameter model and seeing the magic at the same levels of GPT-3.5 and so on. However PremSQL also supports closed models too. So if your data is not sensitive then you can surely go for those models as well. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customization over PremSQL Agents\n",
"\n",
"You can customize lot of things in PremSQL. For starters, you can put any type of generators in PremSQL. Here we are\n",
"using MLX. You can use huggingface or Prem AI SDK (which provides different models) or other APIs as well. You can also build your own worker from scratch. As you have seen here, that we are using MatplotLib tool, you can also make your seaboarn / Plotly tool for the same thing for more interactive visualization. \n",
"\n",
"You can put as many number of arguments in your custom agent constructors and workers. As long as it adheres with the output schema, you can enjoy other functionalities like AgentServer and Playground. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A note about PremSQL Memory, Router and other limitations:\n",
"\n",
"Since this is the first version, where we are introducing agents and it's capabilities, so it comes with certain limitations as follows:\n",
"\n",
"1. Abscence of a Planner. In other words, we do not support \"multi-agent\" workflows. For example, after connecting to the database, if you directly ask something complex to plot, as of now, it will not able to plot things. In ideal case, it should \"plan\" what all things it needs to query, and then which columns needs to be used for plotting. However we are going to support multi-agent framework in coming versions. PRs are welcomed.\n",
"\n",
"2. Context handling in memory. Memory has also a very simple implementation. When you instantiate an agent to work with some \"session_name\" then it captures all the history and saves it inside a local \"sqlite\" database in the name of \"premsql_pipeline_memory.db\" (However you can change the path and name of the db). However, if you want to \"analyse\" or \"plot\" something over your previous output, the way it works is, it searches for the latest output dataframe and take that as an input and then output the plot or analysis. As of now, it can not understand history in a semantic sense. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "text2sql-jLjiS8B5-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: examples/datasets.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/root/anindya/Submission/text2sql/text2sql\n"
]
}
],
"source": [
"cd .."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Datasets\n",
"\n",
"premsql datasets helps to use different already available and pre-processed datasets in a simple way. Since Text-to-SQL is a complex task and requires data which has a depdenency of database and tables. \n",
"\n",
"premsql datasets provides simple APIs to use those and also helps you to create your own dataset using your own private databases. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Currently the following datasets are readily available:\n",
"\n",
"1. [BirdBench Dataset](https://huggingface.co/datasets/premai-io/birdbench)\n",
"2. [Spider Unified Datasets](https://huggingface.co/datasets/premai-io/spider)\n",
"3. [Domains](https://huggingface.co/datasets/premai-io/domains)\n",
"4. [Gretel AI Dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) (A synthetic text to SQL dataset by Gretel AI)\n",
"\n",
"Now we are going to see how to use these datasets in a simple way."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"2024-09-09 04:26:34,697 - [BIRD-DATASET] - INFO - Loaded Bird Dataset\n"
]
}
],
"source": [
"from premsql.datasets import Text2SQLDataset\n",
"from premsql.utils import print_data\n",
"# load the bird dataset\n",
"\n",
"bird_dataset = Text2SQLDataset(\n",
" dataset_name='bird', split=\"train\", force_download=False,\n",
" dataset_folder=\"/root/anindya/text2sql/data\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Currently, this is just the object which has the raw the data. This object consist of two methods: \n",
"\n",
"1. `raw_dataset`: This will return a dict containing the raw data opened form the json file. \n",
"2. `filters_available`: This will return the list of filters available for the dataset.\n",
"\n",
"So for our train dataset here is how we can see the raw data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'evidence': 'released in the year 1945 refers to movie_release_year = 1945;',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_bird_training_dataset = bird_dataset.raw_dataset\n",
"raw_bird_training_dataset[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can also see what all filters are available for the dataset. You can simply use `.filters_available` to see the available filters."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['db_id']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bird_dataset.filter_availables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, in order to load the processed dataset, you can simply call `setup_dataset` method. This will load the processed dataset and return the dataset object. \n",
"\n",
"This dataset has certain (optional) methods available for furthur customization:\n",
"\n",
"- filter_by: tuple | None: This will filter the dataset based on the given filter.\n",
"\n",
"- num_rows: int | None: This will return the number of rows from the dataset.\n",
"\n",
"- num_fewshot: int | None: This will determine how many few shot examples to create in the prompt\n",
"\n",
"- model_name_or_path: str | None: This will apply the prompt template of the model you choose. For example, if you want to finetune a llama model then it will wrap the prompt with the llama model prompt template.\n",
"\n",
"Also if this is not provided then it will not tokenize the dataset. \n",
"\n",
"- prompt_template: str | None: If you want to use any other kind of prompt template then you can provide that here. You can check out the default prompt template [here](/premsql/datasets/prompts.py). \n",
"\n",
"**Note**:\n",
"If `model_name_or_path` is provided then it will automatically use the prompt template of that model and tokenize, otherwise it will not."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 04:26:49,099 - [BIRD-DATASET] - INFO - Setting up Bird Dataset\n",
"Applying prompt: 100%|██████████| 3/3 [00:00<00:00, 1865.52it/s]\n",
"2024-09-09 04:26:49,509 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 04:26:49,510 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 105.07it/s]\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 179.29it/s]\n"
]
},
{
"data": {
"text/plain": [
"{'input_ids': tensor([32013, 32013, 2042, ..., 207, 16, 32021]),\n",
" 'labels': tensor([ -100, -100, -100, ..., 207, 16, 32021]),\n",
" 'raw': {'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'evidence': 'released in the year 1945 refers to movie_release_year = 1945;',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite',\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, develo....tles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# SQL: \\n\\n'}}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Now let's setup the bird dataset \n",
"\n",
"bird_dataset = bird_dataset.setup_dataset(\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\", \n",
" num_fewshot=3, \n",
" num_rows=3\n",
")\n",
"\n",
"print_data(bird_dataset[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes tokenization could be time consuming, and it could be computation heavt. So, you can also preview the dataset without even tokenizing first. Here is\n",
"how you do it. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 04:27:12,344 - [BIRD-DATASET] - INFO - Loaded Bird Dataset\n",
"2024-09-09 04:27:12,345 - [BIRD-DATASET] - INFO - Setting up Bird Dataset\n",
"Applying prompt: 100%|██████████| 3/3 [00:00<00:00, 1908.24it/s]\n"
]
},
{
"data": {
"text/plain": [
"{'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'evidence': 'released in the year 1945 refers to movie_release_year = 1945;',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite',\n",
" 'prompt': '\\n# Follow these instruction:\\nYou will be given schemas of tables of a database. Your job is to write....itles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# SQL: \\n'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bird_dataset_without_tokenization = Text2SQLDataset(\n",
" dataset_name='bird', split=\"train\", force_download=False,\n",
" dataset_folder=\"/root/anindya/text2sql/data\"\n",
").setup_dataset(\n",
" model_name_or_path=None, num_fewshot=3, num_rows=3\n",
")\n",
"\n",
"print_data(bird_dataset_without_tokenization[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"BirdDataset has two instance, a `train` and `validation` instance. For train dataset, you can only filter by `db_id`. This will only return results which are belonging to that database id. \n",
"\n",
"For BirdDevDataset you can filter by `db_id` and `difficulty`. Here is how you load a validation dataset and then filter by `difficulty`. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 04:27:20,270 - [BIRD-DATASET] - INFO - Loaded Bird Dataset\n",
"2024-09-09 04:27:20,271 - [BIRD-DATASET] - INFO - Setting up Bird Dataset\n",
"Applying prompt: 100%|██████████| 100/100 [00:00<00:00, 2101.37it/s]\n"
]
},
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load the BirdBench dev dataset and filter the dataset by \n",
"# difficulty\n",
"\n",
"bird_validation = Text2SQLDataset(\n",
" dataset_name='bird', split=\"validation\", force_download=False,\n",
" dataset_folder=\"/root/anindya/text2sql/data\"\n",
").setup_dataset(\n",
" model_name_or_path=None, \n",
" num_fewshot=3, \n",
" num_rows=100,\n",
" filter_by=(\"difficulty\", \"simple\")\n",
")\n",
"\n",
"# count the number of examples in the dataset which has \n",
"# difficulty level as simple\n",
"\n",
"len([\n",
" example for example in bird_validation \n",
" if example[\"difficulty\"] == \"simple\"\n",
"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly we can also filter by the dataset by `db_id`. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 04:27:28,490 - [BIRD-DATASET] - INFO - Loaded Bird Dataset\n",
"2024-09-09 04:27:28,491 - [BIRD-DATASET] - INFO - Setting up Bird Dataset\n",
"Applying prompt: 100%|██████████| 1534/1534 [00:00<00:00, 2537.01it/s]\n",
"2024-09-09 04:27:29,414 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 04:27:29,415 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 1534/1534 [00:09<00:00, 161.93it/s]\n",
"Tokenizing: 100%|██████████| 1534/1534 [00:09<00:00, 161.71it/s]\n"
]
},
{
"data": {
"text/plain": [
"{'input_ids': tensor([32013, 32013, 2042, ..., 207, 16, 32021]),\n",
" 'labels': tensor([ -100, -100, -100, ..., 207, 16, 32021]),\n",
" 'raw': {'question_id': 0,\n",
" 'db_id': 'california_schools',\n",
" 'question': 'What is the highest eligible free rate for K-12 students in the schools in Alameda County?',\n",
" 'evidence': 'Eligible free rate for K-12 = `Free Meal Count (K-12)` / `Enrollment (K-12)`',\n",
" 'SQL': \"SELECT `Free Meal Count (K-12)` / `Enrollment (K-12)` FROM frpm WHERE `County Name` = 'Alameda' ORDER BY (CAST(`Free Meal Count (K-12)` AS REAL) / `Enrollment (K-12)`) DESC LIMIT 1\",\n",
" 'difficulty': 'simple',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/validation/dev_databases/california_schools/california_schools.sqlite',\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, develo....hat is the highest eligible free rate for K-12 students in the schools in Alameda County?\\n\\n# SQL: \\n\\n'}}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bird_validation = Text2SQLDataset(\n",
" dataset_name='bird', split=\"validation\", force_download=False,\n",
" dataset_folder=\"/root/anindya/text2sql/data\"\n",
").setup_dataset(\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\",\n",
")\n",
"print_data(bird_validation[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's it, thats how easy it is to use the datasets. Similarly you can also use other available datasets"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Fetching 401 files: 100%|██████████| 401/401 [00:17<00:00, 22.75it/s]\n",
"2024-09-09 04:28:35,739 - [SPIDER-DATASET] - INFO - Loaded Spider Dataset\n",
"2024-09-09 04:28:35,744 - [SPIDER-DATASET] - INFO - Setting up Spider Dataset\n",
"Applying prompt: 100%|██████████| 3/3 [00:00<00:00, 1572.67it/s]\n",
"2024-09-09 04:28:36,088 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 04:28:36,089 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 248.45it/s]\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 276.99it/s]\n"
]
}
],
"source": [
"# Loading Spider Dataset\n",
"\n",
"spider_dataset = Text2SQLDataset(\n",
" dataset_name=\"spider\",\n",
" split=\"train\",\n",
" dataset_folder=\"../data\",\n",
").setup_dataset(\n",
" num_fewshot=3,\n",
" num_rows=3,\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Fetching 6 files: 100%|██████████| 6/6 [00:05<00:00, 1.03it/s]\n",
"2024-09-09 04:38:55,864 - [DOMAINS-DATASET] - INFO - Loaded Domains Dataset\n",
"2024-09-09 04:38:55,867 - [DOMAINS-DATASET] - INFO - Setting up Domains Dataset\n",
"Applying prompt: 100%|██████████| 3/3 [00:00<00:00, 1377.59it/s]\n",
"2024-09-09 04:38:56,437 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 04:38:56,438 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 145.70it/s]\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 160.63it/s]\n"
]
}
],
"source": [
"## Loading Domains dataset\n",
"\n",
"domains = Text2SQLDataset(\n",
" dataset_name=\"domains\",\n",
" split=\"train\",\n",
" dataset_folder=\"../data\",\n",
").setup_dataset(\n",
" num_fewshot=3,\n",
" num_rows=3,\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 04:38:45,602 - [UTILS] - INFO - Saved JSON in: ../data/gretel/train.json\n",
"Applying prompt: 100%|██████████| 3/3 [00:00<00:00, 1909.97it/s]\n",
"2024-09-09 04:38:46,543 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 04:38:46,543 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 326.80it/s]\n",
"Tokenizing: 100%|██████████| 3/3 [00:00<00:00, 400.61it/s]\n"
]
}
],
"source": [
"# Loading Gretel AI Dataset (This is a synthetic dataset)\n",
"\n",
"gretel_dataset = Text2SQLDataset(\n",
" dataset_name=\"gretel\",\n",
" split=\"train\",\n",
" dataset_folder=\"../data\",\n",
").setup_dataset(\n",
" num_fewshot=3,\n",
" num_rows=3,\n",
" model_name_or_path=\"premai-io/prem-1B-SQL\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': 5097,\n",
" 'question': 'What is the total volume of timber sold by each salesperson, sorted by salesperson?',\n",
" 'schema': \"CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');\",\n",
" 'SQL': 'SELECT salesperson_id, name, SUM(volume) as total_volume FROM timber_sales JOIN salesperson ON timber_sales.salesperson_id = salesperson.salesperson_id GROUP BY salesperson_id, name ORDER BY total_volume DESC;',\n",
" 'context': \"CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');\",\n",
" 'task_type': 'analytics and reporting',\n",
" 'complexity': 'single join',\n",
" 'db_id': 'forestry',\n",
" 'db_path': None,\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, develo....tion: What is the total volume of timber sold by each salesperson, sorted by salesperson?\\n\\n# SQL: \\n\\n'}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print_data(gretel_dataset[0][\"raw\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the best things of the premsql datasets is that it supports packing. This means you can pack multiple datasets together and use them as a single dataset. This is very useful when you want to train on multiple datasets."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Length of bird dataset: 3\n",
"Length of spider dataset: 3\n",
"Length of domains dataset: 3\n",
"Length of gretel dataset: 3\n",
"Length of merged dataset: 12\n"
]
}
],
"source": [
"# Merge all the datasets\n",
"\n",
"print(f\"Length of bird dataset: {len(bird_dataset)}\")\n",
"print(f\"Length of spider dataset: {len(spider_dataset)}\")\n",
"print(f\"Length of domains dataset: {len(domains)}\")\n",
"print(f\"Length of gretel dataset: {len(gretel_dataset)}\")\n",
"\n",
"merged_dataset = [*bird_dataset, *spider_dataset, *domains, *gretel_dataset]\n",
"print(f\"Length of merged dataset: {len(merged_dataset)}\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input_ids': tensor([32013, 32013, 2042, ..., 207, 16, 32021]),\n",
" 'labels': tensor([ -100, -100, -100, ..., 207, 16, 32021]),\n",
" 'raw': {'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'evidence': 'released in the year 1945 refers to movie_release_year = 1945;',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite',\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, develo....tles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# SQL: \\n\\n'}}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print_data(merged_dataset[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How does a prompt looks like in premsql\n",
"\n",
"You might wonder how does a prompt looks like in premsql. This is how a single prompt looks like when wrapped around a model's prompt template. "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\n",
"### Instruction:\n",
"\n",
"# Follow these instruction:\n",
"You will be given schemas of tables of a database. Your job is to write correct\n",
"error free SQL query based on the question asked. Please make sure:\n",
"\n",
"1. Do not add ``` at start / end of the query. It should be a single line query in a single line (string format)\n",
"2. Make sure the column names are correct and exists in the table\n",
"3. For column names which has a space with it, make sure you have put `` in that column name\n",
"4. Think step by step and always check schema and question and the column names before writing the\n",
"query. \n",
"\n",
"# Database and Table Schema:\n",
"CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'John Doe', 'North'), (2, 'Jane Smith', 'South'); CREATE TABLE timber_sales (sales_id INT, salesperson_id INT, volume REAL, sale_date DATE); INSERT INTO timber_sales (sales_id, salesperson_id, volume, sale_date) VALUES (1, 1, 120, '2021-01-01'), (2, 1, 150, '2021-02-01'), (3, 2, 180, '2021-01-01');\n",
"\n",
"\n",
"\n",
"# Here are some Examples on how to generate SQL statements and use column names:\n",
"\n",
"Question: What is the total volume of timber sold by each salesperson, sorted by salesperson?\n",
"SQL: SELECT salesperson_id, name, SUM(volume) as total_volume FROM timber_sales JOIN salesperson ON timber_sales.salesperson_id = salesperson.salesperson_id GROUP BY salesperson_id, name ORDER BY total_volume DESC;\n",
"\n",
"\n",
"# Question: What is the total volume of timber sold by each salesperson, sorted by salesperson?\n",
"\n",
"# SQL: \n",
"\n",
"\n"
]
}
],
"source": [
"print(gretel_dataset[0][\"raw\"][\"prompt\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating your own dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section, we are going to see how we can make our own dataset similar like the above. Creating your own dataset could come with several customization and variables. One of the easiest ways to create your own dataset is to simply annotate the dataset in the given file structure:\n",
"\n",
"```\n",
"├── databases\n",
"│ ├── california_schools\n",
"│ ├── california_schools.sqlite\n",
"│ ├── card_games\n",
"│ ├── codebase_community\n",
"│ ├── debit_card_specializing\n",
"│ ├── european_football_2\n",
"│ ├── financial\n",
"│ ├── formula_1\n",
"│ ├── student_club\n",
"│ ├── superhero\n",
"│ ├── thrombosis_prediction\n",
"│ └── toxicology\n",
"├── train.json \n",
"├── validation.json # Optional \n",
"```\n",
"\n",
"The reason we do this hierchy is, in a real world scenerio, we can have \n",
"multiple databases, and each databases could be multiple tables. So this is how we organize them.\n",
"\n",
"Suppose you are saving everything inside `./data` folder then inside that folder you should have a `databases` folder (you can name it something else too) and a `train/validation.json` file. \n",
"\n",
"Inside the databases folder you should have multple sub folders where under each sub-folder you should have a `.sqlite` file of the same name. For example: if the db name is `california_schools` then you should have a .sqlite file inside `california_schools` folder. \n",
"\n",
"The `train` or `validation` JSON file, should be a list of dictionaries, having the following (required) keys:\n",
"\n",
"1. `db_id`: this represent the folder and the `.sqlite` file name.\n",
"2. `question`: this represent the question asked by the user.\n",
"3. `SQL`: This is the ground truth SQL.\n",
"\n",
"**Please note:** All the keys are case sensitive. Here is an example of a single datapoint. \n",
"\n",
"```json\n",
"\"question_id\": 0,\n",
"\"db_id\": \"california_schools\",\n",
"\"question\": \"What is the highest eligible free rate for K-12 students in the schools in Alameda County?\",\n",
"\"evidence\": \"Eligible free rate for K-12 = `Free Meal Count (K-12)` / `Enrollment (K-12)`\",\n",
"\"SQL\": \"SELECT `Free Meal Count (K-12)` / `Enrollment (K-12)` FROM frpm WHERE `County Name` = 'Alameda' ORDER BY (CAST(`Free Meal Count (K-12)` AS REAL) / `Enrollment (K-12)`) DESC LIMIT 1\",\n",
"\"difficulty\": \"simple\"\n",
"```\n",
"\n",
"You can also keep other keys too, those will be automatically used as filter keys. Now you can use the code to automatically load your dataset from the folder. "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"from premsql.datasets import StandardDataset\n",
"\n",
"path = \"../data/bird/validation\"\n",
"dataset = StandardDataset(\n",
" split=\"validation\",\n",
" dataset_path=path,\n",
" database_folder_name=\"dev_databases\",\n",
" json_file_name=\"validation.json\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"\u001b[1m[\u001b[0m\u001b[32m'db_id'\u001b[0m, \u001b[32m'difficulty'\u001b[0m\u001b[1m]\u001b[0m"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset.filter_availables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have loaded our Bird dev database but this time we have used the `StandardDataset` class. A `StandardDataset` class acts like a template for all text2sql compatible datasets when following the above structure. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Towards more customization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Last but not the least, there is one more level of customization that you can do while creating text-to-sql datasets. Till now all of these use cases shown above were tightly coupled with `.sqlite` specific databases. However if you:\n",
"\n",
"1. have different databases (like postgres or any cloud DB instance)\n",
"2. or want to have lot of custom logics, before making prompts\n",
"3. or add more utility on top of premsql\n",
"\n",
"This section will help you to achieve that. \n",
"\n",
"**Note** In case of the point number one, you can also migrate one subset of the dataset to SQLite. Once you have migrated a subset of your database content to SQLite and have done annotations for that, you can then go for the first route to create a Text2SQL compatible dataset for fine-tuning and inference. \n",
"\n",
"If you still want to go for full customization then you can achieve this with three steps. A detailed tutorial on this will be coming on future versions. However in short, you need to define two things for making a premsql fully custom dataset.\n",
"\n",
"**DatasetInstance:** A dataset instance helps to operations on individual datapoints. You need to extend `premsql.datasets.base.Text2SQLBaseInstance` class to define your own. Here is how a blueprint looks like:\n",
"\n",
"```python\n",
"\n",
"class CustomDataInstance(Text2SQLBaseInstance):\n",
" def __init__(self, dataset: list[dict]) -> None:\n",
" super().__init__(dataset=dataset)\n",
"\n",
" def schema_prompt(self, db_path: str) -> str:\n",
" # write your schema prompt here\n",
" # you need to fetch the schema from your database\n",
" # and format it. For sqlite database it would look\n",
" # like this: SELECT sql FROM sqlite_master WHERE type='table' AND name='{table_name}\n",
" # check out Text2SQLBaseInstance premsql/datasets/base for more details\n",
"```\n",
"\n",
"Additionally this class some more methods: `additional_prompt` `apply_prompt` those have some db agnostic default implementation, however you can change those too if you want. \n",
"\n",
"Once you have your instance defined, you can now define your custom class by inheriting from\n",
"`premsql.datasets.base.Text2SQLBaseDataset` class, like this:\n",
"\n",
"\n",
"```python\n",
"class CustomText2SQLDataset(Text2SQLBaseDataset):\n",
" def __init__(\n",
" self,\n",
" split: str,\n",
" dataset_folder: Optional[Union[str, Path]] = \"./data\",\n",
" hf_token: Optional[str] = None,\n",
" force_download: Optional[bool] = False,\n",
" ):\n",
" # Define your logic here\n",
" pass \n",
"\n",
" def setup_dataset(\n",
" self,\n",
" filter_by: tuple | None = None,\n",
" num_rows: int | None = None,\n",
" num_fewshot: int | None = None,\n",
" model_name_or_path: str | None = None,\n",
" prompt_template: str | None = None,\n",
" ):\n",
" logger.info(\"Setting up Spider Dataset\")\n",
" return super().setup_dataset(\n",
" filter_by, num_rows, num_fewshot, model_name_or_path, prompt_template\n",
" )\n",
"```\n",
"\n",
"Based on your requirements you can define all the necessary things in __init__ method and `setup_dataset` method. You can checkout `Text2SQLBaseDataset` class to see how things are defined. We will roll out a detailed tutorial on how to make a dataset for a different database very soon. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: examples/error_dataset.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/root/anindya/Submission/text2sql/text2sql\n"
]
}
],
"source": [
"# cd text2sql"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Error Handling Datasets and Prompts\n",
"\n",
"In this section we are going to discuss on how you can create error handling prompt which you can pass it to the models during inference for self-correction from errors, or make error handling prompts to fine-tune your models furthur to make them learn how to handle errors. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2024-09-09 13:55:27,850] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/compiler_compat/ld: cannot find -laio: No such file or directory\n",
"collect2: error: ld returned 1 exit status\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
"\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
"\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
" def forward(ctx, input, weight, bias=None):\n",
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
" def backward(ctx, grad_output):\n"
]
}
],
"source": [
"from premsql.datasets.error_dataset import ErrorDatasetGenerator\n",
"from premsql.generators.huggingface import Text2SQLGeneratorHF\n",
"from premsql.executors.from_langchain import ExecutorUsingLangChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to make a error handling dataset or error handling prompt, make sure the data entity has: `db_id`, `db_path` and existing `prompt` which was used earlier to generate results from the model. Let's see an example to understand this better. We will be using our standard BirdBench dataset for this. We also define our generators in this case it will be [Prem-1B-SQL](https://huggingface.co/premai-io/prem-1B-SQL) model and a DB executor from langchain. \n",
"\n",
"You are't aware of generators, executors and datasets then you can check out the following:\n",
"\n",
"1. [Datasets tutorial](/examples/datasets.ipynb)\n",
"2. [Generators tutorial](/examples/generators.ipynb)\n",
"3. [Executors and evaluators tutorial](/examples/evaluation.ipynb)\n",
"\n",
"Since we are making a error dataset, so we will be using existing datasets. Because our goal is to transform the existing train datasets to a error handling datasets. \n",
"\n",
"The flow is simple:\n",
"\n",
"### For training\n",
"\n",
"1. Start with a exising datasets which is compatible with premsql datasets. \n",
"2. Then use a generator to run on that dataset. The executor will gather errors for in-correct generations. \n",
"3. Now use the existing response, initial prompt and the error to create the new data points which will be now using a error handling prompt. \n",
"\n",
"### For Inference\n",
"\n",
"premsql already handles automatic error handling in the [simple-pipeline](/premsql/pipelines/simple.py) and [execution guided decoding](/examples/generators.ipynb) section. So that you do not need to worry about that. \n",
"\n",
"\n",
"Now let's start with defining our generators and execuror first. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 13:55:49,797 - [GENERATOR] - INFO - Experiment folder found in: experiments/train/testing_error_gen\n",
"Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}\n",
"Loading checkpoint shards: 100%|██████████| 2/2 [00:06<00:00, 3.03s/it]\n"
]
}
],
"source": [
"generator = Text2SQLGeneratorHF(\n",
" model_or_name_or_path=\"premai-io/prem-1B-SQL\",\n",
" experiment_name=\"testing_error_gen\",\n",
" type=\"train\", # do not type: 'test' since this will be used during training\n",
" device=\"cuda:0\"\n",
")\n",
"\n",
"executor = ExecutorUsingLangChain()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After this we define our existing training dataset. We are using BirdBench dataset but you can also use your own text2sql compatible datasets or any of our existing datasets. For demo purposes, we have set `num_rows` to 10, but in actual scenerio you should be using full length of the training datasets. Because generally your error dataset will be lesser than the length of the training dataset if you are using a descent trained model which can generate SQL."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 14:02:05,011 - [BIRD-DATASET] - INFO - Loaded Bird Dataset\n",
"2024-09-09 14:02:05,012 - [BIRD-DATASET] - INFO - Setting up Bird Dataset\n",
"Applying prompt: 100%|██████████| 10/10 [00:00<00:00, 2779.53it/s]\n"
]
}
],
"source": [
"from premsql.datasets import BirdDataset\n",
"\n",
"bird_train = BirdDataset(\n",
" split=\"train\",\n",
" dataset_folder=\"/root/anindya/text2sql/data\"\n",
").setup_dataset(\n",
" num_rows=10,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we define our error handling dataset. It is simple, all you need is to feed in the generator of your choice and the executor. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"error_dataset_gen = ErrorDatasetGenerator(\n",
" generator=generator,\n",
" executor=executor\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we generate and save the results. You can use `force` if you want to force the generation once more. Once the error prompt creations are done, it will save the dataset inside `./experiments/train/<generator-experiment-name>/error_dataset.json`. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generating result ...: 0%| | 0/10 [00:00<?, ?it/s]/root/miniconda3/envs/deep/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
" warnings.warn(\n",
"The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n",
"Generating result ...: 100%|██████████| 10/10 [00:17<00:00, 1.78s/it]\n",
"2024-09-09 14:03:20,614 - [GENERATOR] - INFO - All responses are written to: experiments/train/testing_error_gen\n",
"2024-09-09 14:03:20,615 - [ERROR-HANDLING-DATASET] - INFO - Starting Evaluation\n",
"100%|██████████| 10/10 [00:29<00:00, 2.93s/it]\n",
"2024-09-09 14:03:49,870 - [UTILS] - INFO - Saved JSON in: experiments/train/testing_error_gen/accuracy.json\n",
"2024-09-09 14:03:49,872 - [UTILS] - INFO - Saved JSON in: experiments/train/testing_error_gen/predict.json\n",
"Applying error prompt: 100%|██████████| 10/10 [00:00<00:00, 47339.77it/s]\n"
]
}
],
"source": [
"error_dataset = error_dataset_gen.generate_and_save(\n",
" datasets=bird_train,\n",
" force=True\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once generations are fininshed, this is how a sample datapoint would look like. The `prompt` key will now contain error handling prompt. This is how the error_prompt template looks like:\n",
"\n",
"\n",
"```python\n",
"ERROR_HANDLING_PROMPT = \"\"\"\n",
"{existing_prompt}\n",
"\n",
"# Generated SQL: {sql}\n",
"\n",
"## Error Message\n",
"\n",
"{error_msg}\n",
"\n",
"Carefully review the original question and error message, then rewrite the SQL query to address the identified issues. \n",
"Ensure your corrected query uses correct column names, \n",
"follows proper SQL syntax, and accurately answers the original question \n",
"without introducing new errors.\n",
"\n",
"# SQL: \n",
"\"\"\"\n",
"```\n",
"\n",
"You can also change the prompt by the following method:\n",
"\n",
"```python\n",
"error_dataset = error_dataset_gen.generate_and_save(\n",
" datasets=bird_train,\n",
" force=True,\n",
" prompt_template=your_prompt_template\n",
")\n",
"```\n",
"\n",
"Make sure your prompt template should atleast contain the four keys as laid down by the default error handling prompt. "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'prompt': '\\n# Follow these instruction:\\nYou will be given schemas of tables of a database. Your job is to write correct\\nerror free SQL query based on the question asked. Please make sure:\\n\\n1. Do not add ``` at start / end of the query. It should be a single line query in a single line (string format)\\n2. Make sure the column names are correct and exists in the table\\n3. For column names which has a space with it, make sure you have put `` in that column name\\n4. Think step by step and always check schema and question and the column names before writing the\\nquery. \\n\\n# Database and Table Schema:\\nCREATE TABLE \"lists\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n list_id INTEGER not null\\n primary key,\\n list_title TEXT,\\n list_movie_number INTEGER,\\n list_update_timestamp_utc TEXT,\\n list_creation_timestamp_utc TEXT,\\n list_followers INTEGER,\\n list_url TEXT,\\n list_comments INTEGER,\\n list_description TEXT,\\n list_cover_image_url TEXT,\\n list_first_image_url TEXT,\\n list_second_image_url TEXT,\\n list_third_image_url TEXT\\n)\\nCREATE TABLE \"movies\"\\n(\\n movie_id INTEGER not null\\n primary key,\\n movie_title TEXT,\\n movie_release_year INTEGER,\\n movie_url TEXT,\\n movie_title_language TEXT,\\n movie_popularity INTEGER,\\n movie_image_url TEXT,\\n director_id TEXT,\\n director_name TEXT,\\n director_url TEXT\\n)\\nCREATE TABLE \"ratings_users\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n rating_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER\\n)\\nCREATE TABLE lists_users\\n(\\n user_id INTEGER not null ,\\n list_id INTEGER not null ,\\n list_update_date_utc TEXT,\\n list_creation_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial TEXT,\\n user_has_payment_method TEXT,\\n primary key (user_id, list_id),\\n foreign key (list_id) references lists(list_id),\\n foreign key (user_id) references lists(user_id)\\n)\\nCREATE TABLE ratings\\n(\\n movie_id INTEGER,\\n rating_id INTEGER,\\n rating_url TEXT,\\n rating_score INTEGER,\\n rating_timestamp_utc TEXT,\\n critic TEXT,\\n critic_likes INTEGER,\\n critic_comments INTEGER,\\n user_id INTEGER,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER,\\n foreign key (movie_id) references movies(movie_id),\\n foreign key (user_id) references lists_users(user_id),\\n foreign key (rating_id) references ratings(rating_id),\\n foreign key (user_id) references ratings_users(user_id)\\n)\\n\\n\\n\\n# Here are some Examples on how to generate SQL statements and use column names:\\n\\n\\n# Question: Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# Generated SQL: SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1;\\n\\n## Error Message\\n\\nNone\\n\\nCarefully review the original question and error message, then rewrite the SQL query to address the identified issues. \\nEnsure your corrected query uses correct column names, \\nfollows proper SQL syntax, and accurately answers the original question \\nwithout introducing new errors.\\n\\n# SQL: \\n',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite'}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"error_dataset[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You do not need to run the error handling pipeline again and again once you have generated them once. The next time you require this dataset (most probably to use it during fine-tuning) you just need to use `from_existing` class method. \n",
"\n",
"It requires `experiment_name` as an required argument. Make sure that experiment exists. It is the same experiment name which was used in the generators that was used for error handling dataset generations. Here is an example below. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'db_id': 'movie_platform', 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.', 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1', 'prompt': '\\n# Follow these instruction:\\nYou will be given schemas of tables of a database. Your job is to write correct\\nerror free SQL query based on the question asked. Please make sure:\\n\\n1. Do not add ``` at start / end of the query. It should be a single line query in a single line (string format)\\n2. Make sure the column names are correct and exists in the table\\n3. For column names which has a space with it, make sure you have put `` in that column name\\n4. Think step by step and always check schema and question and the column names before writing the\\nquery. \\n\\n# Database and Table Schema:\\nCREATE TABLE \"lists\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n list_id INTEGER not null\\n primary key,\\n list_title TEXT,\\n list_movie_number INTEGER,\\n list_update_timestamp_utc TEXT,\\n list_creation_timestamp_utc TEXT,\\n list_followers INTEGER,\\n list_url TEXT,\\n list_comments INTEGER,\\n list_description TEXT,\\n list_cover_image_url TEXT,\\n list_first_image_url TEXT,\\n list_second_image_url TEXT,\\n list_third_image_url TEXT\\n)\\nCREATE TABLE \"movies\"\\n(\\n movie_id INTEGER not null\\n primary key,\\n movie_title TEXT,\\n movie_release_year INTEGER,\\n movie_url TEXT,\\n movie_title_language TEXT,\\n movie_popularity INTEGER,\\n movie_image_url TEXT,\\n director_id TEXT,\\n director_name TEXT,\\n director_url TEXT\\n)\\nCREATE TABLE \"ratings_users\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n rating_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER\\n)\\nCREATE TABLE lists_users\\n(\\n user_id INTEGER not null ,\\n list_id INTEGER not null ,\\n list_update_date_utc TEXT,\\n list_creation_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial TEXT,\\n user_has_payment_method TEXT,\\n primary key (user_id, list_id),\\n foreign key (list_id) references lists(list_id),\\n foreign key (user_id) references lists(user_id)\\n)\\nCREATE TABLE ratings\\n(\\n movie_id INTEGER,\\n rating_id INTEGER,\\n rating_url TEXT,\\n rating_score INTEGER,\\n rating_timestamp_utc TEXT,\\n critic TEXT,\\n critic_likes INTEGER,\\n critic_comments INTEGER,\\n user_id INTEGER,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER,\\n foreign key (movie_id) references movies(movie_id),\\n foreign key (user_id) references lists_users(user_id),\\n foreign key (rating_id) references ratings(rating_id),\\n foreign key (user_id) references ratings_users(user_id)\\n)\\n\\n\\n\\n# Here are some Examples on how to generate SQL statements and use column names:\\n\\n\\n# Question: Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# Generated SQL: SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1;\\n\\n## Error Message\\n\\nNone\\n\\nCarefully review the original question and error message, then rewrite the SQL query to address the identified issues. \\nEnsure your corrected query uses correct column names, \\nfollows proper SQL syntax, and accurately answers the original question \\nwithout introducing new errors.\\n\\n# SQL: \\n', 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite'}\n"
]
}
],
"source": [
"existing_error_dataset = ErrorDatasetGenerator.from_existing(\n",
" experiment_name=\"testing_error_gen\"\n",
")\n",
"\n",
"print(existing_error_dataset[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can even tokenize the entities as well if you want. Right now we only support huggingface transformers tokenizers to tokenize the error dataset during the time of loading. This is how we do it while loading an existing dataset. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 14:05:49,798 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-09 14:05:49,799 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 10/10 [00:00<00:00, 173.26it/s]\n",
"Tokenizing: 100%|██████████| 10/10 [00:00<00:00, 182.71it/s]\n"
]
},
{
"data": {
"text/plain": [
"{'input_ids': tensor([32013, 32013, 2042, ..., 207, 16, 32021]),\n",
" 'labels': tensor([ -100, -100, -100, ..., 207, 16, 32021]),\n",
" 'raw': {'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\\n### Instruction:\\n\\n# Follow these instruction:\\nYou will be given schemas of tables of a database. Your job is to write correct\\nerror free SQL query based on the question asked. Please make sure:\\n\\n1. Do not add ``` at start / end of the query. It should be a single line query in a single line (string format)\\n2. Make sure the column names are correct and exists in the table\\n3. For column names which has a space with it, make sure you have put `` in that column name\\n4. Think step by step and always check schema and question and the column names before writing the\\nquery. \\n\\n# Database and Table Schema:\\nCREATE TABLE \"lists\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n list_id INTEGER not null\\n primary key,\\n list_title TEXT,\\n list_movie_number INTEGER,\\n list_update_timestamp_utc TEXT,\\n list_creation_timestamp_utc TEXT,\\n list_followers INTEGER,\\n list_url TEXT,\\n list_comments INTEGER,\\n list_description TEXT,\\n list_cover_image_url TEXT,\\n list_first_image_url TEXT,\\n list_second_image_url TEXT,\\n list_third_image_url TEXT\\n)\\nCREATE TABLE \"movies\"\\n(\\n movie_id INTEGER not null\\n primary key,\\n movie_title TEXT,\\n movie_release_year INTEGER,\\n movie_url TEXT,\\n movie_title_language TEXT,\\n movie_popularity INTEGER,\\n movie_image_url TEXT,\\n director_id TEXT,\\n director_name TEXT,\\n director_url TEXT\\n)\\nCREATE TABLE \"ratings_users\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n rating_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER\\n)\\nCREATE TABLE lists_users\\n(\\n user_id INTEGER not null ,\\n list_id INTEGER not null ,\\n list_update_date_utc TEXT,\\n list_creation_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial TEXT,\\n user_has_payment_method TEXT,\\n primary key (user_id, list_id),\\n foreign key (list_id) references lists(list_id),\\n foreign key (user_id) references lists(user_id)\\n)\\nCREATE TABLE ratings\\n(\\n movie_id INTEGER,\\n rating_id INTEGER,\\n rating_url TEXT,\\n rating_score INTEGER,\\n rating_timestamp_utc TEXT,\\n critic TEXT,\\n critic_likes INTEGER,\\n critic_comments INTEGER,\\n user_id INTEGER,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER,\\n foreign key (movie_id) references movies(movie_id),\\n foreign key (user_id) references lists_users(user_id),\\n foreign key (rating_id) references ratings(rating_id),\\n foreign key (user_id) references ratings_users(user_id)\\n)\\n\\n\\n\\n# Here are some Examples on how to generate SQL statements and use column names:\\n\\n\\n# Question: Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# Generated SQL: SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1;\\n\\n## Error Message\\n\\nNone\\n\\nCarefully review the original question and error message, then rewrite the SQL query to address the identified issues. \\nEnsure your corrected query uses correct column names, \\nfollows proper SQL syntax, and accurately answers the original question \\nwithout introducing new errors.\\n\\n# SQL: \\n\\n',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite'}}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Even tokenize this\n",
"\n",
"existing_error_dataset = ErrorDatasetGenerator.from_existing(\n",
" experiment_name=\"testing_error_gen\",\n",
" tokenize_model_name_or_path=\"premai-io/prem-1B-SQL\",\n",
")\n",
"\n",
"existing_error_dataset[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Another example using sqlite executor\n",
"\n",
"This is an another example which uses sqlite executor to do the same thing as done above. This shows how easy it is to plug and play the components and customize it accordingly. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-09 14:06:12,390 - [GENERATOR] - INFO - Experiment folder found in: experiments/train/testing_error_sqlite\n",
"Unrecognized keys in `rope_scaling` for 'rope_type'='linear': {'type'}\n",
"Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00, 2.95s/it]\n"
]
}
],
"source": [
"from premsql.executors import SQLiteExecutor\n",
"\n",
"generator = Text2SQLGeneratorHF(\n",
" model_or_name_or_path=\"premai-io/prem-1B-SQL\",\n",
" experiment_name=\"testing_error_sqlite\",\n",
" type=\"train\",\n",
" device=\"cuda:0\"\n",
")\n",
"sqlite_executor = SQLiteExecutor()\n",
"\n",
"error_dataset_gen = ErrorDatasetGenerator(\n",
" generator=generator,\n",
" executor=sqlite_executor\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also generate a tokenized dataset on the fly. Here is how you do that. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Generating result ...: 0%| | 0/10 [00:00<?, ?it/s]/root/miniconda3/envs/deep/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.\n",
" warnings.warn(\n",
"Generating result ...: 100%|██████████| 10/10 [00:22<00:00, 2.22s/it]\n",
"2024-09-07 09:22:09,232 - [GENERATOR] - INFO - All responses are written to: experiments/train/testing_error_sqlite\n",
"2024-09-07 09:22:09,233 - [ERROR-HANDLING-DATASET] - INFO - Starting Evaluation\n",
"100%|██████████| 10/10 [00:29<00:00, 2.91s/it]\n",
"2024-09-07 09:22:38,359 - [UTILS] - INFO - Saved JSON in: experiments/train/testing_error_sqlite/accuracy.json\n",
"2024-09-07 09:22:38,361 - [UTILS] - INFO - Saved JSON in: experiments/train/testing_error_sqlite/predict.json\n",
"Applying error prompt: 100%|██████████| 10/10 [00:00<00:00, 44104.14it/s]\n",
"2024-09-07 09:22:38,583 - [DATASET] - INFO - Casted dataset with model chat template\n",
"2024-09-07 09:22:38,584 - [DATASET] - INFO - Starting Tokenization ...\n",
"Tokenizing: 100%|██████████| 10/10 [00:00<00:00, 158.85it/s]\n",
"Tokenizing: 100%|██████████| 10/10 [00:00<00:00, 182.43it/s]\n"
]
}
],
"source": [
"error_dataset_from_sqlite = error_dataset_gen.generate_and_save(\n",
" datasets=bird_train,\n",
" tokenize=True,\n",
" force=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'input_ids': tensor([32013, 32013, 2042, ..., 207, 16, 32021]),\n",
" 'labels': tensor([ -100, -100, -100, ..., 207, 16, 32021]),\n",
" 'raw': {'db_id': 'movie_platform',\n",
" 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.',\n",
" 'SQL': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1',\n",
" 'prompt': '<|begin▁of▁sentence|>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer\\n### Instruction:\\n\\n# Follow these instruction:\\nYou will be given schemas of tables of a database. Your job is to write correct\\nerror free SQL query based on the question asked. Please make sure:\\n\\n1. Do not add ``` at start / end of the query. It should be a single line query in a single line (string format)\\n2. Make sure the column names are correct and exists in the table\\n3. For column names which has a space with it, make sure you have put `` in that column name\\n4. Think step by step and always check schema and question and the column names before writing the\\nquery. \\n\\n# Database and Table Schema:\\nCREATE TABLE \"lists\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n list_id INTEGER not null\\n primary key,\\n list_title TEXT,\\n list_movie_number INTEGER,\\n list_update_timestamp_utc TEXT,\\n list_creation_timestamp_utc TEXT,\\n list_followers INTEGER,\\n list_url TEXT,\\n list_comments INTEGER,\\n list_description TEXT,\\n list_cover_image_url TEXT,\\n list_first_image_url TEXT,\\n list_second_image_url TEXT,\\n list_third_image_url TEXT\\n)\\nCREATE TABLE \"movies\"\\n(\\n movie_id INTEGER not null\\n primary key,\\n movie_title TEXT,\\n movie_release_year INTEGER,\\n movie_url TEXT,\\n movie_title_language TEXT,\\n movie_popularity INTEGER,\\n movie_image_url TEXT,\\n director_id TEXT,\\n director_name TEXT,\\n director_url TEXT\\n)\\nCREATE TABLE \"ratings_users\"\\n(\\n user_id INTEGER\\n references lists_users (user_id),\\n rating_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER\\n)\\nCREATE TABLE lists_users\\n(\\n user_id INTEGER not null ,\\n list_id INTEGER not null ,\\n list_update_date_utc TEXT,\\n list_creation_date_utc TEXT,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_avatar_image_url TEXT,\\n user_cover_image_url TEXT,\\n user_eligible_for_trial TEXT,\\n user_has_payment_method TEXT,\\n primary key (user_id, list_id),\\n foreign key (list_id) references lists(list_id),\\n foreign key (user_id) references lists(user_id)\\n)\\nCREATE TABLE ratings\\n(\\n movie_id INTEGER,\\n rating_id INTEGER,\\n rating_url TEXT,\\n rating_score INTEGER,\\n rating_timestamp_utc TEXT,\\n critic TEXT,\\n critic_likes INTEGER,\\n critic_comments INTEGER,\\n user_id INTEGER,\\n user_trialist INTEGER,\\n user_subscriber INTEGER,\\n user_eligible_for_trial INTEGER,\\n user_has_payment_method INTEGER,\\n foreign key (movie_id) references movies(movie_id),\\n foreign key (user_id) references lists_users(user_id),\\n foreign key (rating_id) references ratings(rating_id),\\n foreign key (user_id) references ratings_users(user_id)\\n)\\n\\n\\n\\n# Here are some Examples on how to generate SQL statements and use column names:\\n\\n\\n# Question: Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\\n\\n# Generated SQL: SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1;\\n\\n## Error Message\\n\\nNone\\n\\nCarefully review the original question and error message, then rewrite the SQL query to address the identified issues. \\nEnsure your corrected query uses correct column names, \\nfollows proper SQL syntax, and accurately answers the original question \\nwithout introducing new errors.\\n\\n# SQL: \\n\\n',\n",
" 'db_path': '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite'}}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"error_dataset_from_sqlite[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thats it, that is how you generate a error handling dataset. This dataset will be compatible with other premsql datasets. So you can use / mix all of them to use as a singular dataset entity which can be now used collectively for fine-tuning purposes. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "deep",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: examples/evaluation.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "30e64251-c3f2-473b-a76f-10bc4a645e93",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/root/anindya/Submission/text2sql/text2sql\n"
]
}
],
"source": [
"# cd .."
]
},
{
"cell_type": "markdown",
"id": "52735847-f54b-4929-acfb-f4fafb509df7",
"metadata": {},
"source": [
"## Evaluator\n",
"\n",
"premsql evaluators helps you to evaluate your text-to-sql models on various your validation datasets. Currently we support two metrics for evaluation:\n",
"\n",
"1. Execution Accuracy\n",
"2. Valid Efficiency Score\n",
"\n",
"**Execution Accuracy (EX):** From the name, it is clear that the correctness of the LLM is measured by comparing the executed results from the LLM with the ground truth.\n",
" \n",
"**Valid Efficiency Score (VES):** The primary objective of LLM-generated SQL queries is to be accurate. However, it also needs to be performance-optimized when dealing with big data. This metric asses both of the objectives. It quantifies how efficient the query is and whether the query is accurate or not. The figure below shows how it is been computed. \n",
"\n",
"Now let's jump in to the code to see how we can use premsql to evaluate models or pipelines using these metrics. \n",
"\n",
"To startoff, we import all the necessary things required to evaluate our models. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b2516212-a777-4ce4-87fa-781304818819",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2024-09-09 12:58:58,000] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/compiler_compat/ld: cannot find -laio: No such file or directory\n",
"collect2: error: ld returned 1 exit status\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[93m [WARNING] \u001b[0m async_io requires the dev libaio .so object and headers but these were not found.\n",
"\u001b[93m [WARNING] \u001b[0m async_io: please install the libaio-dev package with apt\n",
"\u001b[93m [WARNING] \u001b[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.\n",
"\u001b[93m [WARNING] \u001b[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH\n",
"\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4\n",
"\u001b[93m [WARNING] \u001b[0m using untested triton version (3.0.0), only 1.0.0 is known to be compatible\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.\n",
" def forward(ctx, input, weight, bias=None):\n",
"/root/miniconda3/envs/deep/lib/python3.10/site-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.\n",
" def backward(ctx, grad_output):\n"
]
}
],
"source": [
"import json\n",
"from pathlib import Path \n",
"from premsql.evaluator import Text2SQLEvaluator\n",
"from premsql.executors import SQLiteExecutor"
]
},
{
"cell_type": "markdown",
"id": "226827f8-b3bd-4249-9adf-3a396fa3fa37",
"metadata": {},
"source": [
"Our evaluation methods are agnostic to models or any pipelines. To evaluate we rely on a special response JSON structure. So ideally we assume that, before doing evaluation, you have got all the model responses saved inside a JSON. \n",
"\n",
"In our [premsql.generators](/examples/generators.ipynb) section, we have show you how you can get the model responses for validation or inference purposes. We start off by defining our experiment_path. You can get the experiment path manually or you can also get it from your generator object (more on that below). "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7ca7d114",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'result': [('Brief Encounter',)], 'error': None, 'execution_time': 0.03717160224914551}\n"
]
}
],
"source": [
"executor = SQLiteExecutor()\n",
"db_path = (\n",
" '/root/anindya/text2sql/data/bird/train/train_databases/movie_platform/movie_platform.sqlite'\n",
")\n",
"sql = 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1'\n",
"\n",
"result = executor.execute_sql(\n",
" sql=sql,\n",
" dsn_or_db_path=db_path\n",
")\n",
"\n",
"print(result)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "790a8667",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'result': \"[('Brief Encounter',)]\",\n",
" 'error': None,\n",
" 'execution_time': 0.028678178787231445}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from premsql.executors import ExecutorUsingLangChain\n",
"\n",
"executor = ExecutorUsingLangChain()\n",
"executor.execute_sql(\n",
" sql=sql,\n",
" dsn_or_db_path=db_path\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "05c40a8c-8972-4992-b035-5ca7ed410211",
"metadata": {},
"outputs": [],
"source": [
"experiment_path = Path(\n",
" \"experiments/test/testing_finetuned_deepseek_full_fewshot/\"\n",
")\n",
"responses = json.load(open(experiment_path / \"predict.json\", \"r\"))"
]
},
{
"cell_type": "markdown",
"id": "3bbeee2b-9cc6-4c22-929e-1927d0ea9ad0",
"metadata": {},
"source": [
"Since text-to-SQL is a database dependent task, so it requires a DB source to execute the SQL generated from the model and compare it with the result executed from the ground truth SQL. \n",
"\n",
"So evaluator depends on an executor object. An executor derives from `premsql.evaluator.base.BaseExecutor` abstract class. This class has one method called `execute_sql`. We are going to use `SQLiteExecutor` to execute in SQLite DBs. You can also make your own executor to evaluate with your custom executor. More on that below. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5c313d53-da57-4687-9a54-0b03bec9f3d0",
"metadata": {},
"outputs": [],
"source": [
"executor = SQLiteExecutor()\n",
"evaluator = Text2SQLEvaluator(\n",
" executor=executor, experiment_path=experiment_path\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0b345933-ff9d-4a07-81c0-43e53279bd0e",
"metadata": {},
"source": [
"Now our setup is done, let's compute the execution accuracy score using premsql evaluator. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a146e8d7-ed2a-444c-a59f-6390a6b1f136",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fb8f8c7ccd854d7c918098ad214250fd",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1534 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-05 13:50:28,471 - [UTILS] - INFO - Saved JSON in: experiments/test/testing_finetuned_deepseek_full_fewshot/accuracy.json\n",
"2024-09-05 13:50:28,522 - [UTILS] - INFO - Saved JSON in: experiments/test/testing_finetuned_deepseek_full_fewshot/predict.json\n"
]
}
],
"source": [
"ex = evaluator.execute(\n",
" metric_name=\"accuracy\", \n",
" model_responses=responses, \n",
" filter_by=\"difficulty\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "083b42bb-6005-4a19-86df-d43f59fe8b63",
"metadata": {},
"source": [
"This will start the evaluation. Since the responses are coming from premsql.generators. So the responses not only has model responses but all the input data (dict) that was given it the generator before generating results. So this means you can similarly filter the responses using any kind of filter similar how you did it for [datasets](/exampels/datasets.ipynb). Let's print the results to see the scores. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b5c2bc6e-8d77-42a6-a6d3-90c42a367be3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"> Execution Accuracy is: <span style=\"font-weight: bold\">{</span><span style=\"color: #008000; text-decoration-color: #008000\">'simple'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">49.18918918918919</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'challenging'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">24.137931034482758</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'moderate'</span>: \n",
"<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">38.146551724137936</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'overall'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">43.481095176010434</span><span style=\"font-weight: bold\">}</span>\n",
"</pre>\n"
],
"text/plain": [
" Execution Accuracy is: \u001b[1m{\u001b[0m\u001b[32m'simple'\u001b[0m: \u001b[1;36m49.18918918918919\u001b[0m, \u001b[32m'challenging'\u001b[0m: \u001b[1;36m24.137931034482758\u001b[0m, \u001b[32m'moderate'\u001b[0m: \n",
"\u001b[1;36m38.146551724137936\u001b[0m, \u001b[32m'overall'\u001b[0m: \u001b[1;36m43.481095176010434\u001b[0m\u001b[1m}\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(f\" Execution Accuracy is: {ex}\")"
]
},
{
"cell_type": "markdown",
"id": "68d88030-7588-478a-9178-3ebfa332ff65",
"metadata": {},
"source": [
"Once you get the result, you will also see that the results are saved in the same `experiment_path` which was initially given to the object. This saves lot of informations like the errors for each of the validation questions and other infos too. This would help us to understand the problems and debug them. Here is an instance of how it looks like:\n",
"\n",
"```json\n",
"\n",
"{\n",
" \"question_id\": 23,\n",
" \"db_id\": \"california_schools\",\n",
" \"question\": \"List the names of schools with more than 30 difference in enrollements between K-12 and ages 5-17? Please also give the full street adress of the schools.\",\n",
" \"evidence\": \"Diffrence in enrollement = `Enrollment (K-12)` - `Enrollment (Ages 5-17)`\",\n",
" \"SQL\": \"SELECT T1.School, T1.Street FROM schools AS T1 INNER JOIN frpm AS T2 ON T1.CDSCode = T2.CDSCode WHERE T2.`Enrollment (K-12)` - T2.`Enrollment (Ages 5-17)` > 30\",\n",
" \"difficulty\": \"moderate\",\n",
" \"db_path\": \"data/bird/validation/dev_databases/california_schools/california_schools.sqlite\",\n",
" \"prompt\": \"<\\uff5cbegin\\u2581of\\u2581sentence\\uff5c>You are an ... Additional Knowledge: Diffrence in enrollement = `Enrollment (K-12)` - `Enrollment (Ages 5-17)`# Question: List the names of schools with more than 30 difference in enrollements between K-12 and ages 5-17? Please also give the full street adress of the schools.\\n\\n# SQL:\\n\",\n",
" \"dataset_type\": \"real\",\n",
" \"generated\": \"SELECT T2.`School Name`, T2.Street, T2.City, T2.Zip FROM satscores AS T1 INNER JOIN frpm AS T2 ON T1.cds = T2.CDSCode WHERE T1.`Enrollment (K-12)` - T1.`Enrollment (Ages 5-17)` > 30\",\n",
" \"error\": \"no such column: T2.Street\",\n",
" \"accuracy\": 0\n",
"},\n",
"```\n",
"\n",
"All of the informations are saved in the provided `experiment_path`. Similarly, you can calculate Valid Efficiency Score (ves) using the same method. Now for this one, let's filter the result based on `db_id`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "6670b8f2-2a71-42de-a0e9-ec37c55c0305",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b68615d1d2fc43de8e5a3929578fb668",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1534 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-09-05 14:03:40,245 - [UTILS] - INFO - Saved JSON in: experiments/test/testing_finetuned_deepseek_full_fewshot/ves.json\n",
"2024-09-05 14:03:40,296 - [UTILS] - INFO - Saved JSON in: experiments/test/testing_finetuned_deepseek_full_fewshot/predict.json\n"
]
}
],
"source": [
"ves = evaluator.execute(\n",
" metric_name=\"ves\", \n",
" model_responses=responses, \n",
" filter_by=\"db_id\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3f15d698-6d1e-4c97-a537-601c97d6ab22",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">{</span>\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'card_games'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">43.07853075375516</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'formula_1'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">31.775724213967827</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'superhero'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">71.8911921365164</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'thrombosis_prediction'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">32.04531820456654</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'european_football_2'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">55.13455641323744</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'debit_card_specializing'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">31.443812836674972</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'financial'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">20.009163037705697</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'toxicology'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">38.062478162792736</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'california_schools'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">20.28612811030907</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'codebase_community'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">61.334184528675365</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'student_club'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">55.627231251046005</span>,\n",
" <span style=\"color: #008000; text-decoration-color: #008000\">'overall'</span>: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">43.69090268345865</span>\n",
"<span style=\"font-weight: bold\">}</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1m{\u001b[0m\n",
" \u001b[32m'card_games'\u001b[0m: \u001b[1;36m43.07853075375516\u001b[0m,\n",
" \u001b[32m'formula_1'\u001b[0m: \u001b[1;36m31.775724213967827\u001b[0m,\n",
" \u001b[32m'superhero'\u001b[0m: \u001b[1;36m71.8911921365164\u001b[0m,\n",
" \u001b[32m'thrombosis_prediction'\u001b[0m: \u001b[1;36m32.04531820456654\u001b[0m,\n",
" \u001b[32m'european_football_2'\u001b[0m: \u001b[1;36m55.13455641323744\u001b[0m,\n",
" \u001b[32m'debit_card_specializing'\u001b[0m: \u001b[1;36m31.443812836674972\u001b[0m,\n",
" \u001b[32m'financial'\u001b[0m: \u001b[1;36m20.009163037705697\u001b[0m,\n",
" \u001b[32m'toxicology'\u001b[0m: \u001b[1;36m38.062478162792736\u001b[0m,\n",
" \u001b[32m'california_schools'\u001b[0m: \u001b[1;36m20.28612811030907\u001b[0m,\n",
" \u001b[32m'codebase_community'\u001b[0m: \u001b[1;36m61.334184528675365\u001b[0m,\n",
" \u001b[32m'student_club'\u001b[0m: \u001b[1;36m55.627231251046005\u001b[0m,\n",
" \u001b[32m'overall'\u001b[0m: \u001b[1;36m43.69090268345865\u001b[0m\n",
"\u001b[1m}\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"print(ves)"
]
},
{
"cell_type": "markdown",
"id": "8dc48a3f-3c1c-4b71-a9cf-81bd0f71db58",
"metadata": {},
"source": [
"Let's now plot the result to see what is the distribution of result based on different databases. These kind of filters lets us to analyse the results of the model or pipeline on several key aspects which helps us to understand where the next iterations can be done on. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ce44e273-ea0e-4d88-8614-72b71fb524c6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
],
"text/plain": []
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAJOCAYAAABm7rQwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADrTUlEQVR4nOzdd3gUZdfH8d8mkARIgYQSSkIJSAQRAUUCKKg0BRsIKrxS7IooIAjYAAvNAipNLGBDVIqIXXkUFbBQRIRHihCKNCEhBUgCyXn/4MmQJYkksOwm8fu5Lq6LnJ2dvc99z8zOnr1n1mVmJgAAAAAAAMCL/HzdAAAAAAAAAPz7UJQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAlEjx8fFyuVyaNWuWr5tyxt566y3FxsaqdOnSKl++vEfX3bdvX9WqVcstlpqaqttvv12RkZFyuVwaOHCgJGnv3r264YYbFBERIZfLpUmTJnm0LfCtkrTP/JOzkWde6xw1apRcLpfHXuPfKK8+rFWrlvr27eubBgEAPI6iFADA56655hqVLVtWKSkp+S7Tq1cvBQQE6MCBA15sme/98ccf6tu3r2JiYvTKK69oxowZ+S6b/QEu+1/ZsmUVHR2tq6++WjNnzlR6enqBXnPMmDGaNWuW7rnnHr311lu65ZZbJEmDBg3SF198oREjRuitt95Sp06dPJLj2TB16tRCFx3S0tI0ceJEXXzxxQoLC1NQUJDOOecc3Xfffdq4cWOh27Bs2TKNGjVKBw8eLPRz/y2+/fZbt2325H9z5szxdROLtY8//lidOnVSRESEsz0PGTLkX3ccBQAUXaV83QAAAHr16qVFixZpwYIF6t27d67HDx8+rIULFzofrv5Nvv32W2VlZemFF15Q3bp1C/ScadOmKTg4WOnp6frrr7/0xRdf6NZbb9WkSZP08ccfKyoqyln2lVdeUVZWltvz//Of/6hFixYaOXJkrvi1116rIUOGnHliZ9nUqVNVsWLFAs+o2L9/vzp16qSVK1eqS5cu6tmzp4KDg7VhwwbNmTNHM2bMUEZGRqHasGzZMo0ePVp9+/b1+Ay3s6FmzZo6cuSISpcu7fXXvv/++3XRRRflisfFxXm9LZ7y6KOPavjw4T57/SFDhui5555T48aNNWzYMIWHh2vVqlWaPHmy5syZo8WLF6t+/fo+ax8AABJFKQBAEXDNNdcoJCREs2fPzrMotXDhQh06dEi9evXyQet8a9++fZJUqKLGDTfcoIoVKzp/P/7443rnnXfUu3dvde/eXT/++KPzWF4FiH379qlBgwZ5xj1ZXDl27JiysrIUEBDgsXWerr59+2r16tWaO3euunXr5vbYk08+qUceecRHLTv7co5DUFCQT9pwySWX6IYbbvDJa58tpUqVUqlSvjnVfvfdd/Xcc8/pxhtv1DvvvCN/f3/nsb59++qyyy5T9+7dtWrVKq+28dChQypXrpzXXg8AUPRx+R4AwOfKlCmjrl27avHixU4RJqfZs2crJCRE11xzjRISEjRkyBA1atRIwcHBCg0N1ZVXXqk1a9ac8nXatm2rtm3b5orndV+lrKwsTZo0SQ0bNlRQUJCqVKmiu+66S4mJiW7LrVixQh07dlTFihVVpkwZ1a5dW7feemuB8p46daoaNmyowMBAVatWTf3793e71KtWrVrObKVKlSrJ5XJp1KhRBVr3yXr16qXbb79dP/30k7766qs8c8++lGrr1q365JNPnEuoZs2aJZfLJTPTlClTnHi2gwcPauDAgYqKilJgYKDq1q2r8ePHu83Ayr7nzrPPPqtJkyYpJiZGgYGBWr9+vaTjlynecMMNCg8PV1BQkC688EJ99NFHbjlkt2Pp0qUaPHiwKlWqpHLlyun666/X33//7dZv69at05IlS5y25jXu2X766Sd98sknuu2223IVpCQpMDBQzz77rPP3b7/9pr59+6pOnToKCgpSZGSkbr31VrdLokaNGqWhQ4dKkmrXru20Iz4+3lnm7bffVrNmzVSmTBmFh4frpptu0o4dO3K9/pQpU1SnTh2VKVNGzZs31/fff5/ntrxv3z7ddtttqlKlioKCgtS4cWO98cYbbsv80zjkd6+lgozN0aNHNXr0aNWrV09BQUGKiIhQ69at3ba1MzFz5ky5XC69/vrrbvExY8bI5XLp008/dWIHDx7UoEGDVKtWLQUGBqpGjRrq3bu39u/fn+/6C3NsOHjwoPr27auwsDCVL19effr0yfMSzbzuh+RyuXTffffpww8/1HnnnafAwEA1bNhQn3/+ea7nf/vtt7rwwgsVFBSkmJgYvfzyywW+T9Xo0aNVoUIFzZgxw60gJUnNmzfXsGHDtHbtWs2dO1eSdN999yk4OFiHDx/Ota6bb75ZkZGRyszMdGKfffaZLrnkEpUrV04hISHq3Lmz1q1b5/a8vn37Kjg4WH/++aeuuuoqhYSEOF8sfP/99+revbuio6MVGBioqKgoDRo0SEeOHDllbgCAkoWZUgCAIqFXr15644039P777+u+++5z4gkJCfriiy908803q0yZMlq3bp0+/PBDde/eXbVr19bevXv18ssvq02bNlq/fr2qVavmkfbcddddmjVrlvr166f7779fW7du1eTJk7V69WotXbpUpUuX1r59+9ShQwdVqlRJw4cPV/ny5RUfH6/58+efcv2jRo3S6NGj1a5dO91zzz3asGGDpk2bpl9++cVZ/6RJk/Tmm29qwYIFziV5559//mnndMstt2jGjBn68ssv1b59+1yPn3vuuXrrrbc0aNAg1ahRQw8++KAkqUmTJs69pdq3b+82m+3w4cNq06aN/vrrL911112Kjo7WsmXLNGLECO3evTvXzdBnzpyptLQ03XnnnQoMDFR4eLjWrVunVq1aqXr16ho+fLjKlSun999/X9ddd53mzZun66+/3m0dAwYMUIUKFTRy5EjFx8dr0qRJuu+++/Tee+9JkiZNmqQBAwYoODjYmeFUpUqVfPslu8CSfe+sU/nqq6+0ZcsW9evXT5GRkVq3bp1mzJihdevW6ccff5TL5VLXrl21ceNGvfvuu5o4caIzc61SpUqSpKefflqPPfaYevToodtvv11///23XnrpJV166aVavXq1MyNt2rRpuu+++3TJJZdo0KBBio+P13XXXacKFSqoRo0aTpuOHDmitm3bavPmzbrvvvtUu3ZtffDBB+rbt68OHjyoBx544JTjcPJlnJIKPDajRo3S2LFjdfvtt6t58+ZKTk7WihUrtGrVqjy3tZOlpKTkWTTKvql+v379NH/+fA0ePFjt27dXVFSU1q5dq9GjR+u2227TVVddJen4TfovueQS/fe//9Wtt96qpk2bav/+/froo4+0c+dOtxmEp8PMdO211+qHH37Q3XffrXPPPVcLFixQnz59CryOH374QfPnz9e9996rkJAQvfjii+rWrZu2b9/uXJ68evVqderUSVWrVtXo0aOVmZmpJ554wtl+/smmTZu0YcMG9e3bV6GhoXku07t3b40cOVIff/yxbrrpJt14442aMmWKPvnkE3Xv3t1Z7vDhw1q0aJH69u3rFLfeeust9enTRx07dtT48eN1+PBhTZs2Ta1bt9bq1avdinjHjh1Tx44d1bp1az377LMqW7asJOmDDz7Q4cOHdc899ygiIkI///yzXnrpJe3cuVMffPBBgfsSAFACGAAARcCxY8esatWqFhcX5xafPn26SbIvvvjCzMzS0tIsMzPTbZmtW7daYGCgPfHEE24xSTZz5kwn1qZNG2vTpk2u1+7Tp4/VrFnT+fv77783SfbOO++4Lff555+7xRcsWGCS7JdffilUrvv27bOAgADr0KGDWy6TJ082Sfb66687sZEjR5ok+/vvv0+53lMtm5iYaJLs+uuvd2In525mVrNmTevcuXOu50uy/v37u8WefPJJK1eunG3cuNEtPnz4cPP397ft27eb2YnxCA0NtX379rkte8UVV1ijRo0sLS3NiWVlZVnLli2tXr16TmzmzJkmydq1a2dZWVlOfNCgQebv728HDx50Yg0bNsxzrPNy/fXXmyRLTEws0PKHDx/OFXv33XdNkn333XdO7JlnnjFJtnXrVrdl4+Pjzd/f355++mm3+Nq1a61UqVJOPD093SIiIuyiiy6yo0ePOsvNmjXLJLnlN2nSJJNkb7/9thPLyMiwuLg4Cw4OtuTkZDP753HIa58p6Ng0btw4z23mVL755huTlO+/3bt3O8vu3r3bwsPDrX379paenm5NmjSx6OhoS0pKcpZ5/PHHTZLNnz8/12tlbzNncmz48MMPTZJNmDDBiR07dswuueSSXOvM3h9zkmQBAQG2efNmJ7ZmzRqTZC+99JITu/rqq61s2bL2119/ObFNmzZZqVKlcq3zZNltnDhx4j8uFxoaak2bNjWz431TvXp169atm9sy77//vtt2nZKSYuXLl7c77rjDbbk9e/ZYWFiYW7xPnz4myYYPH57rtfPah8aOHWsul8u2bdvmxPLqw5o1a1qfPn3+MTcAQPHB5XsAgCLB399fN910k5YvX+52idPs2bNVpUoVXXHFFZKOX0rl53f87SszM1MHDhxQcHCw6tevr1WrVnmkLR988IHCwsLUvn177d+/3/nXrFkzBQcH65tvvpF04j5PH3/8sY4ePVrg9X/99dfKyMjQwIEDnVwk6Y477lBoaKg++eQTj+RxsuDgYEn6x185LKwPPvhAl1xyiSpUqODWV+3atVNmZqa+++47t+W7devmNtsjISFB//nPf9SjRw9ntsz+/ft14MABdezYUZs2bdJff/3lto4777zT7RKmSy65RJmZmdq2bdtp5ZCcnCxJCgkJKdDyZcqUcf6flpam/fv3q0WLFpJUoG1w/vz5ysrKUo8ePdz6LDIyUvXq1XO2rxUrVujAgQO644473O7706tXL1WoUMFtnZ9++qkiIyN18803O7HSpUvr/vvvV2pqqpYsWeK2/MnjkJfCjE358uW1bt06bdq06ZT55+Xxxx/XV199letfeHi4s0xkZKSmTJmir776Spdccol+/fVXvf76626zgebNm6fGjRvnml0nqUCXvZ3Kp59+qlKlSumee+5xYv7+/howYECB19GuXTvFxMQ4f59//vkKDQ3Vli1bJB0/rn399de67rrr3GZ+1q1bV1deeeUp15+9f59qew4JCXG2fZfLpe7du+vTTz9Vamqqs8x7772n6tWrq3Xr1pKOzxI8ePCgbr75Zrdt19/fXxdffLGz7eaUs6+y5dyHDh06pP3796tly5YyM61evfqUOQIASg4u3wMAFBm9evXSxIkTNXv2bD388MPauXOnvv/+e91///3OpSPZv0Q3depUbd261e0+J576Zb5NmzYpKSlJlStXzvPx7PtetWnTRt26ddPo0aM1ceJEtW3bVtddd5169uypwMDAfNefXTw5+ZevAgICVKdOndMurpxK9ofNghZfCmLTpk367bff8i1wnHyPsNq1a7v9vXnzZpmZHnvsMT322GP5rqN69erO39HR0W6PZxdoTr7fV0FlFzVSUlIKdCP3hIQEjR49WnPmzMmVX1JS0imfv2nTJpmZ6tWrl+fj2Tefz94OTv7VxVKlSuW6z9G2bdtUr149tyKndPySzJzrynbyOOSlMGPzxBNP6Nprr9U555yj8847T506ddItt9xS4MtNGzVqpHbt2p1yuZtuuklvv/22PvnkE915551OsTrbn3/+med9wTxl27Ztqlq1qlPgzVaYX7E7efuVjm/D2dvvvn37dOTIkTx/bbMgv8CZvX+fqvickpLidoy78cYbNWnSJH300Ufq2bOnUlNT9emnn+quu+5yCnrZRcfLL788z3WefLlgqVKl3C4zzbZ9+3Y9/vjj+uijj3LttwXZhwAAJQdFKQBAkdGsWTPFxsbq3Xff1cMPP6x3331XZub2q3tjxozRY489pltvvVVPPvmkwsPD5efnp4EDB+Z5T5ycsm/WfbKchS3peOGrcuXKeuedd/JcT3YBxuVyae7cufrxxx+1aNEiffHFF7r11lv13HPP6ccff8z1wdXXfv/9d0kF+2BbUFlZWWrfvr0eeuihPB8/55xz3P7OOUMi+/nS8Z+v79ixY57rOLm9J9+4OVteY1sQsbGxkqS1a9fqkksuOeXyPXr00LJlyzR06FBdcMEFCg4OVlZWljp16nTKbVA6nrPL5dJnn32WZy7e2G5OHoe8FGZsLr30Uv35559auHChvvzyS7366quaOHGipk+frttvv91j7T5w4IBWrFghSVq/fr2ysrJyFeJOR0GPDZ7g6e33ZNmFyN9++y3fZbZt26bk5GS3X9ls0aKFatWqpffff189e/bUokWLdOTIEd14443OMtnbxFtvvaXIyMhc6z35l/xyzmzNlpmZqfbt2yshIUHDhg1TbGysypUrp7/++kt9+/Yt0D4EACg5KEoBAIqUXr166bHHHtNvv/2m2bNnq169erroooucx+fOnavLLrtMr732mtvzDh48eMqbGFeoUMG5RCank2eRxMTE6Ouvv1arVq0K9OG9RYsWatGihZ5++mnNnj1bvXr10pw5c/L9MF6zZk1J0oYNG1SnTh0nnpGRoa1btxZoxsjpeOuttyQp3wLD6YiJiVFqaupptzk7/9KlS3s078JcqnX11Vdr7Nixevvtt09ZlEpMTNTixYs1evRoPf744048r8vW8mtDTEyMzEy1a9fOVbTLKXs72bx5sy677DInfuzYMcXHx7vNQqpZs6Z+++23XEWaP/74w21dhVHYsQkPD1e/fv3Ur18/paam6tJLL9WoUaM8WpTq37+/UlJSNHbsWI0YMUKTJk3S4MGDncdjYmKc4mthFPTYULNmTS1evFipqaluxcMNGzYU+jXzU7lyZQUFBWnz5s25HssrdrJzzjlH55xzjj788EO98MILec6MfPPNNyVJXbp0cYv36NFDL7zwgpKTk/Xee++pVq1azqWpkpzLDitXrnza++vatWu1ceNGvfHGG24/muCpX2oEABQv3FMKAFCkZM+Kevzxx/Xrr7+6zZKSjs8yOHlGwQcffJDrvkN5iYmJ0R9//KG///7bia1Zs0ZLly51W65Hjx7KzMzUk08+mWsdx44dc37+PTExMVdbLrjgAklSenp6vu1o166dAgIC9OKLL7o9/7XXXlNSUpI6d+58ylwKa/bs2Xr11VcVFxeX65KnM9GjRw8tX75cX3zxRa7HDh48qGPHjv3j8ytXrqy2bdvq5Zdf1u7du3M9nnOsCqNcuXLOOJ1KXFycOnXqpFdffVUffvhhrsczMjI0ZMgQSSdmuZw87if/ymB2GyTlakfXrl3l7++v0aNH51qPmenAgQOSpAsvvFARERF65ZVX3PrxnXfeyXXJ01VXXaU9e/Y4v0AoHd9WX3rpJQUHB6tNmzb/0AN5K8zYZLc5W3BwsOrWrfuP+0FhzZ07V++9957GjRun4cOH66abbtKjjz6qjRs3Ost069ZNa9as0YIFC3I9/59mIhX02HDVVVfp2LFjmjZtmhPLzMzUSy+9dCapufH391e7du304YcfateuXU588+bN+uyzzwq0jscff1yJiYm6++67c832WrlypcaPH6/zzjsv16WON954o9LT0/XGG2/o888/V48ePdwe79ixo0JDQzVmzJg876NXkP01r33IzPTCCy8UKDcAQMnCTCkAQJFSu3ZttWzZUgsXLpSkXEWpLl266IknnlC/fv3UsmVLrV27Vu+8847bjKP83HrrrXr++efVsWNH3Xbbbdq3b5+mT5+uhg0bOjf8lY7fK+quu+7S2LFj9euvv6pDhw4qXbq0Nm3apA8++EAvvPCCbrjhBr3xxhuaOnWqrr/+esXExCglJUWvvPKKQkNDnZ+oz0ulSpU0YsQIjR49Wp06ddI111yjDRs2aOrUqbrooov0f//3f6fZe8fNnTtXwcHBysjI0F9//aUvvvhCS5cuVePGjT3+c+tDhw7VRx99pC5duqhv375q1qyZDh06pLVr12ru3LmKj48/5Qy2KVOmqHXr1mrUqJHuuOMO1alTR3v37tXy5cu1c+dOrVmzptDtatasmaZNm6annnpKdevWVeXKlfO9D450fOZIhw4d1LVrV1199dW64oorVK5cOW3atElz5szR7t279eyzzyo0NFSXXnqpJkyYoKNHj6p69er68ssvtXXr1jzbIEmPPPKIbrrpJpUuXVpXX321YmJi9NRTT2nEiBGKj4/Xddddp5CQEG3dulULFizQnXfeqSFDhiggIECjRo3SgAEDdPnll6tHjx6Kj4/XrFmzFBMT4zYT684779TLL7+svn37auXKlapVq5bmzp2rpUuXatKkSad9H7GCjk2DBg3Utm1bNWvWTOHh4VqxYoXmzp2r++67r0Cv8/333ystLS1X/Pzzz9f555+vffv26Z577tFll13mrHPy5Mn65ptv1LdvX/3www/y8/PT0KFDNXfuXHXv3l233nqrmjVrpoSEBH300UeaPn26GjdunOfrF/TYcPXVV6tVq1YaPny44uPj1aBBA82fP9/j90EaNWqUvvzyS7Vq1Ur33HOPMjMzNXnyZJ133nn69ddfT/n8Xr166ZdfftELL7yg9evXOzfHX7VqlV5//XVFRERo7ty5zv3LsjVt2lR169bVI488ovT0dLdL96Tj94yaNm2abrnlFjVt2lQ33XSTKlWqpO3bt+uTTz5Rq1atNHny5H9sW2xsrGJiYjRkyBD99ddfCg0N1bx58077nnAAgGLOy7/2BwDAKU2ZMsUkWfPmzXM9lpaWZg8++KBVrVrVypQpY61atbLly5fn+kn3vH723czs7bfftjp16lhAQIBdcMEF9sUXX+T62fdsM2bMsGbNmlmZMmUsJCTEGjVqZA899JDt2rXLzMxWrVplN998s0VHR1tgYKBVrlzZunTpYitWrChQnpMnT7bY2FgrXbq0ValSxe655x5LTEx0Wyb7J9H//vvvU64ve9nsf0FBQVajRg3r0qWLvf7665aWlpbrOXnlXrNmTevcuXOuZSVZ//79c8VTUlJsxIgRVrduXQsICLCKFStay5Yt7dlnn7WMjAwzOzEezzzzTJ5t//PPP613794WGRlppUuXturVq1uXLl1s7ty5zjIzZ840SfbLL7+4Pfebb74xSfbNN984sT179ljnzp0tJCTEJLltG/k5fPiwPfvss3bRRRdZcHCwBQQEWL169WzAgAG2efNmZ7mdO3fa9ddfb+XLl7ewsDDr3r277dq1yyTZyJEj3db55JNPWvXq1c3Pz88k2datW53H5s2bZ61bt7Zy5cpZuXLlLDY21vr3728bNmxwW8eLL75oNWvWtMDAQGvevLktXbrUmjVrZp06dXJbbu/evdavXz+rWLGiBQQEWKNGjXJt//80DvntMwUZm6eeesqaN29u5cuXtzJlylhsbKw9/fTTzvjnJ3vs8vuX3Z9du3a1kJAQi4+Pd3v+woULTZKNHz/eiR04cMDuu+8+q169ugUEBFiNGjWsT58+tn///n/Ms6DHhgMHDtgtt9xioaGhFhYWZrfccoutXr061zqz98ec8tuHatasaX369HGLLV682Jo0aWIBAQEWExNjr776qj344IMWFBT0j32a04cffmjt27e3ChUqWGBgoNWtW9cefPDBfzyePPLIIybJ6tatm+8y33zzjXXs2NHCwsIsKCjIYmJirG/fvm7Hvj59+li5cuXyfP769eutXbt2FhwcbBUrVrQ77rjD1qxZU6A+zKuvAADFl8vMQ3dVBAAAwFmXlZWlSpUqqWvXrnrllVd83Rx40XXXXad169bleQ8zAACKI+4pBQAAUESlpaXluhfSm2++qYSEBLVt29Y3jYJXHDlyxO3vTZs26dNPP2XcAQAlCjOlAAAAiqhvv/1WgwYNUvfu3RUREaFVq1bptdde07nnnquVK1cqICDA103EWVK1alX17dtXderU0bZt2zRt2jSlp6dr9erVqlevnq+bBwCAR3CjcwAAgCKqVq1aioqK0osvvqiEhASFh4erd+/eGjduHAWpEq5Tp0569913tWfPHgUGBiouLk5jxoyhIAUAKFGYKQUAAAAAAACv455SAAAAAAAA8DqKUgAAAAAAAPC6En9PqaysLO3atUshISFyuVy+bg4AAAAAAECJZmZKSUlRtWrV5OeX/3yoEl+U2rVrl6KionzdDAAAAAAAgH+VHTt2qEaNGvk+XuKLUiEhIZKOd0RoaKiPWwMAAAAAAFCyJScnKyoqyqnJ5KfEF6WyL9kLDQ2lKAUAAAAAAOAlp7qNEjc6BwAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXUZQCAAAAAACA11GUAgAAAAAAgNdRlAIAAAAAAIDXlfJ1AwDgn4xbvd/XTShWhjep6OsmAAAAAECBMFMKAAAAAAAAXkdRCgAAAAAAAF5HUQoAAAAAAABeR1EKAAAAAAAAXkdRCgAAAAAAAF7n06JUrVq15HK5cv3r37+/JCktLU39+/dXRESEgoOD1a1bN+3du9eXTQYAAAAAAIAH+LQo9csvv2j37t3Ov6+++kqS1L17d0nSoEGDtGjRIn3wwQdasmSJdu3apa5du/qyyQAAAAAAAPCAUr588UqVKrn9PW7cOMXExKhNmzZKSkrSa6+9ptmzZ+vyyy+XJM2cOVPnnnuufvzxR7Vo0cIXTQYAAAAAAIAH+LQolVNGRobefvttDR48WC6XSytXrtTRo0fVrl07Z5nY2FhFR0dr+fLl+Ral0tPTlZ6e7vydnJwsScrMzFRmZqYkyeVyyc/PT1lZWTIzZ9nsePZyp4r7+fnJ5XLlGZekrKysAsX9/f1lZnnGT25jfnFyIqeSmpP+t06X3NticuUdd/lJZmcUN0ly+UmW9b9XyfGaLpdc5t5fpxU/SzllZmay7ZETOZETOZETOZETOZETOZGTT3PK9bkuH0WmKPXhhx/q4MGD6tu3ryRpz549CggIUPny5d2Wq1Klivbs2ZPvesaOHavRo0fniq9bt07BwcGSpPDwcEVHR2vnzp1KSEhwlomMjFRkZKTi4+OVkpLixKOiohQREaFNmzYpLS3NidepU0ehoaFav369W4fXr19fAQEBWrt2rVsbGjVqpIyMDG3YsMGJ+fv7q1GjRkpJSdGWLVuceFBQkGJjY5WYmKgdO3Y48ZCQEMXExGjfvn1u/UBO5FRScyqVWUmZ/qVVJfFEGyVpb4U68s88qorJJ9po8tPe8DoKOHpE4am7nPgxvwDtLx+tMukpCju8z4mnlyqrxNBqCj6SqOC0E20/EhCqpODKCju0X2Uykp14alC4UsuGq3zKHgUeO+zEk8pW1pGgUEUk7VSprAwnnhBcTRkBZVU5MV4unTjI7w+NOms5xcensO2REzmREzmREzmREzmREzmRk09zSk1NVUG47OSyl4907NhRAQEBWrRokSRp9uzZ6tevn9usJ0lq3ry5LrvsMo0fPz7P9eQ1UyoqKkoJCQkKDQ2V9O+pTJITOZWEnJ5Zc/ygxkypgrV9SOMItj1yIidyIidyIidyIidyIidy8mlOycnJCg8PV1JSklOLyUuRmCm1bds2ff3115o/f74Ti4yMVEZGhg4ePOg2W2rv3r2KjIzMd12BgYEKDAzMFff395e/v79bLHtQ8lrW23GXy5VnPL82FjZOTuSUX7zI5+Q6Xqgxt/LQCXnGXS4Pxf1OKgP97zVdeeda6PhZyCm7/9j2yMlTcXIiJ0+1sbBxciInT7WxsHFyIidPtbGwcXIiJ0+1sbDxs5FTfq+f6zkFWuosmzlzpipXrqzOnTs7sWbNmql06dJavHixE9uwYYO2b9+uuLg4XzQTAAAAAAAAHuLzmVJZWVmaOXOm+vTpo1KlTjQnLCxMt912mwYPHqzw8HCFhoZqwIABiouL45f3AAAAAAAAijmfF6W+/vprbd++XbfeemuuxyZOnCg/Pz9169ZN6enp6tixo6ZOneqDVgIAAAAAAMCTisyNzs+W5ORkhYWFnfLmWgCKpnGr9/u6CcXK8CYVfd0EAAAAAP9yBa3FFIl7SgEAAAAAAODfhaIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvK6UrxsAAAAAAMDZNm71fl83odgY3qSir5uAfwlmSgEAAAAAAMDrKEoBAAAAAADA63xelPrrr7/0f//3f4qIiFCZMmXUqFEjrVixwnnczPT444+ratWqKlOmjNq1a6dNmzb5sMUAAAAAAAA4Uz4tSiUmJqpVq1YqXbq0PvvsM61fv17PPfecKlSo4CwzYcIEvfjii5o+fbp++uknlStXTh07dlRaWpoPWw4AAAAAAIAz4dMbnY8fP15RUVGaOXOmE6tdu7bzfzPTpEmT9Oijj+raa6+VJL355puqUqWKPvzwQ910001ebzMAAAAAAADOnE+LUh999JE6duyo7t27a8mSJapevbruvfde3XHHHZKkrVu3as+ePWrXrp3znLCwMF188cVavnx5nkWp9PR0paenO38nJydLkjIzM5WZmSlJcrlc8vPzU1ZWlszMWTY7nr3cqeJ+fn5yuVx5xiUpKyurQHF/f3+ZWZ7xk9uYX5ycyKmk5qT/rdMl97aYXHnHXX6S2RnFTZJcfpJl/e9VcrymyyWXuffXacXPUk6ZmZlse+RETuRETuRETuRETnnEc56TFefzvfziHs1JYtsjpzPKKdfnunz4tCi1ZcsWTZs2TYMHD9bDDz+sX375Rffff78CAgLUp08f7dmzR5JUpUoVt+dVqVLFeexkY8eO1ejRo3PF161bp+DgYElSeHi4oqOjtXPnTiUkJDjLREZGKjIyUvHx8UpJSXHiUVFRioiI0KZNm9wuG6xTp45CQ0O1fv16tw6vX7++AgICtHbtWrc2NGrUSBkZGdqwYYMT8/f3V6NGjZSSkqItW7Y48aCgIMXGxioxMVE7duxw4iEhIYqJidG+ffvc+oCcyKmk5lQqs5Iy/UurSuKJNkrS3gp15J95VBWTT7TR5Ke94XUUcPSIwlN3OfFjfgHaXz5aZdJTFHZ4nxNPL1VWiaHVFHwkUcFpJ9p+JCBUScGVFXZov8pkJDvx1KBwpZYNV/mUPQo8dtiJJ5WtrCNBoYpI2qlSWRlOPCG4mjICyqpyYrxcOnGQ3x8addZyio9PYdsjJ3IiJ3IiJ3IiJ3LKI6cqifFOvDif73njHFYS2x45nVFOqampKgiXnVz28qKAgABdeOGFWrZsmRO7//779csvv2j58uVatmyZWrVqpV27dqlq1arOMj169JDL5dJ7772Xa515zZSKiopSQkKCQkNDJf17KpPkRE4lIadn1hw/qBXbb5m8/M3ZkMYRbHvkRE7kRE7kRE7kRE55xJ/5db8TK87ne/nFPZnT8KaV2PbI6YxySk5OVnh4uJKSkpxaTF58OlOqatWqatCggVvs3HPP1bx58yQdrxRK0t69e92KUnv37tUFF1yQ5zoDAwMVGBiYK+7v7y9/f3+3WPag5LWst+MulyvPeH5tLGycnMgpv3iRz8l1/C3V3N5aT8gz7nJ5KO530tv//17TlXeuhY6fhZyy+49tj5w8FScncvJUGwsbJydy8lQbCxsnp5KbU57nZMXwfO/Ucc/kxLZHTmeSU36vn+s5BVrqLGnVqpXb1DFJ2rhxo2rWrCnp+E3PIyMjtXjxYufx5ORk/fTTT4qLi/NqWwEAAAAAAOA5Pp0pNWjQILVs2VJjxoxRjx499PPPP2vGjBmaMWOGpOPVuoEDB+qpp55SvXr1VLt2bT322GOqVq2arrvuOl82HQAAAAAAAGfAp0Wpiy66SAsWLNCIESP0xBNPqHbt2po0aZJ69erlLPPQQw/p0KFDuvPOO3Xw4EG1bt1an3/+uYKCgnzYcgAAAAAAAJwJn97o3BuSk5MVFhZ2yptrASiaxq3ef+qF4BjepKKvmwAAAFAkcV5ZcJxT4kwVtBbj03tKAQAAAAAA4N+JohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8jqIUAAAAAAAAvI6iFAAAAAAAALyOohQAAAAAAAC8rpSvGwAAAAAAAEqmcav3+7oJxcrwJhV93QSvYqYUAAAAAAAAvI6iFAAAAAAAALyOy/cAAChCmOJeOP+2Ke4AAAAlCTOlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1FKUAAAAAAADgdRSlAAAAAAAA4HUUpQAAAAAAAOB1Pi1KjRo1Si6Xy+1fbGys83haWpr69++viIgIBQcHq1u3btq7d68PWwwAAAAAAABP8PlMqYYNG2r37t3Ovx9++MF5bNCgQVq0aJE++OADLVmyRLt27VLXrl192FoAAAAAAAB4QimfN6BUKUVGRuaKJyUl6bXXXtPs2bN1+eWXS5Jmzpypc889Vz/++KNatGjh7aYCAAAAAADAQ3xelNq0aZOqVaumoKAgxcXFaezYsYqOjtbKlSt19OhRtWvXzlk2NjZW0dHRWr58eb5FqfT0dKWnpzt/JycnS5IyMzOVmZkpSXK5XPLz81NWVpbMzFk2O5693Knifn5+crlcecYlKSsrq0Bxf39/mVme8ZPbmF+cnMippOak/63TJfe2mFx5x11+ktkZxU2SXH6SZf3vVXK8pssll7n312nFz1JOmZmZbHvFPCdJxXLbyy9+1ven//Uf2x45kRM5kRM5nSqnnO8hxfl8L7+4R3OSPDZOudZflM4jiuA4FZf96VTHiFyf6/Lh06LUxRdfrFmzZql+/fravXu3Ro8erUsuuUS///679uzZo4CAAJUvX97tOVWqVNGePXvyXefYsWM1evToXPF169YpODhYkhQeHq7o6Gjt3LlTCQkJzjKRkZGKjIxUfHy8UlJSnHhUVJQiIiK0adMmpaWlOfE6deooNDRU69evd+vw+vXrKyAgQGvXrnVrQ6NGjZSRkaENGzY4MX9/fzVq1EgpKSnasmWLEw8KClJsbKwSExO1Y8cOJx4SEqKYmBjt27fPrR/IiZxKak6lMisp07+0qiSeaKMk7a1QR/6ZR1Ux+UQbTX7aG15HAUePKDx1lxM/5heg/eWjVSY9RWGH9znx9FJllRhaTcFHEhWcdqLtRwJClRRcWWGH9qtMRrITTw0KV2rZcJVP2aPAY4edeFLZyjoSFKqIpJ0qlZXhxBOCqykjoKwqJ8bLpRMH+f2hUWctp/j4FLa9Yp6TVLZYbnu+2p/S0kLY9siJnMiJnMipQDlVSYx34sX5fM8b77mSPDZOOfugqJ1HFMVxKi7706mOEampqSoIl51c9vKhgwcPqmbNmnr++edVpkwZ9evXz23WkyQ1b95cl112mcaPH5/nOvKaKRUVFaWEhASFhoZKKjmV/pL47QU5kdPJ8WfWHD+oFaVvL4ryNzJDGkew7RXznCasSSiW215+8bO9Pz3UpJIktj1yIidyIidyOnVOz/y634kV5/O9/OKezGl400oeG6cJq/8uEjkVl3Eaen4Ft3UU1f3pVMeI5ORkhYeHKykpyanF5MXnl+/lVL58eZ1zzjnavHmz2rdvr4yMDB08eNBtttTevXvzvAdVtsDAQAUGBuaK+/v7y9/f3y2WPSh5LevtuMvlyjOeXxsLGycncsovXuRzch0/VJvbIfuEPOMul4fifie9rfzvNV1551ro+FnIKbv/2PaKd07Fcds7dfzs5JR9ySPbHjmREzl5Kk5OJTenPN9beM/NN+6pccq734vGeYQTL0LjVFz2p1PF83v9XM8p0FJekpqaqj///FNVq1ZVs2bNVLp0aS1evNh5fMOGDdq+fbvi4uJ82EoAAAAAAACcKZ/OlBoyZIiuvvpq1axZU7t27dLIkSPl7++vm2++WWFhYbrttts0ePBghYeHKzQ0VAMGDFBcXBy/vAcAAAAAAFDM+bQotXPnTt188806cOCAKlWqpNatW+vHH39UpUrH7w8xceJE+fn5qVu3bkpPT1fHjh01depUXzYZAAAAAAAAHuDTotScOXP+8fGgoCBNmTJFU6ZM8VKLAAAAAAAA4A1F6p5SAAAAAAAA+HegKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK8r5esGoHDGrd7v6yYUK8ObVPR1EwAAAAAAQB6YKQUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAr6MoBQAAAAAAAK+jKAUAAAAAAACvoygFAAAAAAAAryt0UWrHjh3auXOn8/fPP/+sgQMHasaMGR5tGAAAAAAAAEquQhelevbsqW+++UaStGfPHrVv314///yzHnnkET3xxBMebyAAAAAAAABKnkIXpX7//Xc1b95ckvT+++/rvPPO07Jly/TOO+9o1qxZnm4fAAAAAAAASqBCF6WOHj2qwMBASdLXX3+ta665RpIUGxur3bt3n3ZDxo0bJ5fLpYEDBzqxtLQ09e/fXxEREQoODla3bt20d+/e034NAAAAAAAAFA2FLko1bNhQ06dP1/fff6+vvvpKnTp1kiTt2rVLERERp9WIX375RS+//LLOP/98t/igQYO0aNEiffDBB1qyZIl27dqlrl27ntZrAAAAAAAAoOgodFFq/Pjxevnll9W2bVvdfPPNaty4sSTpo48+ci7rK4zU1FT16tVLr7zyiipUqODEk5KS9Nprr+n555/X5ZdfrmbNmmnmzJlatmyZfvzxx0K/DgAAAAAAAIqOUoV9Qtu2bbV//34lJye7FZHuvPNOlS1bttAN6N+/vzp37qx27drpqaeecuIrV67U0aNH1a5dOycWGxur6OhoLV++XC1atCj0awEAAAAAAKBoKHRRSpLMTCtXrtSff/6pnj17KiQkRAEBAYUuSs2ZM0erVq3SL7/8kuuxPXv2KCAgQOXLl3eLV6lSRXv27Ml3nenp6UpPT3f+Tk5OliRlZmYqMzNTkuRyueTn56esrCyZmbNsdjx7uVPF/fz85HK58oxLUlZWVoHi/v7+MrM847naaFkyl59kJpdOxE2SXH6SZcmVYx0ml+RyyWXu6z6tuOT2mv8Yz6ONhY17IqfMzEyfjFN+8eK87fkqJ/1vncVt2ztl/CzllJmZybZXzHOSVCy3vfziZ31/+l//se2REzmREzmR06lyyvkeUpzP9/KLezQnyWPjlGv9Rek8ogiOU3HZn051jMj1uS4fhS5Kbdu2TZ06ddL27duVnp6u9u3bKyQkROPHj1d6erqmT59eoPXs2LFDDzzwgL766isFBQUVthn5Gjt2rEaPHp0rvm7dOgUHB0uSwsPDFR0drZ07dyohIcFZJjIyUpGRkYqPj1dKSooTj4qKUkREhDZt2qS0tDQnXqdOHYWGhmr9+vVuHV6/fn0FBARo7dq1bm1o1KiRMjIytGHDBifm7++vRo0aKSUlRVu2bHHiQUFBio2NVWJionbs2OHEy6f4KTG0moKPJCo47UTbjwSEKim4ssIO7VeZjGQnnhoUrtSy4SqfskeBxw478aSylXUkKFQRSTtVKivDiScEV1NGQFlVToyXSyc2yv2hUcr0L60qiSfaKEl7K9SRf+ZRVUw+0UaTn/aG11HA0SMKT93lxI/5BWh/+WiVSU9R2OF9Tjy9VNmzltPatbt8Mk4hISGKiYnRvn373IqoxXnb81VOpTIrFcttT/LN/hQfn8K2V8xzksoWy23PV/tTWloI2x45kRM5kRM5FSinKonxTrw4n+954z1XksfGKWcfFLXziKI4TsVlfzrVMSI1NVUF4bKTy16ncN111ykkJESvvfaaIiIitGbNGtWpU0fffvut7rjjDm3atKlA6/nwww91/fXXy9/f34llZmY6FbYvvvhC7dq1U2JiottsqZo1a2rgwIEaNGhQnuvNa6ZUVFSUEhISFBoaejzpYlzpf3bNASrIhWj7kMYRRb6CnDNelLc9X+X0zJrjB7Xitu2dMn6WchrSOIJtr5jnNGFNQrHc9vKLn+396aEmlSSx7ZETOZETOZHTqXN65tf9Tqw4n+/lF/dkTsObVvLYOE1Y/XeRyKm4jNPQ8yu4raOo7k+nOkYkJycrPDxcSUlJTi0mL4WeKfX9999r2bJlCggIcIvXqlVLf/31V4HXc8UVV+Sq3PXr10+xsbEaNmyYoqKiVLp0aS1evFjdunWTJG3YsEHbt29XXFxcvusNDAxUYGBgrri/v79bAUw6MSh5LevtuMvlyjN+chvN5Zf9BGdHcV+R30m7wUnPO9N4Xq+ZXzzfNhY2fvo55exTb47T6caL8rZ3uvEzbqPr+DZR3La9AsXPQk7Z/ce2V7xzKo7b3qnjZyen7Ese2fbIiZzIyVNxciq5OeX53sJ7br5xT41T3v1eNM4jnHgRGqfisj+dKp7f65+s0EWprKysPK8N3Llzp0JCQgq8npCQEJ133nlusXLlyikiIsKJ33bbbRo8eLDCw8MVGhqqAQMGKC4ujpucAwAAAAAAFHP5fEWbvw4dOmjSpEnO3y6XS6mpqRo5cqSuuuoqT7ZNEydOVJcuXdStWzddeumlioyM1Pz58z36GgAAAAAAAPC+Qs+Ueu6559SxY0c1aNBAaWlp6tmzpzZt2qSKFSvq3XffPaPGfPvtt25/BwUFacqUKZoyZcoZrRcAAAAAAABFS6GLUjVq1NCaNWs0Z84c/fbbb0pNTdVtt92mXr16qUyZMmejjQAAAAAAAChhCl2UkqRSpUrp//7v/zzdFgAAAAAAAPxLFLoo9eabb/7j47179z7txgAAAAAAAODfodBFqQceeMDt76NHj+rw4cMKCAhQ2bJlKUoBAAAAAADglApdlEpMTMwV27Rpk+655x4NHTrUI40CAAAAcHaNW73f100oNoY3qejrJgBAieTniZXUq1dP48aNyzWLCgAAAAAAAMiLR4pS0vGbn+/atctTqwMAAAAAAEAJVujL9z766CO3v81Mu3fv1uTJk9WqVSuPNQwAAAAAAAAlV6GLUtddd53b3y6XS5UqVdLll1+u5557zlPtAgAAAAAAQAlW6KJUVlbW2WgHAAAAAAAA/kU8dk8pAAAAAAAAoKAKNFNq8ODBBV7h888/f9qNAQAAAAAAwL9DgYpSq1evLtDKXC7XGTUGAAAAAAAA/w4FKkp98803Z7sdAAAAAAAA+BfhnlIAAAAAAADwukL/+p4krVixQu+//762b9+ujIwMt8fmz5/vkYYBAAAAAACg5Cr0TKk5c+aoZcuW+u9//6sFCxbo6NGjWrdunf7zn/8oLCzsbLQRAAAAAAAAJUyhi1JjxozRxIkTtWjRIgUEBOiFF17QH3/8oR49eig6OvpstBEAAAAAAAAlTKGLUn/++ac6d+4sSQoICNChQ4fkcrk0aNAgzZgxw+MNBAAAAAAAQMlT6KJUhQoVlJKSIkmqXr26fv/9d0nSwYMHdfjwYc+2DgAAAAAAACVSgYtS2cWnSy+9VF999ZUkqXv37nrggQd0xx136Oabb9YVV1xxdloJAAAAAACAEqXAv753/vnn66KLLtJ1112n7t27S5IeeeQRlS5dWsuWLVO3bt306KOPnrWGAgAAAAAAoOQocFFqyZIlmjlzpsaOHaunn35a3bp10+23367hw4efzfYBAAAAAACgBCrw5XuXXHKJXn/9de3evVsvvfSS4uPj1aZNG51zzjkaP3689uzZczbbCQAAAAAAgBKk0Dc6L1eunPr166clS5Zo48aN6t69u6ZMmaLo6Ghdc801Z6ONAAAAAAAAKGEKXZTKqW7dunr44Yf16KOPKiQkRJ988omn2gUAAAAAAIASrMD3lDrZd999p9dff13z5s2Tn5+fevToodtuu82TbQMAAAAAAEAJVaii1K5duzRr1izNmjVLmzdvVsuWLfXiiy+qR48eKleu3NlqIwAAAAAAAEqYAhelrrzySn399deqWLGievfurVtvvVX169c/m20DAAAAAABACVXgolTp0qU1d+5cdenSRf7+/mezTQAAAAAAACjhClyU+uijj85mOwAAAAAAAPAvcka/vgcAAAAAAACcDopSAAAAAAAA8DqKUgAAAAAAAPA6ilIAAAAAAADwOopSAAAAAAAA8DqKUgAAAAAAAPC6Ur5uAACgaBq3er+vm1BsDG9S0ddNAAAAAIodZkoBAAAAAADA65gpBRQAM0YKh1kjAAAAAIBTYaYUAAAAAAAAvI6ZUgAAAGJWbGEwIxYAAHgCM6UAAAAAAADgdRSlAAAAAAAA4HU+LUpNmzZN559/vkJDQxUaGqq4uDh99tlnzuNpaWnq37+/IiIiFBwcrG7dumnv3r0+bDEAAAAAAAA8wadFqRo1amjcuHFauXKlVqxYocsvv1zXXnut1q1bJ0kaNGiQFi1apA8++EBLlizRrl
gitextract_acehzbqp/ ├── .gitignore ├── README.md ├── examples/ │ ├── agent_server.ipynb │ ├── agents.ipynb │ ├── datasets.ipynb │ ├── error_dataset.ipynb │ ├── evaluation.ipynb │ ├── finetuning.ipynb │ ├── generators.ipynb │ ├── lora_tuning.py │ └── simple_pipeline.ipynb ├── premsql/ │ ├── __init__.py │ ├── agents/ │ │ ├── __init__.py │ │ ├── base.py │ │ ├── baseline/ │ │ │ ├── __init__.py │ │ │ ├── main.py │ │ │ ├── prompts.py │ │ │ └── workers/ │ │ │ ├── __init__.py │ │ │ ├── analyser.py │ │ │ ├── followup.py │ │ │ ├── plotter.py │ │ │ └── text2sql.py │ │ ├── memory.py │ │ ├── models.py │ │ ├── router.py │ │ ├── tools/ │ │ │ ├── __init__.py │ │ │ └── plot/ │ │ │ ├── base.py │ │ │ └── matplotlib_tool.py │ │ └── utils.py │ ├── cli.py │ ├── datasets/ │ │ ├── __init__.py │ │ ├── base.py │ │ ├── collator.py │ │ ├── error_dataset.py │ │ ├── real/ │ │ │ ├── bird.py │ │ │ ├── domains.py │ │ │ └── spider.py │ │ └── synthetic/ │ │ └── gretel.py │ ├── evaluator/ │ │ ├── README.md │ │ ├── __init__.py │ │ └── base.py │ ├── executors/ │ │ ├── __init__.py │ │ ├── base.py │ │ ├── from_langchain.py │ │ └── from_sqlite.py │ ├── generators/ │ │ ├── __init__.py │ │ ├── base.py │ │ ├── huggingface.py │ │ ├── mlx.py │ │ ├── ollama_model.py │ │ ├── openai.py │ │ └── premai.py │ ├── logger.py │ ├── playground/ │ │ ├── __init__.py │ │ ├── backend/ │ │ │ ├── api/ │ │ │ │ ├── __init__.py │ │ │ │ ├── admin.py │ │ │ │ ├── apps.py │ │ │ │ ├── migrations/ │ │ │ │ │ ├── 0001_initial.py │ │ │ │ │ └── __init__.py │ │ │ │ ├── models.py │ │ │ │ ├── pydantic_models.py │ │ │ │ ├── serializers.py │ │ │ │ ├── services.py │ │ │ │ ├── tests.py │ │ │ │ ├── urls.py │ │ │ │ ├── utils.py │ │ │ │ └── views.py │ │ │ ├── backend/ │ │ │ │ ├── __init__.py │ │ │ │ ├── asgi.py │ │ │ │ ├── settings.py │ │ │ │ ├── urls.py │ │ │ │ └── wsgi.py │ │ │ ├── backend_client.py │ │ │ └── manage.py │ │ ├── frontend/ │ │ │ ├── components/ │ │ │ │ ├── chat.py │ │ │ │ ├── session.py │ │ │ │ ├── streamlit_plot.py │ │ │ │ └── uploader.py │ │ │ ├── main.py │ │ │ └── utils.py │ │ └── inference_server/ │ │ ├── api_client.py │ │ └── service.py │ ├── prompts.py │ ├── tuner/ │ │ ├── __init__.py │ │ ├── callback.py │ │ ├── config.py │ │ ├── full.py │ │ └── peft.py │ └── utils.py └── pyproject.toml
SYMBOL INDEX (340 symbols across 55 files)
FILE: premsql/agents/base.py
class WorkerBase (line 24) | class WorkerBase(ABC):
method run (line 26) | def run(self):
class AnalysisWorkerBase (line 30) | class AnalysisWorkerBase(ABC):
method run (line 32) | def run(
class ChartPlotWorkerBase (line 38) | class ChartPlotWorkerBase(ABC):
method run (line 40) | def run(
class RouterWorkerBase (line 46) | class RouterWorkerBase(ABC):
method run (line 48) | def run(
class Text2SQLWorkerBase (line 54) | class Text2SQLWorkerBase(ABC):
method __init__ (line 55) | def __init__(
method run (line 73) | def run(self, question: str, **kwargs) -> Text2SQLWorkerOutput:
method initialize_database (line 76) | def initialize_database(
class AgentBase (line 101) | class AgentBase(ABC):
method __init__ (line 102) | def __init__(
method run (line 117) | def run(
method convert_exit_output_to_agent_output (line 126) | def convert_exit_output_to_agent_output(
method __call__ (line 156) | def __call__(
FILE: premsql/agents/baseline/main.py
class BaseLineAgent (line 20) | class BaseLineAgent(AgentBase):
method __init__ (line 21) | def __init__(
method run (line 57) | def run(
method _execute_worker (line 104) | def _execute_worker(
method _create_exit_worker_output (line 138) | def _create_exit_worker_output(
method _handle_followup (line 173) | def _handle_followup(self, prev_output: ExitWorkerOutput):
method _handle_followup_route (line 180) | def _handle_followup_route(self, question: str) -> ExitWorkerOutput:
FILE: premsql/agents/baseline/workers/analyser.py
class BaseLineAnalyserWorker (line 28) | class BaseLineAnalyserWorker(AnalysisWorkerBase):
method __init__ (line 29) | def __init__(self, generator: Text2SQLGeneratorBase) -> None:
method run_chunkwise_analysis (line 32) | def run_chunkwise_analysis(
method analyse (line 110) | def analyse(
method run (line 155) | def run(
FILE: premsql/agents/baseline/workers/followup.py
class BaseLineFollowupWorker (line 14) | class BaseLineFollowupWorker(WorkerBase):
method __init__ (line 15) | def __init__(self, generator: Text2SQLGeneratorBase) -> None:
method run (line 18) | def run(
FILE: premsql/agents/baseline/workers/plotter.py
class BaseLinePlotWorker (line 15) | class BaseLinePlotWorker(ChartPlotWorkerBase):
method __init__ (line 16) | def __init__(
method run (line 21) | def run(
FILE: premsql/agents/baseline/workers/text2sql.py
class BaseLineText2SQLWorker (line 20) | class BaseLineText2SQLWorker(Text2SQLWorkerBase):
method __init__ (line 21) | def __init__(
method show_dataframe (line 44) | def show_dataframe(output: Text2SQLWorkerOutput):
method filer_tables_from_schema (line 55) | def filer_tables_from_schema(
method _create_prompt (line 76) | def _create_prompt(
method run (line 142) | def run(
method do_correction (line 230) | def do_correction(
FILE: premsql/agents/memory.py
class AgentInteractionMemory (line 15) | class AgentInteractionMemory:
method __init__ (line 16) | def __init__(self, session_name: str, db_path: Optional[str] = None):
method list_sessions (line 28) | def list_sessions(self) -> List[str]:
method create_table_if_not_exists (line 34) | def create_table_if_not_exists(self):
method get (line 68) | def get(
method get_latest_message_id (line 86) | def get_latest_message_id(self) -> Optional[int]:
method generate_messages_from_session (line 93) | def generate_messages_from_session(
method get_by_message_id (line 105) | def get_by_message_id(self, message_id: int) -> Optional[dict]:
method push (line 114) | def push(self, output: ExitWorkerOutput):
method delete_table (line 133) | def delete_table(self):
method _row_to_exit_worker_output (line 144) | def _row_to_exit_worker_output(self, row) -> ExitWorkerOutput:
method _exit_worker_output_to_tuple (line 175) | def _exit_worker_output_to_tuple(self, output: ExitWorkerOutput) -> tu...
method _parse_json (line 201) | def _parse_json(self, json_str):
method _serialize_json (line 212) | def _serialize_json(self, obj):
method clear (line 223) | def clear(self):
method close (line 228) | def close(self):
method __del__ (line 231) | def __del__(self):
method delete_table (line 234) | def delete_table(self):
method get_latest_dataframe (line 245) | def get_latest_dataframe(
FILE: premsql/agents/models.py
class BaseWorkerOutput (line 12) | class BaseWorkerOutput(BaseModel):
class Text2SQLWorkerOutput (line 22) | class Text2SQLWorkerOutput(BaseWorkerOutput):
method show_output_dataframe (line 31) | def show_output_dataframe(self) -> pd.DataFrame:
class AnalyserWorkerOutput (line 39) | class AnalyserWorkerOutput(BaseWorkerOutput):
class ChartPlotWorkerOutput (line 47) | class ChartPlotWorkerOutput(BaseWorkerOutput):
class RouterWorkerOutput (line 57) | class RouterWorkerOutput(BaseWorkerOutput):
class FollowupWorkerOutput (line 66) | class FollowupWorkerOutput(BaseWorkerOutput):
class ExitWorkerOutput (line 74) | class ExitWorkerOutput(BaseModel):
method show_output_dataframe (line 115) | def show_output_dataframe(
class AgentOutput (line 131) | class AgentOutput(BaseModel):
method show_output_dataframe (line 150) | def show_output_dataframe(
FILE: premsql/agents/router.py
class SimpleRouterWorker (line 12) | class SimpleRouterWorker(RouterWorkerBase):
method run (line 13) | def run(
FILE: premsql/agents/tools/plot/base.py
class BasePlotTool (line 9) | class BasePlotTool(ABC):
method run (line 11) | def run(self, data: pd.DataFrame, plot_config: dict):
method convert_plot_to_image (line 15) | def convert_plot_to_image(self, fig):
method convert_image_to_base64 (line 18) | def convert_image_to_base64(self, image: Image.Image) -> str:
method save_image (line 23) | def save_image(self, image: Image.Image, file_path: str, format: str =...
method plot_from_base64 (line 26) | def plot_from_base64(self, output_base64: str):
FILE: premsql/agents/tools/plot/matplotlib_tool.py
class SimpleMatplotlibTool (line 16) | class SimpleMatplotlibTool(BasePlotTool):
method __init__ (line 17) | def __init__(self):
method run (line 28) | def run(self, data: pd.DataFrame, plot_config: Dict[str, str]) -> Figure:
method _validate_config (line 49) | def _validate_config(self, df: pd.DataFrame, plot_config: Dict[str, st...
method _area_plot (line 66) | def _area_plot(self, df: pd.DataFrame, x: str, y: str, ax: Axes) -> None:
method _bar_plot (line 69) | def _bar_plot(self, df: pd.DataFrame, x: str, y: str, ax: Axes) -> None:
method _scatter_plot (line 72) | def _scatter_plot(self, df: pd.DataFrame, x: str, y: str, ax: Axes) ->...
method _histogram_plot (line 75) | def _histogram_plot(self, df: pd.DataFrame, x: str, y: str, ax: Axes) ...
method _line_plot (line 78) | def _line_plot(self, df: pd.DataFrame, x: str, y: str, ax: Axes) -> None:
method convert_plot_to_image (line 81) | def convert_plot_to_image(self, fig: Figure) -> Image.Image:
FILE: premsql/agents/utils.py
function convert_df_to_dict (line 12) | def convert_df_to_dict(df: pd.DataFrame):
function execute_and_render_result (line 16) | def execute_and_render_result(
function _render_error (line 26) | def _render_error(error: str, sql: str, using: str) -> Dict[str, Any]:
function _render_data (line 36) | def _render_data(result, sql: str, using: str) -> Dict[str, Any]:
function convert_exit_output_to_agent_output (line 57) | def convert_exit_output_to_agent_output(exit_output: ExitWorkerOutput) -...
FILE: premsql/cli.py
function cli (line 10) | def cli():
function launch (line 15) | def launch():
function launch_all (line 21) | def launch_all():
function launch_api (line 60) | def launch_api():
function stop (line 88) | def stop():
FILE: premsql/datasets/__init__.py
class Text2SQLDataset (line 12) | class Text2SQLDataset:
method __init__ (line 13) | def __init__(
method raw_dataset (line 40) | def raw_dataset(self):
method filter_availables (line 44) | def filter_availables(self):
method setup_dataset (line 47) | def setup_dataset(
FILE: premsql/datasets/base.py
class Text2SQLBaseInstance (line 32) | class Text2SQLBaseInstance:
method __init__ (line 33) | def __init__(self, dataset: list[dict]) -> None:
method __repr__ (line 42) | def __repr__(self) -> str:
method __len__ (line 45) | def __len__(self) -> int:
method __getitem__ (line 48) | def __getitem__(self, idx: int) -> dict:
method schema_prompt (line 51) | def schema_prompt(self, db_path: str) -> str:
method additional_prompt (line 76) | def additional_prompt(self, prompt: Optional[str] = None):
method add_few_shot_examples (line 79) | def add_few_shot_examples(self, db_id: str, k: int = 3) -> str:
method apply_prompt (line 86) | def apply_prompt(
class SupervisedDatasetForTraining (line 114) | class SupervisedDatasetForTraining(torch.utils.data.Dataset):
method load_from_pth (line 116) | def load_from_pth(cls, dataset_path: Union[str, Path]):
method __init__ (line 130) | def __init__(
method preprocess (line 190) | def preprocess(self, sources: Sequence[str], targets: Sequence[str]):
method __len__ (line 203) | def __len__(self):
method __getitem__ (line 206) | def __getitem__(self, idx: int):
method save_tokenized_dataset (line 216) | def save_tokenized_dataset(self, path_to_save: Union[str, Path]):
class Text2SQLBaseDataset (line 221) | class Text2SQLBaseDataset(ABC):
method __init__ (line 222) | def __init__(
method raw_dataset (line 240) | def raw_dataset(self):
method filter_availables (line 244) | def filter_availables(self):
method setup_dataset (line 248) | def setup_dataset(
method __len__ (line 281) | def __len__(self):
method __getitem__ (line 284) | def __getitem__(self, idx):
class StandardDataset (line 288) | class StandardDataset(Text2SQLBaseDataset):
method __init__ (line 289) | def __init__(
method setup_dataset (line 305) | def setup_dataset(
FILE: premsql/datasets/collator.py
class DataCollatorForSupervisedDataset (line 15) | class DataCollatorForSupervisedDataset:
method __call__ (line 18) | def __call__(self, instances: Sequence[dict]) -> dict[str, torch.Tensor]:
FILE: premsql/datasets/error_dataset.py
class ErrorDatasetInstance (line 20) | class ErrorDatasetInstance(Text2SQLBaseInstance):
method __init__ (line 22) | def __init__(self, dataset: list[dict]) -> None:
method apply_prompt (line 25) | def apply_prompt(self, prompt_template: Optional[str] = ERROR_HANDLING...
class ErrorDatasetGenerator (line 51) | class ErrorDatasetGenerator:
method from_existing (line 53) | def from_existing(
method __init__ (line 77) | def __init__(
method generate_and_save (line 87) | def generate_and_save(
FILE: premsql/datasets/real/bird.py
class BirdDataset (line 12) | class BirdDataset(Text2SQLBaseDataset):
method __init__ (line 13) | def __init__(
method setup_dataset (line 52) | def setup_dataset(
FILE: premsql/datasets/real/domains.py
class DomainsDataset (line 12) | class DomainsDataset(Text2SQLBaseDataset):
method __init__ (line 13) | def __init__(
method setup_dataset (line 52) | def setup_dataset(
FILE: premsql/datasets/real/spider.py
class SpiderUnifiedDataset (line 12) | class SpiderUnifiedDataset(Text2SQLBaseDataset):
method __init__ (line 13) | def __init__(
method setup_dataset (line 52) | def setup_dataset(
FILE: premsql/datasets/synthetic/gretel.py
class GretelAIInstance (line 19) | class GretelAIInstance(Text2SQLBaseInstance):
method __init__ (line 20) | def __init__(self, dataset: list[dict]) -> None:
method apply_prompt (line 23) | def apply_prompt(
class GretelAIDataset (line 47) | class GretelAIDataset(Text2SQLBaseDataset):
method __init__ (line 48) | def __init__(
method setup_dataset (line 86) | def setup_dataset(
FILE: premsql/evaluator/base.py
class Text2SQLEvaluator (line 13) | class Text2SQLEvaluator:
method __init__ (line 14) | def __init__(
method _execute_model (line 20) | def _execute_model(
method execute (line 66) | def execute(
method compute_metric (line 130) | def compute_metric(self, results: list[dict], metric_name: str) -> float:
FILE: premsql/executors/base.py
class BaseExecutor (line 6) | class BaseExecutor(ABC):
method execute_sql (line 8) | def execute_sql(self, sql: str, dsn_or_db_path: str) -> dict:
method match_sqls (line 11) | def match_sqls(
method clean_abnormal (line 28) | def clean_abnormal(self, input: list[float]) -> list[float]:
method iterated_execution (line 34) | def iterated_execution(
FILE: premsql/executors/from_langchain.py
class ExecutorUsingLangChain (line 10) | class ExecutorUsingLangChain(BaseExecutor):
method execute_sql (line 12) | def execute_sql(self, sql: str, dsn_or_db_path: Union[str, SQLDatabase...
FILE: premsql/executors/from_sqlite.py
class OptimizedSQLiteExecutor (line 10) | class OptimizedSQLiteExecutor(BaseExecutor):
method __init__ (line 11) | def __init__(self, timeout: float = 1000.0) -> None:
method get_connection (line 16) | def get_connection(self, db_path: str) -> Generator[sqlite3.Connection...
method execute_sql (line 31) | def execute_sql(self, sql: str, dsn_or_db_path: str) -> Dict[str, Any]:
method match_sqls (line 57) | def match_sqls(self, predicted_sql: str, gold_sql: str, dsn_or_db_path...
method iterated_execution (line 71) | def iterated_execution(self, predicted_sql: str, gold_sql: str, dsn_or...
class SQLiteExecutor (line 91) | class SQLiteExecutor(BaseExecutor):
method execute_sql (line 92) | def execute_sql(self, sql: str, dsn_or_db_path: str) -> dict:
FILE: premsql/generators/base.py
class Text2SQLGeneratorBase (line 18) | class Text2SQLGeneratorBase(ABC):
method __init__ (line 19) | def __init__(
method load_client (line 40) | def load_client(self):
method load_tokenizer (line 45) | def load_tokenizer(self):
method model_name_or_path (line 50) | def model_name_or_path(self):
method generate (line 54) | def generate(
method execution_guided_decoding (line 64) | def execution_guided_decoding(
method postprocess (line 98) | def postprocess(self, output_string: str):
method load_results_from_folder (line 117) | def load_results_from_folder(self):
method generate_and_save_results (line 124) | def generate_and_save_results(
FILE: premsql/generators/huggingface.py
class Text2SQLGeneratorHF (line 16) | class Text2SQLGeneratorHF(Text2SQLGeneratorBase):
method __init__ (line 17) | def __init__(
method load_client (line 42) | def load_client(self) -> "transformers.PreTrainedModel":
method load_tokenizer (line 56) | def load_tokenizer(self) -> "transformers.PreTrainedTokenizer":
method model_name_or_path (line 66) | def model_name_or_path(self):
method generate (line 69) | def generate(
FILE: premsql/generators/mlx.py
class Text2SQLGeneratorMLX (line 18) | class Text2SQLGeneratorMLX(Text2SQLGeneratorBase):
method __init__ (line 19) | def __init__(
method load_client (line 38) | def load_client(self):
method load_tokenizer (line 44) | def load_tokenizer(self):
method model_name_or_path (line 49) | def model_name_or_path(self):
method generate (line 52) | def generate(
FILE: premsql/generators/ollama_model.py
class Text2SQLGeneratorOllama (line 16) | class Text2SQLGeneratorOllama(Text2SQLGeneratorBase):
method __init__ (line 17) | def __init__(
method load_client (line 34) | def load_client(self):
method load_tokenizer (line 38) | def load_tokenizer(self):
method model_name_or_path (line 42) | def model_name_or_path(self):
method generate (line 45) | def generate(
FILE: premsql/generators/openai.py
class Text2SQLGeneratorOpenAI (line 12) | class Text2SQLGeneratorOpenAI(Text2SQLGeneratorBase):
method __init__ (line 13) | def __init__(
method load_client (line 30) | def load_client(self):
method load_tokenizer (line 35) | def load_tokenizer(self):
method model_name_or_path (line 39) | def model_name_or_path(self):
method generate (line 42) | def generate(
FILE: premsql/generators/premai.py
class Text2SQLGeneratorPremAI (line 12) | class Text2SQLGeneratorPremAI(Text2SQLGeneratorBase):
method __init__ (line 13) | def __init__(
method load_client (line 35) | def load_client(self) -> Prem:
method load_tokenizer (line 39) | def load_tokenizer(self) -> None:
method model_name_or_path (line 43) | def model_name_or_path(self) -> str:
method generate (line 46) | def generate(
FILE: premsql/logger.py
function setup_console_logger (line 4) | def setup_console_logger(name, level=logging.INFO):
FILE: premsql/playground/backend/api/apps.py
class ApiConfig (line 4) | class ApiConfig(AppConfig):
FILE: premsql/playground/backend/api/migrations/0001_initial.py
class Migration (line 7) | class Migration(migrations.Migration):
FILE: premsql/playground/backend/api/models.py
class Session (line 4) | class Session(models.Model):
class Meta (line 12) | class Meta:
class Completions (line 16) | class Completions(models.Model):
class Meta (line 26) | class Meta:
FILE: premsql/playground/backend/api/pydantic_models.py
class SessionCreationRequest (line 11) | class SessionCreationRequest(BaseModel):
class SessionCreationResponse (line 16) | class SessionCreationResponse(BaseModel):
class SessionSummary (line 28) | class SessionSummary(BaseModel):
class SessionListResponse (line 39) | class SessionListResponse(BaseModel):
class SessionDeleteResponse (line 49) | class SessionDeleteResponse(BaseModel):
class CompletionCreationRequest (line 59) | class CompletionCreationRequest(BaseModel):
class CompletionCreationResponse (line 64) | class CompletionCreationResponse(BaseModel):
class CompletionSummary (line 75) | class CompletionSummary(BaseModel):
class CompletionListResponse (line 85) | class CompletionListResponse(BaseModel):
FILE: premsql/playground/backend/api/serializers.py
class AgentOutputSerializer (line 4) | class AgentOutputSerializer(serializers.Serializer):
class SessionCreationRequestSerializer (line 26) | class SessionCreationRequestSerializer(serializers.Serializer):
class SessionCreationResponseSerializer (line 30) | class SessionCreationResponseSerializer(serializers.Serializer):
class SessionSummarySerializer (line 42) | class SessionSummarySerializer(serializers.Serializer):
class SessionListResponseSerializer (line 51) | class SessionListResponseSerializer(serializers.Serializer):
class SessionDeletionResponse (line 61) | class SessionDeletionResponse(serializers.Serializer):
class CompletionCreationRequestSerializer (line 69) | class CompletionCreationRequestSerializer(serializers.Serializer):
class CompletionCreationResponseSerializer (line 74) | class CompletionCreationResponseSerializer(serializers.Serializer):
class CompletionSummarySerializer (line 85) | class CompletionSummarySerializer(serializers.Serializer):
class CompletionListResponseSerializer (line 93) | class CompletionListResponseSerializer(serializers.Serializer):
function create_model_serializer (line 102) | def create_model_serializer(model_class):
FILE: premsql/playground/backend/api/services.py
class SessionManageService (line 34) | class SessionManageService:
method __init__ (line 35) | def __init__(self) -> None:
method create_session (line 38) | def create_session(
method get_session (line 75) | def get_session(self, session_name: str) -> Optional[Session]:
method list_session (line 81) | def list_session(self, page: int, page_size: int = 20) -> SessionListR...
method delete_session (line 116) | def delete_session(self, session_name: str):
class CompletionService (line 160) | class CompletionService:
method __init__ (line 161) | def __init__(self) -> None:
method completion (line 164) | def completion(
method chat_history (line 225) | def chat_history(
FILE: premsql/playground/backend/api/utils.py
function stop_server_on_port (line 11) | def stop_server_on_port(port: int):
FILE: premsql/playground/backend/api/views.py
function create_session (line 42) | def create_session(request):
function get_session (line 76) | def get_session(request, session_name):
function list_sessions (line 125) | def list_sessions(request):
function delete_session (line 161) | def delete_session(request, session_name):
function create_completion (line 186) | def create_completion(request):
function get_chat_history (line 243) | def get_chat_history(request, session_name):
FILE: premsql/playground/backend/backend_client.py
class BackendAPIClient (line 18) | class BackendAPIClient:
method __init__ (line 19) | def __init__(self):
method create_session (line 26) | def create_session(self, request: SessionCreationRequest) -> SessionCr...
method list_sessions (line 70) | def list_sessions(self, page: int = 1, page_size: int = 20) -> Session...
method get_session (line 100) | def get_session(self, session_name: str) -> SessionListResponse:
method delete_session (line 131) | def delete_session(self, session_name: str) -> SessionDeleteResponse:
method create_completion (line 157) | def create_completion(self, request: CompletionCreationRequest) -> Com...
method get_chat_history (line 185) | def get_chat_history(self, session_name: str, page: int = 1, page_size...
FILE: premsql/playground/backend/manage.py
function main (line 7) | def main():
FILE: premsql/playground/frontend/components/chat.py
class ChatComponent (line 14) | class ChatComponent:
method __init__ (line 15) | def __init__(self) -> None:
method _streamlit_chat_output (line 20) | def _streamlit_chat_output(self, message: AgentOutput | ExitWorkerOutp...
method render_chat_env (line 55) | def render_chat_env(self, session_name: str) -> None:
FILE: premsql/playground/frontend/components/session.py
class SessionComponent (line 13) | class SessionComponent:
method __init__ (line 14) | def __init__(self) -> None:
method render_list_sessions (line 17) | def render_list_sessions(self):
method render_register_session (line 30) | def render_register_session(self):
method render_additional_links (line 54) | def render_additional_links(self):
method render_delete_session_view (line 60) | def render_delete_session_view(self):
FILE: premsql/playground/frontend/components/streamlit_plot.py
class StreamlitPlotTool (line 10) | class StreamlitPlotTool(BasePlotTool):
method __init__ (line 11) | def __init__(self):
method run (line 20) | def run(self, data: pd.DataFrame, plot_config: Dict[str, str]) -> Any:
method _validate_config (line 38) | def _validate_config(self, df: pd.DataFrame, plot_config: Dict[str, st...
method _area_plot (line 65) | def _area_plot(self, df: pd.DataFrame, x: str, y: str) -> Any:
method _bar_plot (line 69) | def _bar_plot(self, df: pd.DataFrame, x: str, y: str) -> Any:
method _scatter_plot (line 73) | def _scatter_plot(self, df: pd.DataFrame, x: str, y: str) -> Any:
method _histogram_plot (line 77) | def _histogram_plot(self, df: pd.DataFrame, x: str, y: str) -> Any:
method _line_plot (line 83) | def _line_plot(self, df: pd.DataFrame, x: str, y: str) -> Any:
method convert_plot_to_image (line 87) | def convert_plot_to_image(self, fig):
FILE: premsql/playground/frontend/components/uploader.py
function render_starter_code (line 127) | def render_starter_code(session_name, db_path):
class UploadComponent (line 179) | class UploadComponent:
method render_kaggle_view (line 181) | def render_kaggle_view() -> Tuple[Optional[str], Optional[Path]]:
method render_csv_upload_view (line 217) | def render_csv_upload_view() -> Tuple[Optional[str], Optional[Path]]:
FILE: premsql/playground/frontend/main.py
function render_main_view (line 8) | def render_main_view():
function main (line 25) | def main():
FILE: premsql/playground/frontend/utils.py
function _is_valid_kaggle_id (line 12) | def _is_valid_kaggle_id(kaggle_id: str) -> bool:
function download_from_kaggle (line 16) | def download_from_kaggle(kaggle_dataset_id: str):
function _migrate_to_sqlite (line 20) | def _migrate_to_sqlite(csv_folder: Path, sqlite_db_path: Path) -> Path:
function migrate_from_csv_to_sqlite (line 38) | def migrate_from_csv_to_sqlite(
function migrate_local_csvs_to_sqlite (line 47) | def migrate_local_csvs_to_sqlite(
FILE: premsql/playground/inference_server/api_client.py
class InferenceServerAPIError (line 7) | class InferenceServerAPIError(Exception):
class InferenceServerAPIClient (line 11) | class InferenceServerAPIClient:
method __init__ (line 12) | def __init__(self, timeout: int = 600) -> None:
method _make_request (line 19) | def _make_request(
method is_online (line 36) | def is_online(self, base_url: str) -> bool:
method post_completion (line 44) | def post_completion(self, base_url: str, question: str) -> Dict[str, A...
method get_session_info (line 51) | def get_session_info(self, base_url: str) -> Dict[str, Any]:
method get_chat_history (line 55) | def get_chat_history(self, base_url: str, message_id: int) -> Dict[str...
method delete_session (line 61) | def delete_session(self, base_url: str) -> Dict[str, Any]:
FILE: premsql/playground/inference_server/service.py
class QuestionInput (line 17) | class QuestionInput(BaseModel):
class SessionInfoResponse (line 21) | class SessionInfoResponse(BaseModel):
class ChatHistoryResponse (line 30) | class ChatHistoryResponse(BaseModel):
class CompletionResponse (line 35) | class CompletionResponse(BaseModel):
class AgentServer (line 40) | class AgentServer:
method __init__ (line 41) | def __init__(
method lifespan (line 53) | async def lifespan(self, app: FastAPI):
method create_app (line 62) | def create_app(self):
method launch (line 158) | def launch(self):
FILE: premsql/tuner/callback.py
class Text2SQLEvaluationCallback (line 24) | class Text2SQLEvaluationCallback(TrainerCallback):
method __init__ (line 25) | def __init__(
method on_step_end (line 53) | def on_step_end(
method on_train_end (line 111) | def on_train_end(
FILE: premsql/tuner/config.py
class DefaultTrainingArguments (line 15) | class DefaultTrainingArguments(TrainingArguments):
class DefaultPeftArguments (line 50) | class DefaultPeftArguments(TrainingArguments):
class DefaultLoraConfig (line 78) | class DefaultLoraConfig(LoraConfig):
FILE: premsql/tuner/full.py
class Text2SQLFullFinetuner (line 15) | class Text2SQLFullFinetuner:
method __init__ (line 16) | def __init__(
method train (line 37) | def train(
FILE: premsql/tuner/peft.py
class Text2SQLPeftTuner (line 23) | class Text2SQLPeftTuner:
method __init__ (line 24) | def __init__(
method train (line 54) | def train(
FILE: premsql/utils.py
function convert_sqlite_path_to_dsn (line 22) | def convert_sqlite_path_to_dsn(path: str):
function convert_sqlite_dsn_to_path (line 29) | def convert_sqlite_dsn_to_path(dsn: str) -> str:
function print_data (line 37) | def print_data(data: dict):
function save_to_json (line 52) | def save_to_json(save_path: Union[str, Path], json_object: dict):
function load_from_json (line 62) | def load_from_json(result_json_path: str) -> dict:
function sqlite_schema_prompt (line 70) | def sqlite_schema_prompt(db_path: str) -> str:
function get_random_few_shot_prompts (line 96) | def get_random_few_shot_prompts(dataset: list[dict], num_few_shot: int):
function get_accepted_filters (line 125) | def get_accepted_filters(data: list[dict]) -> Sequence[str]:
function filter_options (line 137) | def filter_options(
function tokenize_fn (line 159) | def tokenize_fn(strings: Sequence[str], tokenizer: "PreTrainedTokenizer"...
Condensed preview — 90 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (645K chars).
[
{
"path": ".gitignore",
"chars": 404,
"preview": "data\nexperiments\noutput\ntest.py\nexps\n\n\n# Python specific\n*.pyc\n*.pyo\n__pycache__/\n\n# Virtual environments\nvenv/\nenv/\nenv"
},
{
"path": "README.md",
"chars": 20491,
"preview": "# PremSQL | Easy to use fully local RAG on Databases\n\n[](https:"
},
{
"path": "examples/agent_server.ipynb",
"chars": 5212,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/agents.ipynb",
"chars": 61280,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/datasets.ipynb",
"chars": 33128,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/error_dataset.ipynb",
"chars": 37038,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 5,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/evaluation.ipynb",
"chars": 103744,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 2,\n \"id\": \"30e64251-c3f2-473b-a76f-10bc4a645e93\",\n \""
},
{
"path": "examples/finetuning.ipynb",
"chars": 15890,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/generators.ipynb",
"chars": 57232,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "examples/lora_tuning.py",
"chars": 1689,
"preview": "from premsql.datasets import (\n BirdDataset,\n DomainsDataset,\n GretelAIDataset,\n SpiderUnifiedDataset,\n T"
},
{
"path": "examples/simple_pipeline.ipynb",
"chars": 26431,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 3,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "premsql/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "premsql/agents/__init__.py",
"chars": 166,
"preview": "from premsql.agents.baseline.main import BaseLineAgent\nfrom premsql.agents.memory import AgentInteractionMemory\n\n__all__"
},
{
"path": "premsql/agents/base.py",
"chars": 6058,
"preview": "from abc import ABC, abstractmethod\nfrom typing import Optional, Union\n\nimport pandas as pd\n\nfrom premsql.executors.base"
},
{
"path": "premsql/agents/baseline/__init__.py",
"chars": 287,
"preview": "from premsql.agents.baseline.workers import (\n BaseLineAnalyserWorker,\n BaseLineFollowupWorker,\n BaseLinePlotWo"
},
{
"path": "premsql/agents/baseline/main.py",
"chars": 8602,
"preview": "from typing import Any, Optional\n\nimport pandas as pd\n\nfrom premsql.executors.base import BaseExecutor\nfrom premsql.gene"
},
{
"path": "premsql/agents/baseline/prompts.py",
"chars": 12163,
"preview": "# --------------------------------- table selection --------------------------------- #\n\nBASELINE_TEXT2SQL_TABLE_SELECTI"
},
{
"path": "premsql/agents/baseline/workers/__init__.py",
"chars": 430,
"preview": "from premsql.agents.baseline.workers.analyser import BaseLineAnalyserWorker\nfrom premsql.agents.baseline.workers.followu"
},
{
"path": "premsql/agents/baseline/workers/analyser.py",
"chars": 7556,
"preview": "from typing import Optional\n\nimport pandas as pd\nfrom tqdm.auto import tqdm\n\nfrom premsql.generators.base import Text2SQ"
},
{
"path": "premsql/agents/baseline/workers/followup.py",
"chars": 3794,
"preview": "from typing import Optional\n\nimport pandas as pd\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\nfrom premsql"
},
{
"path": "premsql/agents/baseline/workers/plotter.py",
"chars": 3098,
"preview": "from typing import Optional\n\nimport pandas as pd\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\nfrom premsql"
},
{
"path": "premsql/agents/baseline/workers/text2sql.py",
"chars": 9712,
"preview": "from textwrap import dedent\nfrom typing import Literal, Optional\n\nfrom premsql.executors.base import BaseExecutor\nfrom p"
},
{
"path": "premsql/agents/memory.py",
"chars": 9657,
"preview": "import os\nimport tempfile\nimport sqlite3\nfrom platformdirs import user_cache_dir\nfrom typing import List, Literal, Optio"
},
{
"path": "premsql/agents/models.py",
"chars": 5172,
"preview": "from datetime import datetime\nfrom typing import Dict, Literal, Optional\n\nimport pandas as pd\nfrom pydantic import BaseM"
},
{
"path": "premsql/agents/router.py",
"chars": 1244,
"preview": "from typing import Optional\n\nimport pandas as pd\n\nfrom premsql.logger import setup_console_logger\nfrom premsql.agents.ba"
},
{
"path": "premsql/agents/tools/__init__.py",
"chars": 111,
"preview": "from premsql.agents.tools.plot.matplotlib_tool import SimpleMatplotlibTool\n\n__all__ = [\"SimpleMatplotlibTool\"]\n"
},
{
"path": "premsql/agents/tools/plot/base.py",
"chars": 830,
"preview": "import base64\nimport io\nfrom abc import ABC, abstractmethod\n\nimport pandas as pd\nfrom PIL import Image\n\n\nclass BasePlotT"
},
{
"path": "premsql/agents/tools/plot/matplotlib_tool.py",
"chars": 2969,
"preview": "import io\nfrom typing import Callable, Dict\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nfrom matplotlib.axes im"
},
{
"path": "premsql/agents/utils.py",
"chars": 3153,
"preview": "from typing import Any, Dict, Literal\n\nimport pandas as pd\n\nfrom premsql.executors.from_langchain import SQLDatabase\nfro"
},
{
"path": "premsql/cli.py",
"chars": 3737,
"preview": "import os\nimport subprocess\nimport sys\nfrom pathlib import Path\n\nimport click\n\n@click.group()\n@click.version_option()\nde"
},
{
"path": "premsql/datasets/__init__.py",
"chars": 2248,
"preview": "from pathlib import Path\nfrom typing import Optional, Union\n\nfrom premsql.datasets.base import StandardDataset, Text2SQL"
},
{
"path": "premsql/datasets/base.py",
"chars": 10921,
"preview": "import json\nimport os\nimport sqlite3\nfrom abc import ABC, abstractmethod\nfrom copy import deepcopy\nfrom pathlib import P"
},
{
"path": "premsql/datasets/collator.py",
"chars": 1125,
"preview": "from dataclasses import dataclass\nfrom typing import Sequence\nfrom premsql.logger import setup_console_logger\n\nlogger = "
},
{
"path": "premsql/datasets/error_dataset.py",
"chars": 4758,
"preview": "import json\nfrom pathlib import Path\nfrom typing import Optional, Sequence\n\nfrom tqdm.auto import tqdm\n\nfrom premsql.dat"
},
{
"path": "premsql/datasets/real/bird.py",
"chars": 2232,
"preview": "from pathlib import Path\nfrom typing import Optional, Union\n\nfrom huggingface_hub import snapshot_download\n\nfrom premsql"
},
{
"path": "premsql/datasets/real/domains.py",
"chars": 2304,
"preview": "from pathlib import Path\nfrom typing import Optional, Union\n\nfrom huggingface_hub import snapshot_download\n\nfrom premsql"
},
{
"path": "premsql/datasets/real/spider.py",
"chars": 2298,
"preview": "from pathlib import Path\nfrom typing import Optional, Union\n\nfrom huggingface_hub import snapshot_download\n\nfrom premsql"
},
{
"path": "premsql/datasets/synthetic/gretel.py",
"chars": 3800,
"preview": "from pathlib import Path\nfrom typing import Optional, Union\n\nfrom datasets import load_dataset\nfrom tqdm.auto import tqd"
},
{
"path": "premsql/evaluator/README.md",
"chars": 3796,
"preview": "## Evaluators \n\npremsql evaluators help you to evaluate your text-to-sql models on various validation datasets. \nCurrent"
},
{
"path": "premsql/evaluator/__init__.py",
"chars": 86,
"preview": "from premsql.evaluator.base import Text2SQLEvaluator\n\n__all__ = [\"Text2SQLEvaluator\"]\n"
},
{
"path": "premsql/evaluator/base.py",
"chars": 4902,
"preview": "import math\nimport traceback\nfrom pathlib import Path\nfrom typing import Optional, Union\n\nfrom func_timeout import Funct"
},
{
"path": "premsql/executors/__init__.py",
"chars": 233,
"preview": "from premsql.executors.from_langchain import ExecutorUsingLangChain\nfrom premsql.executors.from_sqlite import SQLiteExec"
},
{
"path": "premsql/executors/base.py",
"chars": 2090,
"preview": "from abc import ABC, abstractmethod\n\nimport numpy as np\n\n\nclass BaseExecutor(ABC):\n @abstractmethod\n def execute_s"
},
{
"path": "premsql/executors/from_langchain.py",
"chars": 947,
"preview": "import time\nfrom typing import Union\n\nfrom langchain_community.utilities.sql_database import SQLDatabase\n\nfrom premsql.e"
},
{
"path": "premsql/executors/from_sqlite.py",
"chars": 4355,
"preview": "import sqlite3\nimport time\n\nfrom contextlib import contextmanager\nfrom typing import Any, Dict, Generator \nfrom premsql."
},
{
"path": "premsql/generators/__init__.py",
"chars": 451,
"preview": "from premsql.generators.huggingface import Text2SQLGeneratorHF\nfrom premsql.generators.openai import Text2SQLGeneratorOp"
},
{
"path": "premsql/generators/base.py",
"chars": 5454,
"preview": "import json\nimport re\nfrom abc import ABC, abstractmethod\nfrom pathlib import Path\nfrom typing import Optional\n\nimport s"
},
{
"path": "premsql/generators/huggingface.py",
"chars": 3606,
"preview": "import os\nfrom typing import Optional, Union\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\nfrom premsql.log"
},
{
"path": "premsql/generators/mlx.py",
"chars": 2062,
"preview": "import os\nfrom typing import Optional\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\nfrom premsql.logger imp"
},
{
"path": "premsql/generators/ollama_model.py",
"chars": 1781,
"preview": "from typing import Optional\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\nfrom premsql.logger import setup_"
},
{
"path": "premsql/generators/openai.py",
"chars": 1791,
"preview": "import os\nfrom typing import Optional\n\nfrom premsql.generators.base import Text2SQLGeneratorBase\n\ntry:\n from openai i"
},
{
"path": "premsql/generators/premai.py",
"chars": 1931,
"preview": "import os\nfrom typing import Optional\n\nfrom premai import Prem\n\nfrom premsql.generators.base import Text2SQLGeneratorBas"
},
{
"path": "premsql/logger.py",
"chars": 433,
"preview": "import logging\n\n\ndef setup_console_logger(name, level=logging.INFO):\n \"\"\"Function to setup a console logger.\"\"\"\n f"
},
{
"path": "premsql/playground/__init__.py",
"chars": 297,
"preview": "from premsql.playground.backend.backend_client import BackendAPIClient\nfrom premsql.playground.inference_server.api_clie"
},
{
"path": "premsql/playground/backend/api/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "premsql/playground/backend/api/admin.py",
"chars": 138,
"preview": "from django.contrib import admin\n\nfrom .models import Completions, Session\n\nadmin.site.register(Session)\nadmin.site.regi"
},
{
"path": "premsql/playground/backend/api/apps.py",
"chars": 138,
"preview": "from django.apps import AppConfig\n\n\nclass ApiConfig(AppConfig):\n default_auto_field = \"django.db.models.BigAutoField\""
},
{
"path": "premsql/playground/backend/api/migrations/0001_initial.py",
"chars": 1728,
"preview": "# Generated by Django 5.1.2 on 2024-10-26 09:06\n\nimport django.db.models.deletion\nfrom django.db import migrations, mode"
},
{
"path": "premsql/playground/backend/api/migrations/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "premsql/playground/backend/api/models.py",
"chars": 908,
"preview": "from django.db import models\n\n\nclass Session(models.Model):\n session_id = models.AutoField(primary_key=True)\n db_c"
},
{
"path": "premsql/playground/backend/api/pydantic_models.py",
"chars": 2362,
"preview": "from datetime import datetime\nfrom typing import List, Literal, Optional\n\nfrom pydantic import BaseModel, ConfigDict, Fi"
},
{
"path": "premsql/playground/backend/api/serializers.py",
"chars": 4457,
"preview": "from rest_framework import serializers\n\n\nclass AgentOutputSerializer(serializers.Serializer):\n session_name = seriali"
},
{
"path": "premsql/playground/backend/api/services.py",
"chars": 10433,
"preview": "import subprocess\nfrom typing import Optional\n\nimport requests\nfrom api.models import Completions, Session\nfrom api.pyda"
},
{
"path": "premsql/playground/backend/api/tests.py",
"chars": 60,
"preview": "from django.test import TestCase\n\n# Create your tests here.\n"
},
{
"path": "premsql/playground/backend/api/urls.py",
"chars": 571,
"preview": "from django.urls import path\n\nfrom . import views\n\nurlpatterns = [\n path(\"session/list/\", views.list_sessions, name=\""
},
{
"path": "premsql/playground/backend/api/utils.py",
"chars": 799,
"preview": "import logging\nimport os\nimport signal\nimport subprocess\n\nfrom premsql.logger import setup_console_logger\n\nlogger = setu"
},
{
"path": "premsql/playground/backend/api/views.py",
"chars": 7920,
"preview": "import json\n\nfrom drf_yasg import openapi\nfrom drf_yasg.utils import swagger_auto_schema\nfrom rest_framework import stat"
},
{
"path": "premsql/playground/backend/backend/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "premsql/playground/backend/backend/asgi.py",
"chars": 391,
"preview": "\"\"\"\nASGI config for backend project.\n\nIt exposes the ASGI callable as a module-level variable named ``application``.\n\nFo"
},
{
"path": "premsql/playground/backend/backend/settings.py",
"chars": 3273,
"preview": "\"\"\"\nDjango settings for backend project.\n\nGenerated by 'django-admin startproject' using Django 5.1.2.\n\nFor more informa"
},
{
"path": "premsql/playground/backend/backend/urls.py",
"chars": 965,
"preview": "from django.contrib import admin\nfrom django.urls import include, path\nfrom drf_yasg import openapi\nfrom drf_yasg.views "
},
{
"path": "premsql/playground/backend/backend/wsgi.py",
"chars": 391,
"preview": "\"\"\"\nWSGI config for backend project.\n\nIt exposes the WSGI callable as a module-level variable named ``application``.\n\nFo"
},
{
"path": "premsql/playground/backend/backend_client.py",
"chars": 9568,
"preview": "import requests\nfrom premsql.logger import setup_console_logger\nfrom premsql.playground.backend.api.pydantic_models impo"
},
{
"path": "premsql/playground/backend/manage.py",
"chars": 1059,
"preview": "#!/usr/bin/env python\n\"\"\"Django's command-line utility for administrative tasks.\"\"\"\nimport os\nimport sys\n\n\ndef main():\n "
},
{
"path": "premsql/playground/frontend/components/chat.py",
"chars": 4586,
"preview": "import pandas as pd \nimport streamlit as st\nfrom premsql.playground.backend.backend_client import BackendAPIClient\nfrom "
},
{
"path": "premsql/playground/frontend/components/session.py",
"chars": 3246,
"preview": "import streamlit as st\nfrom premsql.playground.backend.backend_client import BackendAPIClient\nfrom premsql.playground.ba"
},
{
"path": "premsql/playground/frontend/components/streamlit_plot.py",
"chars": 3778,
"preview": "import traceback\nfrom typing import Dict, Any\nimport pandas as pd\nimport streamlit as st \nfrom premsql.logger import set"
},
{
"path": "premsql/playground/frontend/components/uploader.py",
"chars": 9645,
"preview": "import random\nimport streamlit as st\nfrom typing import Tuple, Optional\nfrom pathlib import Path\nfrom premsql.playground"
},
{
"path": "premsql/playground/frontend/main.py",
"chars": 2044,
"preview": "import streamlit as st\nfrom premsql.playground.frontend.components.chat import ChatComponent \nfrom premsql.playground.fr"
},
{
"path": "premsql/playground/frontend/utils.py",
"chars": 2316,
"preview": "import re \nimport os \nimport pandas as pd \nimport kagglehub\nimport sqlite3\nfrom pathlib import Path\nfrom platformdirs im"
},
{
"path": "premsql/playground/inference_server/api_client.py",
"chars": 2155,
"preview": "from typing import Any, Dict, Optional\nfrom urllib.parse import urljoin\n\nimport requests\n\n\nclass InferenceServerAPIError"
},
{
"path": "premsql/playground/inference_server/service.py",
"chars": 5618,
"preview": "import traceback\n\nfrom contextlib import asynccontextmanager\nfrom datetime import datetime\nfrom typing import Optional\n\n"
},
{
"path": "premsql/prompts.py",
"chars": 1813,
"preview": "BASE_TEXT2SQL_PROMPT = \"\"\"\n# Follow these instruction:\nYou will be given schemas of tables of a database. Your job is to"
},
{
"path": "premsql/tuner/__init__.py",
"chars": 436,
"preview": "from premsql.tuner.callback import Text2SQLEvaluationCallback\nfrom premsql.tuner.config import (\n DefaultLoraConfig,\n"
},
{
"path": "premsql/tuner/callback.py",
"chars": 4118,
"preview": "import os\nfrom typing import Optional\n\nfrom premsql.datasets.base import Text2SQLBaseDataset\nfrom premsql.evaluator.base"
},
{
"path": "premsql/tuner/config.py",
"chars": 3288,
"preview": "from dataclasses import dataclass, field\nfrom typing import List, Optional\nfrom premsql.logger import setup_console_logg"
},
{
"path": "premsql/tuner/full.py",
"chars": 3100,
"preview": "from typing import Optional, Sequence\n\nimport transformers\n\nfrom premsql.datasets.base import Text2SQLBaseDataset\nfrom p"
},
{
"path": "premsql/tuner/peft.py",
"chars": 4017,
"preview": "from typing import Optional, Sequence\n\nfrom premsql.datasets.base import Text2SQLBaseDataset\nfrom premsql.datasets.colla"
},
{
"path": "premsql/utils.py",
"chars": 5582,
"preview": "import json\nimport os\nimport random\nimport re\nimport sqlite3\nfrom collections import defaultdict\nfrom pathlib import Pat"
},
{
"path": "pyproject.toml",
"chars": 1169,
"preview": "[tool.poetry]\nname = \"premsql\"\nversion = \"0.2.10\"\ndescription = \"\"\nauthors = [\"Anindyadeep <proanindyadeep@gmail.com>\"]\n"
}
]
About this extraction
This page contains the full source code of the premAI-io/premsql GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 90 files (593.0 KB), approximately 197.2k tokens, and a symbol index with 340 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.