Full Code of KxSystems/kdbai-samples for AI

main ad492f122018 cached
71 files
143.4 MB
3.4M tokens
1 requests
Copy disabled (too large) Download .txt
Showing preview only (13,595K chars total). Download the full file to get everything.
Repository: KxSystems/kdbai-samples
Branch: main
Commit: ad492f122018
Files: 71
Total size: 143.4 MB

Directory structure:
gitextract_79vuf6ne/

├── .gitignore
├── HuggingFace_search/
│   └── huggingface_inference.ipynb
├── KDB.AI_course/
│   ├── README.md
│   ├── course_specific_content/
│   │   ├── making_queries.ipynb
│   │   ├── managing_tables.ipynb
│   │   └── rag_example.ipynb
│   └── notebook_references.md
├── LICENSE
├── LlamaIndex_advanced_RAG/
│   └── KDBAI_Advanced_RAG_Demo.ipynb
├── LlamaIndex_samples/
│   ├── Hybrid_Search_LlamaIndex_KDBAI.ipynb
│   ├── Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb
│   └── Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb
├── LlamaParse_pdf_RAG/
│   └── llamaParse_demo.ipynb
├── README.md
├── TSS_non_transformed/
│   ├── Non_Transformed_TSS_Technical_Analysis.ipynb
│   ├── Temporal_Similarity_Search_KDB+.ipynb
│   ├── Temporal_Similarity_Search_Non-Transformed_Demo.ipynb
│   ├── createHDB.q
│   └── data/
│       └── marketTrades.parquet
├── TSS_transformed/
│   ├── Temporal_Similarity_Search_Transformed_Demo.ipynb
│   ├── Transformed_TSS_pattern_matching.ipynb
│   └── data/
│       └── marketTrades.parquet
├── document_search/
│   └── document_search.ipynb
├── fuzzy_filtering_on_metadata/
│   └── fuzzy_filtering_demo.ipynb
├── hybrid_search/
│   ├── data/
│   │   └── inflation.txt
│   └── hybrid_search_inflation.ipynb
├── image_search/
│   └── image_search.ipynb
├── metadata_filtering/
│   ├── data/
│   │   └── filtered_embedded_movies.pkl
│   └── metadata_filtering_demo.ipynb
├── multi_index_multimodal_search/
│   ├── data/
│   │   ├── bat1.txt
│   │   ├── bat2.txt
│   │   ├── bear1.txt
│   │   ├── bear2.txt
│   │   ├── caterpillar1.txt
│   │   ├── caterpillar2.txt
│   │   ├── deer1.txt
│   │   ├── deer2.txt
│   │   ├── fox1.txt
│   │   ├── fox2.txt
│   │   ├── hedgehog1.txt
│   │   └── hedgehog2.txt
│   └── multi_index_multimodal_search.ipynb
├── multimodal_RAG_VoyageAI/
│   ├── Multimodal_RAG_VoyageAI.ipynb
│   └── data/
│       └── text/
│           ├── bat.txt
│           ├── bear.txt
│           ├── caterpillar.txt
│           ├── deer.txt
│           ├── fox.txt
│           └── hedgehog.txt
├── multimodal_RAG_unified_text/
│   ├── data/
│   │   └── text/
│   │       ├── bat.txt
│   │       ├── bear.txt
│   │       ├── caterpillar.txt
│   │       ├── deer.txt
│   │       ├── fox.txt
│   │       └── hedgehog.txt
│   └── multi_modal_demo.ipynb
├── music_recommendation/
│   ├── data/
│   │   └── song_data.csv
│   └── music_recommendation.ipynb
├── pattern_matching/
│   └── pattern_matching.ipynb
├── qFlat_index_pdf_search/
│   └── pdf_qFlat_Search.ipynb
├── qHnsw_index_pdf_search/
│   └── pdf_qHNSW_Search.ipynb
├── quickstarts/
│   └── python_quickstart.ipynb
├── requirements.txt
├── retrieval_augmented_generation/
│   ├── data/
│   │   └── state_of_the_union.txt
│   ├── retrieval_augmented_generation.ipynb
│   └── retrieval_augmented_generation_evaluation.ipynb
├── sentiment_analysis/
│   ├── data/
│   │   └── disneyland_reviews.csv
│   └── sentiment_analysis.ipynb
├── unstructured_io_RAG/
│   └── Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb
└── video_RAG/
    ├── video_RAG_TwelveLabs.ipynb
    └── video_RAG_VoyageAI.ipynb

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.ipynb_checkpoints
.venv/
.DS_Store


================================================
FILE: HuggingFace_search/huggingface_inference.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914",
   "metadata": {
    "id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914"
   },
   "source": [
    "# Using Hugging Face Inference with KDB.AI to Create a AI Tool Search Engine\n",
    "\n",
    "##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).\n",
    "\n",
    "How to get started with using the Huggingface Inference API with KDB.AI.\n",
    "\n",
    "You will learn how to:\n",
    "\n",
    "1. Connect to KDB.AI\n",
    "2. Create a KDB.AI Database & Table\n",
    "3. Load Data\n",
    "4. Use the Sentence Transformers library to embed every description in the dataset\n",
    "5. Insert the data into our KDB.AI table\n",
    "6. Perform Similarity Search using the Huggingface Inference API\n",
    "7. Delete the KDB.AI Database & Table to Conserve Resources"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nZHRcTHI9bZG",
   "metadata": {
    "id": "nZHRcTHI9bZG"
   },
   "source": [
    "# Why Use Hugging Face for Embeddings?\n",
    "\n",
    "When building production applications that utilize embeddings, it's often advantageous to use open-source embedding models for several reasons:\n",
    "\n",
    "1. **Control**: Open-source models give developers more control over the embeddings process, reducing dependence on third-party embedding providers.\n",
    "\n",
    "2. **Local Embedding**: With open-source models, you can create embeddings locally, which is particularly useful for embedding your dataset.\n",
    "\n",
    "A common approach is to use a Python framework like sentence-transformers, developed by Hugging Face, which offers state-of-the-art sentence, text, and image embeddings. Here's a typical workflow:\n",
    "\n",
    "1. **Embed your dataset locally**: Use a library like Sentence Transformers to embed your dataset, which might consist of AI tools and associated metadata.\n",
    "\n",
    "2. **Embed queries at inference time**: When a user submits a query, use an external service like Hugging Face's Inference API to embed the query. This eliminates the need to deploy your own model, allowing you to leverage a fully optimized external service.\n",
    "\n",
    "By following this approach, you can build a system that searches through hundreds of AI tools without the need to deploy any infrastructure (and scale to millions!). Additionally, since you embed the dataset locally, you can use Hugging Face's free plan without requiring a credit card or worrying about hitting rate limits, at least until you are ready for production.\n",
    "\n",
    "In this tutorial, we will walk through the process of embedding a dataset of AI tools using Sentence Transformers, and then using Hugging Face's Inference API to embed queries at inference time, enabling efficient and scalable search capabilities.\n",
    "\n",
    "You will need a Hugging Face api token for this sample. Please create a Hugging Face account by going to [Hugging Face – The AI community building the future](https://huggingface.co/) and create a token by going to https://huggingface.co/settings/tokens\n",
    "\n",
    "You can then enter this token below or set it to HF_TOKEN in your environment."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "260d0f4b-ef09-4bd2-a197-a9351be24684",
   "metadata": {
    "id": "260d0f4b-ef09-4bd2-a197-a9351be24684"
   },
   "source": [
    "# 0. Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1468bd3",
   "metadata": {
    "id": "d1468bd3"
   },
   "source": [
    "### Install dependencies\n",
    "\n",
    "In order to successfully run this sample, note the following steps depending on where you are running this notebook:\n",
    "\n",
    "-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.\n",
    "\n",
    "\n",
    "-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f4996e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install kdbai_client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "491cd6d6",
   "metadata": {
    "id": "491cd6d6"
   },
   "outputs": [],
   "source": [
    "!pip install sentence-transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc6d17b7",
   "metadata": {
    "id": "cc6d17b7"
   },
   "source": [
    "### Import Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "805d97da",
   "metadata": {
    "id": "805d97da"
   },
   "outputs": [],
   "source": [
    "# vector DB\n",
    "import os\n",
    "from getpass import getpass\n",
    "import kdbai_client as kdbai\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41",
   "metadata": {
    "id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c660c7d",
   "metadata": {
    "id": "8c660c7d"
   },
   "source": [
    "# 1. Connect to KDB.AI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3a3aa22",
   "metadata": {
    "id": "d3a3aa22"
   },
   "source": [
    "To use KDB.AI Server, you will need download and run your own container.\n",
    "To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
    "\n",
    "You will receive an email with the required license file and bearer token needed to download your instance.\n",
    "Follow instructions in the signup email to get your session up and running.\n",
    "\n",
    "Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e85c1ff",
   "metadata": {
    "id": "2e85c1ff"
   },
   "outputs": [],
   "source": [
    "#Set up KDB.AI server endpoint \n",
    "KDBAI_ENDPOINT = (\n",
    "    os.environ[\"KDBAI_ENDPOINT\"]\n",
    "    if \"KDBAI_ENDPOINT\" in os.environ\n",
    "    else \"http://localhost:8082\"\n",
    ")\n",
    "\n",
    "#connect to KDB.AI Server, default mode is qipc\n",
    "session = kdbai.Session(endpoint=KDBAI_ENDPOINT)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "Dpi_auWw68cy",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Dpi_auWw68cy",
    "outputId": "fb43c068-7893-426b-b5bf-559e31a401e2"
   },
   "outputs": [],
   "source": [
    "HF_TOKEN = (\n",
    "    os.environ[\"HF_TOKEN\"]\n",
    "    if \"HF_TOKEN\" in os.environ\n",
    "    else getpass(\"Hugging Face token: \")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8788a6b1",
   "metadata": {
    "id": "8788a6b1"
   },
   "source": [
    "### Verify Defined Databases\n",
    "\n",
    "We can check our connection using the `session.databases()` function.\n",
    "This will return a list of all the databases we have defined in our vector database thus far.\n",
    "This should return a \"default\" database along with any other databases you have already created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "7877f51c",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "7877f51c",
    "outputId": "0e6fca8a-e50b-4b01-a080-b082bf23d889"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[KDBAI database \"default\"]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "session.databases()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "i5NYByShWqeK",
   "metadata": {
    "id": "i5NYByShWqeK"
   },
   "source": [
    "### Create a Database Called \"myDatabase\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "97e5f4a9",
   "metadata": {
    "id": "97e5f4a9"
   },
   "outputs": [],
   "source": [
    "# ensure no database called \"myDatabase\" exists\n",
    "try:\n",
    "    session.database(\"myDatabase\").drop()\n",
    "except kdbai.KDBAIException:\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "Gbvw4SzqWprx",
   "metadata": {
    "id": "Gbvw4SzqWprx"
   },
   "outputs": [],
   "source": [
    "# Create the database\n",
    "db = session.create_database(\"myDatabase\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e33f03c3",
   "metadata": {
    "id": "e33f03c3"
   },
   "source": [
    "# 2. Create a KDB.AI Table\n",
    "\n",
    "To create a table we can use `create_table`, this function takes two arguments - the name and schema of the table.\n",
    "\n",
    "This schema must meet the following criteria:\n",
    "- It must contain a list of columns.\n",
    "- All columns must have either a `type` or a `qtype`.\n",
    "- One column of vector embeddings, this column is implicitly an array of `float64s`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9da55253",
   "metadata": {
    "id": "9da55253"
   },
   "source": [
    "### Define Schema\n",
    "The schema contains all metadata columns, and a 'description_embedding' column which will be used for similarity search\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "e5e8b782",
   "metadata": {
    "id": "e5e8b782"
   },
   "outputs": [],
   "source": [
    "schema = [\n",
    "        {\"name\": \"id\", \"type\": \"str\"},\n",
    "        {\"name\": \"name\", \"type\": \"str\"},\n",
    "        {\"name\": \"description\", \"type\": \"str\"},\n",
    "        {\"name\": \"summary\", \"type\": \"str\"},\n",
    "        {\"name\": \"title\", \"type\": \"str\"},\n",
    "        {\"name\": \"visitors\", \"type\": \"int64\"},\n",
    "        {\"name\": \"description_embedding\", \"type\": \"float64s\"},\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "i9ePLlo3adwt",
   "metadata": {
    "id": "i9ePLlo3adwt"
   },
   "source": [
    "### Define the indexes\n",
    "We will define our dimensionality, similarity metric and index type with the vectorIndex attribute. For this example we chose:\n",
    "\n",
    "- type = hnsw : HNSW enhances efficiency while maintaining accuracy. You have the choice of using other indexes like, qHNSW, and IVFPQ, qFlat or a Flat index here, as with metrics the one you chose depends your data and your overall performance requirements.\n",
    "- name = hnsw_index : this is a custom name you give your index.\n",
    "\n",
    "#### params:\n",
    "- dims = 384 : In the next section, we generate embeddings that are 384-dimensional to match this. The number of dimensions should mirror the output dimensions of your embedding model.\n",
    "- metric = L2 : We chose L2/Euclidean distance. Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like IP/Inner Product and CS/Cosine Similarity and the one you chose depends on the specific context and nature of your data.\n",
    "\n",
    "!Note, it is possible to define multiple indexes within a table!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "1-2uL1JMXP37",
   "metadata": {
    "id": "1-2uL1JMXP37"
   },
   "outputs": [],
   "source": [
    "# Define the index\n",
    "indexes = [\n",
    "    {\n",
    "        'type': 'hnsw',\n",
    "        'name': 'hnsw_index',\n",
    "        'column': 'description_embedding',\n",
    "        'params': {'dims': 384, 'metric': \"L2\"},\n",
    "    },\n",
    "]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09a5caa0",
   "metadata": {
    "id": "09a5caa0"
   },
   "source": [
    "### Create Table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "34067680",
   "metadata": {
    "id": "34067680"
   },
   "outputs": [],
   "source": [
    "table = db.create_table(table=\"ai_tools\", schema=schema, indexes=indexes)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20afbea1",
   "metadata": {
    "id": "20afbea1"
   },
   "source": [
    "# 3. Load Data\n",
    "\n",
    "We fetch data from a github gist containing companies, descriptions, and some metadata. We will then add these to pandas dataframe with column names/types matching the target table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "37581e86",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 293
    },
    "id": "37581e86",
    "outputId": "aebcdcdb-303e-4eda-8c58-36610243e3ac"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>description</th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>summary</th>\n",
       "      <th>title</th>\n",
       "      <th>visitors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Generate 3D textures for your game in seconds ...</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2qg</td>\n",
       "      <td>TextureLab</td>\n",
       "      <td>TextureLab is a website that provides 3D textu...</td>\n",
       "      <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
       "      <td>23913</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Luma Labs enables users to explore 3D modeling...</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2r0</td>\n",
       "      <td>lumalabs</td>\n",
       "      <td>Luma Labs is a website that offers an early ex...</td>\n",
       "      <td>Imagine 3D V1.2 (Alpha)</td>\n",
       "      <td>456963</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Make motion capture from video easier and more...</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2rg</td>\n",
       "      <td>plask</td>\n",
       "      <td>Plask is an AI-powered mocap animation tool th...</td>\n",
       "      <td>Ai-Powered Mocap Animation Tool.</td>\n",
       "      <td>90960</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Get hundreds of interior design ideas for your...</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2s0</td>\n",
       "      <td>AI Room Planner</td>\n",
       "      <td>AI Room Planner is an online platform that uti...</td>\n",
       "      <td>Interior Design By Ai</td>\n",
       "      <td>211540</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>A platform powered by AI to help you create be...</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2sg</td>\n",
       "      <td>AI TWO</td>\n",
       "      <td>AI TWO is a website that provides a platform f...</td>\n",
       "      <td>Aitwo.Co - The Ai-Powered All-In-One Design Pl...</td>\n",
       "      <td>7201</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         description  \\\n",
       "0  Generate 3D textures for your game in seconds ...   \n",
       "1  Luma Labs enables users to explore 3D modeling...   \n",
       "2  Make motion capture from video easier and more...   \n",
       "3  Get hundreds of interior design ideas for your...   \n",
       "4  A platform powered by AI to help you create be...   \n",
       "\n",
       "                         id             name  \\\n",
       "0  rec_cfn1112cibvc11jnn2qg       TextureLab   \n",
       "1  rec_cfn1112cibvc11jnn2r0         lumalabs   \n",
       "2  rec_cfn1112cibvc11jnn2rg            plask   \n",
       "3  rec_cfn1112cibvc11jnn2s0  AI Room Planner   \n",
       "4  rec_cfn1112cibvc11jnn2sg           AI TWO   \n",
       "\n",
       "                                             summary  \\\n",
       "0  TextureLab is a website that provides 3D textu...   \n",
       "1  Luma Labs is a website that offers an early ex...   \n",
       "2  Plask is an AI-powered mocap animation tool th...   \n",
       "3  AI Room Planner is an online platform that uti...   \n",
       "4  AI TWO is a website that provides a platform f...   \n",
       "\n",
       "                                               title  visitors  \n",
       "0  Instant And Unique 3D Textures For Your Next G...     23913  \n",
       "1                            Imagine 3D V1.2 (Alpha)    456963  \n",
       "2                   Ai-Powered Mocap Animation Tool.     90960  \n",
       "3                              Interior Design By Ai    211540  \n",
       "4  Aitwo.Co - The Ai-Powered All-In-One Design Pl...      7201  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "gist_url = \"https://gist.github.com/mrmps/2f62a2287cb2c1ca63a2762fcaac89bc/raw\"\n",
    "response = requests.get(gist_url)\n",
    "ai_tools_data = response.json()\n",
    "df = pd.DataFrame.from_dict(ai_tools_data)\n",
    "\n",
    "# drop column with unecessary metadata\n",
    "df.drop(columns=[\"xata\"], inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bsfi5TO_65G",
   "metadata": {
    "id": "3bsfi5TO_65G"
   },
   "source": [
    "# 4. Use the Sentence Transformers library to embed every description in the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "PxPJZcUmBajt",
   "metadata": {
    "id": "PxPJZcUmBajt"
   },
   "source": [
    "We set the embedding model to BAAI/bge-small-en-v1.5, which is a fast and small model. This is what we will use during inference time as well.\n",
    "\n",
    "If you want faster inference, you can try the [FastEmbed](https://github.com/qdrant/fastembed) library, a much faster and more lightweight embedding library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "f5dc41e8",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 528
    },
    "id": "f5dc41e8",
    "outputId": "035258f8-a679-4696-c2fc-d7361eea91d4"
   },
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer\n",
    "\n",
    "model = SentenceTransformer(\"BAAI/bge-small-en-v1.5\")\n",
    "\n",
    "descriptions = [tool[\"description\"] for tool in ai_tools_data]\n",
    "embeddings = model.encode(descriptions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "yxhFJUkwf8M4",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "yxhFJUkwf8M4",
    "outputId": "130f1178-ed5c-491b-9bfb-b615b681106b"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-0.06802839,  0.01769779,  0.07132471, ...,  0.04166844,\n",
       "        -0.01963805, -0.036344  ],\n",
       "       [ 0.00284367,  0.0034911 ,  0.0392653 , ..., -0.01490238,\n",
       "         0.0041208 ,  0.02246646],\n",
       "       [-0.08536491, -0.05372242,  0.01503714, ...,  0.01607881,\n",
       "         0.04058064, -0.02476997],\n",
       "       ...,\n",
       "       [ 0.00551532, -0.02548731, -0.00431467, ..., -0.00406338,\n",
       "         0.06047558, -0.03689232],\n",
       "       [-0.08149453, -0.00607409, -0.00040346, ...,  0.02765157,\n",
       "         0.04479544, -0.00464933],\n",
       "       [-0.09128137, -0.05604199,  0.01856982, ...,  0.01355306,\n",
       "         0.05817638, -0.05754769]], dtype=float32)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "zJeeJHvgAeZ1",
   "metadata": {
    "id": "zJeeJHvgAeZ1"
   },
   "source": [
    "# 5. Insert the data into our KDB.AI table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "730c9f08",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "730c9f08",
    "outputId": "61dd248a-8372-421c-c1dc-047f405da5b2"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'rowsInserted': 851}"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a DataFrame with the AI tools data\n",
    "data = pd.DataFrame(ai_tools_data)[[\"id\", \"name\", \"description\", \"summary\", \"title\", \"visitors\"]]\n",
    "data[\"description_embedding\"] = embeddings.tolist()\n",
    "\n",
    "# Bulk insert the data into KDB.AI\n",
    "table.insert(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "EJtF_k4iZSGe",
   "metadata": {
    "id": "EJtF_k4iZSGe"
   },
   "source": [
    "## Confirm data is loaded correctly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "b4nBmPXrZPUQ",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 770
    },
    "id": "b4nBmPXrZPUQ",
    "outputId": "ec15f76e-9c32-4bf1-baf1-55a67ef1d8bb"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>description</th>\n",
       "      <th>summary</th>\n",
       "      <th>title</th>\n",
       "      <th>visitors</th>\n",
       "      <th>description_embedding</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>rec_cfn1112cibvc11jnn2qg</td>\n",
       "      <td>TextureLab</td>\n",
       "      <td>Generate 3D textures for your game in seconds ...</td>\n",
       "      <td>TextureLab is a website that provides 3D textu...</td>\n",
       "      <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
       "      <td>23913</td>\n",
       "      <td>[-0.06802839040756226, 0.017697788774967194, 0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>rec_cfn1112cibvc11jnn2r0</td>\n",
       "      <td>lumalabs</td>\n",
       "      <td>Luma Labs enables users to explore 3D modeling...</td>\n",
       "      <td>Luma Labs is a website that offers an early ex...</td>\n",
       "      <td>Imagine 3D V1.2 (Alpha)</td>\n",
       "      <td>456963</td>\n",
       "      <td>[0.0028436651919037104, 0.003491099225357175, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rec_cfn1112cibvc11jnn2rg</td>\n",
       "      <td>plask</td>\n",
       "      <td>Make motion capture from video easier and more...</td>\n",
       "      <td>Plask is an AI-powered mocap animation tool th...</td>\n",
       "      <td>Ai-Powered Mocap Animation Tool.</td>\n",
       "      <td>90960</td>\n",
       "      <td>[-0.08536490797996521, -0.05372241884469986, 0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>rec_cfn1112cibvc11jnn2s0</td>\n",
       "      <td>AI Room Planner</td>\n",
       "      <td>Get hundreds of interior design ideas for your...</td>\n",
       "      <td>AI Room Planner is an online platform that uti...</td>\n",
       "      <td>Interior Design By Ai</td>\n",
       "      <td>211540</td>\n",
       "      <td>[0.020655963569879532, 0.028269633650779724, 0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>rec_cfn1112cibvc11jnn2sg</td>\n",
       "      <td>AI TWO</td>\n",
       "      <td>A platform powered by AI to help you create be...</td>\n",
       "      <td>AI TWO is a website that provides a platform f...</td>\n",
       "      <td>Aitwo.Co - The Ai-Powered All-In-One Design Pl...</td>\n",
       "      <td>7201</td>\n",
       "      <td>[-0.02213478274643421, -0.03189412131905556, 0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>846</th>\n",
       "      <td>rec_cod2au57l1i4603r3hvg</td>\n",
       "      <td>Scott Krager</td>\n",
       "      <td>Thumbnails.com uses AI to generate dozens of u...</td>\n",
       "      <td>Unlock the power of eye-catching thumbnails wi...</td>\n",
       "      <td>Thumbnails.com</td>\n",
       "      <td>0</td>\n",
       "      <td>[-0.07755479961633682, -0.05978638306260109, -...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>847</th>\n",
       "      <td>rec_codntepuqmnhe7ku1ing</td>\n",
       "      <td>Nen Fard</td>\n",
       "      <td>StockTune: AI-powered, public-domain music for...</td>\n",
       "      <td>\\nStockTune is a revolutionary platform offeri...</td>\n",
       "      <td>StockTune</td>\n",
       "      <td>0</td>\n",
       "      <td>[-0.03542690351605415, -0.057283081114292145, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>848</th>\n",
       "      <td>rec_codr709uqmnhe7ku1te0</td>\n",
       "      <td>Nen Fard</td>\n",
       "      <td>StockCake: Free, AI-generated stock photos in...</td>\n",
       "      <td>StockCake is a revolutionary stock photo site ...</td>\n",
       "      <td>StockCake</td>\n",
       "      <td>0</td>\n",
       "      <td>[0.005515319295227528, -0.025487307459115982, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>849</th>\n",
       "      <td>rec_coidgc9uqmnhe7l0eug0</td>\n",
       "      <td>Jason West</td>\n",
       "      <td>FastBots enables anyone to quickly create a po...</td>\n",
       "      <td>FastBots is a no-code AI chatbot builder for b...</td>\n",
       "      <td>FastBots</td>\n",
       "      <td>0</td>\n",
       "      <td>[-0.0814945250749588, -0.006074093747884035, -...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>850</th>\n",
       "      <td>rec_coj3l5aa8o7fb0ajha0g</td>\n",
       "      <td>Dubformer</td>\n",
       "      <td>AI-driven translation and dubbing services</td>\n",
       "      <td>Dubformer is an end-to-end innovative service ...</td>\n",
       "      <td>AI dubbing and video translation solution</td>\n",
       "      <td>0</td>\n",
       "      <td>[-0.09128136932849884, -0.05604198947548866, 0...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>851 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           id             name  \\\n",
       "0    rec_cfn1112cibvc11jnn2qg       TextureLab   \n",
       "1    rec_cfn1112cibvc11jnn2r0         lumalabs   \n",
       "2    rec_cfn1112cibvc11jnn2rg            plask   \n",
       "3    rec_cfn1112cibvc11jnn2s0  AI Room Planner   \n",
       "4    rec_cfn1112cibvc11jnn2sg           AI TWO   \n",
       "..                        ...              ...   \n",
       "846  rec_cod2au57l1i4603r3hvg     Scott Krager   \n",
       "847  rec_codntepuqmnhe7ku1ing         Nen Fard   \n",
       "848  rec_codr709uqmnhe7ku1te0         Nen Fard   \n",
       "849  rec_coidgc9uqmnhe7l0eug0       Jason West   \n",
       "850  rec_coj3l5aa8o7fb0ajha0g        Dubformer   \n",
       "\n",
       "                                           description  \\\n",
       "0    Generate 3D textures for your game in seconds ...   \n",
       "1    Luma Labs enables users to explore 3D modeling...   \n",
       "2    Make motion capture from video easier and more...   \n",
       "3    Get hundreds of interior design ideas for your...   \n",
       "4    A platform powered by AI to help you create be...   \n",
       "..                                                 ...   \n",
       "846  Thumbnails.com uses AI to generate dozens of u...   \n",
       "847  StockTune: AI-powered, public-domain music for...   \n",
       "848   StockCake: Free, AI-generated stock photos in...   \n",
       "849  FastBots enables anyone to quickly create a po...   \n",
       "850         AI-driven translation and dubbing services   \n",
       "\n",
       "                                               summary  \\\n",
       "0    TextureLab is a website that provides 3D textu...   \n",
       "1    Luma Labs is a website that offers an early ex...   \n",
       "2    Plask is an AI-powered mocap animation tool th...   \n",
       "3    AI Room Planner is an online platform that uti...   \n",
       "4    AI TWO is a website that provides a platform f...   \n",
       "..                                                 ...   \n",
       "846  Unlock the power of eye-catching thumbnails wi...   \n",
       "847  \\nStockTune is a revolutionary platform offeri...   \n",
       "848  StockCake is a revolutionary stock photo site ...   \n",
       "849  FastBots is a no-code AI chatbot builder for b...   \n",
       "850  Dubformer is an end-to-end innovative service ...   \n",
       "\n",
       "                                                 title  visitors  \\\n",
       "0    Instant And Unique 3D Textures For Your Next G...     23913   \n",
       "1                              Imagine 3D V1.2 (Alpha)    456963   \n",
       "2                     Ai-Powered Mocap Animation Tool.     90960   \n",
       "3                                Interior Design By Ai    211540   \n",
       "4    Aitwo.Co - The Ai-Powered All-In-One Design Pl...      7201   \n",
       "..                                                 ...       ...   \n",
       "846                                     Thumbnails.com         0   \n",
       "847                                          StockTune         0   \n",
       "848                                          StockCake         0   \n",
       "849                                           FastBots         0   \n",
       "850          AI dubbing and video translation solution         0   \n",
       "\n",
       "                                 description_embedding  \n",
       "0    [-0.06802839040756226, 0.017697788774967194, 0...  \n",
       "1    [0.0028436651919037104, 0.003491099225357175, ...  \n",
       "2    [-0.08536490797996521, -0.05372241884469986, 0...  \n",
       "3    [0.020655963569879532, 0.028269633650779724, 0...  \n",
       "4    [-0.02213478274643421, -0.03189412131905556, 0...  \n",
       "..                                                 ...  \n",
       "846  [-0.07755479961633682, -0.05978638306260109, -...  \n",
       "847  [-0.03542690351605415, -0.057283081114292145, ...  \n",
       "848  [0.005515319295227528, -0.025487307459115982, ...  \n",
       "849  [-0.0814945250749588, -0.006074093747884035, -...  \n",
       "850  [-0.09128136932849884, -0.05604198947548866, 0...  \n",
       "\n",
       "[851 rows x 7 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.query()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0MOZtXLniJLe",
   "metadata": {
    "id": "0MOZtXLniJLe"
   },
   "source": [
    "# 6. Perform Similarity Search Using the Hugging Face Inference API"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "V6KjGhJOANyf",
   "metadata": {
    "id": "V6KjGhJOANyf"
   },
   "source": [
    "## Embed the Query with the Hugging Face Inference API\n",
    "Use the Hugging Face Inference API to embed the query so that it can be used to search our index\n",
    "\n",
    "!! Note that you might need to run this cell a few times as it takes a few seconds for the model to be ready."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a31f878",
   "metadata": {
    "id": "4a31f878"
   },
   "outputs": [],
   "source": [
    "# Perform a search using Hugging Face embeddings\n",
    "import requests\n",
    "\n",
    "# make sure your URL looks like this to ensure you get instant results, and not a model loading error\n",
    "embedding_url = \"https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5/pipeline/feature-extraction\"\n",
    "\n",
    "def generate_query_embedding(text: str) -> list[float]:\n",
    "    response = requests.post(\n",
    "        embedding_url,\n",
    "        headers={\"Authorization\": f\"Bearer {HF_TOKEN}\", \"x-wait-for-model\": \"true\"},        \n",
    "        json={\"inputs\": text}\n",
    "    )\n",
    "\n",
    "    if response.status_code != 200:\n",
    "        raise ValueError(f\"Request failed with status code {response.status_code}: {response.text}\")\n",
    "    return response.json()\n",
    "\n",
    "query = \"AI tool for creating 3D textures\"\n",
    "query_embedding = generate_query_embedding(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "Wrp9GUwEZmvT",
   "metadata": {
    "id": "Wrp9GUwEZmvT"
   },
   "source": [
    "## Run the query with our query embedding"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "Xumxvz7gaGGp",
   "metadata": {
    "id": "Xumxvz7gaGGp"
   },
   "source": [
    "We are searching based on the description for the most relevant startups to the query. Remember that \"hnsw_index\" is the index name we created when defining our index before creating the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "FDrpIofxZl4Z",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 299
    },
    "id": "FDrpIofxZl4Z",
    "outputId": "2d5259a5-762a-4ef4-b37c-e8d6ec9d0f68"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>__nn_distance</th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>description</th>\n",
       "      <th>summary</th>\n",
       "      <th>title</th>\n",
       "      <th>visitors</th>\n",
       "      <th>description_embedding</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.25221</td>\n",
       "      <td>rec_cfn1112cibvc11jnn2qg</td>\n",
       "      <td>TextureLab</td>\n",
       "      <td>Generate 3D textures for your game in seconds ...</td>\n",
       "      <td>TextureLab is a website that provides 3D textu...</td>\n",
       "      <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
       "      <td>23913</td>\n",
       "      <td>[-0.06802839040756226, 0.017697788774967194, 0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.26723</td>\n",
       "      <td>rec_cfn11a2cibvc11jnndbg</td>\n",
       "      <td>Ponzu.gg</td>\n",
       "      <td>Create realistic 3D images with AI-generated t...</td>\n",
       "      <td>Ponzu is a website that helps 3D artists and d...</td>\n",
       "      <td>Ponzu.</td>\n",
       "      <td>6526</td>\n",
       "      <td>[-0.06463481485843658, -0.014672131277620792, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.34271</td>\n",
       "      <td>rec_cfn119acibvc11jnncf0</td>\n",
       "      <td>Masterpiece Studio</td>\n",
       "      <td>Create 3D models with Generative AI and deploy...</td>\n",
       "      <td>Masterpiece Studio is a company that has devel...</td>\n",
       "      <td>Masterpiece Studio.</td>\n",
       "      <td>38954</td>\n",
       "      <td>[-0.04131263867020607, -0.0035701903980225325,...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   __nn_distance                        id                name  \\\n",
       "0        0.25221  rec_cfn1112cibvc11jnn2qg          TextureLab   \n",
       "1        0.26723  rec_cfn11a2cibvc11jnndbg            Ponzu.gg   \n",
       "2        0.34271  rec_cfn119acibvc11jnncf0  Masterpiece Studio   \n",
       "\n",
       "                                         description  \\\n",
       "0  Generate 3D textures for your game in seconds ...   \n",
       "1  Create realistic 3D images with AI-generated t...   \n",
       "2  Create 3D models with Generative AI and deploy...   \n",
       "\n",
       "                                             summary  \\\n",
       "0  TextureLab is a website that provides 3D textu...   \n",
       "1  Ponzu is a website that helps 3D artists and d...   \n",
       "2  Masterpiece Studio is a company that has devel...   \n",
       "\n",
       "                                               title  visitors  \\\n",
       "0  Instant And Unique 3D Textures For Your Next G...     23913   \n",
       "1                                             Ponzu.      6526   \n",
       "2                                Masterpiece Studio.     38954   \n",
       "\n",
       "                               description_embedding  \n",
       "0  [-0.06802839040756226, 0.017697788774967194, 0...  \n",
       "1  [-0.06463481485843658, -0.014672131277620792, ...  \n",
       "2  [-0.04131263867020607, -0.0035701903980225325,...  "
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results = table.search(vectors={\"hnsw_index\":[query_embedding]},n=3,)\n",
    "\n",
    "results[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8aed9bc-72b2-4e70-b763-e7ce054557db",
   "metadata": {
    "id": "d8aed9bc-72b2-4e70-b763-e7ce054557db"
   },
   "source": [
    "# 7. Delete the KDB.AI Table & Database to Conserve Resources\n",
    "\n",
    "\n",
    "We can use `table.drop()` to delete a table, and db.drop() to delete the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
    "outputId": "ba89c1d4-997e-46e7-97d6-e33a8e50d51e"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.drop()\n",
    "db.drop()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bc6d801-1371-48d0-98b4-0baa53bc8446",
   "metadata": {
    "id": "8bc6d801-1371-48d0-98b4-0baa53bc8446"
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Warning:</b> Once you drop a table, you cannot use it again.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "RU2pzQCAn-Wm",
   "metadata": {
    "id": "RU2pzQCAn-Wm"
   },
   "source": [
    "## Take Our Survey\n",
    "\n",
    "We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.\n",
    "\n",
    "[**Take the Survey**](https://delighted.com/t/UGvwprmK)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7672241-42b0-4798-90d7-95aa9fefe68c",
   "metadata": {
    "id": "a7672241-42b0-4798-90d7-95aa9fefe68c"
   },
   "source": [
    "## Next Steps\n",
    "\n",
    "Now that you’re successfully making indexes with KDB.AI, you can start inserting your own data or view more examples:\n",
    "- [PDF Document Search](../document_search)\n",
    "- [MRI Image Search](../image_search)\n",
    "- [Music Recommendation System](../music_recommendation)\n",
    "- [Sensor Pattern Matching](../pattern_matching)\n",
    "- [Retrieval Augmented Generation with LangChain](../retrieval_augmented_generation)\n",
    "- [Sentiment Analysis of Reviews](../sentiment_analysis)"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: KDB.AI_course/README.md
================================================
# KDB.AI Course

Welcome to the KDB.AI course! This course combines custom content with existing examples from the KDB.AI samples repository.

## Course Outline

1. Introduction to KDB.AI
   - [Introduction](./course_specific_content/making_queries.ipynb)
   - [Managing Tables](./course_specific_content/managing_tables.ipynb)

2. Advanced Search Techniques
   - [Hybrid Search](../hybrid_search/hybrid_search_inflation.ipynb)
   - [Temporal Similarity Search (Non-Transformed)](../TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb)
   - [Temporal Similarity Search (Transformed)](../TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb)

3. Retrieval Augmented Generation
   - [RAG Example](./course_specific_content/rag_example.ipynb)

## Note on Referenced Notebooks

Some notebooks in this course are referenced from other parts of the repository. This ensures you're always working with the most up-to-date versions of these examples. For a full list of referenced notebooks and their locations, please see [notebook_references.md](./notebook_references.md).

================================================
FILE: KDB.AI_course/course_specific_content/making_queries.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CpIrSWxiuFxX"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "[Video Walkthrough](https://www.youtube.com/watch?v=0kpseJLbEP4&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=7)\n",
    "\n",
    "In this section of the course, we will focus on querying and searching data in KDB.AI tables. By the end of this notebook, you will have a thorough understanding of the following:\n",
    "- Selecting tables to query\n",
    "- Performing queries and applying filters\n",
    "- Customizing filters\n",
    "- Conducting similarity searches\n",
    "- Processing query results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3nESqJz6uP_7"
   },
   "source": [
    "### Setup\n",
    "\n",
    "Install kdbai_client and import the necessary dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "HBtJhVzlagJt",
    "outputId": "e7e59f12-d603-4104-9ddf-c0022202dc9b"
   },
   "outputs": [],
   "source": [
    "!pip install kdbai_client fastembed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "gitkLFwlag8H"
   },
   "outputs": [],
   "source": [
    "import kdbai_client as kdbai\n",
    "import time\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from fastembed import TextEmbedding\n",
    "import os\n",
    "import getpass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "GoGkIrL9ugk7"
   },
   "source": [
    "##### Connect to KDB.AI Server\n",
    "With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
    "\n",
    "To use KDB.AI Server, you will need download and run your own container.\n",
    "To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
    "\n",
    "You will receive an email with the required license file and bearer token needed to download your instance.\n",
    "Follow instructions in the signup email to get your session up and running.\n",
    "\n",
    "Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "2ApVTaRvajlt",
    "outputId": "90b29cb5-7629-446d-bef2-c66fe35a256f"
   },
   "outputs": [],
   "source": [
    "#Set up KDB.AI server endpoint \n",
    "KDBAI_ENDPOINT = (\n",
    "    os.environ[\"KDBAI_ENDPOINT\"]\n",
    "    if \"KDBAI_ENDPOINT\" in os.environ\n",
    "    else \"http://localhost:8082\"\n",
    ")\n",
    "\n",
    "\n",
    "#connect to KDB.AI Server, default mode is qipc\n",
    "session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Ey1mSxH2an1e"
   },
   "outputs": [],
   "source": [
    "database = session.database(\"default\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "8l8qWymnunY7"
   },
   "source": [
    "### Create Our Table and Insert Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qNQVDDtZapTQ"
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    database.table(\"data\").drop() # Drop the table if it already exists\n",
    "except kdbai.KDBAIException:\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "wRoZgm4LxR0Z"
   },
   "source": [
    "##### Define a Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "574BY9ZQax4p"
   },
   "outputs": [],
   "source": [
    "schema = [\n",
    "    {'name': 'id', 'type': 'int32'},\n",
    "    {'name': 'name', 'type': 'str'},\n",
    "    {'name': 'age', 'type': 'int16'},\n",
    "    {'name': 'city', 'type': 'str'},\n",
    "    {'name': 'description', 'type': 'str'},\n",
    "    {'name': 'embeddings', 'type': 'float32s'}\n",
    "]\n",
    "index_name = 'hnws_index'\n",
    "indexes = [{'name': index_name, 'column': 'embeddings', 'type': 'hnsw', 'params': {'dims': 384}}]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "stFJG7sCa19_"
   },
   "outputs": [],
   "source": [
    "table = database.create_table(\"data\", schema=schema, indexes=indexes)\n",
    "\n",
    "# Generate real vectors using FastEmbed\n",
    "descriptions = [\n",
    "    \"A passionate environmentalist with 5 years of experience in conservation projects and enjoys hiking and outdoor activities.\",\n",
    "    \"A software engineer with 7 years of experience in full-stack development, living in London, who loves to cook Italian cuisine.\",\n",
    "    \"A guitarist with over 10 years of experience performing at local cafes and enjoys reading science fiction.\",\n",
    "    \"A data scientist in Tokyo with 4 years of experience in machine learning and a keen interest in AI research.\",\n",
    "    \"An avid reader and travel blogger with 3 years of experience visiting and writing about historic sites around the world.\",\n",
    "    \"A graphic designer based in Berlin with 8 years of experience and a talent for creating digital art.\",\n",
    "    \"A high school teacher with 15 years of experience in education who loves cycling and participates in charity rides.\",\n",
    "    \"A professional photographer with 6 years of experience specializing in wildlife photography.\",\n",
    "    \"A fitness trainer with 5 years of experience who enjoys helping people achieve their health goals.\",\n",
    "    \"A chef with 12 years of experience who runs a popular restaurant and enjoys experimenting with new recipes.\",\n",
    "    \"A journalist with 9 years of experience writing about technology and enjoys exploring new gadgets.\",\n",
    "    \"A musician with 20 years of experience who plays multiple instruments and performs in a jazz band.\",\n",
    "    \"A software developer with 6 years of experience in creating mobile apps and enjoys coding challenges.\",\n",
    "    \"An artist with 10 years of experience who paints abstract pieces and has exhibited in several galleries.\",\n",
    "    \"A historian with 7 years of experience who loves researching and writing about ancient civilizations.\",\n",
    "    \"A marketing manager with 8 years of experience in digital marketing and social media strategy.\",\n",
    "    \"A nurse with 12 years of experience in emergency care and patient management.\",\n",
    "    \"A financial analyst with 5 years of experience in investment banking and portfolio management.\",\n",
    "    \"A project manager with 10 years of experience in IT project coordination and execution.\",\n",
    "    \"A UX designer with 6 years of experience in creating user-friendly interfaces for web and mobile applications.\",\n",
    "    \"A sales executive with 8 years of experience in B2B sales and client relationship management.\",\n",
    "    \"A content writer with 5 years of experience in creating engaging articles and blog posts.\",\n",
    "    \"A civil engineer with 10 years of experience in infrastructure development and urban planning.\",\n",
    "    \"A teacher with 15 years of experience in primary education and curriculum development.\",\n",
    "    \"A business analyst with 7 years of experience in business process optimization and data analysis.\",\n",
    "    \"A psychologist with 6 years of experience in clinical practice and mental health counseling.\",\n",
    "    \"A software architect with 9 years of experience in designing scalable software solutions.\",\n",
    "    \"A research scientist with 8 years of experience in biotechnology and genetic engineering.\",\n",
    "    \"An operations manager with 12 years of experience in supply chain management and logistics.\",\n",
    "    \"A public relations specialist with 7 years of experience in media relations and corporate communications.\"\n",
    "]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zk_CkMaPvGwv"
   },
   "source": [
    "##### Define an Embedding Model and Embed People Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49
    },
    "collapsed": true,
    "id": "AsTjc7_ca39U",
    "outputId": "91abba63-224f-4fa7-bdc5-948e7a008a13"
   },
   "outputs": [],
   "source": [
    "embedding_model = TextEmbedding()\n",
    "embeddings = list(embedding_model.embed(descriptions))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "s2JneJwna7SP"
   },
   "outputs": [],
   "source": [
    "import random\n",
    "random.seed(42) # for reproducibility\n",
    "\n",
    "names = [\"Alice\", \"Bob\", \"Charlie\", \"Monica\", \"Eve\", \"Frank\", \"Grace\", \"Hannah\", \"Ivy\", \"Jack\", \"Kara\", \"Leo\", \"Mia\", \"Nate\", \"Olivia\", \"Paul\", \"Quinn\", \"Rita\", \"Sam\", \"Tina\", \"Uma\", \"Victor\", \"Wendy\", \"Xander\", \"Yara\", \"Zane\", \"Alice\", \"Cody\", \"Diana\", \"Ethan\"]\n",
    "cities = [\"New York\", \"London\", \"New York\", \"Paris\", \"Berlin\", \"New York\", \"San Francisco\", \"Amsterdam\", \"Rome\", \"Toronto\", \"Chicago\", \"Barcelona\", \"Madrid\", \"New York\", \"Moscow\", \"Dubai\", \"Singapore\", \"New York\", \"Istanbul\", \"Munich\", \"Vienna\", \"Dublin\", \"Zurich\", \"Stockholm\", \"Lisbon\", \"Prague\", \"Budapest\", \"Berlin\", \"Copenhagen\", \"Seoul\"]\n",
    "\n",
    "\n",
    "data = pd.DataFrame({\n",
    "    'id': np.array(list(range(0, 30)), dtype='int32'),\n",
    "    'name': names,\n",
    "    'age': np.array([random.randint(18, 60) for _ in range(30)], dtype='int16'),\n",
    "    'city': cities,\n",
    "    'description': descriptions,\n",
    "    'embeddings': embeddings\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9JCMSVFOxmkg"
   },
   "source": [
    "##### Insert the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZCg1E3hka9SE",
    "outputId": "da6d6a02-5a2d-406c-f4cb-cf68d7349192"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 196,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.insert(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5-92TfnnxyUx"
   },
   "source": [
    "### Query Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "collapsed": true,
    "id": "JY4aJ6Jla_vM",
    "outputId": "37f36ee3-9524-4199-fd7d-1fe6df512fd1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All data in the table:\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"table\",\n  \"rows\": 30,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 30,\n        \"samples\": [\n          27,\n          15,\n          23\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"name\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 29,\n        \"samples\": [\n          \"Diana\",\n          \"Quinn\",\n          \"Mia\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"age\",\n      \"properties\": {\n        \"dtype\": \"int16\",\n        \"num_unique_values\": 22,\n        \"samples\": [\n          58,\n          31,\n          52\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"city\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 25,\n        \"samples\": [\n          \"Chicago\",\n          \"Vienna\",\n          \"New York\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"description\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 30,\n        \"samples\": [\n          \"A research scientist with 8 years of experience in biotechnology and genetic engineering.\",\n          \"A marketing manager with 8 years of experience in digital marketing and social media strategy.\",\n          \"A teacher with 15 years of experience in primary education and curriculum development.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"embeddings\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-97fcf1bd-13ad-4779-9029-a721eda12bdc\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>description</th>\n",
       "      <th>embeddings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Alice</td>\n",
       "      <td>58</td>\n",
       "      <td>New York</td>\n",
       "      <td>A passionate environmentalist with 5 years of ...</td>\n",
       "      <td>[-0.006158471, 0.063678846, 0.09181005, -0.023...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Bob</td>\n",
       "      <td>25</td>\n",
       "      <td>London</td>\n",
       "      <td>A software engineer with 7 years of experience...</td>\n",
       "      <td>[-0.035581246, 0.07986437, 0.04891828, -0.0604...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Charlie</td>\n",
       "      <td>19</td>\n",
       "      <td>New York</td>\n",
       "      <td>A guitarist with over 10 years of experience p...</td>\n",
       "      <td>[0.050266247, 0.05255312, 0.048840936, -0.0032...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Monica</td>\n",
       "      <td>35</td>\n",
       "      <td>Paris</td>\n",
       "      <td>A data scientist in Tokyo with 4 years of expe...</td>\n",
       "      <td>[-0.008097345, 0.030305384, 0.012246384, -0.04...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Eve</td>\n",
       "      <td>33</td>\n",
       "      <td>Berlin</td>\n",
       "      <td>An avid reader and travel blogger with 3 years...</td>\n",
       "      <td>[0.029772803, 0.07571457, 0.042140756, 0.06809...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>Frank</td>\n",
       "      <td>32</td>\n",
       "      <td>New York</td>\n",
       "      <td>A graphic designer based in Berlin with 8 year...</td>\n",
       "      <td>[0.013257692, 0.045190323, 0.0074770325, -0.00...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>Grace</td>\n",
       "      <td>26</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>A high school teacher with 15 years of experie...</td>\n",
       "      <td>[-0.011028861, 0.051242497, 0.063257486, -0.05...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>Hannah</td>\n",
       "      <td>24</td>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>A professional photographer with 6 years of ex...</td>\n",
       "      <td>[0.04469839, 0.07050187, 0.046390466, -0.03404...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>Ivy</td>\n",
       "      <td>52</td>\n",
       "      <td>Rome</td>\n",
       "      <td>A fitness trainer with 5 years of experience w...</td>\n",
       "      <td>[0.0002550126, 0.024398372, 0.09861772, 0.0062...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>Jack</td>\n",
       "      <td>23</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>A chef with 12 years of experience who runs a ...</td>\n",
       "      <td>[-0.008186043, 0.051337104, 0.02683556, -0.030...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10</td>\n",
       "      <td>Kara</td>\n",
       "      <td>55</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>A journalist with 9 years of experience writin...</td>\n",
       "      <td>[-0.017909497, 0.08548332, 0.0022086229, -0.04...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>11</td>\n",
       "      <td>Leo</td>\n",
       "      <td>45</td>\n",
       "      <td>Barcelona</td>\n",
       "      <td>A musician with 20 years of experience who pla...</td>\n",
       "      <td>[0.008686635, 0.03110498, 0.05405915, -0.07571...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>12</td>\n",
       "      <td>Mia</td>\n",
       "      <td>20</td>\n",
       "      <td>Madrid</td>\n",
       "      <td>A software developer with 6 years of experienc...</td>\n",
       "      <td>[-0.04372146, 0.06704399, 0.022140108, -0.1017...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>13</td>\n",
       "      <td>Nate</td>\n",
       "      <td>19</td>\n",
       "      <td>New York</td>\n",
       "      <td>An artist with 10 years of experience who pain...</td>\n",
       "      <td>[0.01933304, 0.023277232, 0.044062667, 0.01242...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>14</td>\n",
       "      <td>Olivia</td>\n",
       "      <td>23</td>\n",
       "      <td>Moscow</td>\n",
       "      <td>A historian with 7 years of experience who lov...</td>\n",
       "      <td>[-0.0051849326, 0.16519417, 0.06066864, 0.0311...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>15</td>\n",
       "      <td>Paul</td>\n",
       "      <td>31</td>\n",
       "      <td>Dubai</td>\n",
       "      <td>A marketing manager with 8 years of experience...</td>\n",
       "      <td>[0.010789718, 0.017695278, 0.018274685, -0.033...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>16</td>\n",
       "      <td>Quinn</td>\n",
       "      <td>32</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>A nurse with 12 years of experience in emergen...</td>\n",
       "      <td>[-0.041632365, 0.034463193, 0.06313535, 0.0160...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>17</td>\n",
       "      <td>Rita</td>\n",
       "      <td>50</td>\n",
       "      <td>New York</td>\n",
       "      <td>A financial analyst with 5 years of experience...</td>\n",
       "      <td>[0.015000028, 0.024906091, 0.0010010687, 0.011...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>18</td>\n",
       "      <td>Sam</td>\n",
       "      <td>56</td>\n",
       "      <td>Istanbul</td>\n",
       "      <td>A project manager with 10 years of experience ...</td>\n",
       "      <td>[-0.020330371, 0.079401195, 0.02162953, -0.080...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>19</td>\n",
       "      <td>Tina</td>\n",
       "      <td>19</td>\n",
       "      <td>Munich</td>\n",
       "      <td>A UX designer with 6 years of experience in cr...</td>\n",
       "      <td>[-0.030572662, 0.04520395, 0.04553928, -0.0925...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>20</td>\n",
       "      <td>Uma</td>\n",
       "      <td>53</td>\n",
       "      <td>Vienna</td>\n",
       "      <td>A sales executive with 8 years of experience i...</td>\n",
       "      <td>[-0.014194918, 0.032352123, -0.0070426096, -0....</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>21</td>\n",
       "      <td>Victor</td>\n",
       "      <td>30</td>\n",
       "      <td>Dublin</td>\n",
       "      <td>A content writer with 5 years of experience in...</td>\n",
       "      <td>[-0.018195461, 0.032041155, 0.059233848, -0.03...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>22</td>\n",
       "      <td>Wendy</td>\n",
       "      <td>59</td>\n",
       "      <td>Zurich</td>\n",
       "      <td>A civil engineer with 10 years of experience i...</td>\n",
       "      <td>[-0.00980266, 0.04713828, 0.05187823, -0.03932...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>23</td>\n",
       "      <td>Xander</td>\n",
       "      <td>52</td>\n",
       "      <td>Stockholm</td>\n",
       "      <td>A teacher with 15 years of experience in prima...</td>\n",
       "      <td>[-0.013646452, 0.028070105, 0.05104053, -0.064...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>24</td>\n",
       "      <td>Yara</td>\n",
       "      <td>44</td>\n",
       "      <td>Lisbon</td>\n",
       "      <td>A business analyst with 7 years of experience ...</td>\n",
       "      <td>[-0.044623584, 0.054378174, 0.0015794634, -0.0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>25</td>\n",
       "      <td>Zane</td>\n",
       "      <td>32</td>\n",
       "      <td>Prague</td>\n",
       "      <td>A psychologist with 6 years of experience in c...</td>\n",
       "      <td>[0.016778275, 0.09543604, 0.048281595, -0.0022...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>26</td>\n",
       "      <td>Alice</td>\n",
       "      <td>46</td>\n",
       "      <td>Budapest</td>\n",
       "      <td>A software architect with 9 years of experienc...</td>\n",
       "      <td>[-0.06051296, 0.031862404, -0.031203829, -0.07...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>27</td>\n",
       "      <td>Cody</td>\n",
       "      <td>55</td>\n",
       "      <td>Berlin</td>\n",
       "      <td>A research scientist with 8 years of experienc...</td>\n",
       "      <td>[-0.01787689, 0.07915241, -0.004790489, -0.031...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>28</td>\n",
       "      <td>Diana</td>\n",
       "      <td>35</td>\n",
       "      <td>Copenhagen</td>\n",
       "      <td>An operations manager with 12 years of experie...</td>\n",
       "      <td>[0.011406942, 0.02994747, 0.06136875, -0.02639...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>29</td>\n",
       "      <td>Ethan</td>\n",
       "      <td>18</td>\n",
       "      <td>Seoul</td>\n",
       "      <td>A public relations specialist with 7 years of ...</td>\n",
       "      <td>[-0.001325855, 0.089781284, 0.05144235, -0.036...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-97fcf1bd-13ad-4779-9029-a721eda12bdc')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-97fcf1bd-13ad-4779-9029-a721eda12bdc button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-97fcf1bd-13ad-4779-9029-a721eda12bdc');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-626a5255-223c-4f1e-b16c-eaa273fbb22f\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-626a5255-223c-4f1e-b16c-eaa273fbb22f')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-626a5255-223c-4f1e-b16c-eaa273fbb22f button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "    id     name  age           city  \\\n",
       "0    0    Alice   58       New York   \n",
       "1    1      Bob   25         London   \n",
       "2    2  Charlie   19       New York   \n",
       "3    3   Monica   35          Paris   \n",
       "4    4      Eve   33         Berlin   \n",
       "5    5    Frank   32       New York   \n",
       "6    6    Grace   26  San Francisco   \n",
       "7    7   Hannah   24      Amsterdam   \n",
       "8    8      Ivy   52           Rome   \n",
       "9    9     Jack   23        Toronto   \n",
       "10  10     Kara   55        Chicago   \n",
       "11  11      Leo   45      Barcelona   \n",
       "12  12      Mia   20         Madrid   \n",
       "13  13     Nate   19       New York   \n",
       "14  14   Olivia   23         Moscow   \n",
       "15  15     Paul   31          Dubai   \n",
       "16  16    Quinn   32      Singapore   \n",
       "17  17     Rita   50       New York   \n",
       "18  18      Sam   56       Istanbul   \n",
       "19  19     Tina   19         Munich   \n",
       "20  20      Uma   53         Vienna   \n",
       "21  21   Victor   30         Dublin   \n",
       "22  22    Wendy   59         Zurich   \n",
       "23  23   Xander   52      Stockholm   \n",
       "24  24     Yara   44         Lisbon   \n",
       "25  25     Zane   32         Prague   \n",
       "26  26    Alice   46       Budapest   \n",
       "27  27     Cody   55         Berlin   \n",
       "28  28    Diana   35     Copenhagen   \n",
       "29  29    Ethan   18          Seoul   \n",
       "\n",
       "                                          description  \\\n",
       "0   A passionate environmentalist with 5 years of ...   \n",
       "1   A software engineer with 7 years of experience...   \n",
       "2   A guitarist with over 10 years of experience p...   \n",
       "3   A data scientist in Tokyo with 4 years of expe...   \n",
       "4   An avid reader and travel blogger with 3 years...   \n",
       "5   A graphic designer based in Berlin with 8 year...   \n",
       "6   A high school teacher with 15 years of experie...   \n",
       "7   A professional photographer with 6 years of ex...   \n",
       "8   A fitness trainer with 5 years of experience w...   \n",
       "9   A chef with 12 years of experience who runs a ...   \n",
       "10  A journalist with 9 years of experience writin...   \n",
       "11  A musician with 20 years of experience who pla...   \n",
       "12  A software developer with 6 years of experienc...   \n",
       "13  An artist with 10 years of experience who pain...   \n",
       "14  A historian with 7 years of experience who lov...   \n",
       "15  A marketing manager with 8 years of experience...   \n",
       "16  A nurse with 12 years of experience in emergen...   \n",
       "17  A financial analyst with 5 years of experience...   \n",
       "18  A project manager with 10 years of experience ...   \n",
       "19  A UX designer with 6 years of experience in cr...   \n",
       "20  A sales executive with 8 years of experience i...   \n",
       "21  A content writer with 5 years of experience in...   \n",
       "22  A civil engineer with 10 years of experience i...   \n",
       "23  A teacher with 15 years of experience in prima...   \n",
       "24  A business analyst with 7 years of experience ...   \n",
       "25  A psychologist with 6 years of experience in c...   \n",
       "26  A software architect with 9 years of experienc...   \n",
       "27  A research scientist with 8 years of experienc...   \n",
       "28  An operations manager with 12 years of experie...   \n",
       "29  A public relations specialist with 7 years of ...   \n",
       "\n",
       "                                           embeddings  \n",
       "0   [-0.006158471, 0.063678846, 0.09181005, -0.023...  \n",
       "1   [-0.035581246, 0.07986437, 0.04891828, -0.0604...  \n",
       "2   [0.050266247, 0.05255312, 0.048840936, -0.0032...  \n",
       "3   [-0.008097345, 0.030305384, 0.012246384, -0.04...  \n",
       "4   [0.029772803, 0.07571457, 0.042140756, 0.06809...  \n",
       "5   [0.013257692, 0.045190323, 0.0074770325, -0.00...  \n",
       "6   [-0.011028861, 0.051242497, 0.063257486, -0.05...  \n",
       "7   [0.04469839, 0.07050187, 0.046390466, -0.03404...  \n",
       "8   [0.0002550126, 0.024398372, 0.09861772, 0.0062...  \n",
       "9   [-0.008186043, 0.051337104, 0.02683556, -0.030...  \n",
       "10  [-0.017909497, 0.08548332, 0.0022086229, -0.04...  \n",
       "11  [0.008686635, 0.03110498, 0.05405915, -0.07571...  \n",
       "12  [-0.04372146, 0.06704399, 0.022140108, -0.1017...  \n",
       "13  [0.01933304, 0.023277232, 0.044062667, 0.01242...  \n",
       "14  [-0.0051849326, 0.16519417, 0.06066864, 0.0311...  \n",
       "15  [0.010789718, 0.017695278, 0.018274685, -0.033...  \n",
       "16  [-0.041632365, 0.034463193, 0.06313535, 0.0160...  \n",
       "17  [0.015000028, 0.024906091, 0.0010010687, 0.011...  \n",
       "18  [-0.020330371, 0.079401195, 0.02162953, -0.080...  \n",
       "19  [-0.030572662, 0.04520395, 0.04553928, -0.0925...  \n",
       "20  [-0.014194918, 0.032352123, -0.0070426096, -0....  \n",
       "21  [-0.018195461, 0.032041155, 0.059233848, -0.03...  \n",
       "22  [-0.00980266, 0.04713828, 0.05187823, -0.03932...  \n",
       "23  [-0.013646452, 0.028070105, 0.05104053, -0.064...  \n",
       "24  [-0.044623584, 0.054378174, 0.0015794634, -0.0...  \n",
       "25  [0.016778275, 0.09543604, 0.048281595, -0.0022...  \n",
       "26  [-0.06051296, 0.031862404, -0.031203829, -0.07...  \n",
       "27  [-0.01787689, 0.07915241, -0.004790489, -0.031...  \n",
       "28  [0.011406942, 0.02994747, 0.06136875, -0.02639...  \n",
       "29  [-0.001325855, 0.089781284, 0.05144235, -0.036...  "
      ]
     },
     "execution_count": 185,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"All data in the table:\")\n",
    "table.query()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1IqGPc21yGc3"
   },
   "source": [
    "##### Query with Filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 164
    },
    "collapsed": true,
    "id": "9FdMeiywbBpp",
    "outputId": "c1650759-cef5-4857-8a37-d0f1a1823c47"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Querying data where age >= 30 and city is Rome or Paris:\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"table\",\n  \"rows\": 2,\n  \"fields\": [\n    {\n      \"column\": \"id\",\n      \"properties\": {\n        \"dtype\": \"int32\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          8,\n          3\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"name\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Ivy\",\n          \"Monica\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"age\",\n      \"properties\": {\n        \"dtype\": \"int16\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          52,\n          35\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"city\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Rome\",\n          \"Paris\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"description\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"A fitness trainer with 5 years of experience who enjoys helping people achieve their health goals.\",\n          \"A data scientist in Tokyo with 4 years of experience in machine learning and a keen interest in AI research.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"embeddings\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-508297d5-d28a-47b8-ab66-a37a12fda125\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "      <th>city</th>\n",
       "      <th>description</th>\n",
       "      <th>embeddings</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>Monica</td>\n",
       "      <td>35</td>\n",
       "      <td>Paris</td>\n",
       "      <td>A data scientist in Tokyo with 4 years of expe...</td>\n",
       "      <td>[-0.008097345, 0.030305384, 0.012246384, -0.04...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>8</td>\n",
       "      <td>Ivy</td>\n",
       "      <td>52</td>\n",
       "      <td>Rome</td>\n",
       "      <td>A fitness trainer with 5 years of experience w...</td>\n",
       "      <td>[0.0002550126, 0.024398372, 0.09861772, 0.0062...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-508297d5-d28a-47b8-ab66-a37a12fda125')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-508297d5-d28a-47b8-ab66-a37a12fda125 button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-508297d5-d28a-47b8-ab66-a37a12fda125');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "   id    name  age   city                                        description  \\\n",
       "0   3  Monica   35  Paris  A data scientist in Tokyo with 4 years of expe...   \n",
       "1   8     Ivy   52   Rome  A fitness trainer with 5 years of experience w...   \n",
       "\n",
       "                                          embeddings  \n",
       "0  [-0.008097345, 0.030305384, 0.012246384, -0.04...  \n",
       "1  [0.0002550126, 0.024398372, 0.09861772, 0.0062...  "
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Querying data where age >= 30 and city is Rome or Paris:\")\n",
    "table.query(filter=[(\">=\", \"age\", 30), (\"in\", \"city\", [\"Rome\", \"Paris\"])])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "oDnVcKV5bML1",
    "outputId": "01fa9a05-c554-4270-fd78-b04c3b493d6d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Aggregated query for names and cities:\n",
      "       name           city\n",
      "0     Alice       New York\n",
      "1       Bob         London\n",
      "2   Charlie       New York\n",
      "3    Monica          Paris\n",
      "4       Eve         Berlin\n",
      "5     Frank       New York\n",
      "6     Grace  San Francisco\n",
      "7    Hannah      Amsterdam\n",
      "8       Ivy           Rome\n",
      "9      Jack        Toronto\n",
      "10     Kara        Chicago\n",
      "11      Leo      Barcelona\n",
      "12      Mia         Madrid\n",
      "13     Nate       New York\n",
      "14   Olivia         Moscow\n",
      "15     Paul          Dubai\n",
      "16    Quinn      Singapore\n",
      "17     Rita       New York\n",
      "18      Sam       Istanbul\n",
      "19     Tina         Munich\n",
      "20      Uma         Vienna\n",
      "21   Victor         Dublin\n",
      "22    Wendy         Zurich\n",
      "23   Xander      Stockholm\n",
      "24     Yara         Lisbon\n",
      "25     Zane         Prague\n",
      "26    Alice       Budapest\n",
      "27     Cody         Berlin\n",
      "28    Diana     Copenhagen\n",
      "29    Ethan          Seoul\n"
     ]
    }
   ],
   "source": [
    "print(\"Aggregated query for names and cities:\")\n",
    "print(table.query(aggs={\"name\": \"name\", \"city\": \"city\"})) # returns only the names and cities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "dyj4HxfUbTW4",
    "outputId": "8ecf8d36-9cf7-4b56-a270-edd1b567325c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Aggregated query for maximum age grouped by city:\n",
      "             city  maxAge\n",
      "0           Seoul      18\n",
      "1          Munich      19\n",
      "2          Madrid      20\n",
      "3          Moscow      23\n",
      "4         Toronto      23\n",
      "5       Amsterdam      24\n",
      "6          London      25\n",
      "7   San Francisco      26\n",
      "8          Dublin      30\n",
      "9           Dubai      31\n",
      "10         Prague      32\n",
      "11      Singapore      32\n",
      "12     Copenhagen      35\n",
      "13          Paris      35\n",
      "14         Lisbon      44\n",
      "15      Barcelona      45\n",
      "16       Budapest      46\n",
      "17           Rome      52\n",
      "18      Stockholm      52\n",
      "19         Vienna      53\n",
      "20         Berlin      55\n",
      "21        Chicago      55\n",
      "22       Istanbul      56\n",
      "23       New York      58\n",
      "24         Zurich      59\n"
     ]
    }
   ],
   "source": [
    "print(\"Aggregated query for maximum age grouped by city:\")\n",
    "print(table.query(aggs={'maxAge': ['max', 'age']}, group_by=['city'], sort_columns=['maxAge']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 851
    },
    "collapsed": true,
    "id": "Ex8dE0cmbUXy",
    "outputId": "8340e11b-c1d4-4734-b7b0-040a93a0ac6a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Aggregated query for average age and count of distict people in each city:\n"
     ]
    },
    {
     "data": {
      "application/vnd.google.colaboratory.intrinsic+json": {
       "summary": "{\n  \"name\": \"table\",\n  \"rows\": 25,\n  \"fields\": [\n    {\n      \"column\": \"city\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 25,\n        \"samples\": [\n          \"Dublin\",\n          \"Lisbon\",\n          \"Seoul\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"avgAge\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 13.004219827937904,\n        \"min\": 18.0,\n        \"max\": 59.0,\n        \"num_unique_values\": 20,\n        \"samples\": [\n          18.0,\n          55.0,\n          52.0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"countCity\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 1,\n        \"max\": 5,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          1,\n          5,\n          2\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
       "type": "dataframe"
      },
      "text/html": [
       "\n",
       "  <div id=\"df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea\" class=\"colab-df-container\">\n",
       "    <div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>avgAge</th>\n",
       "      <th>countCity</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Seoul</td>\n",
       "      <td>18.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Munich</td>\n",
       "      <td>19.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Madrid</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Moscow</td>\n",
       "      <td>23.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Toronto</td>\n",
       "      <td>23.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Amsterdam</td>\n",
       "      <td>24.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>London</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>26.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Dublin</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Dubai</td>\n",
       "      <td>31.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Prague</td>\n",
       "      <td>32.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Singapore</td>\n",
       "      <td>32.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Copenhagen</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Paris</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>New York</td>\n",
       "      <td>35.6</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Berlin</td>\n",
       "      <td>44.0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Lisbon</td>\n",
       "      <td>44.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Barcelona</td>\n",
       "      <td>45.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Budapest</td>\n",
       "      <td>46.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Rome</td>\n",
       "      <td>52.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Stockholm</td>\n",
       "      <td>52.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Vienna</td>\n",
       "      <td>53.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>55.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Istanbul</td>\n",
       "      <td>56.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Zurich</td>\n",
       "      <td>59.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>\n",
       "    <div class=\"colab-df-buttons\">\n",
       "\n",
       "  <div class=\"colab-df-container\">\n",
       "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea')\"\n",
       "            title=\"Convert this dataframe to an interactive table.\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
       "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
       "  </svg>\n",
       "    </button>\n",
       "\n",
       "  <style>\n",
       "    .colab-df-container {\n",
       "      display:flex;\n",
       "      gap: 12px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert {\n",
       "      background-color: #E8F0FE;\n",
       "      border: none;\n",
       "      border-radius: 50%;\n",
       "      cursor: pointer;\n",
       "      display: none;\n",
       "      fill: #1967D2;\n",
       "      height: 32px;\n",
       "      padding: 0 0 0 0;\n",
       "      width: 32px;\n",
       "    }\n",
       "\n",
       "    .colab-df-convert:hover {\n",
       "      background-color: #E2EBFA;\n",
       "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "      fill: #174EA6;\n",
       "    }\n",
       "\n",
       "    .colab-df-buttons div {\n",
       "      margin-bottom: 4px;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert {\n",
       "      background-color: #3B4455;\n",
       "      fill: #D2E3FC;\n",
       "    }\n",
       "\n",
       "    [theme=dark] .colab-df-convert:hover {\n",
       "      background-color: #434B5C;\n",
       "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
       "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
       "      fill: #FFFFFF;\n",
       "    }\n",
       "  </style>\n",
       "\n",
       "    <script>\n",
       "      const buttonEl =\n",
       "        document.querySelector('#df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea button.colab-df-convert');\n",
       "      buttonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "\n",
       "      async function convertToInteractive(key) {\n",
       "        const element = document.querySelector('#df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea');\n",
       "        const dataTable =\n",
       "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
       "                                                    [key], {});\n",
       "        if (!dataTable) return;\n",
       "\n",
       "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
       "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
       "          + ' to learn more about interactive tables.';\n",
       "        element.innerHTML = '';\n",
       "        dataTable['output_type'] = 'display_data';\n",
       "        await google.colab.output.renderOutput(dataTable, element);\n",
       "        const docLink = document.createElement('div');\n",
       "        docLink.innerHTML = docLinkHtml;\n",
       "        element.appendChild(docLink);\n",
       "      }\n",
       "    </script>\n",
       "  </div>\n",
       "\n",
       "\n",
       "<div id=\"df-8a668902-d12b-4b27-8bea-abd518bdff99\">\n",
       "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8a668902-d12b-4b27-8bea-abd518bdff99')\"\n",
       "            title=\"Suggest charts\"\n",
       "            style=\"display:none;\">\n",
       "\n",
       "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
       "     width=\"24px\">\n",
       "    <g>\n",
       "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
       "    </g>\n",
       "</svg>\n",
       "  </button>\n",
       "\n",
       "<style>\n",
       "  .colab-df-quickchart {\n",
       "      --bg-color: #E8F0FE;\n",
       "      --fill-color: #1967D2;\n",
       "      --hover-bg-color: #E2EBFA;\n",
       "      --hover-fill-color: #174EA6;\n",
       "      --disabled-fill-color: #AAA;\n",
       "      --disabled-bg-color: #DDD;\n",
       "  }\n",
       "\n",
       "  [theme=dark] .colab-df-quickchart {\n",
       "      --bg-color: #3B4455;\n",
       "      --fill-color: #D2E3FC;\n",
       "      --hover-bg-color: #434B5C;\n",
       "      --hover-fill-color: #FFFFFF;\n",
       "      --disabled-bg-color: #3B4455;\n",
       "      --disabled-fill-color: #666;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart {\n",
       "    background-color: var(--bg-color);\n",
       "    border: none;\n",
       "    border-radius: 50%;\n",
       "    cursor: pointer;\n",
       "    display: none;\n",
       "    fill: var(--fill-color);\n",
       "    height: 32px;\n",
       "    padding: 0;\n",
       "    width: 32px;\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart:hover {\n",
       "    background-color: var(--hover-bg-color);\n",
       "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
       "    fill: var(--button-hover-fill-color);\n",
       "  }\n",
       "\n",
       "  .colab-df-quickchart-complete:disabled,\n",
       "  .colab-df-quickchart-complete:disabled:hover {\n",
       "    background-color: var(--disabled-bg-color);\n",
       "    fill: var(--disabled-fill-color);\n",
       "    box-shadow: none;\n",
       "  }\n",
       "\n",
       "  .colab-df-spinner {\n",
       "    border: 2px solid var(--fill-color);\n",
       "    border-color: transparent;\n",
       "    border-bottom-color: var(--fill-color);\n",
       "    animation:\n",
       "      spin 1s steps(1) infinite;\n",
       "  }\n",
       "\n",
       "  @keyframes spin {\n",
       "    0% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "      border-left-color: var(--fill-color);\n",
       "    }\n",
       "    20% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    30% {\n",
       "      border-color: transparent;\n",
       "      border-left-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    40% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-top-color: var(--fill-color);\n",
       "    }\n",
       "    60% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "    }\n",
       "    80% {\n",
       "      border-color: transparent;\n",
       "      border-right-color: var(--fill-color);\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "    90% {\n",
       "      border-color: transparent;\n",
       "      border-bottom-color: var(--fill-color);\n",
       "    }\n",
       "  }\n",
       "</style>\n",
       "\n",
       "  <script>\n",
       "    async function quickchart(key) {\n",
       "      const quickchartButtonEl =\n",
       "        document.querySelector('#' + key + ' button');\n",
       "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
       "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
       "      try {\n",
       "        const charts = await google.colab.kernel.invokeFunction(\n",
       "            'suggestCharts', [key], {});\n",
       "      } catch (error) {\n",
       "        console.error('Error during call to suggestCharts:', error);\n",
       "      }\n",
       "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
       "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
       "    }\n",
       "    (() => {\n",
       "      let quickchartButtonEl =\n",
       "        document.querySelector('#df-8a668902-d12b-4b27-8bea-abd518bdff99 button');\n",
       "      quickchartButtonEl.style.display =\n",
       "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
       "    })();\n",
       "  </script>\n",
       "</div>\n",
       "\n",
       "    </div>\n",
       "  </div>\n"
      ],
      "text/plain": [
       "             city  avgAge  countCity\n",
       "0           Seoul    18.0          1\n",
       "1          Munich    19.0          1\n",
       "2          Madrid    20.0          1\n",
       "3          Moscow    23.0          1\n",
       "4         Toronto    23.0          1\n",
       "5       Amsterdam    24.0          1\n",
       "6          London    25.0          1\n",
       "7   San Francisco    26.0          1\n",
       "8          Dublin    30.0          1\n",
       "9           Dubai    31.0          1\n",
       "10         Prague    32.0          1\n",
       "11      Singapore    32.0          1\n",
       "12     Copenhagen    35.0          1\n",
       "13          Paris    35.0          1\n",
       "14       New York    35.6          5\n",
       "15         Berlin    44.0          2\n",
       "16         Lisbon    44.0          1\n",
       "17      Barcelona    45.0          1\n",
       "18       Budapest    46.0          1\n",
       "19           Rome    52.0          1\n",
       "20      Stockholm    52.0          1\n",
       "21         Vienna    53.0          1\n",
       "22        Chicago    55.0          1\n",
       "23       Istanbul    56.0          1\n",
       "24         Zurich    59.0          1"
      ]
     },
     "execution_count": 199,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Aggregated query for average age and count of distict people in each city:\")\n",
    "table.query(aggs={'avgAge': ['avg', 'age'], 'countCity': ['count', 'id']}, group_by=['city'], sort_columns=['avgAge'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WKkXE7-yvtQf"
   },
   "source": [
    "##### Customizing Filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "0TR-GFQ7bW5_",
    "outputId": "ef98b740-aad6-4ebb-9266-36e8d49d6ccd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Querying data where age < 30 and name starts with H:\n",
      "   id    name  age       city  \\\n",
      "0   7  Hannah   24  Amsterdam   \n",
      "\n",
      "                                         description  \\\n",
      "0  A professional photographer with 6 years of ex...   \n",
      "\n",
      "                                          embeddings  \n",
      "0  [0.04469839, 0.07050187, 0.046390466, -0.03404...  \n"
     ]
    }
   ],
   "source": [
    "print(\"Querying data where age < 30 and name starts with H:\")\n",
    "print(table.query(filter=[(\"<\", \"age\", 30), (\"like\", \"name\", \"H*\")]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "ySsv3TZxbYbu",
    "outputId": "fbf7919c-9f77-4d7b-e108-6eda3b6d210d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Querying data where age > 30 and city is Rome or New York:\n",
      "   id   name  age      city  \\\n",
      "0   0  Alice   58  New York   \n",
      "1   5  Frank   32  New York   \n",
      "2   8    Ivy   52      Rome   \n",
      "3  17   Rita   50  New York   \n",
      "\n",
      "                                         description  \\\n",
      "0  A passionate environmentalist with 5 years of ...   \n",
      "1  A graphic designer based in Berlin with 8 year...   \n",
      "2  A fitness trainer with 5 years of experience w...   \n",
      "3  A financial analyst with 5 years of experience...   \n",
      "\n",
      "                                          embeddings  \n",
      "0  [-0.006158471, 0.063678846, 0.09181005, -0.023...  \n",
      "1  [0.013257692, 0.045190323, 0.0074770325, -0.00...  \n",
      "2  [0.0002550126, 0.024398372, 0.09861772, 0.0062...  \n",
      "3  [0.015000028, 0.024906091, 0.0010010687, 0.011...  \n"
     ]
    }
   ],
   "source": [
    "print(\"Querying data where age > 30 and city is Rome or New York:\")\n",
    "print(table.query(filter=[(\">\", \"age\", 30), (\"in\", \"city\", [\"Rome\", \"New York\"])]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "I3sX3ucWvzDZ"
   },
   "source": [
    "#### Vector Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cuxZlgX2wwHp"
   },
   "source": [
    "##### Embedding a Query Vector and Searching"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "c8KMGQO2i7gT"
   },
   "outputs": [],
   "source": [
    "person_query = \"a software engineer with lots of experience\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "dBjWSreqjGOd"
   },
   "outputs": [],
   "source": [
    "person_embedding = list(embedding_model.embed([person_query]))[0].tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "9Ud3vcsbjOJ4",
    "outputId": "8d11a7f1-57f8-425c-d676-c82c4065e167"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "384"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(person_embedding)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "Zhld2bOLbfn3",
    "outputId": "12fe2934-ed17-4d4e-91dd-04a3cc7859ff"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Searching for the three closest people to the example vector:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[   id   name  age      city  \\\n",
       " 0  26  Alice   46  Budapest   \n",
       " 1  12    Mia   20    Madrid   \n",
       " 2  19   Tina   19    Munich   \n",
       " \n",
       "                                          description  \\\n",
       " 0  A software architect with 9 years of experienc...   \n",
       " 1  A software developer with 6 years of experienc...   \n",
       " 2  A UX designer with 6 years of experience in cr...   \n",
       " \n",
       "                                           embeddings  __nn_distance  \n",
       " 0  [-0.06051296, 0.031862404, -0.031203829, -0.07...       0.322560  \n",
       " 1  [-0.04372146, 0.06704399, 0.022140108, -0.1017...       0.356629  \n",
       " 2  [-0.030572662, 0.04520395, 0.04553928, -0.0925...       0.457364  ]"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Searching for the three closest people to the example vector:\")\n",
    "table.search({index_name: [person_embedding]}, n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ashW6LLjv7zb"
   },
   "source": [
    "##### Batch Search with Multiple Query Vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "4GFgKzsCbhRg",
    "outputId": "9be28206-4142-4038-8fc6-6b87ecffcb29"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch search with multiple query vectors:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[   id   name  age      city  \\\n",
       " 0  26  Alice   46  Budapest   \n",
       " 1  12    Mia   20    Madrid   \n",
       " 2  19   Tina   19    Munich   \n",
       " \n",
       "                                          description  \\\n",
       " 0  A software architect with 9 years of experienc...   \n",
       " 1  A software developer with 6 years of experienc...   \n",
       " 2  A UX designer with 6 years of experience in cr...   \n",
       " \n",
       "                                           embeddings  __nn_distance  \n",
       " 0  [-0.06051296, 0.031862404, -0.031203829, -0.07...       0.322560  \n",
       " 1  [-0.04372146, 0.06704399, 0.022140108, -0.1017...       0.356629  \n",
       " 2  [-0.030572662, 0.04520395, 0.04553928, -0.0925...       0.457364  ,\n",
       "    id    name  age    city                                        description  \\\n",
       " 0   3  Monica   35   Paris  A data scientist in Tokyo with 4 years of expe...   \n",
       " 1  24    Yara   44  Lisbon  A business analyst with 7 years of experience ...   \n",
       " 2  27    Cody   55  Berlin  A research scientist with 8 years of experienc...   \n",
       " \n",
       "                                           embeddings  __nn_distance  \n",
       " 0  [-0.008097345, 0.030305384, 0.012246384, -0.04...       0.231473  \n",
       " 1  [-0.044623584, 0.054378174, 0.0015794634, -0.0...       0.617759  \n",
       " 2  [-0.01787689, 0.07915241, -0.004790489, -0.031...       0.642228  ]"
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Batch search with multiple query vectors:\")\n",
    "queries = [\"a software engineer with lots of experience\", \"a data scientist with experience in machine learning and a keen interest in AI research\"]\n",
    "queries_embeddings = list(embedding_model.embed(queries))\n",
    "table.search(vectors={index_name: [q.tolist() for q in queries_embeddings]}, n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nApqLnYdwIAF"
   },
   "source": [
    "##### Combining Aggregations with Vector Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "VtgsqjOWbisW",
    "outputId": "114c1f09-c68b-45b2-b673-132c72b7292e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Searching with aggregated results for name, city, and description:\n",
      "[    name      city                                        description\n",
      "0  Alice  Budapest  A software architect with 9 years of experienc...\n",
      "1    Mia    Madrid  A software developer with 6 years of experienc...\n",
      "2   Tina    Munich  A UX designer with 6 years of experience in cr...]\n"
     ]
    }
   ],
   "source": [
    "print(\"Searching with aggregated results for name, city, and description:\")\n",
    "print(table.search(vectors={index_name: [person_embedding]}, n=3, aggs={\"name\": \"name\", \"city\": \"city\", \"description\": \"description\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7b4GvZSrweRS"
   },
   "source": [
    "##### Vector Search with Filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "cSkZPE3FblJD",
    "outputId": "046c490f-8df8-47cd-8643-0667bcf8bd79"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Searching with filter to find people younger than 30:\n",
      "[    name  age                                        description\n",
      "0    Mia   20  A software developer with 6 years of experienc...\n",
      "1   Tina   19  A UX designer with 6 years of experience in cr...\n",
      "2    Bob   25  A software engineer with 7 years of experience...\n",
      "3  Ethan   18  A public relations specialist with 7 years of ...\n",
      "4   Jack   23  A chef with 12 years of experience who runs a ...]\n"
     ]
    }
   ],
   "source": [
    "print(\"Searching with filter to find people younger than 30:\")\n",
    "print(table.search(vectors={index_name: [person_embedding]}, n=5, filter=[(\"<\", \"age\", 30)], aggs={\"name\": \"name\", \"age\": \"age\", \"description\": \"description\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_uPg6NsgwgEc"
   },
   "source": [
    "### Drop Table To Conserve Resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "wzZvVkTBbmgg",
    "outputId": "84df09a6-b19b-4a46-8e1d-77b0b822590c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 179,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.drop()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}


================================================
FILE: KDB.AI_course/course_specific_content/managing_tables.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914",
   "metadata": {
    "id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914"
   },
   "source": [
    "# Managing Tables in KDB.AI\n",
    "[Video Walkthough](https://www.youtube.com/watch?v=XH5iNkcFKXc&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=6)\n",
    "\n",
    "##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).\n",
    "\n",
    "\n",
    "\n",
    "How to get started with the KDB.AI vector database. Here, you'll get a quick taste of KDB.AI in  ~10 minutes.\n",
    "\n",
    "You will learn how to:\n",
    "\n",
    "1. Connect to KDB.AI\n",
    "1. Create a KDB.AI Table\n",
    "1. Add Data to the KDB.AI Table\n",
    "1. Query the Table\n",
    "1. Perform Similarity Search\n",
    "1. Delete the KDB.AI Table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "260d0f4b-ef09-4bd2-a197-a9351be24684",
   "metadata": {
    "id": "260d0f4b-ef09-4bd2-a197-a9351be24684"
   },
   "source": [
    "## 0. Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1468bd3",
   "metadata": {
    "id": "d1468bd3"
   },
   "source": [
    "### Install dependencies\n",
    "\n",
    "In order to successfully run this sample, note the following steps depending on where you are running this notebook:\n",
    "\n",
    "-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.\n",
    "\n",
    "\n",
    "-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "491cd6d6",
   "metadata": {
    "id": "491cd6d6"
   },
   "outputs": [],
   "source": [
    "!pip install kdbai_client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc6d17b7",
   "metadata": {
    "id": "cc6d17b7"
   },
   "source": [
    "### Import Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "805d97da",
   "metadata": {
    "id": "805d97da"
   },
   "outputs": [],
   "source": [
    "# vector DB\n",
    "import os\n",
    "from getpass import getpass\n",
    "import kdbai_client as kdbai\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41",
   "metadata": {
    "id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c660c7d",
   "metadata": {
    "id": "8c660c7d"
   },
   "source": [
    "With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
    "\n",
    "### Connect to KDB.AI Server\n",
    "\n",
    "To use KDB.AI Server, you will need download and run your own container.\n",
    "To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
    "\n",
    "You will receive an email with the required license file and bearer token needed to download your instance.\n",
    "Follow instructions in the signup email to get your session up and running.\n",
    "\n",
    "Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e85c1ff",
   "metadata": {
    "id": "2e85c1ff"
   },
   "outputs": [],
   "source": [
    "#Set up KDB.AI server endpoint \n",
    "KDBAI_ENDPOINT = (\n",
    "    os.environ[\"KDBAI_ENDPOINT\"]\n",
    "    if \"KDBAI_ENDPOINT\" in os.environ\n",
    "    else \"http://localhost:8082\"\n",
    ")\n",
    "\n",
    "\n",
    "#connect to KDB.AI Server, default mode is qipc\n",
    "session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f330098",
   "metadata": {
    "id": "6f330098"
   },
   "outputs": [],
   "source": [
    "database = session.database(\"default\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ec2c77b",
   "metadata": {
    "id": "1ec2c77b"
   },
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>Need help understanding a function?</b><br/>\n",
    "Add ? before or after any function name in KDB.AI to bring up the documentation for that function along with sample code and arguments.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e54917b",
   "metadata": {
    "id": "6e54917b"
   },
   "outputs": [],
   "source": [
    "?kdbai.Session"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8788a6b1",
   "metadata": {
    "id": "8788a6b1"
   },
   "source": [
    "### Verify Defined Tables\n",
    "\n",
    "We can check our connection using the `session.list()` function.\n",
    "This will return a list of all the tables we have defined in our vector database thus far.\n",
    "This should return an empty list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97e5f4a9",
   "metadata": {
    "id": "97e5f4a9"
   },
   "outputs": [],
   "source": [
    "# ensure no table called \"data\" exists\n",
    "try:\n",
    "    database.table(\"data\").drop()\n",
    "except kdbai.KDBAIException:\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7877f51c",
   "metadata": {
    "id": "7877f51c",
    "outputId": "a6deb89e-0325-4686-f111-b611f5acb2e5"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "database.tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e33f03c3",
   "metadata": {
    "id": "e33f03c3"
   },
   "source": [
    "## 2. Create a KDB.AI Table\n",
    "\n",
    "To create a table we can use `create_table`, this function takes two mandatory arguments - the name and schema of the table.\n",
    "\n",
    "This schema must meet the following criteria:\n",
    "- It must contain a list of columns.\n",
    "- All columns must have `type` specified.\n",
    "\n",
    "If you want to create indexes, you must provide them as separate parameter.\n",
    "- It must contain a list of index definitions\n",
    "- Each index must have `name`, `colummn` and `type` attributes. Index-specific parameters can be passed in `params`, it's mandatory for some index types.\n",
    "\n",
    "Run `?database.create_table` for more details and sample code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f40400b",
   "metadata": {
    "id": "6f40400b"
   },
   "outputs": [],
   "source": [
    "?database.create_table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9da55253",
   "metadata": {
    "id": "9da55253"
   },
   "source": [
    "### Define Schema\n",
    "\n",
    "Our table will have two columns the first `id` with a list of dummy ID's, the second will be the vector embeddings we will use for similarity search later on in this example.\n",
    "\n",
    "We will define our dimensionality, similarity metric and index type in the `indexes` parameter. For this example we chose:\n",
    "- `dims = 8` : In the next section, we generate embeddings that are eight-dimensional to match this. You can chose any value here.\n",
    "- `metric = L2` : We chose [L2/Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance). Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like [IP/Inner Product](https://en.wikipedia.org/wiki/Inner_product_space) and [CS/Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) and the one you chose depends on the specific context and nature of your data.\n",
    "- `type = flat` : We use a [Flat index](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexFlat.html) here as we have a simple data structure so this is more than adequate. You have the choice of using other indexes like [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html) and [IVFPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexIVFPQ.html) here, as with metrics the one you chose depends your data and your overall performance requirements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5e8b782",
   "metadata": {
    "id": "e5e8b782"
   },
   "outputs": [],
   "source": [
    "schema = [\n",
    "    {\"name\": \"id\", \"type\": \"str\"},\n",
    "    {\"name\": \"vectors\", \"type\": \"float32s\"},\n",
    "]\n",
    "\n",
    "index_name = \"flat_index\"\n",
    "indexes = [{\"name\": index_name, \"column\": \"vectors\", \"type\": \"flat\", \"params\": {\"dims\": 8, \"metric\": \"L2\"}}]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09a5caa0",
   "metadata": {
    "id": "09a5caa0"
   },
   "source": [
    "### Create Table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "34067680",
   "metadata": {
    "id": "34067680"
   },
   "outputs": [],
   "source": [
    "table = database.create_table(\"data\", schema=schema, indexes=indexes)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20afbea1",
   "metadata": {
    "id": "20afbea1"
   },
   "source": [
    "## 3. Add Data to the KDB.AI Table\n",
    "\n",
    "First, generate a vector of five 8-dimensional vectors which will be the vector embeddings in this example. We will then add these to pandas dataframe with column names/types matching the target table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37581e86",
   "metadata": {
    "id": "37581e86"
   },
   "outputs": [],
   "source": [
    "# Create a NumPy array of 5 eight-dimensional float32 arrays\n",
    "vectors = np.array(\n",
    "    [\n",
    "        [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],\n",
    "        [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],\n",
    "        [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],\n",
    "        [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1],\n",
    "        [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2],\n",
    "    ],\n",
    "    dtype=np.float32,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5dc41e8",
   "metadata": {
    "id": "f5dc41e8"
   },
   "outputs": [],
   "source": [
    "# Example ID values\n",
    "ids = [\"h\", \"e\", \"l\", \"l\", \"o\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "730c9f08",
   "metadata": {
    "id": "730c9f08"
   },
   "outputs": [],
   "source": [
    "# column names/types matching the schema\n",
    "embeddings = pd.DataFrame({\"id\": ids, \"vectors\": list(vectors)})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a31f878",
   "metadata": {
    "id": "4a31f878",
    "outputId": "933caa30-7fd4-4d11-c717-ecfff36fa6c9"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>vectors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>h</td>\n",
       "      <td>[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>e</td>\n",
       "      <td>[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>l</td>\n",
       "      <td>[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>l</td>\n",
       "      <td>[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>o</td>\n",
       "      <td>[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  id                                   vectors\n",
       "0  h  [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]\n",
       "1  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
       "2  l  [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
       "3  l  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]\n",
       "4  o  [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43cd2ad8",
   "metadata": {
    "id": "43cd2ad8"
   },
   "source": [
    "We can now add data to our KDB.AI table using `insert`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7e0f8c5",
   "metadata": {
    "id": "b7e0f8c5"
   },
   "outputs": [],
   "source": [
    "table.insert(embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09577e8e",
   "metadata": {
    "id": "09577e8e"
   },
   "source": [
    "## 4. Query the Table\n",
    "\n",
    "We can use `query` to query data from the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4b8b8e5",
   "metadata": {
    "id": "f4b8b8e5",
    "outputId": "f96e7323-a9f0-4154-abef-dd012be6b1b9"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>vectors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>h</td>\n",
       "      <td>[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>e</td>\n",
       "      <td>[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>l</td>\n",
       "      <td>[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>l</td>\n",
       "      <td>[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>o</td>\n",
       "      <td>[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  id                                   vectors\n",
       "0  h  [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]\n",
       "1  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
       "2  l  [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
       "3  l  [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]\n",
       "4  o  [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.query()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c267a58",
   "metadata": {
    "id": "9c267a58"
   },
   "source": [
    "## 5. Perform Similarity Search\n",
    "\n",
    "Finally, let's perform similarity search on the table. We do this using the `search` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "595829ff",
   "metadata": {
    "id": "595829ff"
   },
   "outputs": [],
   "source": [
    "?table.search"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bb341f3",
   "metadata": {
    "id": "9bb341f3"
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Note:</b> The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9301c97",
   "metadata": {
    "id": "c9301c97",
    "outputId": "880dfe5d-86e2-4487-d3d6-771c7be40f57"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[  id                                   vectors  __nn_distance\n",
       " 0  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]           0.01]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find the closest neighbor of a single query vector\n",
    "table.search(vectors={index_name: [[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, n=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49758e9d",
   "metadata": {
    "id": "49758e9d"
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Note:</b> The output was a list of length one, matching the number of vectors we input to the search. This can be indexed on position [0] to extract the dataframe corresponding to the single input vector.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8aed9bc-72b2-4e70-b763-e7ce054557db",
   "metadata": {
    "id": "d8aed9bc-72b2-4e70-b763-e7ce054557db"
   },
   "source": [
    "## 6. Delete the KDB.AI Table\n",
    "\n",
    "We can use `table.drop()` to delete a table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
   "metadata": {
    "id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
    "outputId": "53a714f1-ad13-410c-dbdb-39e5a22e7a86"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.drop()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bc6d801-1371-48d0-98b4-0baa53bc8446",
   "metadata": {
    "id": "8bc6d801-1371-48d0-98b4-0baa53bc8446"
   },
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Warning:</b> Once you drop a table, you cannot use it again.\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: KDB.AI_course/course_specific_content/rag_example.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "on0mJqL80KsJ"
   },
   "source": [
    "## Introduction\n",
    "\n",
    "[Video Walkthrough](https://www.youtube.com/watch?v=Obbn15rZfvQ&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=13)\n",
    "\n",
    "This notebook demonstrates the implementation of a Retrieval-Augmented Generation (RAG) pipeline using KDB.AI and Large Language Models. By the end of this tutorial, you'll understand how to leverage vector databases and LLMs to create an effective RAG system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "s3Eb0JnV0lVJ"
   },
   "source": [
    "### Setup and Dependencies\n",
    "Install kdbai_client and import the necessary dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "x68BCmLZ15N2"
   },
   "source": [
    "##### Install Required Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "collapsed": true,
    "id": "OnhoXtx5ggta",
    "outputId": "1e69c47d-0034-47fb-fcd9-35ccade1d6d2"
   },
   "outputs": [],
   "source": [
    "# Install required libraries\n",
    "!pip install llama-index fastembed openai kdbai_client onnxruntime==1.19.2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LHMN8-Vd2ANx"
   },
   "source": [
    "##### Import Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "RHlEgCWExKo3"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "import kdbai_client as kdbai\n",
    "import time\n",
    "from llama_index.core import Document, SimpleDirectoryReader\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "import pandas as pd\n",
    "from fastembed import TextEmbedding\n",
    "import openai\n",
    "import textwrap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up OpenAI API key\n",
    "OPENAI_API_KEY = (\n",
    "    os.environ[\"OPENAI_API_KEY\"]\n",
    "    if \"OPENAI_API_KEY\" in os.environ\n",
    "    else getpass(\"OpenAI API key: \")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0aC8tMVy0vPv"
   },
   "source": [
    "With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
    "\n",
    "### Connect to KDB.AI Server\n",
    "\n",
    "To use KDB.AI Server, you will need download and run your own container.\n",
    "To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
    "\n",
    "You will receive an email with the required license file and bearer token needed to download your instance.\n",
    "Follow instructions in the signup email to get your session up and running.\n",
    "\n",
    "Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4rhRF58Wwxhj",
    "outputId": "355c7966-b409-4a52-f86c-b4f62755df97"
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "#Set up KDB.AI server endpoint \n",
    "KDBAI_ENDPOINT = (\n",
    "    os.environ[\"KDBAI_ENDPOINT\"]\n",
    "    if \"KDBAI_ENDPOINT\" in os.environ\n",
    "    else \"http://localhost:8082\"\n",
    ")\n",
    "\n",
    "\n",
    "#connect to KDB.AI Server, default mode is qipc\n",
    "session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TI-33lMv1LYi"
   },
   "source": [
    "##### Initialize Embedding Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 336
    },
    "id": "cemGdbEkufnu",
    "outputId": "4de02d2c-338e-4034-e784-c26a6abb8550"
   },
   "outputs": [],
   "source": [
    "fastembed = TextEmbedding()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sd3rjwg-kmWL"
   },
   "source": [
    "### Data Preparation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "24Sj5uYC2SBf"
   },
   "source": [
    "##### Download Dataset\n",
    "We'll use the Paul Graham Essay Dataset as our corpus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "ZgL3bM7GPkUa",
    "outputId": "0d2e7772-b5dd-4d02-a23d-9ce62d39343e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  4.37it/s]\n",
      "Successfully downloaded PaulGrahamEssayDataset to ./data\n"
     ]
    }
   ],
   "source": [
    "!llamaindex-cli download-llamadataset PaulGrahamEssayDataset --download-dir ./data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BXoGTn42lE-T"
   },
   "source": [
    "### Create a KDB.AI session and table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-gOC5u2KW32F"
   },
   "outputs": [],
   "source": [
    "KDBAI_TABLE_NAME = \"paul_graham\"\n",
    "database = session.database(\"default\")\n",
    "\n",
    "# Drop existing table if it exists\n",
    "try:\n",
    "    database.table(KDBAI_TABLE_NAME).drop()\n",
    "except kdbai.KDBAIException:\n",
    "    pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "SP9oocI1Z2j1"
   },
   "outputs": [],
   "source": [
    "# Define table schema\n",
    "\n",
    "schema = [\n",
    "    dict(name=\"text\", type=\"bytes\"),\n",
    "    dict(name=\"embedding\", type=\"float32s\")\n",
    "]\n",
    "index_name = \"flat_index\"\n",
    "indexes = [dict(name=index_name, column=\"embedding\", type=\"flat\", params=dict(metric=\"L2\", dims=384))]\n",
    "\n",
    "table = database.create_table(KDBAI_TABLE_NAME, schema=schema, indexes=indexes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "THCzKUyS3E2B"
   },
   "source": [
    "#### Load and Parse Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "mwFpmgDSzZ_A",
    "outputId": "78987b65-47af-4d4f-b405-50ad264bb041"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "46"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "node_parser = SentenceSplitter(chunk_size=500, chunk_overlap=100)\n",
    "essays = SimpleDirectoryReader(input_dir=\"./data/source_files\").load_data()\n",
    "docs = node_parser.get_nodes_from_documents(essays)\n",
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fpeYGnog3MLs"
   },
   "source": [
    "##### Generate Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49
    },
    "id": "gjhkkyqHaA5k",
    "outputId": "d7e8f17b-c3e1-406a-f8d0-e35bb22165cf"
   },
   "outputs": [],
   "source": [
    "embedding_model = TextEmbedding()\n",
    "documents = [doc.text for doc in docs]\n",
    "embeddings = list(embedding_model.embed(documents))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xMHUKw8FcYDZ",
    "outputId": "e05c1849-9647-4c9c-bca3-a7f6628bf7b0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.\n",
      "\n",
      "With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n",
      "\n",
      "The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n",
      "\n",
      "Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n",
      "\n",
      "Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n",
      "\n",
      "I couldn't have put this into words when I was 18.\n"
     ]
    }
   ],
   "source": [
    "print(documents[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OCWVBc0c3Sbs"
   },
   "source": [
    "#####  Insert Data into KDB.AI Table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "LcWJHw4caCt3"
   },
   "outputs": [],
   "source": [
    "records_to_insert_with_embeddings = pd.DataFrame({\n",
    "    \"text\": [d.encode('utf-8') for d in documents],\n",
    "    \"embedding\": embeddings\n",
    "})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "3TXly52oaGvE",
    "outputId": "3116011d-fdd9-4255-9086-73240d31e4f4"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'rowsInserted': 46}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.insert(records_to_insert_with_embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0QtreXjz3W_t"
   },
   "source": [
    "### RAG Implementation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bQUkhjTS3Ypm"
   },
   "source": [
    "##### Define Query and Generate Embedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "m-zgeI0BaLOc"
   },
   "outputs": [],
   "source": [
    "query = \"How does Paul Graham decide what to work on?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "id": "3-YRNxNjaNZT"
   },
   "outputs": [],
   "source": [
    "query_embedding = list(embedding_model.embed([query]))[0].tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zg8SklaJ3biG"
   },
   "source": [
    "##### Perform Vector Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "id": "gWSOhN6uaQzQ"
   },
   "outputs": [],
   "source": [
    "search_results = table.search({index_name: [query_embedding]}, n=10)\n",
    "search_results_df = search_results[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 842
    },
    "id": "BcM2f_KPaS72",
    "outputId": "e7397190-88b2-47a2-cd67-dc1c80a7d09d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top Search Results Based on Query: How does Paul Graham decide what to work on?\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>__nn_distance</th>\n",
       "      <th>text</th>\n",
       "      <th>embedding</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.823007</td>\n",
       "      <td>b'In late 2015 I spent 3 months writing essays, and when I went back to working on Bel I could barely understand the code. Not so much because it was badly written as because the problem is so convoluted. When you\\'re working on an interpreter written in itself, it\\'s hard to keep track of what\\'s happening at what level, and errors can be practically encrypted by the time you get them.\\n\\nSo I said no more essays till Bel was done. But I told few people about Bel while I was working on it. So for years it must have seemed that I was doing nothing, when in fact I was working harder than I\\'d ever worked on anything. Occasionally after wrestling for hours with some gruesome bug I\\'d check Twitter or HN and see someone asking \"Does Paul Graham still code?\"\\n\\nWorking on Bel was hard but satisfying. I worked on it so intensively that at any given time I had a decent chunk of the code in my head and could write more there. I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with some problem involving continuations while I watched them play in the tide pools. It felt like I was doing life right. I remember that because I was slightly dismayed at how novel it felt. The good news is that I had more moments like this over the next few years.\\n\\nIn the summer of 2016 we moved to England. We wanted our kids to see what it was like living in another country, and since I was a British citizen by birth, that seemed the obvious choice. We only meant to stay for a year, but we liked it so much that we still live there. So most of Bel was written in England.\\n\\nIn the fall of 2019, Bel was finally finished. Like McCarthy\\'s original Lisp, it\\'s a spec rather than an implementation, although like McCarthy\\'s Lisp it\\'s a spec expressed as code.\\n\\nNow that I could write essays again, I wrote a bunch about topics I\\'d had stacked up. I kept writing essays through 2020, but I also started to think about other things I could work on. How should I choose what to do?'</td>\n",
       "      <td>[-0.05267877, 0.005840427, -0.01187801, -0.028083289, 0.029767925, -0.01268333, -0.009753024, -0.011209541, 0.030792488, -0.07470311, 0.0005716741, 0.034681723, -0.0025648128, -0.007870674, -0.037071493, -0.0026503617, -0.030294443, -0.046712548, -0.026220752, -0.010382689, -0.047210008, 0.0039388337, -0.009324926, 0.04539282, 0.04298206, 0.051068194, 0.029527958, -0.012021941, -0.051774003, -0.20419116, -0.019487105, 0.03856181, 0.054865412, -0.024023462, 0.005628216, 0.059498444, -0.023029648, -0.011461271, 0.0007990732, 0.01532533, 0.013435846, 0.009714834, 0.010104686, -0.014338494, 0.004052569, 0.020879505, 0.0112869395, -0.048422333, 0.025670612, 0.033183247, -0.071020156, -0.032056253, -0.0013147242, 0.045764726, -0.023884403, 0.013609344, 0.021824384, 0.0791942, 0.0021155155, -0.0058458406, 0.022163069, -0.0010415328, -0.1377265, 0.05194325, -0.035091735, 0.020503322, -0.03358411, -0.039575316, -0.018544003, 0.07090187, -0.030203853, 0.0024145627, -0.050365325, 0.1062729, 0.04504893, 0.020158818, -0.0055481945, 0.0020900085, 0.014658697, -0.01600323, 0.018643875, -0.020128626, 0.001960821, 0.014573526, -0.018745624, -0.011082115, -0.026627902, 0.035287272, 0.033186108, 0.004842385, 0.04288919, -0.051519115, 0.021143924, 0.03511711, -0.032461487, -0.053802498, -2.9269107e-05, 0.022274038, -0.019326271, 0.5066904, ...]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.851789</td>\n",
       "      <td>b\"He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.\\n\\nShe died on January 15, 2014. We knew this was coming, but it was still hard when it did.\\n\\nI kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)\\n\\nWhat should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]\\n\\nI spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.\\n\\nI realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.\"</td>\n",
       "      <td>[-0.04173409, -0.020306244, 0.026670614, -0.028619805, 0.013841975, -0.004587492, -0.03740281, -0.0023207841, -0.005583664, -0.02458708, 0.032301717, -0.003981511, -0.0022139344, 0.040776156, 0.008303966, 0.065411426, -0.05266241, -0.0147317415, -0.013039435, -0.02108635, -0.08220996, -0.023095597, 0.009018569, -0.06593445, 0.053503707, 0.02561, -0.011278506, -0.029375598, -0.02894449, -0.17977206, 0.015862752, 0.037204675, 0.028550476, -0.008014831, 0.050124772, 0.053289328, -0.037882008, -0.004310019, -0.040979013, 0.031382367, -0.019382592, 0.041386265, -0.06535482, -0.03808074, 0.013384267, 0.010357172, 0.0032444543, -0.052392986, 0.042238504, 0.020043798, -0.028322041, -0.055793695, -0.011091505, 0.020135079, -0.003494716, 0.01618655, 0.08450317, 0.040414557, 0.032989975, 0.011764182, -0.013049825, -0.029259514, -0.102057606, 0.016020596, 0.016062474, 0.010199196, -0.009390674, -0.043287795, 0.034758028, 0.13968067, 0.025622727, 0.016510569, -0.02354023, 0.073845506, 0.009602881, -0.049839057, 0.022470307, 0.043024465, 0.0017405926, -0.028580481, 0.0027170023, 0.010050958, -0.013109462, 0.014532717, -0.04200619, 0.01677191, -0.07769759, 0.0073121856, 0.0189732, 0.08225239, 0.052873313, 0.020460907, 0.017190987, -0.025781311, -0.057865854, -0.015826138, 0.04352462, 0.040577717, -0.045354914, 0.47870147, ...]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   __nn_distance  \\\n",
       "0       0.823007   \n",
       "1       0.851789   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    text  \\\n",
       "0  b'In late 2015 I spent 3 months writing essays, and when I went back to working on Bel I could barely understand the code. Not so much because it was badly written as because the problem is so convoluted. When you\\'re working on an interpreter written in itself, it\\'s hard to keep track of what\\'s happening at what level, and errors can be practically encrypted by the time you get them.\\n\\nSo I said no more essays till Bel was done. But I told few people about Bel while I was working on it. So for years it must have seemed that I was doing nothing, when in fact I was working harder than I\\'d ever worked on anything. Occasionally after wrestling for hours with some gruesome bug I\\'d check Twitter or HN and see someone asking \"Does Paul Graham still code?\"\\n\\nWorking on Bel was hard but satisfying. I worked on it so intensively that at any given time I had a decent chunk of the code in my head and could write more there. I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with some problem involving continuations while I watched them play in the tide pools. It felt like I was doing life right. I remember that because I was slightly dismayed at how novel it felt. The good news is that I had more moments like this over the next few years.\\n\\nIn the summer of 2016 we moved to England. We wanted our kids to see what it was like living in another country, and since I was a British citizen by birth, that seemed the obvious choice. We only meant to stay for a year, but we liked it so much that we still live there. So most of Bel was written in England.\\n\\nIn the fall of 2019, Bel was finally finished. Like McCarthy\\'s original Lisp, it\\'s a spec rather than an implementation, although like McCarthy\\'s Lisp it\\'s a spec expressed as code.\\n\\nNow that I could write essays again, I wrote a bunch about topics I\\'d had stacked up. I kept writing essays through 2020, but I also started to think about other things I could work on. How should I choose what to do?'   \n",
       "1                                                                                                                                                                  b\"He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.\\n\\nShe died on January 15, 2014. We knew this was coming, but it was still hard when it did.\\n\\nI kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)\\n\\nWhat should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]\\n\\nI spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.\\n\\nI realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.\"   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            embedding  \n",
       "0  [-0.05267877, 0.005840427, -0.01187801, -0.028083289, 0.029767925, -0.01268333, -0.009753024, -0.011209541, 0.030792488, -0.07470311, 0.0005716741, 0.034681723, -0.0025648128, -0.007870674, -0.037071493, -0.0026503617, -0.030294443, -0.046712548, -0.026220752, -0.010382689, -0.047210008, 0.0039388337, -0.009324926, 0.04539282, 0.04298206, 0.051068194, 0.029527958, -0.012021941, -0.051774003, -0.20419116, -0.019487105, 0.03856181, 0.054865412, -0.024023462, 0.005628216, 0.059498444, -0.023029648, -0.011461271, 0.0007990732, 0.01532533, 0.013435846, 0.009714834, 0.010104686, -0.014338494, 0.004052569, 0.020879505, 0.0112869395, -0.048422333, 0.025670612, 0.033183247, -0.071020156, -0.032056253, -0.0013147242, 0.045764726, -0.023884403, 0.013609344, 0.021824384, 0.0791942, 0.0021155155, -0.0058458406, 0.022163069, -0.0010415328, -0.1377265, 0.05194325, -0.035091735, 0.020503322, -0.03358411, -0.039575316, -0.018544003, 0.07090187, -0.030203853, 0.0024145627, -0.050365325, 0.1062729, 0.04504893, 0.020158818, -0.0055481945, 0.0020900085, 0.014658697, -0.01600323, 0.018643875, -0.020128626, 0.001960821, 0.014573526, -0.018745624, -0.011082115, -0.026627902, 0.035287272, 0.033186108, 0.004842385, 0.04288919, -0.051519115, 0.021143924, 0.03511711, -0.032461487, -0.053802498, -2.9269107e-05, 0.022274038, -0.019326271, 0.5066904, ...]  \n",
       "1               [-0.04173409, -0.020306244, 0.026670614, -0.028619805, 0.013841975, -0.004587492, -0.03740281, -0.0023207841, -0.005583664, -0.02458708, 0.032301717, -0.003981511, -0.0022139344, 0.040776156, 0.008303966, 0.065411426, -0.05266241, -0.0147317415, -0.013039435, -0.02108635, -0.08220996, -0.023095597, 0.009018569, -0.06593445, 0.053503707, 0.02561, -0.011278506, -0.029375598, -0.02894449, -0.17977206, 0.015862752, 0.037204675, 0.028550476, -0.008014831, 0.050124772, 0.053289328, -0.037882008, -0.004310019, -0.040979013, 0.031382367, -0.019382592, 0.041386265, -0.06535482, -0.03808074, 0.013384267, 0.010357172, 0.0032444543, -0.052392986, 0.042238504, 0.020043798, -0.028322041, -0.055793695, -0.011091505, 0.020135079, -0.003494716, 0.01618655, 0.08450317, 0.040414557, 0.032989975, 0.011764182, -0.013049825, -0.029259514, -0.102057606, 0.016020596, 0.016062474, 0.010199196, -0.009390674, -0.043287795, 0.034758028, 0.13968067, 0.025622727, 0.016510569, -0.02354023, 0.073845506, 0.009602881, -0.049839057, 0.022470307, 0.043024465, 0.0017405926, -0.028580481, 0.0027170023, 0.010050958, -0.013109462, 0.014532717, -0.04200619, 0.01677191, -0.07769759, 0.0073121856, 0.0189732, 0.08225239, 0.052873313, 0.020460907, 0.017190987, -0.025781311, -0.057865854, -0.015826138, 0.04352462, 0.040577717, -0.045354914, 0.47870147, ...]  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option('display.max_colwidth', None)\n",
    "print(\"Top Search Results Based on Query:\", query)\n",
    "df = pd.DataFrame(search_results_df)\n",
    "df.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WnIeeDVJ3g47"
   },
   "source": [
    "##### RAG Function Definition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fhGl5YDXaXak"
   },
   "outputs": [],
   "source": [
    "def RAG(retrieved_data,prompt):\n",
    "    messages = \"Answer the following query in three sentences based on the context and only the context: \" + \"\\n\"\n",
    "    messages += prompt + \"\\n\"\n",
    "    if len(retrieved_data) > 0:\n",
    "        messages += \"Context: \" + \"\\n\"\n",
    "        for data in retrieved_data:\n",
    "            messages += data.decode('utf-8') + \"\\n\"\n",
    "    openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
    "    response = openai.chat.completions.create(\n",
    "        model=\"gpt-4o\",\n",
    "        messages=[\n",
    "            {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\"type\": \"text\", \"text\": messages},\n",
    "            ],\n",
    "            },\n",
    "        ],\n",
    "        max_tokens=300,\n",
    "    )\n",
    "    content = response.choices[0].message.content\n",
    "    return content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Y4yJL1kp3pOr"
   },
   "source": [
    "##### Execute RAG Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "id": "hqrg036n3yUs"
   },
   "outputs": [],
   "source": [
    "# Utility Function for Text Wrapping\n",
    "\n",
    "def print_wrapped(text, width=80):\n",
    "    wrapper = textwrap.TextWrapper(width=width)\n",
    "    word_list = wrapper.wrap(text=text)\n",
    "    for line in word_list:\n",
    "        print(line)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "bI2xx6IygIN5",
    "outputId": "9af82ab0-4637-4501-c398-f0042946ef4d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query: How does Paul Graham decide what to work on?\n",
      "Paul Graham decides what to work on based on a mix of personal interest, the\n",
      "desire to explore untapped potential in various fields, and the influence of\n",
      "pivotal moments and advice from close acquaintances. His transition from working\n",
      "on Y Combinator, painting, and writing essays to developing the Bel programming\n",
      "language and exploring startup ideas, such as the web app for creating web apps,\n",
      "reflects a combination of seeking out deeply engaging projects and responding to\n",
      "unsolicited advice from trusted collaborators that prompts reflection on his\n",
      "trajectory. Graham's choices are driven by the pursuit of projects that not only\n",
      "challenge him but also promise a significant impact or learning opportunity,\n",
      "reflecting a deliberate process of selection influenced by both internal\n",
      "motivations and external inputs.\n"
     ]
    }
   ],
   "source": [
    "print(\"Query:\", query)\n",
    "\n",
    "print_wrapped(RAG(search_results_df[\"text\"],query))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1TorfCfi37m2"
   },
   "source": [
    "### Drop Table To Conserve Resources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "tVmA5Dei36s3"
   },
   "outputs": [],
   "source": [
    "table.drop()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}


================================================
FILE: KDB.AI_course/notebook_references.md
================================================
# KDB.AI Course Notebook References

This document provides a comprehensive list of all notebooks used in the KDB.AI course, including both course-specific content and notebooks referenced from other parts of the repository. This ensures a single point of reference for all course materials.

## Course-Specific Content

These notebooks are located in the `KDB.AI_course/course_specific_content/` directory:

| Notebook Name | Description |
|---------------|-------------|
| [making_queries.ipynb](./course_specific_content/making_queries.ipynb) | Introduction to making queries in KDB.AI |
| [managing_tables.ipynb](./course_specific_content/managing_tables.ipynb) | Guide to managing tables in KDB.AI |
| [rag_example.ipynb](./course_specific_content/rag_example.ipynb) | Example of Retrieval Augmented Generation with KDB.AI |

## Referenced Notebooks

These notebooks are referenced from other parts of the repository:

| Course Section | Notebook Name | Location | Description |
|----------------|---------------|----------|-------------|
| Advanced Search Techniques | Hybrid Search | [/hybrid_search/hybrid_search_inflation.ipynb](../hybrid_search/hybrid_search_inflation.ipynb) | Demonstrates hybrid search techniques using inflation data |
| Advanced Search Techniques | Temporal Similarity Search (Non-Transformed) | [/TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb](../TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb) | Covers non-transformed temporal similarity search |
| Advanced Search Techniques | Temporal Similarity Search (Transformed) | [/TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb](../TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb) | Explores transformed temporal similarity search |

## Additional Resources

These notebooks, while not directly part of the course, provide valuable supplementary information:

| Notebook Name | Location | Description |
|---------------|----------|-------------|
| Python Quickstart | [/quickstarts/python_quickstart.ipynb](../quickstarts/python_quickstart.ipynb) | Quick introduction to using KDB.AI with Python |
| Document Search | [/document_search/document_search.ipynb](../document_search/document_search.ipynb) | Example of document search implementation |
| Image Search | [/image_search/image_search.ipynb](../image_search/image_search.ipynb) | Demonstration of image search capabilities |

## Maintenance Notes

- This file should be updated whenever notebooks are added, removed, or relocated within the course or the main repository.

================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specifi
Download .txt
gitextract_79vuf6ne/

├── .gitignore
├── HuggingFace_search/
│   └── huggingface_inference.ipynb
├── KDB.AI_course/
│   ├── README.md
│   ├── course_specific_content/
│   │   ├── making_queries.ipynb
│   │   ├── managing_tables.ipynb
│   │   └── rag_example.ipynb
│   └── notebook_references.md
├── LICENSE
├── LlamaIndex_advanced_RAG/
│   └── KDBAI_Advanced_RAG_Demo.ipynb
├── LlamaIndex_samples/
│   ├── Hybrid_Search_LlamaIndex_KDBAI.ipynb
│   ├── Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb
│   └── Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb
├── LlamaParse_pdf_RAG/
│   └── llamaParse_demo.ipynb
├── README.md
├── TSS_non_transformed/
│   ├── Non_Transformed_TSS_Technical_Analysis.ipynb
│   ├── Temporal_Similarity_Search_KDB+.ipynb
│   ├── Temporal_Similarity_Search_Non-Transformed_Demo.ipynb
│   ├── createHDB.q
│   └── data/
│       └── marketTrades.parquet
├── TSS_transformed/
│   ├── Temporal_Similarity_Search_Transformed_Demo.ipynb
│   ├── Transformed_TSS_pattern_matching.ipynb
│   └── data/
│       └── marketTrades.parquet
├── document_search/
│   └── document_search.ipynb
├── fuzzy_filtering_on_metadata/
│   └── fuzzy_filtering_demo.ipynb
├── hybrid_search/
│   ├── data/
│   │   └── inflation.txt
│   └── hybrid_search_inflation.ipynb
├── image_search/
│   └── image_search.ipynb
├── metadata_filtering/
│   ├── data/
│   │   └── filtered_embedded_movies.pkl
│   └── metadata_filtering_demo.ipynb
├── multi_index_multimodal_search/
│   ├── data/
│   │   ├── bat1.txt
│   │   ├── bat2.txt
│   │   ├── bear1.txt
│   │   ├── bear2.txt
│   │   ├── caterpillar1.txt
│   │   ├── caterpillar2.txt
│   │   ├── deer1.txt
│   │   ├── deer2.txt
│   │   ├── fox1.txt
│   │   ├── fox2.txt
│   │   ├── hedgehog1.txt
│   │   └── hedgehog2.txt
│   └── multi_index_multimodal_search.ipynb
├── multimodal_RAG_VoyageAI/
│   ├── Multimodal_RAG_VoyageAI.ipynb
│   └── data/
│       └── text/
│           ├── bat.txt
│           ├── bear.txt
│           ├── caterpillar.txt
│           ├── deer.txt
│           ├── fox.txt
│           └── hedgehog.txt
├── multimodal_RAG_unified_text/
│   ├── data/
│   │   └── text/
│   │       ├── bat.txt
│   │       ├── bear.txt
│   │       ├── caterpillar.txt
│   │       ├── deer.txt
│   │       ├── fox.txt
│   │       └── hedgehog.txt
│   └── multi_modal_demo.ipynb
├── music_recommendation/
│   ├── data/
│   │   └── song_data.csv
│   └── music_recommendation.ipynb
├── pattern_matching/
│   └── pattern_matching.ipynb
├── qFlat_index_pdf_search/
│   └── pdf_qFlat_Search.ipynb
├── qHnsw_index_pdf_search/
│   └── pdf_qHNSW_Search.ipynb
├── quickstarts/
│   └── python_quickstart.ipynb
├── requirements.txt
├── retrieval_augmented_generation/
│   ├── data/
│   │   └── state_of_the_union.txt
│   ├── retrieval_augmented_generation.ipynb
│   └── retrieval_augmented_generation_evaluation.ipynb
├── sentiment_analysis/
│   ├── data/
│   │   └── disneyland_reviews.csv
│   └── sentiment_analysis.ipynb
├── unstructured_io_RAG/
│   └── Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb
└── video_RAG/
    ├── video_RAG_TwelveLabs.ipynb
    └── video_RAG_VoyageAI.ipynb
Copy disabled (too large) Download .json
Condensed preview — 71 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (13,784K chars).
[
  {
    "path": ".gitignore",
    "chars": 37,
    "preview": "*.ipynb_checkpoints\n.venv/\n.DS_Store\n"
  },
  {
    "path": "HuggingFace_search/huggingface_inference.ipynb",
    "chars": 42501,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "KDB.AI_course/README.md",
    "chars": 1101,
    "preview": "# KDB.AI Course\n\nWelcome to the KDB.AI course! This course combines custom content with existing examples from the KDB.A"
  },
  {
    "path": "KDB.AI_course/course_specific_content/making_queries.ipynb",
    "chars": 86210,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"CpIrSWxiuFxX\"\n   },\n   \"source\": [\n    \"## Int"
  },
  {
    "path": "KDB.AI_course/course_specific_content/managing_tables.ipynb",
    "chars": 19792,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "KDB.AI_course/course_specific_content/rag_example.ipynb",
    "chars": 35329,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"on0mJqL80KsJ\"\n   },\n   \"source\": [\n    \"## Int"
  },
  {
    "path": "KDB.AI_course/notebook_references.md",
    "chars": 2596,
    "preview": "# KDB.AI Course Notebook References\n\nThis document provides a comprehensive list of all notebooks used in the KDB.AI cou"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "LlamaIndex_advanced_RAG/KDBAI_Advanced_RAG_Demo.ipynb",
    "chars": 38336,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"cb212650-86b7-4298-8bbd-c20a5227fbf0\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "LlamaIndex_samples/Hybrid_Search_LlamaIndex_KDBAI.ipynb",
    "chars": 54375,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"36bda246-f167-4114-a5d1-a053d8bb6faa\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "LlamaIndex_samples/Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb",
    "chars": 1315754,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"TTMDGImH5JOM\",\n   \"metadata\": {\n    \"id\": \"TTMDGImH5JOM\"\n   },\n "
  },
  {
    "path": "LlamaIndex_samples/Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb",
    "chars": 32862,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f134f72f-ba98-4eed-ac63-276c3057fa95\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "LlamaParse_pdf_RAG/llamaParse_demo.ipynb",
    "chars": 322948,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"NBD-4xlhnyxl\"\n   },\n   \"source\": [\n    \"## Par"
  },
  {
    "path": "README.md",
    "chars": 6858,
    "preview": "![KDB.AI Logo](https://kdb.ai/files/2024/01/kdbai-logo.svg)\n\nThe example [KDB.AI](https://kdb.ai) samples provided aim t"
  },
  {
    "path": "TSS_non_transformed/Non_Transformed_TSS_Technical_Analysis.ipynb",
    "chars": 550100,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"iSMlD8gdmdpz\"\n   },\n   \"source\": [\n    \"# Non-"
  },
  {
    "path": "TSS_non_transformed/Temporal_Similarity_Search_KDB+.ipynb",
    "chars": 1069208,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Temporal Similarity Search on KDB"
  },
  {
    "path": "TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb",
    "chars": 60013,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"58ae5b26-0b58-4416-a69d-f5662fce320d\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "TSS_non_transformed/createHDB.q",
    "chars": 3723,
    "preview": "dst:`:demo_hdb    / database root\n\nnumpart:10;\ned:2024.08.31; / end date the last date partition\ndts:{[ed;n] reverse n#d"
  },
  {
    "path": "TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb",
    "chars": 123849,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"58ae5b26-0b58-4416-a69d-f5662fce320d\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "TSS_transformed/Transformed_TSS_pattern_matching.ipynb",
    "chars": 31352,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b52fbdd6-10c5-4f52-b015-ce0f812c7d94\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "document_search/document_search.ipynb",
    "chars": 103286,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "fuzzy_filtering_on_metadata/fuzzy_filtering_demo.ipynb",
    "chars": 39510,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"WgVm-xwOXyhY\"\n   },\n   \"source\": [\n    \"# Fuzz"
  },
  {
    "path": "hybrid_search/data/inflation.txt",
    "chars": 13058,
    "preview": " At last year's Jackson Hole symposium, I delivered a brief, direct message. My remarks this year will be a bit longer, "
  },
  {
    "path": "hybrid_search/hybrid_search_inflation.ipynb",
    "chars": 142732,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"yxwE82TZNkco\"\n   },\n   \"source\": [\n    \"## Hyb"
  },
  {
    "path": "image_search/image_search.ipynb",
    "chars": 1386564,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"25588ca9-dc13-4136-962b-42e5d090fb31\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "metadata_filtering/metadata_filtering_demo.ipynb",
    "chars": 30130,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Metadata Filtering with KDB.AI Ve"
  },
  {
    "path": "multi_index_multimodal_search/data/bat1.txt",
    "chars": 66,
    "preview": "Bat with outstretched wings hovering over an apple on leafy branch"
  },
  {
    "path": "multi_index_multimodal_search/data/bat2.txt",
    "chars": 62,
    "preview": "a bat hanging upside down from a metal bar in an urban setting"
  },
  {
    "path": "multi_index_multimodal_search/data/bear1.txt",
    "chars": 54,
    "preview": "Brown bear walking through a forest with fallen leaves"
  },
  {
    "path": "multi_index_multimodal_search/data/bear2.txt",
    "chars": 57,
    "preview": "Close-up portrait of a grizzly bear's face and upper body"
  },
  {
    "path": "multi_index_multimodal_search/data/caterpillar1.txt",
    "chars": 44,
    "preview": "Bright green caterpillar on a wooden surface"
  },
  {
    "path": "multi_index_multimodal_search/data/caterpillar2.txt",
    "chars": 55,
    "preview": "Hairy caterpillar with spikes and spots on a green leaf"
  },
  {
    "path": "multi_index_multimodal_search/data/deer1.txt",
    "chars": 50,
    "preview": "Buck deer with antlers standing in misty grassland"
  },
  {
    "path": "multi_index_multimodal_search/data/deer2.txt",
    "chars": 59,
    "preview": "Side view of buck deer with large antlers in forest setting"
  },
  {
    "path": "multi_index_multimodal_search/data/fox1.txt",
    "chars": 45,
    "preview": "Fluffy fox curled up in snow with eyes closed"
  },
  {
    "path": "multi_index_multimodal_search/data/fox2.txt",
    "chars": 54,
    "preview": "Red fox with alert expression against rocky background"
  },
  {
    "path": "multi_index_multimodal_search/data/hedgehog1.txt",
    "chars": 71,
    "preview": "Small hedgehog curled up in someone's palm, surrounded by autumn colors"
  },
  {
    "path": "multi_index_multimodal_search/data/hedgehog2.txt",
    "chars": 61,
    "preview": "Hedgehog in a field of red flowers, some petals on its spines"
  },
  {
    "path": "multi_index_multimodal_search/multi_index_multimodal_search.ipynb",
    "chars": 19128,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"D0wSB_3mKl5D\"\n   },\n   \"source\": [\n    \"## Mul"
  },
  {
    "path": "multimodal_RAG_VoyageAI/Multimodal_RAG_VoyageAI.ipynb",
    "chars": 766939,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"J5EZ09A9XKSj\"\n   },\n   \"source\": [\n    \"## Mul"
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/bat.txt",
    "chars": 338,
    "preview": "Bats are the only mammals capable of sustained flight, distinct from flying squirrels which can only glide. They have wi"
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/bear.txt",
    "chars": 366,
    "preview": "Bears are large mammals with a stocky body, powerful limbs, and a short tail. They have a large brain and are considered"
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/caterpillar.txt",
    "chars": 328,
    "preview": "Caterpillars are the larval stage of butterflies and moths. They have a segmented body with a distinct head and typicall"
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/deer.txt",
    "chars": 314,
    "preview": "Deer are hoofed mammals known for their graceful bodies and long legs. Most male deer have antlers, which they shed and "
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/fox.txt",
    "chars": 342,
    "preview": "Foxes are small to medium-sized, omnivorous mammals belonging to the Canidae family, which also includes wolves, dogs, a"
  },
  {
    "path": "multimodal_RAG_VoyageAI/data/text/hedgehog.txt",
    "chars": 396,
    "preview": "Hedgehogs are small, nocturnal mammals known for their distinctive spines, which are modified hairs. These spines provid"
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/bat.txt",
    "chars": 338,
    "preview": "Bats are the only mammals capable of sustained flight, distinct from flying squirrels which can only glide. They have wi"
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/bear.txt",
    "chars": 366,
    "preview": "Bears are large mammals with a stocky body, powerful limbs, and a short tail. They have a large brain and are considered"
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/caterpillar.txt",
    "chars": 328,
    "preview": "Caterpillars are the larval stage of butterflies and moths. They have a segmented body with a distinct head and typicall"
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/deer.txt",
    "chars": 314,
    "preview": "Deer are hoofed mammals known for their graceful bodies and long legs. Most male deer have antlers, which they shed and "
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/fox.txt",
    "chars": 342,
    "preview": "Foxes are small to medium-sized, omnivorous mammals belonging to the Canidae family, which also includes wolves, dogs, a"
  },
  {
    "path": "multimodal_RAG_unified_text/data/text/hedgehog.txt",
    "chars": 396,
    "preview": "Hedgehogs are small, nocturnal mammals known for their distinctive spines, which are modified hairs. These spines provid"
  },
  {
    "path": "multimodal_RAG_unified_text/multi_modal_demo.ipynb",
    "chars": 248857,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Multimodal Retrieval Augmented Ge"
  },
  {
    "path": "music_recommendation/music_recommendation.ipynb",
    "chars": 155150,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1c9731fb\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Music Recomm"
  },
  {
    "path": "pattern_matching/pattern_matching.ipynb",
    "chars": 537894,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"b52fbdd6-10c5-4f52-b015-ce0f812c7d94\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "qFlat_index_pdf_search/pdf_qFlat_Search.ipynb",
    "chars": 35395,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "qHnsw_index_pdf_search/pdf_qHNSW_Search.ipynb",
    "chars": 85831,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "quickstarts/python_quickstart.ipynb",
    "chars": 66408,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "requirements.txt",
    "chars": 206,
    "preview": "gensim >= 4.3\njupyter >= 1.0\nkdbai_client >= 0.1.2\nmatplotlib >= 3.7\nopenai >= 0.28\npypdf >= 3.0\nsentence-transformers >"
  },
  {
    "path": "retrieval_augmented_generation/data/state_of_the_union.txt",
    "chars": 38539,
    "preview": "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices "
  },
  {
    "path": "retrieval_augmented_generation/retrieval_augmented_generation.ipynb",
    "chars": 34057,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"48eeba82\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Retrieval Au"
  },
  {
    "path": "retrieval_augmented_generation/retrieval_augmented_generation_evaluation.ipynb",
    "chars": 31724,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"48eeba82\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Retrieval Au"
  },
  {
    "path": "sentiment_analysis/sentiment_analysis.ipynb",
    "chars": 215305,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"48b6c907\",\n   \"metadata\": {\n    \"id\": \"48b6c907\"\n   },\n   \"sourc"
  },
  {
    "path": "unstructured_io_RAG/Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb",
    "chars": 115937,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"Rt-9l5M4gmSD\"\n   },\n   \"source\": [\n    \"# RAG "
  },
  {
    "path": "video_RAG/video_RAG_TwelveLabs.ipynb",
    "chars": 1107798,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"id\": \"fa68ef64\",\n      \"metadata\": {\n        \"id\": \"fa68ef64\""
  },
  {
    "path": "video_RAG/video_RAG_VoyageAI.ipynb",
    "chars": 4590815,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"id\": \"fa68ef64\",\n      \"metadata\": {\n        \"id\": \"fa68ef64\""
  }
]

// ... and 5 more files (download for full content)

About this extraction

This page contains the full source code of the KxSystems/kdbai-samples GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 71 files (143.4 MB), approximately 3.4M tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!