Copy disabled (too large)
Download .txt
Showing preview only (13,595K chars total). Download the full file to get everything.
Repository: KxSystems/kdbai-samples
Branch: main
Commit: ad492f122018
Files: 71
Total size: 143.4 MB
Directory structure:
gitextract_79vuf6ne/
├── .gitignore
├── HuggingFace_search/
│ └── huggingface_inference.ipynb
├── KDB.AI_course/
│ ├── README.md
│ ├── course_specific_content/
│ │ ├── making_queries.ipynb
│ │ ├── managing_tables.ipynb
│ │ └── rag_example.ipynb
│ └── notebook_references.md
├── LICENSE
├── LlamaIndex_advanced_RAG/
│ └── KDBAI_Advanced_RAG_Demo.ipynb
├── LlamaIndex_samples/
│ ├── Hybrid_Search_LlamaIndex_KDBAI.ipynb
│ ├── Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb
│ └── Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb
├── LlamaParse_pdf_RAG/
│ └── llamaParse_demo.ipynb
├── README.md
├── TSS_non_transformed/
│ ├── Non_Transformed_TSS_Technical_Analysis.ipynb
│ ├── Temporal_Similarity_Search_KDB+.ipynb
│ ├── Temporal_Similarity_Search_Non-Transformed_Demo.ipynb
│ ├── createHDB.q
│ └── data/
│ └── marketTrades.parquet
├── TSS_transformed/
│ ├── Temporal_Similarity_Search_Transformed_Demo.ipynb
│ ├── Transformed_TSS_pattern_matching.ipynb
│ └── data/
│ └── marketTrades.parquet
├── document_search/
│ └── document_search.ipynb
├── fuzzy_filtering_on_metadata/
│ └── fuzzy_filtering_demo.ipynb
├── hybrid_search/
│ ├── data/
│ │ └── inflation.txt
│ └── hybrid_search_inflation.ipynb
├── image_search/
│ └── image_search.ipynb
├── metadata_filtering/
│ ├── data/
│ │ └── filtered_embedded_movies.pkl
│ └── metadata_filtering_demo.ipynb
├── multi_index_multimodal_search/
│ ├── data/
│ │ ├── bat1.txt
│ │ ├── bat2.txt
│ │ ├── bear1.txt
│ │ ├── bear2.txt
│ │ ├── caterpillar1.txt
│ │ ├── caterpillar2.txt
│ │ ├── deer1.txt
│ │ ├── deer2.txt
│ │ ├── fox1.txt
│ │ ├── fox2.txt
│ │ ├── hedgehog1.txt
│ │ └── hedgehog2.txt
│ └── multi_index_multimodal_search.ipynb
├── multimodal_RAG_VoyageAI/
│ ├── Multimodal_RAG_VoyageAI.ipynb
│ └── data/
│ └── text/
│ ├── bat.txt
│ ├── bear.txt
│ ├── caterpillar.txt
│ ├── deer.txt
│ ├── fox.txt
│ └── hedgehog.txt
├── multimodal_RAG_unified_text/
│ ├── data/
│ │ └── text/
│ │ ├── bat.txt
│ │ ├── bear.txt
│ │ ├── caterpillar.txt
│ │ ├── deer.txt
│ │ ├── fox.txt
│ │ └── hedgehog.txt
│ └── multi_modal_demo.ipynb
├── music_recommendation/
│ ├── data/
│ │ └── song_data.csv
│ └── music_recommendation.ipynb
├── pattern_matching/
│ └── pattern_matching.ipynb
├── qFlat_index_pdf_search/
│ └── pdf_qFlat_Search.ipynb
├── qHnsw_index_pdf_search/
│ └── pdf_qHNSW_Search.ipynb
├── quickstarts/
│ └── python_quickstart.ipynb
├── requirements.txt
├── retrieval_augmented_generation/
│ ├── data/
│ │ └── state_of_the_union.txt
│ ├── retrieval_augmented_generation.ipynb
│ └── retrieval_augmented_generation_evaluation.ipynb
├── sentiment_analysis/
│ ├── data/
│ │ └── disneyland_reviews.csv
│ └── sentiment_analysis.ipynb
├── unstructured_io_RAG/
│ └── Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb
└── video_RAG/
├── video_RAG_TwelveLabs.ipynb
└── video_RAG_VoyageAI.ipynb
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.ipynb_checkpoints
.venv/
.DS_Store
================================================
FILE: HuggingFace_search/huggingface_inference.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914",
"metadata": {
"id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914"
},
"source": [
"# Using Hugging Face Inference with KDB.AI to Create a AI Tool Search Engine\n",
"\n",
"##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).\n",
"\n",
"How to get started with using the Huggingface Inference API with KDB.AI.\n",
"\n",
"You will learn how to:\n",
"\n",
"1. Connect to KDB.AI\n",
"2. Create a KDB.AI Database & Table\n",
"3. Load Data\n",
"4. Use the Sentence Transformers library to embed every description in the dataset\n",
"5. Insert the data into our KDB.AI table\n",
"6. Perform Similarity Search using the Huggingface Inference API\n",
"7. Delete the KDB.AI Database & Table to Conserve Resources"
]
},
{
"cell_type": "markdown",
"id": "nZHRcTHI9bZG",
"metadata": {
"id": "nZHRcTHI9bZG"
},
"source": [
"# Why Use Hugging Face for Embeddings?\n",
"\n",
"When building production applications that utilize embeddings, it's often advantageous to use open-source embedding models for several reasons:\n",
"\n",
"1. **Control**: Open-source models give developers more control over the embeddings process, reducing dependence on third-party embedding providers.\n",
"\n",
"2. **Local Embedding**: With open-source models, you can create embeddings locally, which is particularly useful for embedding your dataset.\n",
"\n",
"A common approach is to use a Python framework like sentence-transformers, developed by Hugging Face, which offers state-of-the-art sentence, text, and image embeddings. Here's a typical workflow:\n",
"\n",
"1. **Embed your dataset locally**: Use a library like Sentence Transformers to embed your dataset, which might consist of AI tools and associated metadata.\n",
"\n",
"2. **Embed queries at inference time**: When a user submits a query, use an external service like Hugging Face's Inference API to embed the query. This eliminates the need to deploy your own model, allowing you to leverage a fully optimized external service.\n",
"\n",
"By following this approach, you can build a system that searches through hundreds of AI tools without the need to deploy any infrastructure (and scale to millions!). Additionally, since you embed the dataset locally, you can use Hugging Face's free plan without requiring a credit card or worrying about hitting rate limits, at least until you are ready for production.\n",
"\n",
"In this tutorial, we will walk through the process of embedding a dataset of AI tools using Sentence Transformers, and then using Hugging Face's Inference API to embed queries at inference time, enabling efficient and scalable search capabilities.\n",
"\n",
"You will need a Hugging Face api token for this sample. Please create a Hugging Face account by going to [Hugging Face – The AI community building the future](https://huggingface.co/) and create a token by going to https://huggingface.co/settings/tokens\n",
"\n",
"You can then enter this token below or set it to HF_TOKEN in your environment."
]
},
{
"cell_type": "markdown",
"id": "260d0f4b-ef09-4bd2-a197-a9351be24684",
"metadata": {
"id": "260d0f4b-ef09-4bd2-a197-a9351be24684"
},
"source": [
"# 0. Setup"
]
},
{
"cell_type": "markdown",
"id": "d1468bd3",
"metadata": {
"id": "d1468bd3"
},
"source": [
"### Install dependencies\n",
"\n",
"In order to successfully run this sample, note the following steps depending on where you are running this notebook:\n",
"\n",
"-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.\n",
"\n",
"\n",
"-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f4996e9",
"metadata": {},
"outputs": [],
"source": [
"!pip install kdbai_client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "491cd6d6",
"metadata": {
"id": "491cd6d6"
},
"outputs": [],
"source": [
"!pip install sentence-transformers"
]
},
{
"cell_type": "markdown",
"id": "cc6d17b7",
"metadata": {
"id": "cc6d17b7"
},
"source": [
"### Import Packages"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "805d97da",
"metadata": {
"id": "805d97da"
},
"outputs": [],
"source": [
"# vector DB\n",
"import os\n",
"from getpass import getpass\n",
"import kdbai_client as kdbai\n",
"import time"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41",
"metadata": {
"id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "8c660c7d",
"metadata": {
"id": "8c660c7d"
},
"source": [
"# 1. Connect to KDB.AI"
]
},
{
"cell_type": "markdown",
"id": "d3a3aa22",
"metadata": {
"id": "d3a3aa22"
},
"source": [
"To use KDB.AI Server, you will need download and run your own container.\n",
"To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
"\n",
"You will receive an email with the required license file and bearer token needed to download your instance.\n",
"Follow instructions in the signup email to get your session up and running.\n",
"\n",
"Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e85c1ff",
"metadata": {
"id": "2e85c1ff"
},
"outputs": [],
"source": [
"#Set up KDB.AI server endpoint \n",
"KDBAI_ENDPOINT = (\n",
" os.environ[\"KDBAI_ENDPOINT\"]\n",
" if \"KDBAI_ENDPOINT\" in os.environ\n",
" else \"http://localhost:8082\"\n",
")\n",
"\n",
"#connect to KDB.AI Server, default mode is qipc\n",
"session = kdbai.Session(endpoint=KDBAI_ENDPOINT)\n"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "Dpi_auWw68cy",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Dpi_auWw68cy",
"outputId": "fb43c068-7893-426b-b5bf-559e31a401e2"
},
"outputs": [],
"source": [
"HF_TOKEN = (\n",
" os.environ[\"HF_TOKEN\"]\n",
" if \"HF_TOKEN\" in os.environ\n",
" else getpass(\"Hugging Face token: \")\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8788a6b1",
"metadata": {
"id": "8788a6b1"
},
"source": [
"### Verify Defined Databases\n",
"\n",
"We can check our connection using the `session.databases()` function.\n",
"This will return a list of all the databases we have defined in our vector database thus far.\n",
"This should return a \"default\" database along with any other databases you have already created."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "7877f51c",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7877f51c",
"outputId": "0e6fca8a-e50b-4b01-a080-b082bf23d889"
},
"outputs": [
{
"data": {
"text/plain": [
"[KDBAI database \"default\"]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"session.databases()"
]
},
{
"cell_type": "markdown",
"id": "i5NYByShWqeK",
"metadata": {
"id": "i5NYByShWqeK"
},
"source": [
"### Create a Database Called \"myDatabase\""
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "97e5f4a9",
"metadata": {
"id": "97e5f4a9"
},
"outputs": [],
"source": [
"# ensure no database called \"myDatabase\" exists\n",
"try:\n",
" session.database(\"myDatabase\").drop()\n",
"except kdbai.KDBAIException:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "Gbvw4SzqWprx",
"metadata": {
"id": "Gbvw4SzqWprx"
},
"outputs": [],
"source": [
"# Create the database\n",
"db = session.create_database(\"myDatabase\")"
]
},
{
"cell_type": "markdown",
"id": "e33f03c3",
"metadata": {
"id": "e33f03c3"
},
"source": [
"# 2. Create a KDB.AI Table\n",
"\n",
"To create a table we can use `create_table`, this function takes two arguments - the name and schema of the table.\n",
"\n",
"This schema must meet the following criteria:\n",
"- It must contain a list of columns.\n",
"- All columns must have either a `type` or a `qtype`.\n",
"- One column of vector embeddings, this column is implicitly an array of `float64s`."
]
},
{
"cell_type": "markdown",
"id": "9da55253",
"metadata": {
"id": "9da55253"
},
"source": [
"### Define Schema\n",
"The schema contains all metadata columns, and a 'description_embedding' column which will be used for similarity search\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "e5e8b782",
"metadata": {
"id": "e5e8b782"
},
"outputs": [],
"source": [
"schema = [\n",
" {\"name\": \"id\", \"type\": \"str\"},\n",
" {\"name\": \"name\", \"type\": \"str\"},\n",
" {\"name\": \"description\", \"type\": \"str\"},\n",
" {\"name\": \"summary\", \"type\": \"str\"},\n",
" {\"name\": \"title\", \"type\": \"str\"},\n",
" {\"name\": \"visitors\", \"type\": \"int64\"},\n",
" {\"name\": \"description_embedding\", \"type\": \"float64s\"},\n",
" ]"
]
},
{
"cell_type": "markdown",
"id": "i9ePLlo3adwt",
"metadata": {
"id": "i9ePLlo3adwt"
},
"source": [
"### Define the indexes\n",
"We will define our dimensionality, similarity metric and index type with the vectorIndex attribute. For this example we chose:\n",
"\n",
"- type = hnsw : HNSW enhances efficiency while maintaining accuracy. You have the choice of using other indexes like, qHNSW, and IVFPQ, qFlat or a Flat index here, as with metrics the one you chose depends your data and your overall performance requirements.\n",
"- name = hnsw_index : this is a custom name you give your index.\n",
"\n",
"#### params:\n",
"- dims = 384 : In the next section, we generate embeddings that are 384-dimensional to match this. The number of dimensions should mirror the output dimensions of your embedding model.\n",
"- metric = L2 : We chose L2/Euclidean distance. Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like IP/Inner Product and CS/Cosine Similarity and the one you chose depends on the specific context and nature of your data.\n",
"\n",
"!Note, it is possible to define multiple indexes within a table!"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "1-2uL1JMXP37",
"metadata": {
"id": "1-2uL1JMXP37"
},
"outputs": [],
"source": [
"# Define the index\n",
"indexes = [\n",
" {\n",
" 'type': 'hnsw',\n",
" 'name': 'hnsw_index',\n",
" 'column': 'description_embedding',\n",
" 'params': {'dims': 384, 'metric': \"L2\"},\n",
" },\n",
"]\n"
]
},
{
"cell_type": "markdown",
"id": "09a5caa0",
"metadata": {
"id": "09a5caa0"
},
"source": [
"### Create Table"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "34067680",
"metadata": {
"id": "34067680"
},
"outputs": [],
"source": [
"table = db.create_table(table=\"ai_tools\", schema=schema, indexes=indexes)"
]
},
{
"cell_type": "markdown",
"id": "20afbea1",
"metadata": {
"id": "20afbea1"
},
"source": [
"# 3. Load Data\n",
"\n",
"We fetch data from a github gist containing companies, descriptions, and some metadata. We will then add these to pandas dataframe with column names/types matching the target table."
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "37581e86",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 293
},
"id": "37581e86",
"outputId": "aebcdcdb-303e-4eda-8c58-36610243e3ac"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>description</th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>summary</th>\n",
" <th>title</th>\n",
" <th>visitors</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Generate 3D textures for your game in seconds ...</td>\n",
" <td>rec_cfn1112cibvc11jnn2qg</td>\n",
" <td>TextureLab</td>\n",
" <td>TextureLab is a website that provides 3D textu...</td>\n",
" <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
" <td>23913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Luma Labs enables users to explore 3D modeling...</td>\n",
" <td>rec_cfn1112cibvc11jnn2r0</td>\n",
" <td>lumalabs</td>\n",
" <td>Luma Labs is a website that offers an early ex...</td>\n",
" <td>Imagine 3D V1.2 (Alpha)</td>\n",
" <td>456963</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Make motion capture from video easier and more...</td>\n",
" <td>rec_cfn1112cibvc11jnn2rg</td>\n",
" <td>plask</td>\n",
" <td>Plask is an AI-powered mocap animation tool th...</td>\n",
" <td>Ai-Powered Mocap Animation Tool.</td>\n",
" <td>90960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Get hundreds of interior design ideas for your...</td>\n",
" <td>rec_cfn1112cibvc11jnn2s0</td>\n",
" <td>AI Room Planner</td>\n",
" <td>AI Room Planner is an online platform that uti...</td>\n",
" <td>Interior Design By Ai</td>\n",
" <td>211540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>A platform powered by AI to help you create be...</td>\n",
" <td>rec_cfn1112cibvc11jnn2sg</td>\n",
" <td>AI TWO</td>\n",
" <td>AI TWO is a website that provides a platform f...</td>\n",
" <td>Aitwo.Co - The Ai-Powered All-In-One Design Pl...</td>\n",
" <td>7201</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" description \\\n",
"0 Generate 3D textures for your game in seconds ... \n",
"1 Luma Labs enables users to explore 3D modeling... \n",
"2 Make motion capture from video easier and more... \n",
"3 Get hundreds of interior design ideas for your... \n",
"4 A platform powered by AI to help you create be... \n",
"\n",
" id name \\\n",
"0 rec_cfn1112cibvc11jnn2qg TextureLab \n",
"1 rec_cfn1112cibvc11jnn2r0 lumalabs \n",
"2 rec_cfn1112cibvc11jnn2rg plask \n",
"3 rec_cfn1112cibvc11jnn2s0 AI Room Planner \n",
"4 rec_cfn1112cibvc11jnn2sg AI TWO \n",
"\n",
" summary \\\n",
"0 TextureLab is a website that provides 3D textu... \n",
"1 Luma Labs is a website that offers an early ex... \n",
"2 Plask is an AI-powered mocap animation tool th... \n",
"3 AI Room Planner is an online platform that uti... \n",
"4 AI TWO is a website that provides a platform f... \n",
"\n",
" title visitors \n",
"0 Instant And Unique 3D Textures For Your Next G... 23913 \n",
"1 Imagine 3D V1.2 (Alpha) 456963 \n",
"2 Ai-Powered Mocap Animation Tool. 90960 \n",
"3 Interior Design By Ai 211540 \n",
"4 Aitwo.Co - The Ai-Powered All-In-One Design Pl... 7201 "
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import requests\n",
"\n",
"gist_url = \"https://gist.github.com/mrmps/2f62a2287cb2c1ca63a2762fcaac89bc/raw\"\n",
"response = requests.get(gist_url)\n",
"ai_tools_data = response.json()\n",
"df = pd.DataFrame.from_dict(ai_tools_data)\n",
"\n",
"# drop column with unecessary metadata\n",
"df.drop(columns=[\"xata\"], inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "3bsfi5TO_65G",
"metadata": {
"id": "3bsfi5TO_65G"
},
"source": [
"# 4. Use the Sentence Transformers library to embed every description in the dataset"
]
},
{
"cell_type": "markdown",
"id": "PxPJZcUmBajt",
"metadata": {
"id": "PxPJZcUmBajt"
},
"source": [
"We set the embedding model to BAAI/bge-small-en-v1.5, which is a fast and small model. This is what we will use during inference time as well.\n",
"\n",
"If you want faster inference, you can try the [FastEmbed](https://github.com/qdrant/fastembed) library, a much faster and more lightweight embedding library."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "f5dc41e8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 528
},
"id": "f5dc41e8",
"outputId": "035258f8-a679-4696-c2fc-d7361eea91d4"
},
"outputs": [],
"source": [
"from sentence_transformers import SentenceTransformer\n",
"\n",
"model = SentenceTransformer(\"BAAI/bge-small-en-v1.5\")\n",
"\n",
"descriptions = [tool[\"description\"] for tool in ai_tools_data]\n",
"embeddings = model.encode(descriptions)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "yxhFJUkwf8M4",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yxhFJUkwf8M4",
"outputId": "130f1178-ed5c-491b-9bfb-b615b681106b"
},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.06802839, 0.01769779, 0.07132471, ..., 0.04166844,\n",
" -0.01963805, -0.036344 ],\n",
" [ 0.00284367, 0.0034911 , 0.0392653 , ..., -0.01490238,\n",
" 0.0041208 , 0.02246646],\n",
" [-0.08536491, -0.05372242, 0.01503714, ..., 0.01607881,\n",
" 0.04058064, -0.02476997],\n",
" ...,\n",
" [ 0.00551532, -0.02548731, -0.00431467, ..., -0.00406338,\n",
" 0.06047558, -0.03689232],\n",
" [-0.08149453, -0.00607409, -0.00040346, ..., 0.02765157,\n",
" 0.04479544, -0.00464933],\n",
" [-0.09128137, -0.05604199, 0.01856982, ..., 0.01355306,\n",
" 0.05817638, -0.05754769]], dtype=float32)"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings"
]
},
{
"cell_type": "markdown",
"id": "zJeeJHvgAeZ1",
"metadata": {
"id": "zJeeJHvgAeZ1"
},
"source": [
"# 5. Insert the data into our KDB.AI table"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "730c9f08",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "730c9f08",
"outputId": "61dd248a-8372-421c-c1dc-047f405da5b2"
},
"outputs": [
{
"data": {
"text/plain": [
"{'rowsInserted': 851}"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a DataFrame with the AI tools data\n",
"data = pd.DataFrame(ai_tools_data)[[\"id\", \"name\", \"description\", \"summary\", \"title\", \"visitors\"]]\n",
"data[\"description_embedding\"] = embeddings.tolist()\n",
"\n",
"# Bulk insert the data into KDB.AI\n",
"table.insert(data)"
]
},
{
"cell_type": "markdown",
"id": "EJtF_k4iZSGe",
"metadata": {
"id": "EJtF_k4iZSGe"
},
"source": [
"## Confirm data is loaded correctly"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "b4nBmPXrZPUQ",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 770
},
"id": "b4nBmPXrZPUQ",
"outputId": "ec15f76e-9c32-4bf1-baf1-55a67ef1d8bb"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" <th>summary</th>\n",
" <th>title</th>\n",
" <th>visitors</th>\n",
" <th>description_embedding</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>rec_cfn1112cibvc11jnn2qg</td>\n",
" <td>TextureLab</td>\n",
" <td>Generate 3D textures for your game in seconds ...</td>\n",
" <td>TextureLab is a website that provides 3D textu...</td>\n",
" <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
" <td>23913</td>\n",
" <td>[-0.06802839040756226, 0.017697788774967194, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>rec_cfn1112cibvc11jnn2r0</td>\n",
" <td>lumalabs</td>\n",
" <td>Luma Labs enables users to explore 3D modeling...</td>\n",
" <td>Luma Labs is a website that offers an early ex...</td>\n",
" <td>Imagine 3D V1.2 (Alpha)</td>\n",
" <td>456963</td>\n",
" <td>[0.0028436651919037104, 0.003491099225357175, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>rec_cfn1112cibvc11jnn2rg</td>\n",
" <td>plask</td>\n",
" <td>Make motion capture from video easier and more...</td>\n",
" <td>Plask is an AI-powered mocap animation tool th...</td>\n",
" <td>Ai-Powered Mocap Animation Tool.</td>\n",
" <td>90960</td>\n",
" <td>[-0.08536490797996521, -0.05372241884469986, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>rec_cfn1112cibvc11jnn2s0</td>\n",
" <td>AI Room Planner</td>\n",
" <td>Get hundreds of interior design ideas for your...</td>\n",
" <td>AI Room Planner is an online platform that uti...</td>\n",
" <td>Interior Design By Ai</td>\n",
" <td>211540</td>\n",
" <td>[0.020655963569879532, 0.028269633650779724, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>rec_cfn1112cibvc11jnn2sg</td>\n",
" <td>AI TWO</td>\n",
" <td>A platform powered by AI to help you create be...</td>\n",
" <td>AI TWO is a website that provides a platform f...</td>\n",
" <td>Aitwo.Co - The Ai-Powered All-In-One Design Pl...</td>\n",
" <td>7201</td>\n",
" <td>[-0.02213478274643421, -0.03189412131905556, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>846</th>\n",
" <td>rec_cod2au57l1i4603r3hvg</td>\n",
" <td>Scott Krager</td>\n",
" <td>Thumbnails.com uses AI to generate dozens of u...</td>\n",
" <td>Unlock the power of eye-catching thumbnails wi...</td>\n",
" <td>Thumbnails.com</td>\n",
" <td>0</td>\n",
" <td>[-0.07755479961633682, -0.05978638306260109, -...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>847</th>\n",
" <td>rec_codntepuqmnhe7ku1ing</td>\n",
" <td>Nen Fard</td>\n",
" <td>StockTune: AI-powered, public-domain music for...</td>\n",
" <td>\\nStockTune is a revolutionary platform offeri...</td>\n",
" <td>StockTune</td>\n",
" <td>0</td>\n",
" <td>[-0.03542690351605415, -0.057283081114292145, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>848</th>\n",
" <td>rec_codr709uqmnhe7ku1te0</td>\n",
" <td>Nen Fard</td>\n",
" <td>StockCake: Free, AI-generated stock photos in...</td>\n",
" <td>StockCake is a revolutionary stock photo site ...</td>\n",
" <td>StockCake</td>\n",
" <td>0</td>\n",
" <td>[0.005515319295227528, -0.025487307459115982, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>849</th>\n",
" <td>rec_coidgc9uqmnhe7l0eug0</td>\n",
" <td>Jason West</td>\n",
" <td>FastBots enables anyone to quickly create a po...</td>\n",
" <td>FastBots is a no-code AI chatbot builder for b...</td>\n",
" <td>FastBots</td>\n",
" <td>0</td>\n",
" <td>[-0.0814945250749588, -0.006074093747884035, -...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>850</th>\n",
" <td>rec_coj3l5aa8o7fb0ajha0g</td>\n",
" <td>Dubformer</td>\n",
" <td>AI-driven translation and dubbing services</td>\n",
" <td>Dubformer is an end-to-end innovative service ...</td>\n",
" <td>AI dubbing and video translation solution</td>\n",
" <td>0</td>\n",
" <td>[-0.09128136932849884, -0.05604198947548866, 0...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>851 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" id name \\\n",
"0 rec_cfn1112cibvc11jnn2qg TextureLab \n",
"1 rec_cfn1112cibvc11jnn2r0 lumalabs \n",
"2 rec_cfn1112cibvc11jnn2rg plask \n",
"3 rec_cfn1112cibvc11jnn2s0 AI Room Planner \n",
"4 rec_cfn1112cibvc11jnn2sg AI TWO \n",
".. ... ... \n",
"846 rec_cod2au57l1i4603r3hvg Scott Krager \n",
"847 rec_codntepuqmnhe7ku1ing Nen Fard \n",
"848 rec_codr709uqmnhe7ku1te0 Nen Fard \n",
"849 rec_coidgc9uqmnhe7l0eug0 Jason West \n",
"850 rec_coj3l5aa8o7fb0ajha0g Dubformer \n",
"\n",
" description \\\n",
"0 Generate 3D textures for your game in seconds ... \n",
"1 Luma Labs enables users to explore 3D modeling... \n",
"2 Make motion capture from video easier and more... \n",
"3 Get hundreds of interior design ideas for your... \n",
"4 A platform powered by AI to help you create be... \n",
".. ... \n",
"846 Thumbnails.com uses AI to generate dozens of u... \n",
"847 StockTune: AI-powered, public-domain music for... \n",
"848 StockCake: Free, AI-generated stock photos in... \n",
"849 FastBots enables anyone to quickly create a po... \n",
"850 AI-driven translation and dubbing services \n",
"\n",
" summary \\\n",
"0 TextureLab is a website that provides 3D textu... \n",
"1 Luma Labs is a website that offers an early ex... \n",
"2 Plask is an AI-powered mocap animation tool th... \n",
"3 AI Room Planner is an online platform that uti... \n",
"4 AI TWO is a website that provides a platform f... \n",
".. ... \n",
"846 Unlock the power of eye-catching thumbnails wi... \n",
"847 \\nStockTune is a revolutionary platform offeri... \n",
"848 StockCake is a revolutionary stock photo site ... \n",
"849 FastBots is a no-code AI chatbot builder for b... \n",
"850 Dubformer is an end-to-end innovative service ... \n",
"\n",
" title visitors \\\n",
"0 Instant And Unique 3D Textures For Your Next G... 23913 \n",
"1 Imagine 3D V1.2 (Alpha) 456963 \n",
"2 Ai-Powered Mocap Animation Tool. 90960 \n",
"3 Interior Design By Ai 211540 \n",
"4 Aitwo.Co - The Ai-Powered All-In-One Design Pl... 7201 \n",
".. ... ... \n",
"846 Thumbnails.com 0 \n",
"847 StockTune 0 \n",
"848 StockCake 0 \n",
"849 FastBots 0 \n",
"850 AI dubbing and video translation solution 0 \n",
"\n",
" description_embedding \n",
"0 [-0.06802839040756226, 0.017697788774967194, 0... \n",
"1 [0.0028436651919037104, 0.003491099225357175, ... \n",
"2 [-0.08536490797996521, -0.05372241884469986, 0... \n",
"3 [0.020655963569879532, 0.028269633650779724, 0... \n",
"4 [-0.02213478274643421, -0.03189412131905556, 0... \n",
".. ... \n",
"846 [-0.07755479961633682, -0.05978638306260109, -... \n",
"847 [-0.03542690351605415, -0.057283081114292145, ... \n",
"848 [0.005515319295227528, -0.025487307459115982, ... \n",
"849 [-0.0814945250749588, -0.006074093747884035, -... \n",
"850 [-0.09128136932849884, -0.05604198947548866, 0... \n",
"\n",
"[851 rows x 7 columns]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.query()"
]
},
{
"cell_type": "markdown",
"id": "0MOZtXLniJLe",
"metadata": {
"id": "0MOZtXLniJLe"
},
"source": [
"# 6. Perform Similarity Search Using the Hugging Face Inference API"
]
},
{
"cell_type": "markdown",
"id": "V6KjGhJOANyf",
"metadata": {
"id": "V6KjGhJOANyf"
},
"source": [
"## Embed the Query with the Hugging Face Inference API\n",
"Use the Hugging Face Inference API to embed the query so that it can be used to search our index\n",
"\n",
"!! Note that you might need to run this cell a few times as it takes a few seconds for the model to be ready."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a31f878",
"metadata": {
"id": "4a31f878"
},
"outputs": [],
"source": [
"# Perform a search using Hugging Face embeddings\n",
"import requests\n",
"\n",
"# make sure your URL looks like this to ensure you get instant results, and not a model loading error\n",
"embedding_url = \"https://router.huggingface.co/hf-inference/models/BAAI/bge-small-en-v1.5/pipeline/feature-extraction\"\n",
"\n",
"def generate_query_embedding(text: str) -> list[float]:\n",
" response = requests.post(\n",
" embedding_url,\n",
" headers={\"Authorization\": f\"Bearer {HF_TOKEN}\", \"x-wait-for-model\": \"true\"}, \n",
" json={\"inputs\": text}\n",
" )\n",
"\n",
" if response.status_code != 200:\n",
" raise ValueError(f\"Request failed with status code {response.status_code}: {response.text}\")\n",
" return response.json()\n",
"\n",
"query = \"AI tool for creating 3D textures\"\n",
"query_embedding = generate_query_embedding(query)"
]
},
{
"cell_type": "markdown",
"id": "Wrp9GUwEZmvT",
"metadata": {
"id": "Wrp9GUwEZmvT"
},
"source": [
"## Run the query with our query embedding"
]
},
{
"cell_type": "markdown",
"id": "Xumxvz7gaGGp",
"metadata": {
"id": "Xumxvz7gaGGp"
},
"source": [
"We are searching based on the description for the most relevant startups to the query. Remember that \"hnsw_index\" is the index name we created when defining our index before creating the table."
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "FDrpIofxZl4Z",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 299
},
"id": "FDrpIofxZl4Z",
"outputId": "2d5259a5-762a-4ef4-b37c-e8d6ec9d0f68"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>__nn_distance</th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" <th>summary</th>\n",
" <th>title</th>\n",
" <th>visitors</th>\n",
" <th>description_embedding</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.25221</td>\n",
" <td>rec_cfn1112cibvc11jnn2qg</td>\n",
" <td>TextureLab</td>\n",
" <td>Generate 3D textures for your game in seconds ...</td>\n",
" <td>TextureLab is a website that provides 3D textu...</td>\n",
" <td>Instant And Unique 3D Textures For Your Next G...</td>\n",
" <td>23913</td>\n",
" <td>[-0.06802839040756226, 0.017697788774967194, 0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.26723</td>\n",
" <td>rec_cfn11a2cibvc11jnndbg</td>\n",
" <td>Ponzu.gg</td>\n",
" <td>Create realistic 3D images with AI-generated t...</td>\n",
" <td>Ponzu is a website that helps 3D artists and d...</td>\n",
" <td>Ponzu.</td>\n",
" <td>6526</td>\n",
" <td>[-0.06463481485843658, -0.014672131277620792, ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.34271</td>\n",
" <td>rec_cfn119acibvc11jnncf0</td>\n",
" <td>Masterpiece Studio</td>\n",
" <td>Create 3D models with Generative AI and deploy...</td>\n",
" <td>Masterpiece Studio is a company that has devel...</td>\n",
" <td>Masterpiece Studio.</td>\n",
" <td>38954</td>\n",
" <td>[-0.04131263867020607, -0.0035701903980225325,...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" __nn_distance id name \\\n",
"0 0.25221 rec_cfn1112cibvc11jnn2qg TextureLab \n",
"1 0.26723 rec_cfn11a2cibvc11jnndbg Ponzu.gg \n",
"2 0.34271 rec_cfn119acibvc11jnncf0 Masterpiece Studio \n",
"\n",
" description \\\n",
"0 Generate 3D textures for your game in seconds ... \n",
"1 Create realistic 3D images with AI-generated t... \n",
"2 Create 3D models with Generative AI and deploy... \n",
"\n",
" summary \\\n",
"0 TextureLab is a website that provides 3D textu... \n",
"1 Ponzu is a website that helps 3D artists and d... \n",
"2 Masterpiece Studio is a company that has devel... \n",
"\n",
" title visitors \\\n",
"0 Instant And Unique 3D Textures For Your Next G... 23913 \n",
"1 Ponzu. 6526 \n",
"2 Masterpiece Studio. 38954 \n",
"\n",
" description_embedding \n",
"0 [-0.06802839040756226, 0.017697788774967194, 0... \n",
"1 [-0.06463481485843658, -0.014672131277620792, ... \n",
"2 [-0.04131263867020607, -0.0035701903980225325,... "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results = table.search(vectors={\"hnsw_index\":[query_embedding]},n=3,)\n",
"\n",
"results[0]"
]
},
{
"cell_type": "markdown",
"id": "d8aed9bc-72b2-4e70-b763-e7ce054557db",
"metadata": {
"id": "d8aed9bc-72b2-4e70-b763-e7ce054557db"
},
"source": [
"# 7. Delete the KDB.AI Table & Database to Conserve Resources\n",
"\n",
"\n",
"We can use `table.drop()` to delete a table, and db.drop() to delete the database."
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
"outputId": "ba89c1d4-997e-46e7-97d6-e33a8e50d51e"
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.drop()\n",
"db.drop()"
]
},
{
"cell_type": "markdown",
"id": "8bc6d801-1371-48d0-98b4-0baa53bc8446",
"metadata": {
"id": "8bc6d801-1371-48d0-98b4-0baa53bc8446"
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<b>Warning:</b> Once you drop a table, you cannot use it again.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "RU2pzQCAn-Wm",
"metadata": {
"id": "RU2pzQCAn-Wm"
},
"source": [
"## Take Our Survey\n",
"\n",
"We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.\n",
"\n",
"[**Take the Survey**](https://delighted.com/t/UGvwprmK)"
]
},
{
"cell_type": "markdown",
"id": "a7672241-42b0-4798-90d7-95aa9fefe68c",
"metadata": {
"id": "a7672241-42b0-4798-90d7-95aa9fefe68c"
},
"source": [
"## Next Steps\n",
"\n",
"Now that you’re successfully making indexes with KDB.AI, you can start inserting your own data or view more examples:\n",
"- [PDF Document Search](../document_search)\n",
"- [MRI Image Search](../image_search)\n",
"- [Music Recommendation System](../music_recommendation)\n",
"- [Sensor Pattern Matching](../pattern_matching)\n",
"- [Retrieval Augmented Generation with LangChain](../retrieval_augmented_generation)\n",
"- [Sentiment Analysis of Reviews](../sentiment_analysis)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: KDB.AI_course/README.md
================================================
# KDB.AI Course
Welcome to the KDB.AI course! This course combines custom content with existing examples from the KDB.AI samples repository.
## Course Outline
1. Introduction to KDB.AI
- [Introduction](./course_specific_content/making_queries.ipynb)
- [Managing Tables](./course_specific_content/managing_tables.ipynb)
2. Advanced Search Techniques
- [Hybrid Search](../hybrid_search/hybrid_search_inflation.ipynb)
- [Temporal Similarity Search (Non-Transformed)](../TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb)
- [Temporal Similarity Search (Transformed)](../TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb)
3. Retrieval Augmented Generation
- [RAG Example](./course_specific_content/rag_example.ipynb)
## Note on Referenced Notebooks
Some notebooks in this course are referenced from other parts of the repository. This ensures you're always working with the most up-to-date versions of these examples. For a full list of referenced notebooks and their locations, please see [notebook_references.md](./notebook_references.md).
================================================
FILE: KDB.AI_course/course_specific_content/making_queries.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "CpIrSWxiuFxX"
},
"source": [
"## Introduction\n",
"\n",
"[Video Walkthrough](https://www.youtube.com/watch?v=0kpseJLbEP4&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=7)\n",
"\n",
"In this section of the course, we will focus on querying and searching data in KDB.AI tables. By the end of this notebook, you will have a thorough understanding of the following:\n",
"- Selecting tables to query\n",
"- Performing queries and applying filters\n",
"- Customizing filters\n",
"- Conducting similarity searches\n",
"- Processing query results"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3nESqJz6uP_7"
},
"source": [
"### Setup\n",
"\n",
"Install kdbai_client and import the necessary dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "HBtJhVzlagJt",
"outputId": "e7e59f12-d603-4104-9ddf-c0022202dc9b"
},
"outputs": [],
"source": [
"!pip install kdbai_client fastembed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gitkLFwlag8H"
},
"outputs": [],
"source": [
"import kdbai_client as kdbai\n",
"import time\n",
"import pandas as pd\n",
"import numpy as np\n",
"from fastembed import TextEmbedding\n",
"import os\n",
"import getpass"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GoGkIrL9ugk7"
},
"source": [
"##### Connect to KDB.AI Server\n",
"With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
"\n",
"To use KDB.AI Server, you will need download and run your own container.\n",
"To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
"\n",
"You will receive an email with the required license file and bearer token needed to download your instance.\n",
"Follow instructions in the signup email to get your session up and running.\n",
"\n",
"Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "2ApVTaRvajlt",
"outputId": "90b29cb5-7629-446d-bef2-c66fe35a256f"
},
"outputs": [],
"source": [
"#Set up KDB.AI server endpoint \n",
"KDBAI_ENDPOINT = (\n",
" os.environ[\"KDBAI_ENDPOINT\"]\n",
" if \"KDBAI_ENDPOINT\" in os.environ\n",
" else \"http://localhost:8082\"\n",
")\n",
"\n",
"\n",
"#connect to KDB.AI Server, default mode is qipc\n",
"session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ey1mSxH2an1e"
},
"outputs": [],
"source": [
"database = session.database(\"default\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8l8qWymnunY7"
},
"source": [
"### Create Our Table and Insert Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qNQVDDtZapTQ"
},
"outputs": [],
"source": [
"try:\n",
" database.table(\"data\").drop() # Drop the table if it already exists\n",
"except kdbai.KDBAIException:\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wRoZgm4LxR0Z"
},
"source": [
"##### Define a Schema"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "574BY9ZQax4p"
},
"outputs": [],
"source": [
"schema = [\n",
" {'name': 'id', 'type': 'int32'},\n",
" {'name': 'name', 'type': 'str'},\n",
" {'name': 'age', 'type': 'int16'},\n",
" {'name': 'city', 'type': 'str'},\n",
" {'name': 'description', 'type': 'str'},\n",
" {'name': 'embeddings', 'type': 'float32s'}\n",
"]\n",
"index_name = 'hnws_index'\n",
"indexes = [{'name': index_name, 'column': 'embeddings', 'type': 'hnsw', 'params': {'dims': 384}}]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "stFJG7sCa19_"
},
"outputs": [],
"source": [
"table = database.create_table(\"data\", schema=schema, indexes=indexes)\n",
"\n",
"# Generate real vectors using FastEmbed\n",
"descriptions = [\n",
" \"A passionate environmentalist with 5 years of experience in conservation projects and enjoys hiking and outdoor activities.\",\n",
" \"A software engineer with 7 years of experience in full-stack development, living in London, who loves to cook Italian cuisine.\",\n",
" \"A guitarist with over 10 years of experience performing at local cafes and enjoys reading science fiction.\",\n",
" \"A data scientist in Tokyo with 4 years of experience in machine learning and a keen interest in AI research.\",\n",
" \"An avid reader and travel blogger with 3 years of experience visiting and writing about historic sites around the world.\",\n",
" \"A graphic designer based in Berlin with 8 years of experience and a talent for creating digital art.\",\n",
" \"A high school teacher with 15 years of experience in education who loves cycling and participates in charity rides.\",\n",
" \"A professional photographer with 6 years of experience specializing in wildlife photography.\",\n",
" \"A fitness trainer with 5 years of experience who enjoys helping people achieve their health goals.\",\n",
" \"A chef with 12 years of experience who runs a popular restaurant and enjoys experimenting with new recipes.\",\n",
" \"A journalist with 9 years of experience writing about technology and enjoys exploring new gadgets.\",\n",
" \"A musician with 20 years of experience who plays multiple instruments and performs in a jazz band.\",\n",
" \"A software developer with 6 years of experience in creating mobile apps and enjoys coding challenges.\",\n",
" \"An artist with 10 years of experience who paints abstract pieces and has exhibited in several galleries.\",\n",
" \"A historian with 7 years of experience who loves researching and writing about ancient civilizations.\",\n",
" \"A marketing manager with 8 years of experience in digital marketing and social media strategy.\",\n",
" \"A nurse with 12 years of experience in emergency care and patient management.\",\n",
" \"A financial analyst with 5 years of experience in investment banking and portfolio management.\",\n",
" \"A project manager with 10 years of experience in IT project coordination and execution.\",\n",
" \"A UX designer with 6 years of experience in creating user-friendly interfaces for web and mobile applications.\",\n",
" \"A sales executive with 8 years of experience in B2B sales and client relationship management.\",\n",
" \"A content writer with 5 years of experience in creating engaging articles and blog posts.\",\n",
" \"A civil engineer with 10 years of experience in infrastructure development and urban planning.\",\n",
" \"A teacher with 15 years of experience in primary education and curriculum development.\",\n",
" \"A business analyst with 7 years of experience in business process optimization and data analysis.\",\n",
" \"A psychologist with 6 years of experience in clinical practice and mental health counseling.\",\n",
" \"A software architect with 9 years of experience in designing scalable software solutions.\",\n",
" \"A research scientist with 8 years of experience in biotechnology and genetic engineering.\",\n",
" \"An operations manager with 12 years of experience in supply chain management and logistics.\",\n",
" \"A public relations specialist with 7 years of experience in media relations and corporate communications.\"\n",
"]\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zk_CkMaPvGwv"
},
"source": [
"##### Define an Embedding Model and Embed People Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49
},
"collapsed": true,
"id": "AsTjc7_ca39U",
"outputId": "91abba63-224f-4fa7-bdc5-948e7a008a13"
},
"outputs": [],
"source": [
"embedding_model = TextEmbedding()\n",
"embeddings = list(embedding_model.embed(descriptions))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "s2JneJwna7SP"
},
"outputs": [],
"source": [
"import random\n",
"random.seed(42) # for reproducibility\n",
"\n",
"names = [\"Alice\", \"Bob\", \"Charlie\", \"Monica\", \"Eve\", \"Frank\", \"Grace\", \"Hannah\", \"Ivy\", \"Jack\", \"Kara\", \"Leo\", \"Mia\", \"Nate\", \"Olivia\", \"Paul\", \"Quinn\", \"Rita\", \"Sam\", \"Tina\", \"Uma\", \"Victor\", \"Wendy\", \"Xander\", \"Yara\", \"Zane\", \"Alice\", \"Cody\", \"Diana\", \"Ethan\"]\n",
"cities = [\"New York\", \"London\", \"New York\", \"Paris\", \"Berlin\", \"New York\", \"San Francisco\", \"Amsterdam\", \"Rome\", \"Toronto\", \"Chicago\", \"Barcelona\", \"Madrid\", \"New York\", \"Moscow\", \"Dubai\", \"Singapore\", \"New York\", \"Istanbul\", \"Munich\", \"Vienna\", \"Dublin\", \"Zurich\", \"Stockholm\", \"Lisbon\", \"Prague\", \"Budapest\", \"Berlin\", \"Copenhagen\", \"Seoul\"]\n",
"\n",
"\n",
"data = pd.DataFrame({\n",
" 'id': np.array(list(range(0, 30)), dtype='int32'),\n",
" 'name': names,\n",
" 'age': np.array([random.randint(18, 60) for _ in range(30)], dtype='int16'),\n",
" 'city': cities,\n",
" 'description': descriptions,\n",
" 'embeddings': embeddings\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9JCMSVFOxmkg"
},
"source": [
"##### Insert the Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZCg1E3hka9SE",
"outputId": "da6d6a02-5a2d-406c-f4cb-cf68d7349192"
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.insert(data)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5-92TfnnxyUx"
},
"source": [
"### Query Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"collapsed": true,
"id": "JY4aJ6Jla_vM",
"outputId": "37f36ee3-9524-4199-fd7d-1fe6df512fd1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All data in the table:\n"
]
},
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"table\",\n \"rows\": 30,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"int32\",\n \"num_unique_values\": 30,\n \"samples\": [\n 27,\n 15,\n 23\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 29,\n \"samples\": [\n \"Diana\",\n \"Quinn\",\n \"Mia\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"age\",\n \"properties\": {\n \"dtype\": \"int16\",\n \"num_unique_values\": 22,\n \"samples\": [\n 58,\n 31,\n 52\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"city\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 25,\n \"samples\": [\n \"Chicago\",\n \"Vienna\",\n \"New York\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 30,\n \"samples\": [\n \"A research scientist with 8 years of experience in biotechnology and genetic engineering.\",\n \"A marketing manager with 8 years of experience in digital marketing and social media strategy.\",\n \"A teacher with 15 years of experience in primary education and curriculum development.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"embeddings\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe"
},
"text/html": [
"\n",
" <div id=\"df-97fcf1bd-13ad-4779-9029-a721eda12bdc\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>city</th>\n",
" <th>description</th>\n",
" <th>embeddings</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>Alice</td>\n",
" <td>58</td>\n",
" <td>New York</td>\n",
" <td>A passionate environmentalist with 5 years of ...</td>\n",
" <td>[-0.006158471, 0.063678846, 0.09181005, -0.023...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Bob</td>\n",
" <td>25</td>\n",
" <td>London</td>\n",
" <td>A software engineer with 7 years of experience...</td>\n",
" <td>[-0.035581246, 0.07986437, 0.04891828, -0.0604...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Charlie</td>\n",
" <td>19</td>\n",
" <td>New York</td>\n",
" <td>A guitarist with over 10 years of experience p...</td>\n",
" <td>[0.050266247, 0.05255312, 0.048840936, -0.0032...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Monica</td>\n",
" <td>35</td>\n",
" <td>Paris</td>\n",
" <td>A data scientist in Tokyo with 4 years of expe...</td>\n",
" <td>[-0.008097345, 0.030305384, 0.012246384, -0.04...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Eve</td>\n",
" <td>33</td>\n",
" <td>Berlin</td>\n",
" <td>An avid reader and travel blogger with 3 years...</td>\n",
" <td>[0.029772803, 0.07571457, 0.042140756, 0.06809...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>Frank</td>\n",
" <td>32</td>\n",
" <td>New York</td>\n",
" <td>A graphic designer based in Berlin with 8 year...</td>\n",
" <td>[0.013257692, 0.045190323, 0.0074770325, -0.00...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>Grace</td>\n",
" <td>26</td>\n",
" <td>San Francisco</td>\n",
" <td>A high school teacher with 15 years of experie...</td>\n",
" <td>[-0.011028861, 0.051242497, 0.063257486, -0.05...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>7</td>\n",
" <td>Hannah</td>\n",
" <td>24</td>\n",
" <td>Amsterdam</td>\n",
" <td>A professional photographer with 6 years of ex...</td>\n",
" <td>[0.04469839, 0.07050187, 0.046390466, -0.03404...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>8</td>\n",
" <td>Ivy</td>\n",
" <td>52</td>\n",
" <td>Rome</td>\n",
" <td>A fitness trainer with 5 years of experience w...</td>\n",
" <td>[0.0002550126, 0.024398372, 0.09861772, 0.0062...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>9</td>\n",
" <td>Jack</td>\n",
" <td>23</td>\n",
" <td>Toronto</td>\n",
" <td>A chef with 12 years of experience who runs a ...</td>\n",
" <td>[-0.008186043, 0.051337104, 0.02683556, -0.030...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>10</td>\n",
" <td>Kara</td>\n",
" <td>55</td>\n",
" <td>Chicago</td>\n",
" <td>A journalist with 9 years of experience writin...</td>\n",
" <td>[-0.017909497, 0.08548332, 0.0022086229, -0.04...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>11</td>\n",
" <td>Leo</td>\n",
" <td>45</td>\n",
" <td>Barcelona</td>\n",
" <td>A musician with 20 years of experience who pla...</td>\n",
" <td>[0.008686635, 0.03110498, 0.05405915, -0.07571...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>12</td>\n",
" <td>Mia</td>\n",
" <td>20</td>\n",
" <td>Madrid</td>\n",
" <td>A software developer with 6 years of experienc...</td>\n",
" <td>[-0.04372146, 0.06704399, 0.022140108, -0.1017...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>13</td>\n",
" <td>Nate</td>\n",
" <td>19</td>\n",
" <td>New York</td>\n",
" <td>An artist with 10 years of experience who pain...</td>\n",
" <td>[0.01933304, 0.023277232, 0.044062667, 0.01242...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>14</td>\n",
" <td>Olivia</td>\n",
" <td>23</td>\n",
" <td>Moscow</td>\n",
" <td>A historian with 7 years of experience who lov...</td>\n",
" <td>[-0.0051849326, 0.16519417, 0.06066864, 0.0311...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>15</td>\n",
" <td>Paul</td>\n",
" <td>31</td>\n",
" <td>Dubai</td>\n",
" <td>A marketing manager with 8 years of experience...</td>\n",
" <td>[0.010789718, 0.017695278, 0.018274685, -0.033...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>16</td>\n",
" <td>Quinn</td>\n",
" <td>32</td>\n",
" <td>Singapore</td>\n",
" <td>A nurse with 12 years of experience in emergen...</td>\n",
" <td>[-0.041632365, 0.034463193, 0.06313535, 0.0160...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>17</td>\n",
" <td>Rita</td>\n",
" <td>50</td>\n",
" <td>New York</td>\n",
" <td>A financial analyst with 5 years of experience...</td>\n",
" <td>[0.015000028, 0.024906091, 0.0010010687, 0.011...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>18</td>\n",
" <td>Sam</td>\n",
" <td>56</td>\n",
" <td>Istanbul</td>\n",
" <td>A project manager with 10 years of experience ...</td>\n",
" <td>[-0.020330371, 0.079401195, 0.02162953, -0.080...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>19</td>\n",
" <td>Tina</td>\n",
" <td>19</td>\n",
" <td>Munich</td>\n",
" <td>A UX designer with 6 years of experience in cr...</td>\n",
" <td>[-0.030572662, 0.04520395, 0.04553928, -0.0925...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>20</td>\n",
" <td>Uma</td>\n",
" <td>53</td>\n",
" <td>Vienna</td>\n",
" <td>A sales executive with 8 years of experience i...</td>\n",
" <td>[-0.014194918, 0.032352123, -0.0070426096, -0....</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>21</td>\n",
" <td>Victor</td>\n",
" <td>30</td>\n",
" <td>Dublin</td>\n",
" <td>A content writer with 5 years of experience in...</td>\n",
" <td>[-0.018195461, 0.032041155, 0.059233848, -0.03...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>22</td>\n",
" <td>Wendy</td>\n",
" <td>59</td>\n",
" <td>Zurich</td>\n",
" <td>A civil engineer with 10 years of experience i...</td>\n",
" <td>[-0.00980266, 0.04713828, 0.05187823, -0.03932...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>23</td>\n",
" <td>Xander</td>\n",
" <td>52</td>\n",
" <td>Stockholm</td>\n",
" <td>A teacher with 15 years of experience in prima...</td>\n",
" <td>[-0.013646452, 0.028070105, 0.05104053, -0.064...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>24</td>\n",
" <td>Yara</td>\n",
" <td>44</td>\n",
" <td>Lisbon</td>\n",
" <td>A business analyst with 7 years of experience ...</td>\n",
" <td>[-0.044623584, 0.054378174, 0.0015794634, -0.0...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>25</td>\n",
" <td>Zane</td>\n",
" <td>32</td>\n",
" <td>Prague</td>\n",
" <td>A psychologist with 6 years of experience in c...</td>\n",
" <td>[0.016778275, 0.09543604, 0.048281595, -0.0022...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>26</td>\n",
" <td>Alice</td>\n",
" <td>46</td>\n",
" <td>Budapest</td>\n",
" <td>A software architect with 9 years of experienc...</td>\n",
" <td>[-0.06051296, 0.031862404, -0.031203829, -0.07...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>27</td>\n",
" <td>Cody</td>\n",
" <td>55</td>\n",
" <td>Berlin</td>\n",
" <td>A research scientist with 8 years of experienc...</td>\n",
" <td>[-0.01787689, 0.07915241, -0.004790489, -0.031...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>28</td>\n",
" <td>Diana</td>\n",
" <td>35</td>\n",
" <td>Copenhagen</td>\n",
" <td>An operations manager with 12 years of experie...</td>\n",
" <td>[0.011406942, 0.02994747, 0.06136875, -0.02639...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>29</td>\n",
" <td>Ethan</td>\n",
" <td>18</td>\n",
" <td>Seoul</td>\n",
" <td>A public relations specialist with 7 years of ...</td>\n",
" <td>[-0.001325855, 0.089781284, 0.05144235, -0.036...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-97fcf1bd-13ad-4779-9029-a721eda12bdc')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-97fcf1bd-13ad-4779-9029-a721eda12bdc button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-97fcf1bd-13ad-4779-9029-a721eda12bdc');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-626a5255-223c-4f1e-b16c-eaa273fbb22f\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-626a5255-223c-4f1e-b16c-eaa273fbb22f')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-626a5255-223c-4f1e-b16c-eaa273fbb22f button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" id name age city \\\n",
"0 0 Alice 58 New York \n",
"1 1 Bob 25 London \n",
"2 2 Charlie 19 New York \n",
"3 3 Monica 35 Paris \n",
"4 4 Eve 33 Berlin \n",
"5 5 Frank 32 New York \n",
"6 6 Grace 26 San Francisco \n",
"7 7 Hannah 24 Amsterdam \n",
"8 8 Ivy 52 Rome \n",
"9 9 Jack 23 Toronto \n",
"10 10 Kara 55 Chicago \n",
"11 11 Leo 45 Barcelona \n",
"12 12 Mia 20 Madrid \n",
"13 13 Nate 19 New York \n",
"14 14 Olivia 23 Moscow \n",
"15 15 Paul 31 Dubai \n",
"16 16 Quinn 32 Singapore \n",
"17 17 Rita 50 New York \n",
"18 18 Sam 56 Istanbul \n",
"19 19 Tina 19 Munich \n",
"20 20 Uma 53 Vienna \n",
"21 21 Victor 30 Dublin \n",
"22 22 Wendy 59 Zurich \n",
"23 23 Xander 52 Stockholm \n",
"24 24 Yara 44 Lisbon \n",
"25 25 Zane 32 Prague \n",
"26 26 Alice 46 Budapest \n",
"27 27 Cody 55 Berlin \n",
"28 28 Diana 35 Copenhagen \n",
"29 29 Ethan 18 Seoul \n",
"\n",
" description \\\n",
"0 A passionate environmentalist with 5 years of ... \n",
"1 A software engineer with 7 years of experience... \n",
"2 A guitarist with over 10 years of experience p... \n",
"3 A data scientist in Tokyo with 4 years of expe... \n",
"4 An avid reader and travel blogger with 3 years... \n",
"5 A graphic designer based in Berlin with 8 year... \n",
"6 A high school teacher with 15 years of experie... \n",
"7 A professional photographer with 6 years of ex... \n",
"8 A fitness trainer with 5 years of experience w... \n",
"9 A chef with 12 years of experience who runs a ... \n",
"10 A journalist with 9 years of experience writin... \n",
"11 A musician with 20 years of experience who pla... \n",
"12 A software developer with 6 years of experienc... \n",
"13 An artist with 10 years of experience who pain... \n",
"14 A historian with 7 years of experience who lov... \n",
"15 A marketing manager with 8 years of experience... \n",
"16 A nurse with 12 years of experience in emergen... \n",
"17 A financial analyst with 5 years of experience... \n",
"18 A project manager with 10 years of experience ... \n",
"19 A UX designer with 6 years of experience in cr... \n",
"20 A sales executive with 8 years of experience i... \n",
"21 A content writer with 5 years of experience in... \n",
"22 A civil engineer with 10 years of experience i... \n",
"23 A teacher with 15 years of experience in prima... \n",
"24 A business analyst with 7 years of experience ... \n",
"25 A psychologist with 6 years of experience in c... \n",
"26 A software architect with 9 years of experienc... \n",
"27 A research scientist with 8 years of experienc... \n",
"28 An operations manager with 12 years of experie... \n",
"29 A public relations specialist with 7 years of ... \n",
"\n",
" embeddings \n",
"0 [-0.006158471, 0.063678846, 0.09181005, -0.023... \n",
"1 [-0.035581246, 0.07986437, 0.04891828, -0.0604... \n",
"2 [0.050266247, 0.05255312, 0.048840936, -0.0032... \n",
"3 [-0.008097345, 0.030305384, 0.012246384, -0.04... \n",
"4 [0.029772803, 0.07571457, 0.042140756, 0.06809... \n",
"5 [0.013257692, 0.045190323, 0.0074770325, -0.00... \n",
"6 [-0.011028861, 0.051242497, 0.063257486, -0.05... \n",
"7 [0.04469839, 0.07050187, 0.046390466, -0.03404... \n",
"8 [0.0002550126, 0.024398372, 0.09861772, 0.0062... \n",
"9 [-0.008186043, 0.051337104, 0.02683556, -0.030... \n",
"10 [-0.017909497, 0.08548332, 0.0022086229, -0.04... \n",
"11 [0.008686635, 0.03110498, 0.05405915, -0.07571... \n",
"12 [-0.04372146, 0.06704399, 0.022140108, -0.1017... \n",
"13 [0.01933304, 0.023277232, 0.044062667, 0.01242... \n",
"14 [-0.0051849326, 0.16519417, 0.06066864, 0.0311... \n",
"15 [0.010789718, 0.017695278, 0.018274685, -0.033... \n",
"16 [-0.041632365, 0.034463193, 0.06313535, 0.0160... \n",
"17 [0.015000028, 0.024906091, 0.0010010687, 0.011... \n",
"18 [-0.020330371, 0.079401195, 0.02162953, -0.080... \n",
"19 [-0.030572662, 0.04520395, 0.04553928, -0.0925... \n",
"20 [-0.014194918, 0.032352123, -0.0070426096, -0.... \n",
"21 [-0.018195461, 0.032041155, 0.059233848, -0.03... \n",
"22 [-0.00980266, 0.04713828, 0.05187823, -0.03932... \n",
"23 [-0.013646452, 0.028070105, 0.05104053, -0.064... \n",
"24 [-0.044623584, 0.054378174, 0.0015794634, -0.0... \n",
"25 [0.016778275, 0.09543604, 0.048281595, -0.0022... \n",
"26 [-0.06051296, 0.031862404, -0.031203829, -0.07... \n",
"27 [-0.01787689, 0.07915241, -0.004790489, -0.031... \n",
"28 [0.011406942, 0.02994747, 0.06136875, -0.02639... \n",
"29 [-0.001325855, 0.089781284, 0.05144235, -0.036... "
]
},
"execution_count": 185,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"All data in the table:\")\n",
"table.query()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1IqGPc21yGc3"
},
"source": [
"##### Query with Filters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 164
},
"collapsed": true,
"id": "9FdMeiywbBpp",
"outputId": "c1650759-cef5-4857-8a37-d0f1a1823c47"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Querying data where age >= 30 and city is Rome or Paris:\n"
]
},
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"table\",\n \"rows\": 2,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"int32\",\n \"num_unique_values\": 2,\n \"samples\": [\n 8,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Ivy\",\n \"Monica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"age\",\n \"properties\": {\n \"dtype\": \"int16\",\n \"num_unique_values\": 2,\n \"samples\": [\n 52,\n 35\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"city\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Rome\",\n \"Paris\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"A fitness trainer with 5 years of experience who enjoys helping people achieve their health goals.\",\n \"A data scientist in Tokyo with 4 years of experience in machine learning and a keen interest in AI research.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"embeddings\",\n \"properties\": {\n \"dtype\": \"object\",\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe"
},
"text/html": [
"\n",
" <div id=\"df-508297d5-d28a-47b8-ab66-a37a12fda125\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>age</th>\n",
" <th>city</th>\n",
" <th>description</th>\n",
" <th>embeddings</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>Monica</td>\n",
" <td>35</td>\n",
" <td>Paris</td>\n",
" <td>A data scientist in Tokyo with 4 years of expe...</td>\n",
" <td>[-0.008097345, 0.030305384, 0.012246384, -0.04...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8</td>\n",
" <td>Ivy</td>\n",
" <td>52</td>\n",
" <td>Rome</td>\n",
" <td>A fitness trainer with 5 years of experience w...</td>\n",
" <td>[0.0002550126, 0.024398372, 0.09861772, 0.0062...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-508297d5-d28a-47b8-ab66-a37a12fda125')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-508297d5-d28a-47b8-ab66-a37a12fda125 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-508297d5-d28a-47b8-ab66-a37a12fda125');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-915e3ed9-57cc-4c6d-963b-bc953be0ae3c button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" id name age city description \\\n",
"0 3 Monica 35 Paris A data scientist in Tokyo with 4 years of expe... \n",
"1 8 Ivy 52 Rome A fitness trainer with 5 years of experience w... \n",
"\n",
" embeddings \n",
"0 [-0.008097345, 0.030305384, 0.012246384, -0.04... \n",
"1 [0.0002550126, 0.024398372, 0.09861772, 0.0062... "
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Querying data where age >= 30 and city is Rome or Paris:\")\n",
"table.query(filter=[(\">=\", \"age\", 30), (\"in\", \"city\", [\"Rome\", \"Paris\"])])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "oDnVcKV5bML1",
"outputId": "01fa9a05-c554-4270-fd78-b04c3b493d6d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Aggregated query for names and cities:\n",
" name city\n",
"0 Alice New York\n",
"1 Bob London\n",
"2 Charlie New York\n",
"3 Monica Paris\n",
"4 Eve Berlin\n",
"5 Frank New York\n",
"6 Grace San Francisco\n",
"7 Hannah Amsterdam\n",
"8 Ivy Rome\n",
"9 Jack Toronto\n",
"10 Kara Chicago\n",
"11 Leo Barcelona\n",
"12 Mia Madrid\n",
"13 Nate New York\n",
"14 Olivia Moscow\n",
"15 Paul Dubai\n",
"16 Quinn Singapore\n",
"17 Rita New York\n",
"18 Sam Istanbul\n",
"19 Tina Munich\n",
"20 Uma Vienna\n",
"21 Victor Dublin\n",
"22 Wendy Zurich\n",
"23 Xander Stockholm\n",
"24 Yara Lisbon\n",
"25 Zane Prague\n",
"26 Alice Budapest\n",
"27 Cody Berlin\n",
"28 Diana Copenhagen\n",
"29 Ethan Seoul\n"
]
}
],
"source": [
"print(\"Aggregated query for names and cities:\")\n",
"print(table.query(aggs={\"name\": \"name\", \"city\": \"city\"})) # returns only the names and cities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "dyj4HxfUbTW4",
"outputId": "8ecf8d36-9cf7-4b56-a270-edd1b567325c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Aggregated query for maximum age grouped by city:\n",
" city maxAge\n",
"0 Seoul 18\n",
"1 Munich 19\n",
"2 Madrid 20\n",
"3 Moscow 23\n",
"4 Toronto 23\n",
"5 Amsterdam 24\n",
"6 London 25\n",
"7 San Francisco 26\n",
"8 Dublin 30\n",
"9 Dubai 31\n",
"10 Prague 32\n",
"11 Singapore 32\n",
"12 Copenhagen 35\n",
"13 Paris 35\n",
"14 Lisbon 44\n",
"15 Barcelona 45\n",
"16 Budapest 46\n",
"17 Rome 52\n",
"18 Stockholm 52\n",
"19 Vienna 53\n",
"20 Berlin 55\n",
"21 Chicago 55\n",
"22 Istanbul 56\n",
"23 New York 58\n",
"24 Zurich 59\n"
]
}
],
"source": [
"print(\"Aggregated query for maximum age grouped by city:\")\n",
"print(table.query(aggs={'maxAge': ['max', 'age']}, group_by=['city'], sort_columns=['maxAge']))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 851
},
"collapsed": true,
"id": "Ex8dE0cmbUXy",
"outputId": "8340e11b-c1d4-4734-b7b0-040a93a0ac6a"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Aggregated query for average age and count of distict people in each city:\n"
]
},
{
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"summary": "{\n \"name\": \"table\",\n \"rows\": 25,\n \"fields\": [\n {\n \"column\": \"city\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 25,\n \"samples\": [\n \"Dublin\",\n \"Lisbon\",\n \"Seoul\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"avgAge\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13.004219827937904,\n \"min\": 18.0,\n \"max\": 59.0,\n \"num_unique_values\": 20,\n \"samples\": [\n 18.0,\n 55.0,\n 52.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"countCity\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 1,\n \"max\": 5,\n \"num_unique_values\": 3,\n \"samples\": [\n 1,\n 5,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
"type": "dataframe"
},
"text/html": [
"\n",
" <div id=\"df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>city</th>\n",
" <th>avgAge</th>\n",
" <th>countCity</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Seoul</td>\n",
" <td>18.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Munich</td>\n",
" <td>19.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Madrid</td>\n",
" <td>20.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Moscow</td>\n",
" <td>23.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Toronto</td>\n",
" <td>23.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Amsterdam</td>\n",
" <td>24.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>London</td>\n",
" <td>25.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>San Francisco</td>\n",
" <td>26.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Dublin</td>\n",
" <td>30.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Dubai</td>\n",
" <td>31.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Prague</td>\n",
" <td>32.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Singapore</td>\n",
" <td>32.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Copenhagen</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Paris</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>New York</td>\n",
" <td>35.6</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Berlin</td>\n",
" <td>44.0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Lisbon</td>\n",
" <td>44.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Barcelona</td>\n",
" <td>45.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Budapest</td>\n",
" <td>46.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>Rome</td>\n",
" <td>52.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Stockholm</td>\n",
" <td>52.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Vienna</td>\n",
" <td>53.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>Chicago</td>\n",
" <td>55.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Istanbul</td>\n",
" <td>56.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>Zurich</td>\n",
" <td>59.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-07ac5e33-bf99-46a5-b6bd-1a4d6df150ea');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-8a668902-d12b-4b27-8bea-abd518bdff99\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8a668902-d12b-4b27-8bea-abd518bdff99')\"\n",
" title=\"Suggest charts\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-8a668902-d12b-4b27-8bea-abd518bdff99 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
"\n",
" </div>\n",
" </div>\n"
],
"text/plain": [
" city avgAge countCity\n",
"0 Seoul 18.0 1\n",
"1 Munich 19.0 1\n",
"2 Madrid 20.0 1\n",
"3 Moscow 23.0 1\n",
"4 Toronto 23.0 1\n",
"5 Amsterdam 24.0 1\n",
"6 London 25.0 1\n",
"7 San Francisco 26.0 1\n",
"8 Dublin 30.0 1\n",
"9 Dubai 31.0 1\n",
"10 Prague 32.0 1\n",
"11 Singapore 32.0 1\n",
"12 Copenhagen 35.0 1\n",
"13 Paris 35.0 1\n",
"14 New York 35.6 5\n",
"15 Berlin 44.0 2\n",
"16 Lisbon 44.0 1\n",
"17 Barcelona 45.0 1\n",
"18 Budapest 46.0 1\n",
"19 Rome 52.0 1\n",
"20 Stockholm 52.0 1\n",
"21 Vienna 53.0 1\n",
"22 Chicago 55.0 1\n",
"23 Istanbul 56.0 1\n",
"24 Zurich 59.0 1"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Aggregated query for average age and count of distict people in each city:\")\n",
"table.query(aggs={'avgAge': ['avg', 'age'], 'countCity': ['count', 'id']}, group_by=['city'], sort_columns=['avgAge'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WKkXE7-yvtQf"
},
"source": [
"##### Customizing Filters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "0TR-GFQ7bW5_",
"outputId": "ef98b740-aad6-4ebb-9266-36e8d49d6ccd"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Querying data where age < 30 and name starts with H:\n",
" id name age city \\\n",
"0 7 Hannah 24 Amsterdam \n",
"\n",
" description \\\n",
"0 A professional photographer with 6 years of ex... \n",
"\n",
" embeddings \n",
"0 [0.04469839, 0.07050187, 0.046390466, -0.03404... \n"
]
}
],
"source": [
"print(\"Querying data where age < 30 and name starts with H:\")\n",
"print(table.query(filter=[(\"<\", \"age\", 30), (\"like\", \"name\", \"H*\")]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "ySsv3TZxbYbu",
"outputId": "fbf7919c-9f77-4d7b-e108-6eda3b6d210d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Querying data where age > 30 and city is Rome or New York:\n",
" id name age city \\\n",
"0 0 Alice 58 New York \n",
"1 5 Frank 32 New York \n",
"2 8 Ivy 52 Rome \n",
"3 17 Rita 50 New York \n",
"\n",
" description \\\n",
"0 A passionate environmentalist with 5 years of ... \n",
"1 A graphic designer based in Berlin with 8 year... \n",
"2 A fitness trainer with 5 years of experience w... \n",
"3 A financial analyst with 5 years of experience... \n",
"\n",
" embeddings \n",
"0 [-0.006158471, 0.063678846, 0.09181005, -0.023... \n",
"1 [0.013257692, 0.045190323, 0.0074770325, -0.00... \n",
"2 [0.0002550126, 0.024398372, 0.09861772, 0.0062... \n",
"3 [0.015000028, 0.024906091, 0.0010010687, 0.011... \n"
]
}
],
"source": [
"print(\"Querying data where age > 30 and city is Rome or New York:\")\n",
"print(table.query(filter=[(\">\", \"age\", 30), (\"in\", \"city\", [\"Rome\", \"New York\"])]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "I3sX3ucWvzDZ"
},
"source": [
"#### Vector Search"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cuxZlgX2wwHp"
},
"source": [
"##### Embedding a Query Vector and Searching"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c8KMGQO2i7gT"
},
"outputs": [],
"source": [
"person_query = \"a software engineer with lots of experience\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dBjWSreqjGOd"
},
"outputs": [],
"source": [
"person_embedding = list(embedding_model.embed([person_query]))[0].tolist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "9Ud3vcsbjOJ4",
"outputId": "8d11a7f1-57f8-425c-d676-c82c4065e167"
},
"outputs": [
{
"data": {
"text/plain": [
"384"
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(person_embedding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "Zhld2bOLbfn3",
"outputId": "12fe2934-ed17-4d4e-91dd-04a3cc7859ff"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Searching for the three closest people to the example vector:\n"
]
},
{
"data": {
"text/plain": [
"[ id name age city \\\n",
" 0 26 Alice 46 Budapest \n",
" 1 12 Mia 20 Madrid \n",
" 2 19 Tina 19 Munich \n",
" \n",
" description \\\n",
" 0 A software architect with 9 years of experienc... \n",
" 1 A software developer with 6 years of experienc... \n",
" 2 A UX designer with 6 years of experience in cr... \n",
" \n",
" embeddings __nn_distance \n",
" 0 [-0.06051296, 0.031862404, -0.031203829, -0.07... 0.322560 \n",
" 1 [-0.04372146, 0.06704399, 0.022140108, -0.1017... 0.356629 \n",
" 2 [-0.030572662, 0.04520395, 0.04553928, -0.0925... 0.457364 ]"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Searching for the three closest people to the example vector:\")\n",
"table.search({index_name: [person_embedding]}, n=3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ashW6LLjv7zb"
},
"source": [
"##### Batch Search with Multiple Query Vectors"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "4GFgKzsCbhRg",
"outputId": "9be28206-4142-4038-8fc6-6b87ecffcb29"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Batch search with multiple query vectors:\n"
]
},
{
"data": {
"text/plain": [
"[ id name age city \\\n",
" 0 26 Alice 46 Budapest \n",
" 1 12 Mia 20 Madrid \n",
" 2 19 Tina 19 Munich \n",
" \n",
" description \\\n",
" 0 A software architect with 9 years of experienc... \n",
" 1 A software developer with 6 years of experienc... \n",
" 2 A UX designer with 6 years of experience in cr... \n",
" \n",
" embeddings __nn_distance \n",
" 0 [-0.06051296, 0.031862404, -0.031203829, -0.07... 0.322560 \n",
" 1 [-0.04372146, 0.06704399, 0.022140108, -0.1017... 0.356629 \n",
" 2 [-0.030572662, 0.04520395, 0.04553928, -0.0925... 0.457364 ,\n",
" id name age city description \\\n",
" 0 3 Monica 35 Paris A data scientist in Tokyo with 4 years of expe... \n",
" 1 24 Yara 44 Lisbon A business analyst with 7 years of experience ... \n",
" 2 27 Cody 55 Berlin A research scientist with 8 years of experienc... \n",
" \n",
" embeddings __nn_distance \n",
" 0 [-0.008097345, 0.030305384, 0.012246384, -0.04... 0.231473 \n",
" 1 [-0.044623584, 0.054378174, 0.0015794634, -0.0... 0.617759 \n",
" 2 [-0.01787689, 0.07915241, -0.004790489, -0.031... 0.642228 ]"
]
},
"execution_count": 176,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Batch search with multiple query vectors:\")\n",
"queries = [\"a software engineer with lots of experience\", \"a data scientist with experience in machine learning and a keen interest in AI research\"]\n",
"queries_embeddings = list(embedding_model.embed(queries))\n",
"table.search(vectors={index_name: [q.tolist() for q in queries_embeddings]}, n=3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nApqLnYdwIAF"
},
"source": [
"##### Combining Aggregations with Vector Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "VtgsqjOWbisW",
"outputId": "114c1f09-c68b-45b2-b673-132c72b7292e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Searching with aggregated results for name, city, and description:\n",
"[ name city description\n",
"0 Alice Budapest A software architect with 9 years of experienc...\n",
"1 Mia Madrid A software developer with 6 years of experienc...\n",
"2 Tina Munich A UX designer with 6 years of experience in cr...]\n"
]
}
],
"source": [
"print(\"Searching with aggregated results for name, city, and description:\")\n",
"print(table.search(vectors={index_name: [person_embedding]}, n=3, aggs={\"name\": \"name\", \"city\": \"city\", \"description\": \"description\"}))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7b4GvZSrweRS"
},
"source": [
"##### Vector Search with Filters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "cSkZPE3FblJD",
"outputId": "046c490f-8df8-47cd-8643-0667bcf8bd79"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Searching with filter to find people younger than 30:\n",
"[ name age description\n",
"0 Mia 20 A software developer with 6 years of experienc...\n",
"1 Tina 19 A UX designer with 6 years of experience in cr...\n",
"2 Bob 25 A software engineer with 7 years of experience...\n",
"3 Ethan 18 A public relations specialist with 7 years of ...\n",
"4 Jack 23 A chef with 12 years of experience who runs a ...]\n"
]
}
],
"source": [
"print(\"Searching with filter to find people younger than 30:\")\n",
"print(table.search(vectors={index_name: [person_embedding]}, n=5, filter=[(\"<\", \"age\", 30)], aggs={\"name\": \"name\", \"age\": \"age\", \"description\": \"description\"}))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_uPg6NsgwgEc"
},
"source": [
"### Drop Table To Conserve Resources"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wzZvVkTBbmgg",
"outputId": "84df09a6-b19b-4a46-8e1d-77b0b822590c"
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 179,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.drop()"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: KDB.AI_course/course_specific_content/managing_tables.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914",
"metadata": {
"id": "bb2094b8-13a5-4f7c-bd21-d2c709dab914"
},
"source": [
"# Managing Tables in KDB.AI\n",
"[Video Walkthough](https://www.youtube.com/watch?v=XH5iNkcFKXc&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=6)\n",
"\n",
"##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).\n",
"\n",
"\n",
"\n",
"How to get started with the KDB.AI vector database. Here, you'll get a quick taste of KDB.AI in ~10 minutes.\n",
"\n",
"You will learn how to:\n",
"\n",
"1. Connect to KDB.AI\n",
"1. Create a KDB.AI Table\n",
"1. Add Data to the KDB.AI Table\n",
"1. Query the Table\n",
"1. Perform Similarity Search\n",
"1. Delete the KDB.AI Table"
]
},
{
"cell_type": "markdown",
"id": "260d0f4b-ef09-4bd2-a197-a9351be24684",
"metadata": {
"id": "260d0f4b-ef09-4bd2-a197-a9351be24684"
},
"source": [
"## 0. Setup"
]
},
{
"cell_type": "markdown",
"id": "d1468bd3",
"metadata": {
"id": "d1468bd3"
},
"source": [
"### Install dependencies\n",
"\n",
"In order to successfully run this sample, note the following steps depending on where you are running this notebook:\n",
"\n",
"-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.\n",
"\n",
"\n",
"-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "491cd6d6",
"metadata": {
"id": "491cd6d6"
},
"outputs": [],
"source": [
"!pip install kdbai_client"
]
},
{
"cell_type": "markdown",
"id": "cc6d17b7",
"metadata": {
"id": "cc6d17b7"
},
"source": [
"### Import Packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "805d97da",
"metadata": {
"id": "805d97da"
},
"outputs": [],
"source": [
"# vector DB\n",
"import os\n",
"from getpass import getpass\n",
"import kdbai_client as kdbai\n",
"import time"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41",
"metadata": {
"id": "a55ae34e-472b-4aa7-9add-1fcb2ee24a41"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "8c660c7d",
"metadata": {
"id": "8c660c7d"
},
"source": [
"With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
"\n",
"### Connect to KDB.AI Server\n",
"\n",
"To use KDB.AI Server, you will need download and run your own container.\n",
"To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
"\n",
"You will receive an email with the required license file and bearer token needed to download your instance.\n",
"Follow instructions in the signup email to get your session up and running.\n",
"\n",
"Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2e85c1ff",
"metadata": {
"id": "2e85c1ff"
},
"outputs": [],
"source": [
"#Set up KDB.AI server endpoint \n",
"KDBAI_ENDPOINT = (\n",
" os.environ[\"KDBAI_ENDPOINT\"]\n",
" if \"KDBAI_ENDPOINT\" in os.environ\n",
" else \"http://localhost:8082\"\n",
")\n",
"\n",
"\n",
"#connect to KDB.AI Server, default mode is qipc\n",
"session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f330098",
"metadata": {
"id": "6f330098"
},
"outputs": [],
"source": [
"database = session.database(\"default\")"
]
},
{
"cell_type": "markdown",
"id": "1ec2c77b",
"metadata": {
"id": "1ec2c77b"
},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<b>Need help understanding a function?</b><br/>\n",
"Add ? before or after any function name in KDB.AI to bring up the documentation for that function along with sample code and arguments.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e54917b",
"metadata": {
"id": "6e54917b"
},
"outputs": [],
"source": [
"?kdbai.Session"
]
},
{
"cell_type": "markdown",
"id": "8788a6b1",
"metadata": {
"id": "8788a6b1"
},
"source": [
"### Verify Defined Tables\n",
"\n",
"We can check our connection using the `session.list()` function.\n",
"This will return a list of all the tables we have defined in our vector database thus far.\n",
"This should return an empty list."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97e5f4a9",
"metadata": {
"id": "97e5f4a9"
},
"outputs": [],
"source": [
"# ensure no table called \"data\" exists\n",
"try:\n",
" database.table(\"data\").drop()\n",
"except kdbai.KDBAIException:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7877f51c",
"metadata": {
"id": "7877f51c",
"outputId": "a6deb89e-0325-4686-f111-b611f5acb2e5"
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"database.tables"
]
},
{
"cell_type": "markdown",
"id": "e33f03c3",
"metadata": {
"id": "e33f03c3"
},
"source": [
"## 2. Create a KDB.AI Table\n",
"\n",
"To create a table we can use `create_table`, this function takes two mandatory arguments - the name and schema of the table.\n",
"\n",
"This schema must meet the following criteria:\n",
"- It must contain a list of columns.\n",
"- All columns must have `type` specified.\n",
"\n",
"If you want to create indexes, you must provide them as separate parameter.\n",
"- It must contain a list of index definitions\n",
"- Each index must have `name`, `colummn` and `type` attributes. Index-specific parameters can be passed in `params`, it's mandatory for some index types.\n",
"\n",
"Run `?database.create_table` for more details and sample code."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f40400b",
"metadata": {
"id": "6f40400b"
},
"outputs": [],
"source": [
"?database.create_table"
]
},
{
"cell_type": "markdown",
"id": "9da55253",
"metadata": {
"id": "9da55253"
},
"source": [
"### Define Schema\n",
"\n",
"Our table will have two columns the first `id` with a list of dummy ID's, the second will be the vector embeddings we will use for similarity search later on in this example.\n",
"\n",
"We will define our dimensionality, similarity metric and index type in the `indexes` parameter. For this example we chose:\n",
"- `dims = 8` : In the next section, we generate embeddings that are eight-dimensional to match this. You can chose any value here.\n",
"- `metric = L2` : We chose [L2/Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance). Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like [IP/Inner Product](https://en.wikipedia.org/wiki/Inner_product_space) and [CS/Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) and the one you chose depends on the specific context and nature of your data.\n",
"- `type = flat` : We use a [Flat index](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexFlat.html) here as we have a simple data structure so this is more than adequate. You have the choice of using other indexes like [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html) and [IVFPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexIVFPQ.html) here, as with metrics the one you chose depends your data and your overall performance requirements."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5e8b782",
"metadata": {
"id": "e5e8b782"
},
"outputs": [],
"source": [
"schema = [\n",
" {\"name\": \"id\", \"type\": \"str\"},\n",
" {\"name\": \"vectors\", \"type\": \"float32s\"},\n",
"]\n",
"\n",
"index_name = \"flat_index\"\n",
"indexes = [{\"name\": index_name, \"column\": \"vectors\", \"type\": \"flat\", \"params\": {\"dims\": 8, \"metric\": \"L2\"}}]"
]
},
{
"cell_type": "markdown",
"id": "09a5caa0",
"metadata": {
"id": "09a5caa0"
},
"source": [
"### Create Table"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34067680",
"metadata": {
"id": "34067680"
},
"outputs": [],
"source": [
"table = database.create_table(\"data\", schema=schema, indexes=indexes)"
]
},
{
"cell_type": "markdown",
"id": "20afbea1",
"metadata": {
"id": "20afbea1"
},
"source": [
"## 3. Add Data to the KDB.AI Table\n",
"\n",
"First, generate a vector of five 8-dimensional vectors which will be the vector embeddings in this example. We will then add these to pandas dataframe with column names/types matching the target table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37581e86",
"metadata": {
"id": "37581e86"
},
"outputs": [],
"source": [
"# Create a NumPy array of 5 eight-dimensional float32 arrays\n",
"vectors = np.array(\n",
" [\n",
" [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],\n",
" [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],\n",
" [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],\n",
" [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1],\n",
" [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2],\n",
" ],\n",
" dtype=np.float32,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5dc41e8",
"metadata": {
"id": "f5dc41e8"
},
"outputs": [],
"source": [
"# Example ID values\n",
"ids = [\"h\", \"e\", \"l\", \"l\", \"o\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "730c9f08",
"metadata": {
"id": "730c9f08"
},
"outputs": [],
"source": [
"# column names/types matching the schema\n",
"embeddings = pd.DataFrame({\"id\": ids, \"vectors\": list(vectors)})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a31f878",
"metadata": {
"id": "4a31f878",
"outputId": "933caa30-7fd4-4d11-c717-ecfff36fa6c9"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>vectors</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>h</td>\n",
" <td>[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>e</td>\n",
" <td>[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>l</td>\n",
" <td>[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>l</td>\n",
" <td>[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>o</td>\n",
" <td>[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id vectors\n",
"0 h [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]\n",
"1 e [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
"2 l [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
"3 l [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]\n",
"4 o [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings"
]
},
{
"cell_type": "markdown",
"id": "43cd2ad8",
"metadata": {
"id": "43cd2ad8"
},
"source": [
"We can now add data to our KDB.AI table using `insert`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7e0f8c5",
"metadata": {
"id": "b7e0f8c5"
},
"outputs": [],
"source": [
"table.insert(embeddings)"
]
},
{
"cell_type": "markdown",
"id": "09577e8e",
"metadata": {
"id": "09577e8e"
},
"source": [
"## 4. Query the Table\n",
"\n",
"We can use `query` to query data from the table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4b8b8e5",
"metadata": {
"id": "f4b8b8e5",
"outputId": "f96e7323-a9f0-4154-abef-dd012be6b1b9"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>vectors</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>h</td>\n",
" <td>[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>e</td>\n",
" <td>[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>l</td>\n",
" <td>[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>l</td>\n",
" <td>[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>o</td>\n",
" <td>[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id vectors\n",
"0 h [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]\n",
"1 e [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n",
"2 l [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
"3 l [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]\n",
"4 o [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.query()"
]
},
{
"cell_type": "markdown",
"id": "9c267a58",
"metadata": {
"id": "9c267a58"
},
"source": [
"## 5. Perform Similarity Search\n",
"\n",
"Finally, let's perform similarity search on the table. We do this using the `search` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "595829ff",
"metadata": {
"id": "595829ff"
},
"outputs": [],
"source": [
"?table.search"
]
},
{
"cell_type": "markdown",
"id": "9bb341f3",
"metadata": {
"id": "9bb341f3"
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<b>Note:</b> The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9301c97",
"metadata": {
"id": "c9301c97",
"outputId": "880dfe5d-86e2-4487-d3d6-771c7be40f57"
},
"outputs": [
{
"data": {
"text/plain": [
"[ id vectors __nn_distance\n",
" 0 e [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] 0.01]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the closest neighbor of a single query vector\n",
"table.search(vectors={index_name: [[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, n=1)"
]
},
{
"cell_type": "markdown",
"id": "49758e9d",
"metadata": {
"id": "49758e9d"
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<b>Note:</b> The output was a list of length one, matching the number of vectors we input to the search. This can be indexed on position [0] to extract the dataframe corresponding to the single input vector.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "d8aed9bc-72b2-4e70-b763-e7ce054557db",
"metadata": {
"id": "d8aed9bc-72b2-4e70-b763-e7ce054557db"
},
"source": [
"## 6. Delete the KDB.AI Table\n",
"\n",
"We can use `table.drop()` to delete a table."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
"metadata": {
"id": "548a9d95-aac3-4d63-a87a-99eedfe55f07",
"outputId": "53a714f1-ad13-410c-dbdb-39e5a22e7a86"
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.drop()"
]
},
{
"cell_type": "markdown",
"id": "8bc6d801-1371-48d0-98b4-0baa53bc8446",
"metadata": {
"id": "8bc6d801-1371-48d0-98b4-0baa53bc8446"
},
"source": [
"<div class=\"alert alert-block alert-warning\">\n",
"<b>Warning:</b> Once you drop a table, you cannot use it again.\n",
"</div>"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: KDB.AI_course/course_specific_content/rag_example.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "on0mJqL80KsJ"
},
"source": [
"## Introduction\n",
"\n",
"[Video Walkthrough](https://www.youtube.com/watch?v=Obbn15rZfvQ&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=13)\n",
"\n",
"This notebook demonstrates the implementation of a Retrieval-Augmented Generation (RAG) pipeline using KDB.AI and Large Language Models. By the end of this tutorial, you'll understand how to leverage vector databases and LLMs to create an effective RAG system."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s3Eb0JnV0lVJ"
},
"source": [
"### Setup and Dependencies\n",
"Install kdbai_client and import the necessary dependencies"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x68BCmLZ15N2"
},
"source": [
"##### Install Required Libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "OnhoXtx5ggta",
"outputId": "1e69c47d-0034-47fb-fcd9-35ccade1d6d2"
},
"outputs": [],
"source": [
"# Install required libraries\n",
"!pip install llama-index fastembed openai kdbai_client onnxruntime==1.19.2"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LHMN8-Vd2ANx"
},
"source": [
"##### Import Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "RHlEgCWExKo3"
},
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
"import kdbai_client as kdbai\n",
"import time\n",
"from llama_index.core import Document, SimpleDirectoryReader\n",
"from llama_index.core.node_parser import SentenceSplitter\n",
"import pandas as pd\n",
"from fastembed import TextEmbedding\n",
"import openai\n",
"import textwrap"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set up OpenAI API key\n",
"OPENAI_API_KEY = (\n",
" os.environ[\"OPENAI_API_KEY\"]\n",
" if \"OPENAI_API_KEY\" in os.environ\n",
" else getpass(\"OpenAI API key: \")\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0aC8tMVy0vPv"
},
"source": [
"With the embeddings created, we need to store them in a vector database to enable efficient searching.\n",
"\n",
"### Connect to KDB.AI Server\n",
"\n",
"To use KDB.AI Server, you will need download and run your own container.\n",
"To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
"\n",
"You will receive an email with the required license file and bearer token needed to download your instance.\n",
"Follow instructions in the signup email to get your session up and running.\n",
"\n",
"Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4rhRF58Wwxhj",
"outputId": "355c7966-b409-4a52-f86c-b4f62755df97"
},
"outputs": [],
"source": [
"\n",
"\n",
"#Set up KDB.AI server endpoint \n",
"KDBAI_ENDPOINT = (\n",
" os.environ[\"KDBAI_ENDPOINT\"]\n",
" if \"KDBAI_ENDPOINT\" in os.environ\n",
" else \"http://localhost:8082\"\n",
")\n",
"\n",
"\n",
"#connect to KDB.AI Server, default mode is qipc\n",
"session = kdbai.Session(endpoint=KDBAI_ENDPOINT)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TI-33lMv1LYi"
},
"source": [
"##### Initialize Embedding Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 336
},
"id": "cemGdbEkufnu",
"outputId": "4de02d2c-338e-4034-e784-c26a6abb8550"
},
"outputs": [],
"source": [
"fastembed = TextEmbedding()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sd3rjwg-kmWL"
},
"source": [
"### Data Preparation\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "24Sj5uYC2SBf"
},
"source": [
"##### Download Dataset\n",
"We'll use the Paul Graham Essay Dataset as our corpus."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZgL3bM7GPkUa",
"outputId": "0d2e7772-b5dd-4d02-a23d-9ce62d39343e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 4.37it/s]\n",
"Successfully downloaded PaulGrahamEssayDataset to ./data\n"
]
}
],
"source": [
"!llamaindex-cli download-llamadataset PaulGrahamEssayDataset --download-dir ./data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BXoGTn42lE-T"
},
"source": [
"### Create a KDB.AI session and table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-gOC5u2KW32F"
},
"outputs": [],
"source": [
"KDBAI_TABLE_NAME = \"paul_graham\"\n",
"database = session.database(\"default\")\n",
"\n",
"# Drop existing table if it exists\n",
"try:\n",
" database.table(KDBAI_TABLE_NAME).drop()\n",
"except kdbai.KDBAIException:\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "SP9oocI1Z2j1"
},
"outputs": [],
"source": [
"# Define table schema\n",
"\n",
"schema = [\n",
" dict(name=\"text\", type=\"bytes\"),\n",
" dict(name=\"embedding\", type=\"float32s\")\n",
"]\n",
"index_name = \"flat_index\"\n",
"indexes = [dict(name=index_name, column=\"embedding\", type=\"flat\", params=dict(metric=\"L2\", dims=384))]\n",
"\n",
"table = database.create_table(KDBAI_TABLE_NAME, schema=schema, indexes=indexes)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "THCzKUyS3E2B"
},
"source": [
"#### Load and Parse Documents"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mwFpmgDSzZ_A",
"outputId": "78987b65-47af-4d4f-b405-50ad264bb041"
},
"outputs": [
{
"data": {
"text/plain": [
"46"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"node_parser = SentenceSplitter(chunk_size=500, chunk_overlap=100)\n",
"essays = SimpleDirectoryReader(input_dir=\"./data/source_files\").load_data()\n",
"docs = node_parser.get_nodes_from_documents(essays)\n",
"len(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fpeYGnog3MLs"
},
"source": [
"##### Generate Embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49
},
"id": "gjhkkyqHaA5k",
"outputId": "d7e8f17b-c3e1-406a-f8d0-e35bb22165cf"
},
"outputs": [],
"source": [
"embedding_model = TextEmbedding()\n",
"documents = [doc.text for doc in docs]\n",
"embeddings = list(embedding_model.embed(documents))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xMHUKw8FcYDZ",
"outputId": "e05c1849-9647-4c9c-bca3-a7f6628bf7b0"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.\n",
"\n",
"With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n",
"\n",
"The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n",
"\n",
"Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n",
"\n",
"Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n",
"\n",
"I couldn't have put this into words when I was 18.\n"
]
}
],
"source": [
"print(documents[1])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OCWVBc0c3Sbs"
},
"source": [
"##### Insert Data into KDB.AI Table"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"id": "LcWJHw4caCt3"
},
"outputs": [],
"source": [
"records_to_insert_with_embeddings = pd.DataFrame({\n",
" \"text\": [d.encode('utf-8') for d in documents],\n",
" \"embedding\": embeddings\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3TXly52oaGvE",
"outputId": "3116011d-fdd9-4255-9086-73240d31e4f4"
},
"outputs": [
{
"data": {
"text/plain": [
"{'rowsInserted': 46}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table.insert(records_to_insert_with_embeddings)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0QtreXjz3W_t"
},
"source": [
"### RAG Implementation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bQUkhjTS3Ypm"
},
"source": [
"##### Define Query and Generate Embedding"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"id": "m-zgeI0BaLOc"
},
"outputs": [],
"source": [
"query = \"How does Paul Graham decide what to work on?\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"id": "3-YRNxNjaNZT"
},
"outputs": [],
"source": [
"query_embedding = list(embedding_model.embed([query]))[0].tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zg8SklaJ3biG"
},
"source": [
"##### Perform Vector Search"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"id": "gWSOhN6uaQzQ"
},
"outputs": [],
"source": [
"search_results = table.search({index_name: [query_embedding]}, n=10)\n",
"search_results_df = search_results[0]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 842
},
"id": "BcM2f_KPaS72",
"outputId": "e7397190-88b2-47a2-cd67-dc1c80a7d09d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Top Search Results Based on Query: How does Paul Graham decide what to work on?\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>__nn_distance</th>\n",
" <th>text</th>\n",
" <th>embedding</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.823007</td>\n",
" <td>b'In late 2015 I spent 3 months writing essays, and when I went back to working on Bel I could barely understand the code. Not so much because it was badly written as because the problem is so convoluted. When you\\'re working on an interpreter written in itself, it\\'s hard to keep track of what\\'s happening at what level, and errors can be practically encrypted by the time you get them.\\n\\nSo I said no more essays till Bel was done. But I told few people about Bel while I was working on it. So for years it must have seemed that I was doing nothing, when in fact I was working harder than I\\'d ever worked on anything. Occasionally after wrestling for hours with some gruesome bug I\\'d check Twitter or HN and see someone asking \"Does Paul Graham still code?\"\\n\\nWorking on Bel was hard but satisfying. I worked on it so intensively that at any given time I had a decent chunk of the code in my head and could write more there. I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with some problem involving continuations while I watched them play in the tide pools. It felt like I was doing life right. I remember that because I was slightly dismayed at how novel it felt. The good news is that I had more moments like this over the next few years.\\n\\nIn the summer of 2016 we moved to England. We wanted our kids to see what it was like living in another country, and since I was a British citizen by birth, that seemed the obvious choice. We only meant to stay for a year, but we liked it so much that we still live there. So most of Bel was written in England.\\n\\nIn the fall of 2019, Bel was finally finished. Like McCarthy\\'s original Lisp, it\\'s a spec rather than an implementation, although like McCarthy\\'s Lisp it\\'s a spec expressed as code.\\n\\nNow that I could write essays again, I wrote a bunch about topics I\\'d had stacked up. I kept writing essays through 2020, but I also started to think about other things I could work on. How should I choose what to do?'</td>\n",
" <td>[-0.05267877, 0.005840427, -0.01187801, -0.028083289, 0.029767925, -0.01268333, -0.009753024, -0.011209541, 0.030792488, -0.07470311, 0.0005716741, 0.034681723, -0.0025648128, -0.007870674, -0.037071493, -0.0026503617, -0.030294443, -0.046712548, -0.026220752, -0.010382689, -0.047210008, 0.0039388337, -0.009324926, 0.04539282, 0.04298206, 0.051068194, 0.029527958, -0.012021941, -0.051774003, -0.20419116, -0.019487105, 0.03856181, 0.054865412, -0.024023462, 0.005628216, 0.059498444, -0.023029648, -0.011461271, 0.0007990732, 0.01532533, 0.013435846, 0.009714834, 0.010104686, -0.014338494, 0.004052569, 0.020879505, 0.0112869395, -0.048422333, 0.025670612, 0.033183247, -0.071020156, -0.032056253, -0.0013147242, 0.045764726, -0.023884403, 0.013609344, 0.021824384, 0.0791942, 0.0021155155, -0.0058458406, 0.022163069, -0.0010415328, -0.1377265, 0.05194325, -0.035091735, 0.020503322, -0.03358411, -0.039575316, -0.018544003, 0.07090187, -0.030203853, 0.0024145627, -0.050365325, 0.1062729, 0.04504893, 0.020158818, -0.0055481945, 0.0020900085, 0.014658697, -0.01600323, 0.018643875, -0.020128626, 0.001960821, 0.014573526, -0.018745624, -0.011082115, -0.026627902, 0.035287272, 0.033186108, 0.004842385, 0.04288919, -0.051519115, 0.021143924, 0.03511711, -0.032461487, -0.053802498, -2.9269107e-05, 0.022274038, -0.019326271, 0.5066904, ...]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.851789</td>\n",
" <td>b\"He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.\\n\\nShe died on January 15, 2014. We knew this was coming, but it was still hard when it did.\\n\\nI kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)\\n\\nWhat should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]\\n\\nI spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.\\n\\nI realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.\"</td>\n",
" <td>[-0.04173409, -0.020306244, 0.026670614, -0.028619805, 0.013841975, -0.004587492, -0.03740281, -0.0023207841, -0.005583664, -0.02458708, 0.032301717, -0.003981511, -0.0022139344, 0.040776156, 0.008303966, 0.065411426, -0.05266241, -0.0147317415, -0.013039435, -0.02108635, -0.08220996, -0.023095597, 0.009018569, -0.06593445, 0.053503707, 0.02561, -0.011278506, -0.029375598, -0.02894449, -0.17977206, 0.015862752, 0.037204675, 0.028550476, -0.008014831, 0.050124772, 0.053289328, -0.037882008, -0.004310019, -0.040979013, 0.031382367, -0.019382592, 0.041386265, -0.06535482, -0.03808074, 0.013384267, 0.010357172, 0.0032444543, -0.052392986, 0.042238504, 0.020043798, -0.028322041, -0.055793695, -0.011091505, 0.020135079, -0.003494716, 0.01618655, 0.08450317, 0.040414557, 0.032989975, 0.011764182, -0.013049825, -0.029259514, -0.102057606, 0.016020596, 0.016062474, 0.010199196, -0.009390674, -0.043287795, 0.034758028, 0.13968067, 0.025622727, 0.016510569, -0.02354023, 0.073845506, 0.009602881, -0.049839057, 0.022470307, 0.043024465, 0.0017405926, -0.028580481, 0.0027170023, 0.010050958, -0.013109462, 0.014532717, -0.04200619, 0.01677191, -0.07769759, 0.0073121856, 0.0189732, 0.08225239, 0.052873313, 0.020460907, 0.017190987, -0.025781311, -0.057865854, -0.015826138, 0.04352462, 0.040577717, -0.045354914, 0.47870147, ...]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" __nn_distance \\\n",
"0 0.823007 \n",
"1 0.851789 \n",
"\n",
" text \\\n",
"0 b'In late 2015 I spent 3 months writing essays, and when I went back to working on Bel I could barely understand the code. Not so much because it was badly written as because the problem is so convoluted. When you\\'re working on an interpreter written in itself, it\\'s hard to keep track of what\\'s happening at what level, and errors can be practically encrypted by the time you get them.\\n\\nSo I said no more essays till Bel was done. But I told few people about Bel while I was working on it. So for years it must have seemed that I was doing nothing, when in fact I was working harder than I\\'d ever worked on anything. Occasionally after wrestling for hours with some gruesome bug I\\'d check Twitter or HN and see someone asking \"Does Paul Graham still code?\"\\n\\nWorking on Bel was hard but satisfying. I worked on it so intensively that at any given time I had a decent chunk of the code in my head and could write more there. I remember taking the boys to the coast on a sunny day in 2015 and figuring out how to deal with some problem involving continuations while I watched them play in the tide pools. It felt like I was doing life right. I remember that because I was slightly dismayed at how novel it felt. The good news is that I had more moments like this over the next few years.\\n\\nIn the summer of 2016 we moved to England. We wanted our kids to see what it was like living in another country, and since I was a British citizen by birth, that seemed the obvious choice. We only meant to stay for a year, but we liked it so much that we still live there. So most of Bel was written in England.\\n\\nIn the fall of 2019, Bel was finally finished. Like McCarthy\\'s original Lisp, it\\'s a spec rather than an implementation, although like McCarthy\\'s Lisp it\\'s a spec expressed as code.\\n\\nNow that I could write essays again, I wrote a bunch about topics I\\'d had stacked up. I kept writing essays through 2020, but I also started to think about other things I could work on. How should I choose what to do?' \n",
"1 b\"He wanted to start a startup to make nuclear reactors. But I kept at it, and in October 2013 he finally agreed. We decided he'd take over starting with the winter 2014 batch. For the rest of 2013 I left running YC more and more to Sam, partly so he could learn the job, and partly because I was focused on my mother, whose cancer had returned.\\n\\nShe died on January 15, 2014. We knew this was coming, but it was still hard when it did.\\n\\nI kept working on YC till March, to help get that batch of startups through Demo Day, then I checked out pretty completely. (I still talk to alumni and to new startups working on things I'm interested in, but that only takes a few hours a week.)\\n\\nWhat should I do next? Rtm's advice hadn't included anything about that. I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I could get if I really focused on it. So the day after I stopped working on YC, I started painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging. [18]\\n\\nI spent most of the rest of 2014 painting. I'd never been able to work so uninterruptedly before, and I got to be better than I had been. Not good enough, but better. Then in November, right in the middle of a painting, I ran out of steam. Up till that point I'd always been curious to see how the painting I was working on would turn out, but suddenly finishing this one seemed like a chore. So I stopped working on it and cleaned my brushes and haven't painted since. So far anyway.\\n\\nI realize that sounds rather wimpy. But attention is a zero sum game. If you can choose what to work on, and you choose a project that's not the best one (or at least a good one) for you, then it's getting in the way of another project that is. And at 50 there was some opportunity cost to screwing around.\" \n",
"\n",
" embedding \n",
"0 [-0.05267877, 0.005840427, -0.01187801, -0.028083289, 0.029767925, -0.01268333, -0.009753024, -0.011209541, 0.030792488, -0.07470311, 0.0005716741, 0.034681723, -0.0025648128, -0.007870674, -0.037071493, -0.0026503617, -0.030294443, -0.046712548, -0.026220752, -0.010382689, -0.047210008, 0.0039388337, -0.009324926, 0.04539282, 0.04298206, 0.051068194, 0.029527958, -0.012021941, -0.051774003, -0.20419116, -0.019487105, 0.03856181, 0.054865412, -0.024023462, 0.005628216, 0.059498444, -0.023029648, -0.011461271, 0.0007990732, 0.01532533, 0.013435846, 0.009714834, 0.010104686, -0.014338494, 0.004052569, 0.020879505, 0.0112869395, -0.048422333, 0.025670612, 0.033183247, -0.071020156, -0.032056253, -0.0013147242, 0.045764726, -0.023884403, 0.013609344, 0.021824384, 0.0791942, 0.0021155155, -0.0058458406, 0.022163069, -0.0010415328, -0.1377265, 0.05194325, -0.035091735, 0.020503322, -0.03358411, -0.039575316, -0.018544003, 0.07090187, -0.030203853, 0.0024145627, -0.050365325, 0.1062729, 0.04504893, 0.020158818, -0.0055481945, 0.0020900085, 0.014658697, -0.01600323, 0.018643875, -0.020128626, 0.001960821, 0.014573526, -0.018745624, -0.011082115, -0.026627902, 0.035287272, 0.033186108, 0.004842385, 0.04288919, -0.051519115, 0.021143924, 0.03511711, -0.032461487, -0.053802498, -2.9269107e-05, 0.022274038, -0.019326271, 0.5066904, ...] \n",
"1 [-0.04173409, -0.020306244, 0.026670614, -0.028619805, 0.013841975, -0.004587492, -0.03740281, -0.0023207841, -0.005583664, -0.02458708, 0.032301717, -0.003981511, -0.0022139344, 0.040776156, 0.008303966, 0.065411426, -0.05266241, -0.0147317415, -0.013039435, -0.02108635, -0.08220996, -0.023095597, 0.009018569, -0.06593445, 0.053503707, 0.02561, -0.011278506, -0.029375598, -0.02894449, -0.17977206, 0.015862752, 0.037204675, 0.028550476, -0.008014831, 0.050124772, 0.053289328, -0.037882008, -0.004310019, -0.040979013, 0.031382367, -0.019382592, 0.041386265, -0.06535482, -0.03808074, 0.013384267, 0.010357172, 0.0032444543, -0.052392986, 0.042238504, 0.020043798, -0.028322041, -0.055793695, -0.011091505, 0.020135079, -0.003494716, 0.01618655, 0.08450317, 0.040414557, 0.032989975, 0.011764182, -0.013049825, -0.029259514, -0.102057606, 0.016020596, 0.016062474, 0.010199196, -0.009390674, -0.043287795, 0.034758028, 0.13968067, 0.025622727, 0.016510569, -0.02354023, 0.073845506, 0.009602881, -0.049839057, 0.022470307, 0.043024465, 0.0017405926, -0.028580481, 0.0027170023, 0.010050958, -0.013109462, 0.014532717, -0.04200619, 0.01677191, -0.07769759, 0.0073121856, 0.0189732, 0.08225239, 0.052873313, 0.020460907, 0.017190987, -0.025781311, -0.057865854, -0.015826138, 0.04352462, 0.040577717, -0.045354914, 0.47870147, ...] "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.set_option('display.max_colwidth', None)\n",
"print(\"Top Search Results Based on Query:\", query)\n",
"df = pd.DataFrame(search_results_df)\n",
"df.head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WnIeeDVJ3g47"
},
"source": [
"##### RAG Function Definition"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fhGl5YDXaXak"
},
"outputs": [],
"source": [
"def RAG(retrieved_data,prompt):\n",
" messages = \"Answer the following query in three sentences based on the context and only the context: \" + \"\\n\"\n",
" messages += prompt + \"\\n\"\n",
" if len(retrieved_data) > 0:\n",
" messages += \"Context: \" + \"\\n\"\n",
" for data in retrieved_data:\n",
" messages += data.decode('utf-8') + \"\\n\"\n",
" openai.api_key = os.environ[\"OPENAI_API_KEY\"]\n",
" response = openai.chat.completions.create(\n",
" model=\"gpt-4o\",\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": [\n",
" {\"type\": \"text\", \"text\": messages},\n",
" ],\n",
" },\n",
" ],\n",
" max_tokens=300,\n",
" )\n",
" content = response.choices[0].message.content\n",
" return content"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y4yJL1kp3pOr"
},
"source": [
"##### Execute RAG Pipeline"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"id": "hqrg036n3yUs"
},
"outputs": [],
"source": [
"# Utility Function for Text Wrapping\n",
"\n",
"def print_wrapped(text, width=80):\n",
" wrapper = textwrap.TextWrapper(width=width)\n",
" word_list = wrapper.wrap(text=text)\n",
" for line in word_list:\n",
" print(line)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bI2xx6IygIN5",
"outputId": "9af82ab0-4637-4501-c398-f0042946ef4d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Query: How does Paul Graham decide what to work on?\n",
"Paul Graham decides what to work on based on a mix of personal interest, the\n",
"desire to explore untapped potential in various fields, and the influence of\n",
"pivotal moments and advice from close acquaintances. His transition from working\n",
"on Y Combinator, painting, and writing essays to developing the Bel programming\n",
"language and exploring startup ideas, such as the web app for creating web apps,\n",
"reflects a combination of seeking out deeply engaging projects and responding to\n",
"unsolicited advice from trusted collaborators that prompts reflection on his\n",
"trajectory. Graham's choices are driven by the pursuit of projects that not only\n",
"challenge him but also promise a significant impact or learning opportunity,\n",
"reflecting a deliberate process of selection influenced by both internal\n",
"motivations and external inputs.\n"
]
}
],
"source": [
"print(\"Query:\", query)\n",
"\n",
"print_wrapped(RAG(search_results_df[\"text\"],query))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1TorfCfi37m2"
},
"source": [
"### Drop Table To Conserve Resources"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"id": "tVmA5Dei36s3"
},
"outputs": [],
"source": [
"table.drop()"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: KDB.AI_course/notebook_references.md
================================================
# KDB.AI Course Notebook References
This document provides a comprehensive list of all notebooks used in the KDB.AI course, including both course-specific content and notebooks referenced from other parts of the repository. This ensures a single point of reference for all course materials.
## Course-Specific Content
These notebooks are located in the `KDB.AI_course/course_specific_content/` directory:
| Notebook Name | Description |
|---------------|-------------|
| [making_queries.ipynb](./course_specific_content/making_queries.ipynb) | Introduction to making queries in KDB.AI |
| [managing_tables.ipynb](./course_specific_content/managing_tables.ipynb) | Guide to managing tables in KDB.AI |
| [rag_example.ipynb](./course_specific_content/rag_example.ipynb) | Example of Retrieval Augmented Generation with KDB.AI |
## Referenced Notebooks
These notebooks are referenced from other parts of the repository:
| Course Section | Notebook Name | Location | Description |
|----------------|---------------|----------|-------------|
| Advanced Search Techniques | Hybrid Search | [/hybrid_search/hybrid_search_inflation.ipynb](../hybrid_search/hybrid_search_inflation.ipynb) | Demonstrates hybrid search techniques using inflation data |
| Advanced Search Techniques | Temporal Similarity Search (Non-Transformed) | [/TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb](../TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb) | Covers non-transformed temporal similarity search |
| Advanced Search Techniques | Temporal Similarity Search (Transformed) | [/TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb](../TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb) | Explores transformed temporal similarity search |
## Additional Resources
These notebooks, while not directly part of the course, provide valuable supplementary information:
| Notebook Name | Location | Description |
|---------------|----------|-------------|
| Python Quickstart | [/quickstarts/python_quickstart.ipynb](../quickstarts/python_quickstart.ipynb) | Quick introduction to using KDB.AI with Python |
| Document Search | [/document_search/document_search.ipynb](../document_search/document_search.ipynb) | Example of document search implementation |
| Image Search | [/image_search/image_search.ipynb](../image_search/image_search.ipynb) | Demonstration of image search capabilities |
## Maintenance Notes
- This file should be updated whenever notebooks are added, removed, or relocated within the course or the main repository.
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specifi
gitextract_79vuf6ne/
├── .gitignore
├── HuggingFace_search/
│ └── huggingface_inference.ipynb
├── KDB.AI_course/
│ ├── README.md
│ ├── course_specific_content/
│ │ ├── making_queries.ipynb
│ │ ├── managing_tables.ipynb
│ │ └── rag_example.ipynb
│ └── notebook_references.md
├── LICENSE
├── LlamaIndex_advanced_RAG/
│ └── KDBAI_Advanced_RAG_Demo.ipynb
├── LlamaIndex_samples/
│ ├── Hybrid_Search_LlamaIndex_KDBAI.ipynb
│ ├── Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb
│ └── Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb
├── LlamaParse_pdf_RAG/
│ └── llamaParse_demo.ipynb
├── README.md
├── TSS_non_transformed/
│ ├── Non_Transformed_TSS_Technical_Analysis.ipynb
│ ├── Temporal_Similarity_Search_KDB+.ipynb
│ ├── Temporal_Similarity_Search_Non-Transformed_Demo.ipynb
│ ├── createHDB.q
│ └── data/
│ └── marketTrades.parquet
├── TSS_transformed/
│ ├── Temporal_Similarity_Search_Transformed_Demo.ipynb
│ ├── Transformed_TSS_pattern_matching.ipynb
│ └── data/
│ └── marketTrades.parquet
├── document_search/
│ └── document_search.ipynb
├── fuzzy_filtering_on_metadata/
│ └── fuzzy_filtering_demo.ipynb
├── hybrid_search/
│ ├── data/
│ │ └── inflation.txt
│ └── hybrid_search_inflation.ipynb
├── image_search/
│ └── image_search.ipynb
├── metadata_filtering/
│ ├── data/
│ │ └── filtered_embedded_movies.pkl
│ └── metadata_filtering_demo.ipynb
├── multi_index_multimodal_search/
│ ├── data/
│ │ ├── bat1.txt
│ │ ├── bat2.txt
│ │ ├── bear1.txt
│ │ ├── bear2.txt
│ │ ├── caterpillar1.txt
│ │ ├── caterpillar2.txt
│ │ ├── deer1.txt
│ │ ├── deer2.txt
│ │ ├── fox1.txt
│ │ ├── fox2.txt
│ │ ├── hedgehog1.txt
│ │ └── hedgehog2.txt
│ └── multi_index_multimodal_search.ipynb
├── multimodal_RAG_VoyageAI/
│ ├── Multimodal_RAG_VoyageAI.ipynb
│ └── data/
│ └── text/
│ ├── bat.txt
│ ├── bear.txt
│ ├── caterpillar.txt
│ ├── deer.txt
│ ├── fox.txt
│ └── hedgehog.txt
├── multimodal_RAG_unified_text/
│ ├── data/
│ │ └── text/
│ │ ├── bat.txt
│ │ ├── bear.txt
│ │ ├── caterpillar.txt
│ │ ├── deer.txt
│ │ ├── fox.txt
│ │ └── hedgehog.txt
│ └── multi_modal_demo.ipynb
├── music_recommendation/
│ ├── data/
│ │ └── song_data.csv
│ └── music_recommendation.ipynb
├── pattern_matching/
│ └── pattern_matching.ipynb
├── qFlat_index_pdf_search/
│ └── pdf_qFlat_Search.ipynb
├── qHnsw_index_pdf_search/
│ └── pdf_qHNSW_Search.ipynb
├── quickstarts/
│ └── python_quickstart.ipynb
├── requirements.txt
├── retrieval_augmented_generation/
│ ├── data/
│ │ └── state_of_the_union.txt
│ ├── retrieval_augmented_generation.ipynb
│ └── retrieval_augmented_generation_evaluation.ipynb
├── sentiment_analysis/
│ ├── data/
│ │ └── disneyland_reviews.csv
│ └── sentiment_analysis.ipynb
├── unstructured_io_RAG/
│ └── Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb
└── video_RAG/
├── video_RAG_TwelveLabs.ipynb
└── video_RAG_VoyageAI.ipynb
Copy disabled (too large)
Download .json
Condensed preview — 71 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (13,784K chars).
[
{
"path": ".gitignore",
"chars": 37,
"preview": "*.ipynb_checkpoints\n.venv/\n.DS_Store\n"
},
{
"path": "HuggingFace_search/huggingface_inference.ipynb",
"chars": 42501,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n \"metadata\": {\n \"id\""
},
{
"path": "KDB.AI_course/README.md",
"chars": 1101,
"preview": "# KDB.AI Course\n\nWelcome to the KDB.AI course! This course combines custom content with existing examples from the KDB.A"
},
{
"path": "KDB.AI_course/course_specific_content/making_queries.ipynb",
"chars": 86210,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"CpIrSWxiuFxX\"\n },\n \"source\": [\n \"## Int"
},
{
"path": "KDB.AI_course/course_specific_content/managing_tables.ipynb",
"chars": 19792,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n \"metadata\": {\n \"id\""
},
{
"path": "KDB.AI_course/course_specific_content/rag_example.ipynb",
"chars": 35329,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"on0mJqL80KsJ\"\n },\n \"source\": [\n \"## Int"
},
{
"path": "KDB.AI_course/notebook_references.md",
"chars": 2596,
"preview": "# KDB.AI Course Notebook References\n\nThis document provides a comprehensive list of all notebooks used in the KDB.AI cou"
},
{
"path": "LICENSE",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "LlamaIndex_advanced_RAG/KDBAI_Advanced_RAG_Demo.ipynb",
"chars": 38336,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"cb212650-86b7-4298-8bbd-c20a5227fbf0\",\n \"metadata\": {\n \"id\""
},
{
"path": "LlamaIndex_samples/Hybrid_Search_LlamaIndex_KDBAI.ipynb",
"chars": 54375,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"36bda246-f167-4114-a5d1-a053d8bb6faa\",\n \"metadata\": {\n \"id\""
},
{
"path": "LlamaIndex_samples/Multimodal_RAG_LLamaIndex_CLIP_KDBAI.ipynb",
"chars": 1315754,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"TTMDGImH5JOM\",\n \"metadata\": {\n \"id\": \"TTMDGImH5JOM\"\n },\n "
},
{
"path": "LlamaIndex_samples/Sub_Question_Query_Engine_LlamaIndex_KDBAI.ipynb",
"chars": 32862,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"f134f72f-ba98-4eed-ac63-276c3057fa95\",\n \"metadata\": {\n \"id\""
},
{
"path": "LlamaParse_pdf_RAG/llamaParse_demo.ipynb",
"chars": 322948,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"NBD-4xlhnyxl\"\n },\n \"source\": [\n \"## Par"
},
{
"path": "README.md",
"chars": 6858,
"preview": "\n\nThe example [KDB.AI](https://kdb.ai) samples provided aim t"
},
{
"path": "TSS_non_transformed/Non_Transformed_TSS_Technical_Analysis.ipynb",
"chars": 550100,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"iSMlD8gdmdpz\"\n },\n \"source\": [\n \"# Non-"
},
{
"path": "TSS_non_transformed/Temporal_Similarity_Search_KDB+.ipynb",
"chars": 1069208,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"# Temporal Similarity Search on KDB"
},
{
"path": "TSS_non_transformed/Temporal_Similarity_Search_Non-Transformed_Demo.ipynb",
"chars": 60013,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"58ae5b26-0b58-4416-a69d-f5662fce320d\",\n \"metadata\": {},\n \"so"
},
{
"path": "TSS_non_transformed/createHDB.q",
"chars": 3723,
"preview": "dst:`:demo_hdb / database root\n\nnumpart:10;\ned:2024.08.31; / end date the last date partition\ndts:{[ed;n] reverse n#d"
},
{
"path": "TSS_transformed/Temporal_Similarity_Search_Transformed_Demo.ipynb",
"chars": 123849,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"58ae5b26-0b58-4416-a69d-f5662fce320d\",\n \"metadata\": {\n \"id\""
},
{
"path": "TSS_transformed/Transformed_TSS_pattern_matching.ipynb",
"chars": 31352,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"b52fbdd6-10c5-4f52-b015-ce0f812c7d94\",\n \"metadata\": {\n \"id\""
},
{
"path": "document_search/document_search.ipynb",
"chars": 103286,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n \"metadata\": {\n \"id\""
},
{
"path": "fuzzy_filtering_on_metadata/fuzzy_filtering_demo.ipynb",
"chars": 39510,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"WgVm-xwOXyhY\"\n },\n \"source\": [\n \"# Fuzz"
},
{
"path": "hybrid_search/data/inflation.txt",
"chars": 13058,
"preview": " At last year's Jackson Hole symposium, I delivered a brief, direct message. My remarks this year will be a bit longer, "
},
{
"path": "hybrid_search/hybrid_search_inflation.ipynb",
"chars": 142732,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"yxwE82TZNkco\"\n },\n \"source\": [\n \"## Hyb"
},
{
"path": "image_search/image_search.ipynb",
"chars": 1386564,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"25588ca9-dc13-4136-962b-42e5d090fb31\",\n \"metadata\": {\n \"id\""
},
{
"path": "metadata_filtering/metadata_filtering_demo.ipynb",
"chars": 30130,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"# Metadata Filtering with KDB.AI Ve"
},
{
"path": "multi_index_multimodal_search/data/bat1.txt",
"chars": 66,
"preview": "Bat with outstretched wings hovering over an apple on leafy branch"
},
{
"path": "multi_index_multimodal_search/data/bat2.txt",
"chars": 62,
"preview": "a bat hanging upside down from a metal bar in an urban setting"
},
{
"path": "multi_index_multimodal_search/data/bear1.txt",
"chars": 54,
"preview": "Brown bear walking through a forest with fallen leaves"
},
{
"path": "multi_index_multimodal_search/data/bear2.txt",
"chars": 57,
"preview": "Close-up portrait of a grizzly bear's face and upper body"
},
{
"path": "multi_index_multimodal_search/data/caterpillar1.txt",
"chars": 44,
"preview": "Bright green caterpillar on a wooden surface"
},
{
"path": "multi_index_multimodal_search/data/caterpillar2.txt",
"chars": 55,
"preview": "Hairy caterpillar with spikes and spots on a green leaf"
},
{
"path": "multi_index_multimodal_search/data/deer1.txt",
"chars": 50,
"preview": "Buck deer with antlers standing in misty grassland"
},
{
"path": "multi_index_multimodal_search/data/deer2.txt",
"chars": 59,
"preview": "Side view of buck deer with large antlers in forest setting"
},
{
"path": "multi_index_multimodal_search/data/fox1.txt",
"chars": 45,
"preview": "Fluffy fox curled up in snow with eyes closed"
},
{
"path": "multi_index_multimodal_search/data/fox2.txt",
"chars": 54,
"preview": "Red fox with alert expression against rocky background"
},
{
"path": "multi_index_multimodal_search/data/hedgehog1.txt",
"chars": 71,
"preview": "Small hedgehog curled up in someone's palm, surrounded by autumn colors"
},
{
"path": "multi_index_multimodal_search/data/hedgehog2.txt",
"chars": 61,
"preview": "Hedgehog in a field of red flowers, some petals on its spines"
},
{
"path": "multi_index_multimodal_search/multi_index_multimodal_search.ipynb",
"chars": 19128,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"D0wSB_3mKl5D\"\n },\n \"source\": [\n \"## Mul"
},
{
"path": "multimodal_RAG_VoyageAI/Multimodal_RAG_VoyageAI.ipynb",
"chars": 766939,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"J5EZ09A9XKSj\"\n },\n \"source\": [\n \"## Mul"
},
{
"path": "multimodal_RAG_VoyageAI/data/text/bat.txt",
"chars": 338,
"preview": "Bats are the only mammals capable of sustained flight, distinct from flying squirrels which can only glide. They have wi"
},
{
"path": "multimodal_RAG_VoyageAI/data/text/bear.txt",
"chars": 366,
"preview": "Bears are large mammals with a stocky body, powerful limbs, and a short tail. They have a large brain and are considered"
},
{
"path": "multimodal_RAG_VoyageAI/data/text/caterpillar.txt",
"chars": 328,
"preview": "Caterpillars are the larval stage of butterflies and moths. They have a segmented body with a distinct head and typicall"
},
{
"path": "multimodal_RAG_VoyageAI/data/text/deer.txt",
"chars": 314,
"preview": "Deer are hoofed mammals known for their graceful bodies and long legs. Most male deer have antlers, which they shed and "
},
{
"path": "multimodal_RAG_VoyageAI/data/text/fox.txt",
"chars": 342,
"preview": "Foxes are small to medium-sized, omnivorous mammals belonging to the Canidae family, which also includes wolves, dogs, a"
},
{
"path": "multimodal_RAG_VoyageAI/data/text/hedgehog.txt",
"chars": 396,
"preview": "Hedgehogs are small, nocturnal mammals known for their distinctive spines, which are modified hairs. These spines provid"
},
{
"path": "multimodal_RAG_unified_text/data/text/bat.txt",
"chars": 338,
"preview": "Bats are the only mammals capable of sustained flight, distinct from flying squirrels which can only glide. They have wi"
},
{
"path": "multimodal_RAG_unified_text/data/text/bear.txt",
"chars": 366,
"preview": "Bears are large mammals with a stocky body, powerful limbs, and a short tail. They have a large brain and are considered"
},
{
"path": "multimodal_RAG_unified_text/data/text/caterpillar.txt",
"chars": 328,
"preview": "Caterpillars are the larval stage of butterflies and moths. They have a segmented body with a distinct head and typicall"
},
{
"path": "multimodal_RAG_unified_text/data/text/deer.txt",
"chars": 314,
"preview": "Deer are hoofed mammals known for their graceful bodies and long legs. Most male deer have antlers, which they shed and "
},
{
"path": "multimodal_RAG_unified_text/data/text/fox.txt",
"chars": 342,
"preview": "Foxes are small to medium-sized, omnivorous mammals belonging to the Canidae family, which also includes wolves, dogs, a"
},
{
"path": "multimodal_RAG_unified_text/data/text/hedgehog.txt",
"chars": 396,
"preview": "Hedgehogs are small, nocturnal mammals known for their distinctive spines, which are modified hairs. These spines provid"
},
{
"path": "multimodal_RAG_unified_text/multi_modal_demo.ipynb",
"chars": 248857,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \"# Multimodal Retrieval Augmented Ge"
},
{
"path": "music_recommendation/music_recommendation.ipynb",
"chars": 155150,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"1c9731fb\",\n \"metadata\": {},\n \"source\": [\n \"# Music Recomm"
},
{
"path": "pattern_matching/pattern_matching.ipynb",
"chars": 537894,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"b52fbdd6-10c5-4f52-b015-ce0f812c7d94\",\n \"metadata\": {},\n \"so"
},
{
"path": "qFlat_index_pdf_search/pdf_qFlat_Search.ipynb",
"chars": 35395,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n \"metadata\": {\n \"id\""
},
{
"path": "qHnsw_index_pdf_search/pdf_qHNSW_Search.ipynb",
"chars": 85831,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"3280b01a-d3b7-4ef6-9494-789d15bc48ec\",\n \"metadata\": {\n \"id\""
},
{
"path": "quickstarts/python_quickstart.ipynb",
"chars": 66408,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"bb2094b8-13a5-4f7c-bd21-d2c709dab914\",\n \"metadata\": {\n \"id\""
},
{
"path": "requirements.txt",
"chars": 206,
"preview": "gensim >= 4.3\njupyter >= 1.0\nkdbai_client >= 0.1.2\nmatplotlib >= 3.7\nopenai >= 0.28\npypdf >= 3.0\nsentence-transformers >"
},
{
"path": "retrieval_augmented_generation/data/state_of_the_union.txt",
"chars": 38539,
"preview": "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices "
},
{
"path": "retrieval_augmented_generation/retrieval_augmented_generation.ipynb",
"chars": 34057,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"48eeba82\",\n \"metadata\": {},\n \"source\": [\n \"# Retrieval Au"
},
{
"path": "retrieval_augmented_generation/retrieval_augmented_generation_evaluation.ipynb",
"chars": 31724,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"48eeba82\",\n \"metadata\": {},\n \"source\": [\n \"# Retrieval Au"
},
{
"path": "sentiment_analysis/sentiment_analysis.ipynb",
"chars": 215305,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"48b6c907\",\n \"metadata\": {\n \"id\": \"48b6c907\"\n },\n \"sourc"
},
{
"path": "unstructured_io_RAG/Table_RAG_Unstructured_KDBAI_LangChain_RAG.ipynb",
"chars": 115937,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"Rt-9l5M4gmSD\"\n },\n \"source\": [\n \"# RAG "
},
{
"path": "video_RAG/video_RAG_TwelveLabs.ipynb",
"chars": 1107798,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"fa68ef64\",\n \"metadata\": {\n \"id\": \"fa68ef64\""
},
{
"path": "video_RAG/video_RAG_VoyageAI.ipynb",
"chars": 4590815,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"fa68ef64\",\n \"metadata\": {\n \"id\": \"fa68ef64\""
}
]
// ... and 5 more files (download for full content)
About this extraction
This page contains the full source code of the KxSystems/kdbai-samples GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 71 files (143.4 MB), approximately 3.4M tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.