[
  {
    "path": ".changeset/README.md",
    "content": "# Changesets\n\nHello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works\nwith multi-package repos, or single-package repos to help you version and publish your code. You can\nfind the full documentation for it [in our repository](https://github.com/changesets/changesets)\n\nWe have a quick list of common questions to get you started engaging with this project in\n[our documentation](https://github.com/changesets/changesets/blob/main/docs/common-questions.md)\n"
  },
  {
    "path": ".changeset/config.json",
    "content": "{\n  \"$schema\": \"https://unpkg.com/@changesets/config@2.3.0/schema.json\",\n  \"changelog\": \"@changesets/cli/changelog\",\n  \"commit\": false,\n  \"fixed\": [],\n  \"linked\": [],\n  \"access\": \"public\",\n  \"baseBranch\": \"main\",\n  \"updateInternalDependencies\": \"patch\",\n  \"ignore\": []\n}\n"
  },
  {
    "path": ".changeset/forty-papayas-push.md",
    "content": "---\n\"client-vector-search\": patch\n---\n\nupdates the docs and dynamic import for @xenova/transformers\n"
  },
  {
    "path": ".changeset/nine-readers-trade.md",
    "content": "---\n'client-vector-search': minor\n---\n\nsupport for experimental hnsw that runs on node and browser with json and binary serialization opitons\n"
  },
  {
    "path": ".changeset/six-coins-care.md",
    "content": "---\n\"client-vector-search\": patch\n---\n\ncreates a proper embedding index\n"
  },
  {
    "path": ".changeset/tall-lies-hope.md",
    "content": "---\n\"client-vector-search\": patch\n---\n\nadds in-memory index creation and brute force knn search\n"
  },
  {
    "path": ".gitignore",
    "content": "node_modules\ndist\ntest.js\nmock/\n.DS_Store\n.pytest_cache/\ntest*/\n"
  },
  {
    "path": ".npmignore",
    "content": "node_modules\ndist\ntest.js\nmock/\n.DS_Store\n.pytest_cache/\ntest*/\n\n.github/\n.changeset/\n"
  },
  {
    "path": ".prettierrc",
    "content": "{\n  \"semi\": true,\n  \"trailingComma\": \"all\",\n  \"singleQuote\": true,\n  \"printWidth\": 80,\n  \"tabWidth\": 2\n}\n"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# client-vector-search\n\n## 0.2.0\n\n### Minor Changes\n\n- support for experimental hnsw that runs on node and browser with json and binary serialization opitons\n\n### Patch Changes\n\n- f09bc2f: updates the docs and dynamic import for @xenova/transformers\n- 46e07d6: creates a proper embedding index\n- 13bddbb: adds in-memory index creation and brute force knn search\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Yusuf Hilmi\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# client-vector-search\n\nA client side vector search library that can embed, search, and cache. Works on the browser and server side.\n\nIt outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.\n\nI'm the founder of [searchbase.app](https://searchbase.app) and we needed this for our product and customers. We'll be using this library in production. You can be sure it'll be maintained and improved.\n\n- Embed documents using transformers by default: gte-small (~30mb).\n- Calculate cosine similarity between embeddings.\n- Create an index and search on the client side\n- Cache vectors with browser caching support.\n\nLots of improvements are coming!\n\n## Roadmap\n\nOur goal is to build a super simple, fast vector search that works with couple hundred to thousands vectors. ~1k vectors per user covers 99% of the use cases.\n\nWe'll initially keep things super simple and sub 100ms\n\n### TODOs\n- [ ] add HNSW index that works on node and browser env, don't rely on hnsw binder libs\n- [ ] add a proper testing suite and ci/cd for the lib\n  - [ ] simple health tests\n    - [ ] mock the @xenova/transformers for jest, it's not happy with it\n  - [ ] performance tests, recall, memory usage, cpu usage etc.\n\n\n## Installation\n\n```bash\nnpm i client-vector-search\n```\n\n\n## Quickstart\n\nThis library provides a plug-and-play solution for embedding and vector search. It's designed to be easy to use, efficient, and versatile. Here's a quick start guide:\n\n\n```ts\n  import { getEmbedding, EmbeddingIndex } from 'client-vector-search';\n\n  // getEmbedding is an async function, so you need to use 'await' or '.then()' to get the result\n  const embedding = await getEmbedding(\"Apple\"); // Returns embedding as number[]\n\n  // Each object should have an 'embedding' property of type number[]\n  const initialObjects = [\n  { id: 1, name: \"Apple\", embedding: embedding },\n  { id: 2, name: \"Banana\", embedding: await getEmbedding(\"Banana\") },\n  { id: 3, name: \"Cheddar\", embedding: await getEmbedding(\"Cheddar\")},\n  { id: 4, name: \"Space\", embedding: await getEmbedding(\"Space\")},\n  { id: 5, name: \"database\", embedding: await getEmbedding(\"database\")},\n  ];\n  const index = new EmbeddingIndex(initialObjects); // Creates an index\n\n  // The query should be an embedding of type number[]\n  const queryEmbedding = await getEmbedding('Fruit'); // Query embedding\n  const results = await index.search(queryEmbedding, { topK: 5 }); // Returns top similar objects\n\n  // specify the storage type\n  await index.saveIndex('indexedDB');\n  const results = await index.search([1, 2, 3], {\n    topK: 5,\n    useStorage: 'indexedDB',\n    // storageOptions: { // use only if you overrode the defaults\n    //   indexedDBName: 'clientVectorDB',\n    //   indexedDBObjectStoreName: 'ClientEmbeddingStore',\n    // },\n  });\n\n  console.log(results);\n\n  await index.deleteIndexedDB(); // if you overrode default, specify db name\n```\n\n## Trouble-shooting\n\n### NextJS\nTo use it inside NextJS projects you'll need to update the `next.config.js` file to include the following:\n\n```js\nmodule.exports = {\n  // Override the default webpack configuration\n  webpack: (config) => {\n    // See https://webpack.js.org/configuration/resolve/#resolvealias\n    config.resolve.alias = {\n      ...config.resolve.alias,\n      sharp$: false,\n      \"onnxruntime-node$\": false,\n    };\n    return config;\n  },\n};\n```\n\n#### Model load after page is loaded\n\nYou can initialize the model before using it to generate embeddings. This will ensure that the model is loaded before you use it and provide a better UX.\n\n```js\nimport { initializeModel } from \"client-vector-search\"\n...\n  useEffect(() => {\n    try {\n      initializeModel();\n    } catch (e) {\n      console.log(e);\n    }\n  }, []);\n```\n\n## Usage Guide\n\nThis guide provides a step-by-step walkthrough of the library's main features. It covers everything from generating embeddings for a string to performing operations on the index such as adding, updating, and removing objects. It also includes instructions on how to save the index to a database and perform search operations within it.\n\nUntil we have a reference documentation, you can find all the methods and their usage in this guide. Each step is accompanied by a code snippet to illustrate the usage of the method in question. Make sure to follow along and try out the examples in your own environment to get a better understanding of how everything works.\n\nLet's get started!\n\n### Step 1: Generate Embeddings for String\nGenerate embeddings for a given string using the `getEmbedding` method.\n\n```ts\nconst embedding = await getEmbedding(\"Apple\"); // Returns embedding as number[]\n```\n> **Note**: `getEmbedding` is asynchronous; make sure to use `await`.\n\n---\n\n### Step 2: Calculate Cosine Similarity\nCalculate the cosine similarity between two embeddings.\n\n```ts\nconst similarity = cosineSimilarity(embedding1, embedding2, 6);\n```\n> **Note**: Both embeddings should be of the same length.\n\n---\n\n### Step 3: Create an Index\nCreate an index with an initial array of objects. Each object must have an 'embedding' property.\n\n```ts\nconst initialObjects = [...];\nconst index = new EmbeddingIndex(initialObjects);\n```\n\n---\n\n### Step 4: Add to Index\nAdd an object to the index.\n\n```ts\nconst objectToAdd = { id: 6, name: 'Cat', embedding: await getEmbedding('Cat') };\nindex.add(objectToAdd);\n```\n\n---\n\n### Step 5: Update Index\nUpdate an existing object in the index.\n\n```ts\nconst vectorToUpdate = { id: 6, name: 'Dog', embedding: await getEmbedding('Dog') };\nindex.update({ id: 6 }, vectorToUpdate);\n```\n\n---\n\n### Step 6: Remove from Index\nRemove an object from the index.\n\n```ts\nindex.remove({ id: 6 });\n```\n\n---\n\n### Step 7: Retrieve from Index\nRetrieve an object from the index.\n\n```ts\nconst vector = index.get({ id: 1 });\n```\n\n---\n\n### Step 8: Search the Index\nSearch the index with a query embedding.\n\n```ts\nconst queryEmbedding = await getEmbedding('Fruit');\nconst results = await index.search(queryEmbedding, { topK: 5 });\n```\n\n---\n\n### Step 9: Print the Index\nPrint the entire index to the console.\n\n```ts\nindex.printIndex();\n```\n\n---\n\n### Step 10: Save Index to IndexedDB (for browser)\nSave the index to a persistent IndexedDB database. Note\n\n```ts\nawait index.saveIndex(\"indexedDB\", { DBName: \"clientVectorDB\", objectStoreName:\"ClientEmbeddingStore\"})\n```\n\n---\n\n### Important: Search in indexedDB\nPerform a search operation in the IndexedDB.\n\n```ts\nconst results = await index.search(queryEmbedding, {\n  topK: 5,\n  useStorage: \"indexedDB\",\n  storageOptions: { // only if you want to override the default options, defaults are below\n    indexedDBName: 'clientVectorDB',\n    indexedDBObjectStoreName: 'ClientEmbeddingStore'\n  }\n});\n\n---\n\n### Delete Database\nTo delete an entire database.\n\n```ts\nawait IndexedDbManager.deleteIndexedDB(\"clientVectorDB\");\n```\n\n---\n\n### Delete Object Store\nTo delete an object store from a database.\n\n```ts\nawait IndexedDbManager.deleteIndexedDBObjectStore(\"clientVectorDB\", \"ClientEmbeddingStore\");\n```\n\n---\n\n### Retrieve All Objects\nTo retrieve all objects from a specific object store.\n\n```ts\nconst allObjects = await IndexedDbManager.getAllObjectsFromIndexedDB(\"clientVectorDB\", \"ClientEmbeddingStore\");\n```\n"
  },
  {
    "path": "package.json",
    "content": "{\n  \"name\": \"client-vector-search\",\n  \"version\": \"0.2.0\",\n  \"description\": \"A client side vector search library\",\n  \"main\": \"dist/index.js\",\n  \"module\": \"dist/index.mjs\",\n  \"types\": \"dist/index.d.ts\",\n  \"scripts\": {\n    \"build\": \"tsup src/index.ts --format cjs,esm --dts\",\n    \"dev\": \"tsup src/index.ts --format cjs,esm --dts --watch\",\n    \"changeset\": \"changeset\",\n    \"version\": \"changeset version\",\n    \"release\": \"npm run build && changeset publish\",\n    \"lint\": \"tsc\"\n  },\n  \"repository\": {\n    \"type\": \"git\",\n    \"url\": \"git+https://github.com/yusufhilmi/client-vector-search.git\"\n  },\n  \"keywords\": [\n    \"vector\",\n    \"search\",\n    \"embeddings\",\n    \"nlp\",\n    \"models\"\n  ],\n  \"author\": \"yusufhilmi\",\n  \"license\": \"MIT\",\n  \"bugs\": {\n    \"url\": \"https://github.com/yusufhilmi/client-vector-search/issues\"\n  },\n  \"homepage\": \"https://github.com/yusufhilmi/client-vector-search#readme\",\n  \"devDependencies\": {\n    \"@changesets/cli\": \"^2.26.2\",\n    \"fake-indexeddb\": \"^4.0.2\",\n    \"tsup\": \"^6.5.0\",\n    \"typescript\": \"^4.9.5\"\n  },\n  \"dependencies\": {\n    \"@msgpack/msgpack\": \"^3.0.0-beta2\",\n    \"@xenova/transformers\": \"^2.5.2\",\n    \"lru-cache\": \"^10.0.1\"\n  }\n}\n"
  },
  {
    "path": "src/cache.ts",
    "content": "import { LRUCache } from 'lru-cache';\n\nclass Cache {\n  private static instance: LRUCache<string, any[]>;\n\n  private constructor() {}\n\n  public static getInstance(\n    max: number = 10000,\n    maxAge: number = 1000 * 60 * 10,\n  ): LRUCache<string, any[]> {\n    if (!Cache.instance) {\n      const options = {\n        max: max,\n        length: () => 1,\n        maxAge: maxAge,\n      };\n      Cache.instance = new LRUCache<string, any[]>(options);\n    }\n    return Cache.instance;\n  }\n}\n\nexport default Cache;\n"
  },
  {
    "path": "src/hnsw.ts",
    "content": "// an experimental implementation of hnsw that doesn't rely on the hnsw binding libs which only works in browser or node\n// TODOS:\n// - bare bones\n// - find # layers and optimal params\n// - test the speed, accuracy, and memory usage\nimport { encode, decode } from '@msgpack/msgpack';\n\ntype Vector = number[];\ntype Distance = number;\ntype NodeIndex = number;\ntype Layer = LayerNode[];\n\ninterface LayerNode {\n  vector: Vector;\n  connections: NodeIndex[];\n  layerBelow: NodeIndex | null;\n}\n\ninterface HNSWData {\n  L: number;\n  mL: number;\n  efc: number;\n  index: Layer[];\n}\n\n// Simple Priority Queue Implementation\nclass PriorityQueue<T> {\n  private elements: T[];\n  private compareFn: (a: T, b: T) => number;\n\n  constructor(elements: T[], compareFn: (a: T, b: T) => number) {\n    this.elements = elements;\n    this.compareFn = compareFn;\n    this.elements.sort(this.compareFn);\n  }\n\n  push(element: T) {\n    this.elements.push(element);\n    this.elements.sort(this.compareFn);\n  }\n\n  pop(): T | null {\n    return this.elements.shift() || null;\n  }\n\n  isEmpty(): boolean {\n    return this.elements.length === 0;\n  }\n}\n\nconst EuclideanDistance = (a: Vector, b: Vector): Distance => {\n  if (a.length !== b.length) {\n    throw new Error('Vectors must have the same length');\n  }\n\n  return Math.sqrt(\n    a.reduce((acc, val, i) => {\n      const bVal = b[i]; // Check b[i] in a variable\n      if (bVal === undefined) throw new Error('b[i] is undefined');\n      return acc + Math.pow(val - bVal, 2);\n    }, 0),\n  );\n};\n\nconst getInsertLayer = (L: number, mL: number): number => {\n  return Math.min(-Math.floor(Math.log(Math.random()) * mL), L - 1);\n};\nconst _searchLayer = (\n  graph: Layer,\n  entry: NodeIndex,\n  query: Vector,\n  ef: number,\n): [Distance, NodeIndex][] => {\n  if (entry < 0 || entry >= graph.length) {\n    throw new Error(`Invalid entry index: ${entry}`);\n  }\n\n  // Check if the graph at the entry index is defined\n  const graphEntry = graph[entry];\n  if (!graphEntry) {\n    throw new Error(`Graph entry at index ${entry} is undefined`);\n  }\n\n  const best: [Distance, NodeIndex] = [\n    EuclideanDistance(graphEntry.vector, query),\n    entry,\n  ];\n  const nns: [Distance, NodeIndex][] = [best];\n  const visited = new Set([best[1]]);\n  const candidates = new PriorityQueue<[Distance, NodeIndex]>(\n    [best],\n    (a, b) => a[0] - b[0],\n  );\n\n  while (!candidates.isEmpty()) {\n    const current = candidates.pop();\n    // Define a variable to hold the last element of nns array\n    const lastNnsElement = nns.length > 0 ? nns[nns.length - 1] : null;\n    // Check if current is not null and lastNnsElement is not undefined before comparing their values\n    if (!current || (lastNnsElement && lastNnsElement[0] < current[0])) break;\n\n    const graphCurrent = graph[current[1]];\n    if (!graphCurrent) continue;\n\n    for (const e of graphCurrent.connections) {\n      const graphE = graph[e];\n      if (!graphE) continue;\n\n      const dist = EuclideanDistance(graphE.vector, query);\n      if (!visited.has(e)) {\n        visited.add(e);\n        const lastNn = nns[nns.length - 1];\n        if (!lastNn || dist < lastNn[0] || nns.length < ef) {\n          candidates.push([dist, e]);\n          nns.push([dist, e]);\n          nns.sort((a, b) => a[0] - b[0]);\n          if (nns.length > ef) {\n            nns.pop();\n          }\n        }\n      }\n    }\n  }\n\n  return nns;\n};\nexport class ExperimentalHNSWIndex {\n  private L: number;\n  private mL: number;\n  private efc: number;\n  private index: Layer[];\n\n  constructor(L = 5, mL = 0.62, efc = 10) {\n    this.L = L;\n    this.mL = mL;\n    this.efc = efc;\n    this.index = Array.from({ length: L }, () => []);\n  }\n  setIndex(index: Layer[]): void {\n    this.index = index;\n  }\n\n  insert(vec: Vector) {\n    const l = getInsertLayer(this.L, this.mL);\n    let startV = 0;\n\n    for (let n = 0; n < this.L; n++) {\n      const graph = this.index[n];\n\n      if (graph?.length === 0) {\n        // If the graph layer is empty, add a new node to it\n        // Assign next layer to a variable and check if it's undefined\n        const nextLayer = this.index[n + 1];\n        const nextLayerLength = nextLayer ? nextLayer.length : null;\n        graph?.push({\n          vector: vec,\n          connections: [],\n          layerBelow: n < this.L - 1 ? nextLayerLength : null,\n        });\n        continue;\n      }\n\n      if (n < l && graph) {\n        // Check if the search layer result is not undefined before accessing its properties\n        const searchLayerResult = _searchLayer(graph, startV, vec, 1);\n        startV =\n          searchLayerResult && searchLayerResult[0]\n            ? searchLayerResult[0][1]\n            : startV;\n      } else if (graph) {\n        // Assign next layer to a variable and check if it's undefined\n        const nextLayer = this.index[n + 1];\n        const nextLayerLength = nextLayer ? nextLayer.length : null;\n        const node: LayerNode = {\n          vector: vec,\n          connections: [],\n          layerBelow: n < this.L - 1 ? nextLayerLength : null,\n        };\n        const nns = _searchLayer(graph, startV, vec, this.efc);\n        for (const nn of nns) {\n          node.connections.push(nn[1]);\n          graph[nn[1]]?.connections.push(graph.length);\n        }\n        graph?.push(node);\n        // Assign graph[startV] to a variable and check if it's undefined before accessing its properties\n        const graphStartV = graph[startV];\n        if (graphStartV) startV = graphStartV.layerBelow!;\n      }\n    }\n  }\n\n  search(query: Vector, ef = 1): [Distance, NodeIndex][] {\n    if (this.index && this.index[0] && this.index[0].length === 0) {\n      return [];\n    }\n\n    let bestV = 0;\n    for (const graph of this.index) {\n      const searchLayer = _searchLayer(graph, bestV, query, ef);\n      if (searchLayer && searchLayer[0]) {\n        bestV = searchLayer[0][1];\n        if (graph[bestV]?.layerBelow === null) {\n          return _searchLayer(graph, bestV, query, ef);\n        }\n        bestV = graph[bestV]?.layerBelow!;\n      }\n    }\n    return [];\n  }\n\n  toJSON() {\n    return {\n      L: this.L,\n      mL: this.mL,\n      efc: this.efc,\n      index: this.index,\n    };\n  }\n\n  static fromJSON(json: any): ExperimentalHNSWIndex {\n    const hnsw = new ExperimentalHNSWIndex(json.L, json.mL, json.efc);\n    return hnsw;\n  }\n\n  toBinary() {\n    return encode({\n      L: this.L,\n      mL: this.mL,\n      efc: this.efc,\n      index: this.index,\n    });\n  }\n\n  static fromBinary(binary: Uint8Array): ExperimentalHNSWIndex {\n    const data = decode(binary) as HNSWData;\n    const hnsw = new ExperimentalHNSWIndex(data.L, data.mL, data.efc);\n    hnsw.setIndex(data.index);\n    return hnsw;\n  }\n}\n"
  },
  {
    "path": "src/index.ts",
    "content": "const DEFAULT_TOP_K = 3;\n\ninterface Filter {\n  [key: string]: any;\n}\n\nimport Cache from './cache';\nimport { IndexedDbManager } from './indexedDB';\nimport { cosineSimilarity } from './utils';\nexport { ExperimentalHNSWIndex } from './hnsw';\n\n// uncomment if you want to test indexedDB implementation in node env for faster dev cycle\n// import { IDBFactory } from 'fake-indexeddb';\n// const indexedDB = new IDBFactory();\n\nexport interface SearchResult {\n  similarity: number;\n  object: any;\n}\n\ntype StorageOptions = 'indexedDB' | 'localStorage' | 'none';\n\n/**\n * Interface for search options in the EmbeddingIndex class.\n * topK: The number of top similar items to return.\n * filter: An optional filter to apply to the objects before searching.\n * useStorage: A flag to indicate whether to use storage options like indexedDB or localStorage.\n */\ninterface SearchOptions {\n  topK?: number;\n  filter?: Filter;\n  useStorage?: StorageOptions;\n  storageOptions?: { indexedDBName: string; indexedDBObjectStoreName: string }; // TODO: generalize it to localStorage as well\n}\n\nconst cacheInstance = Cache.getInstance();\n\nlet pipe: any;\nlet currentModel: string;\n\nexport const initializeModel = async (\n  model: string = 'Xenova/gte-small',\n): Promise<void> => {\n  if (model !== currentModel) {\n    const transformersModule = await import('@xenova/transformers');\n    const pipeline = transformersModule.pipeline;\n    pipe = await pipeline('feature-extraction', model);\n    currentModel = model;\n  }\n};\n\nexport const getEmbedding = async (\n  text: string,\n  precision: number = 7,\n  options = { pooling: 'mean', normalize: false },\n  model = 'Xenova/gte-small',\n): Promise<number[]> => {\n  const cachedEmbedding = cacheInstance.get(text);\n  if (cachedEmbedding) {\n    return Promise.resolve(cachedEmbedding);\n  }\n\n  if (model !== currentModel) {\n    await initializeModel(model);\n  }\n\n  const output = await pipe(text, options);\n  const roundedOutput = Array.from(output.data as number[]).map(\n    (value: number) => parseFloat(value.toFixed(precision)),\n  );\n  cacheInstance.set(text, roundedOutput);\n  return Array.from(roundedOutput);\n};\n\nexport class EmbeddingIndex {\n  private objects: Filter[];\n  private keys: string[];\n\n  constructor(initialObjects?: Filter[]) {\n    // TODO: add support for options while creating index such as  {... indexedDB: true, ...}\n    this.objects = [];\n    this.keys = [];\n    if (initialObjects && initialObjects.length > 0) {\n      initialObjects.forEach((obj) => this.validateAndAdd(obj));\n      if (initialObjects[0]) {\n        this.keys = Object.keys(initialObjects[0]);\n      }\n    }\n  }\n\n  private findVectorIndex(filter: Filter): number {\n    return this.objects.findIndex((object) =>\n      Object.keys(filter).every((key) => object[key] === filter[key]),\n    );\n  }\n\n  private validateAndAdd(obj: Filter) {\n    if (!Array.isArray(obj.embedding) || obj.embedding.some(isNaN)) {\n      throw new Error(\n        'Object must have an embedding property of type number[]',\n      );\n    }\n    if (this.keys.length === 0) {\n      this.keys = Object.keys(obj);\n    } else if (!this.keys.every((key) => key in obj)) {\n      throw new Error(\n        'Object must have the same properties as the initial objects',\n      );\n    }\n    this.objects.push(obj);\n  }\n\n  add(obj: Filter) {\n    this.validateAndAdd(obj);\n  }\n\n  // Method to update an existing vector in the index\n  update(filter: Filter, vector: Filter) {\n    const index = this.findVectorIndex(filter);\n    if (index === -1) {\n      throw new Error('Vector not found');\n    }\n    if (vector.hasOwnProperty('embedding')) {\n      // Validate and add the new vector\n      this.validateAndAdd(vector);\n    }\n    // Replace the old vector with the new one\n    this.objects[index] = Object.assign(this.objects[index] as Filter, vector);\n  }\n\n  // Method to remove a vector from the index\n  remove(filter: Filter) {\n    const index = this.findVectorIndex(filter);\n    if (index === -1) {\n      throw new Error('Vector not found');\n    }\n    // Remove the vector from the index\n    this.objects.splice(index, 1);\n  }\n\n  // Method to remove multiple vectors from the index\n  removeBatch(filters: Filter[]) {\n    filters.forEach((filter) => {\n      const index = this.findVectorIndex(filter);\n      if (index !== -1) {\n        // Remove the vector from the index\n        this.objects.splice(index, 1);\n      }\n    });\n  }\n\n  // Method to retrieve a vector from the index\n  get(filter: Filter) {\n    const vector = this.objects[this.findVectorIndex(filter)];\n    return vector || null;\n  }\n\n  size(): number {\n    // Returns the size of the index\n    return this.objects.length;\n  }\n\n  clear() {\n    this.objects = [];\n  }\n\n  async search(\n    queryEmbedding: number[],\n    options: SearchOptions = {\n      topK: 3,\n      useStorage: 'none',\n      storageOptions: {\n        indexedDBName: 'clientVectorDB',\n        indexedDBObjectStoreName: 'ClientEmbeddingStore',\n      },\n    },\n  ): Promise<SearchResult[]> {\n    const topK = options.topK || DEFAULT_TOP_K;\n    const filter = options.filter || {};\n    const useStorage = options.useStorage || 'none';\n\n    if (useStorage === 'indexedDB') {\n      const DBname = options.storageOptions?.indexedDBName || 'clientVectorDB';\n      const objectStoreName =\n        options.storageOptions?.indexedDBObjectStoreName ||\n        'ClientEmbeddingStore';\n\n      if (typeof indexedDB === 'undefined') {\n        console.error('IndexedDB is not supported');\n        throw new Error('IndexedDB is not supported');\n      }\n      const results = await this.loadAndSearchFromIndexedDB(\n        DBname,\n        objectStoreName,\n        queryEmbedding,\n        topK,\n        filter,\n      );\n      return results;\n    } else {\n      // Compute similarities\n      const similarities = this.objects\n        .filter((object) =>\n          Object.keys(filter).every((key) => object[key] === filter[key]),\n        )\n        .map((obj) => ({\n          similarity: cosineSimilarity(queryEmbedding, obj.embedding),\n          object: obj,\n        }));\n\n      // Sort by similarity and return topK results\n      return similarities\n        .sort((a, b) => b.similarity - a.similarity)\n        .slice(0, topK);\n    }\n  }\n\n  printIndex() {\n    console.log('Index Content:');\n    this.objects.forEach((obj, idx) => {\n      console.log(`Item ${idx + 1}:`, obj);\n    });\n  }\n\n  async saveIndex(\n    storageType: string,\n    options: { DBName: string; objectStoreName: string } = {\n      DBName: 'clientVectorDB',\n      objectStoreName: 'ClientEmbeddingStore',\n    },\n  ) {\n    if (storageType === 'indexedDB') {\n      await this.saveToIndexedDB(options.DBName, options.objectStoreName);\n    } else {\n      throw new Error(\n        `Unsupported storage type: ${storageType} \\n Supported storage types: \"indexedDB\"`,\n      );\n    }\n  }\n\n  async saveToIndexedDB(\n    DBname: string = 'clientVectorDB',\n    objectStoreName: string = 'ClientEmbeddingStore',\n  ): Promise<void> {\n    if (typeof indexedDB === 'undefined') {\n      console.error('IndexedDB is not defined');\n      throw new Error('IndexedDB is not supported');\n    }\n\n    if (!this.objects || this.objects.length === 0) {\n      throw new Error('Index is empty. Nothing to save');\n    }\n\n    try {\n      const db = await IndexedDbManager.create(DBname, objectStoreName);\n      await db.addToIndexedDB(this.objects);\n      console.log(\n        `Index saved to database '${DBname}' object store '${objectStoreName}'`,\n      );\n    } catch (error) {\n      console.error('Error saving index to database:', error);\n      throw new Error('Error saving index to database');\n    }\n  }\n\n  async loadAndSearchFromIndexedDB(\n    DBname: string = 'clientVectorDB',\n    objectStoreName: string = 'ClientEmbeddingStore',\n    queryEmbedding: number[],\n    topK: number,\n    filter: { [key: string]: any },\n  ): Promise<SearchResult[]> {\n    const db = await IndexedDbManager.create(DBname, objectStoreName);\n    const generator = db.dbGenerator();\n    const results: { similarity: number; object: any }[] = [];\n\n    for await (const record of generator) {\n      if (Object.keys(filter).every((key) => record[key] === filter[key])) {\n        const similarity = cosineSimilarity(queryEmbedding, record.embedding);\n        results.push({ similarity, object: record });\n      }\n    }\n    results.sort((a, b) => b.similarity - a.similarity);\n    return results.slice(0, topK);\n  }\n\n  async deleteIndexedDB(DBname: string = 'clientVectorDB'): Promise<void> {\n    if (typeof indexedDB === 'undefined') {\n      console.error('IndexedDB is not defined');\n      throw new Error('IndexedDB is not supported');\n    }\n    return new Promise((resolve, reject) => {\n      const request = indexedDB.deleteDatabase(DBname);\n\n      request.onsuccess = () => {\n        console.log(`Database '${DBname}' deleted`);\n        resolve();\n      };\n      request.onerror = (event) => {\n        console.error('Failed to delete database', event);\n        reject(new Error('Failed to delete database'));\n      };\n    });\n  }\n\n  async deleteIndexedDBObjectStore(\n    DBname: string = 'clientVectorDB',\n    objectStoreName: string = 'ClientEmbeddingStore',\n  ): Promise<void> {\n    const db = await IndexedDbManager.create(DBname, objectStoreName);\n\n    try {\n      await db.deleteIndexedDBObjectStoreFromDB(DBname, objectStoreName);\n      console.log(\n        `Object store '${objectStoreName}' deleted from database '${DBname}'`,\n      );\n    } catch (error) {\n      console.error('Error deleting object store:', error);\n      throw new Error('Error deleting object store');\n    }\n  }\n\n  async getAllObjectsFromIndexedDB(\n    DBname: string = 'clientVectorDB',\n    objectStoreName: string = 'ClientEmbeddingStore',\n  ): Promise<any[]> {\n    const db = await IndexedDbManager.create(DBname, objectStoreName);\n    const objects: any[] = [];\n    for await (const record of db.dbGenerator()) {\n      objects.push(record);\n    }\n    return objects;\n  }\n}\n"
  },
  {
    "path": "src/indexedDB.ts",
    "content": "// uncomment for testing only\n// import { IDBFactory } from 'fake-indexeddb';\n// const indexedDB = new IDBFactory();\n\nexport class IndexedDbManager {\n  private DBname!: string;\n  private objectStoreName!: string;\n\n  constructor(DBname: string, objectStoreName: string) {\n    this.DBname = DBname;\n    this.objectStoreName = objectStoreName;\n  }\n\n  static async create(\n    DBname: string = 'clientVectorDB',\n    objectStoreName: string = 'ClientEmbeddingStore',\n    index: string | null = null,\n  ): Promise<IndexedDbManager> {\n    const instance = new IndexedDbManager(DBname, objectStoreName);\n    return new Promise((resolve, reject) => {\n      const request = indexedDB.open(DBname);\n      let db: IDBDatabase;\n\n      request.onerror = (event) => {\n        console.error('IndexedDB error:', event);\n        reject(new Error('Database initialization failed'));\n      };\n\n      request.onsuccess = async () => {\n        db = request.result;\n        if (!db.objectStoreNames.contains(objectStoreName)) {\n          db.close();\n          await instance.createObjectStore(index);\n        }\n        db.close();\n        resolve(instance);\n      };\n    });\n  }\n\n  async createObjectStore(index: string | null = null): Promise<void> {\n    return new Promise((resolve, reject) => {\n      const request = indexedDB.open(this.DBname);\n      request.onsuccess = () => {\n        let db1 = request.result;\n        var version = db1.version;\n        db1.close();\n        const request_2 = indexedDB.open(this.DBname, version + 1);\n        request_2.onupgradeneeded = async () => {\n          let db2 = request_2.result;\n          if (!db2.objectStoreNames.contains(this.objectStoreName)) {\n            const objectStore = db2.createObjectStore(this.objectStoreName, {\n              autoIncrement: true,\n            });\n            if (index) {\n              objectStore.createIndex(`by_${index}`, index, { unique: false });\n            }\n          }\n        };\n        request_2.onsuccess = async () => {\n          let db2 = request_2.result;\n          console.log('Object store creation successful');\n          db2.close();\n          resolve();\n        };\n        request_2.onerror = (event) => {\n          console.error('Error creating object store:', event);\n          reject(new Error('Error creating object store'));\n        };\n      };\n      request.onerror = (event) => {\n        console.error('Error opening database:', event);\n        reject(new Error('Error opening database'));\n      };\n    });\n  }\n\n  async addToIndexedDB(\n    objs: { [key: string]: any }[] | { [key: string]: any },\n  ): Promise<void> {\n    return new Promise(async (resolve, reject) => {\n      const request = indexedDB.open(this.DBname);\n\n      request.onsuccess = async () => {\n        let db = request.result;\n        const transaction = db.transaction([this.objectStoreName], 'readwrite');\n        const objectStore = transaction.objectStore(this.objectStoreName);\n\n        if (!Array.isArray(objs)) {\n          objs = [objs];\n        }\n\n        objs.forEach((obj: { [key: string]: any }) => {\n          const request = objectStore.add(obj);\n\n          request.onerror = (event) => {\n            console.error('Failed to add object', event);\n            throw new Error('Failed to add object');\n          };\n        });\n\n        transaction.oncomplete = () => {\n          resolve();\n        };\n\n        transaction.onerror = (event) => {\n          console.error('Failed to add object', event);\n          reject(new Error('Failed to add object'));\n        };\n        db.close();\n      };\n    });\n  }\n\n  async *dbGenerator(): AsyncGenerator<any, void, undefined> {\n    const objectStoreName = this.objectStoreName;\n    const dbOpenPromise = new Promise<IDBDatabase>((resolve, reject) => {\n      const request = indexedDB.open(this.DBname);\n      request.onsuccess = () => {\n        resolve(request.result);\n      };\n      request.onerror = () => {\n        reject(new Error('Could not open DB'));\n      };\n    });\n\n    try {\n      const db = await dbOpenPromise;\n      const transaction = db.transaction([objectStoreName], 'readonly');\n      const objectStore = transaction.objectStore(objectStoreName);\n      const request = objectStore.openCursor();\n\n      let promiseResolver: (value: any) => void;\n\n      request.onsuccess = function (event: Event) {\n        const cursor = (event.target as IDBRequest<IDBCursorWithValue>).result;\n        if (cursor) {\n          promiseResolver(cursor.value);\n          cursor.continue();\n        } else {\n          promiseResolver(null);\n        }\n      };\n\n      while (true) {\n        const promise = new Promise<any>((resolve) => {\n          promiseResolver = resolve;\n        });\n        const value = await promise;\n        if (value === null) break;\n        yield value;\n      }\n\n      db.close();\n    } catch (error) {\n      console.error('An error occurred:', error);\n    }\n  }\n  async deleteIndexedDBObjectStoreFromDB(\n    DBname: string,\n    objectStoreName: string,\n  ): Promise<void> {\n    return new Promise(async (resolve, reject) => {\n      const request = indexedDB.open(this.DBname);\n\n      request.onsuccess = async () => {\n        let db = request.result;\n        var version = db.version;\n        db.close();\n        const request_2 = indexedDB.open(db.name, version + 1);\n        request_2.onupgradeneeded = async () => {\n          let db2 = request_2.result;\n          if (db2.objectStoreNames.contains(objectStoreName)) {\n            db2.deleteObjectStore(objectStoreName);\n          } else {\n            console.error(\n              `Object store '${objectStoreName}' not found in database '${DBname}'`,\n            );\n            reject(\n              new Error(\n                `Object store '${objectStoreName}' not found in database '${DBname}'`,\n              ),\n            );\n          }\n        };\n        request_2.onsuccess = () => {\n          let db2 = request_2.result;\n          console.log('Object store deletion successful');\n          db2.close();\n          resolve();\n        };\n        request_2.onerror = (event) => {\n          console.error('Failed to delete object store', event);\n          let db2 = request_2.result;\n          db2.close();\n          reject(new Error('Failed to delete object store'));\n        };\n      };\n      request.onerror = (event) => {\n        console.error('Failed to open database', event);\n        reject(new Error('Failed to open database'));\n      };\n    });\n  }\n}\n"
  },
  {
    "path": "src/utils.ts",
    "content": "export const cosineSimilarity = (\n  vecA: number[],\n  vecB: number[],\n  precision: number = 6,\n): number => {\n  // Check if both vectors have the same length\n  if (vecA.length !== vecB.length) {\n    throw new Error('Vectors must have the same length');\n  }\n\n  // Compute dot product and magnitudes\n  const dotProduct = vecA.reduce((sum, a, i) => {\n    const b = vecB[i]; // Extract value safely\n    return sum + a * (b !== undefined ? b : 0); // Check for undefined\n  }, 0);\n  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));\n  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));\n\n  // Check if either magnitude is zero\n  if (magnitudeA === 0 || magnitudeB === 0) {\n    return 0;\n  }\n\n  // Calculate cosine similarity and round to specified precision\n  return parseFloat(\n    (dotProduct / (magnitudeA * magnitudeB)).toFixed(precision),\n  );\n};\n"
  },
  {
    "path": "tsconfig.json",
    "content": "{\n\t\"compilerOptions\": {\n\t\t\"target\": \"ESNext\",\n\t\t\"lib\": [\"ESNext\", \"DOM\"],\n\t\t\"module\": \"esnext\",\n\t\t\"rootDir\": \"./src\",\n\t\t\"moduleResolution\": \"node\",\n\t\t\"baseUrl\": \"./\",\n\t\t\"resolveJsonModule\": true,\n\t\t\"allowJs\": true,\n\t\t\"declaration\": true,\n\t\t\"declarationMap\": true,\n\t\t\"sourceMap\": true,\n\t\t\"outDir\": \"./dist\",\n\t\t\"noUnusedParameters\": true,\n\t\t\"noUnusedLocals\": true,\n\t\t// \"target\": \"es2016\", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */\n\t\t// \"module\": \"commonjs\", /* Specify what module code is generated. */\n\t\t\"esModuleInterop\": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */\n\t\t\"forceConsistentCasingInFileNames\": true, /* Ensure that casing is correct in imports. */\n\t\t\"strict\": true, /* Enable all strict type-checking options. */\n\t\t\"skipLibCheck\": true, /* Skip type checking all .d.ts files. */\n\t\t\"noUncheckedIndexedAccess\": true,\n\t\t\"noEmit\": true\n\t}\n}\n"
  }
]