Full Code of yusufhilmi/client-vector-search for AI

main 6c8272d3abc9 cached

19 files

35.9 KB

9.5k tokens

54 symbols

1 requests

Download .txt

Repository: yusufhilmi/client-vector-search
Branch: main
Commit: 6c8272d3abc9
Files: 19
Total size: 35.9 KB

Directory structure:
gitextract_toyhhhmu/

├── .changeset/
│   ├── README.md
│   ├── config.json
│   ├── forty-papayas-push.md
│   ├── nine-readers-trade.md
│   ├── six-coins-care.md
│   └── tall-lies-hope.md
├── .gitignore
├── .npmignore
├── .prettierrc
├── CHANGELOG.md
├── LICENSE
├── README.md
├── package.json
├── src/
│   ├── cache.ts
│   ├── hnsw.ts
│   ├── index.ts
│   ├── indexedDB.ts
│   └── utils.ts
└── tsconfig.json

================================================
FILE CONTENTS
================================================

================================================
FILE: .changeset/README.md
================================================
# Changesets

Hello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works
with multi-package repos, or single-package repos to help you version and publish your code. You can
find the full documentation for it [in our repository](https://github.com/changesets/changesets)

We have a quick list of common questions to get you started engaging with this project in
[our documentation](https://github.com/changesets/changesets/blob/main/docs/common-questions.md)


================================================
FILE: .changeset/config.json
================================================
{
  "$schema": "https://unpkg.com/@changesets/config@2.3.0/schema.json",
  "changelog": "@changesets/cli/changelog",
  "commit": false,
  "fixed": [],
  "linked": [],
  "access": "public",
  "baseBranch": "main",
  "updateInternalDependencies": "patch",
  "ignore": []
}


================================================
FILE: .changeset/forty-papayas-push.md
================================================
---
"client-vector-search": patch
---

updates the docs and dynamic import for @xenova/transformers


================================================
FILE: .changeset/nine-readers-trade.md
================================================
---
'client-vector-search': minor
---

support for experimental hnsw that runs on node and browser with json and binary serialization opitons


================================================
FILE: .changeset/six-coins-care.md
================================================
---
"client-vector-search": patch
---

creates a proper embedding index


================================================
FILE: .changeset/tall-lies-hope.md
================================================
---
"client-vector-search": patch
---

adds in-memory index creation and brute force knn search


================================================
FILE: .gitignore
================================================
node_modules
dist
test.js
mock/
.DS_Store
.pytest_cache/
test*/


================================================
FILE: .npmignore
================================================
node_modules
dist
test.js
mock/
.DS_Store
.pytest_cache/
test*/

.github/
.changeset/


================================================
FILE: .prettierrc
================================================
{
  "semi": true,
  "trailingComma": "all",
  "singleQuote": true,
  "printWidth": 80,
  "tabWidth": 2
}


================================================
FILE: CHANGELOG.md
================================================
# client-vector-search

## 0.2.0

### Minor Changes

- support for experimental hnsw that runs on node and browser with json and binary serialization opitons

### Patch Changes

- f09bc2f: updates the docs and dynamic import for @xenova/transformers
- 46e07d6: creates a proper embedding index
- 13bddbb: adds in-memory index creation and brute force knn search


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Yusuf Hilmi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# client-vector-search

A client side vector search library that can embed, search, and cache. Works on the browser and server side.

It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.

I'm the founder of [searchbase.app](https://searchbase.app) and we needed this for our product and customers. We'll be using this library in production. You can be sure it'll be maintained and improved.

- Embed documents using transformers by default: gte-small (~30mb).
- Calculate cosine similarity between embeddings.
- Create an index and search on the client side
- Cache vectors with browser caching support.

Lots of improvements are coming!

## Roadmap

Our goal is to build a super simple, fast vector search that works with couple hundred to thousands vectors. ~1k vectors per user covers 99% of the use cases.

We'll initially keep things super simple and sub 100ms

### TODOs
- [ ] add HNSW index that works on node and browser env, don't rely on hnsw binder libs
- [ ] add a proper testing suite and ci/cd for the lib
  - [ ] simple health tests
    - [ ] mock the @xenova/transformers for jest, it's not happy with it
  - [ ] performance tests, recall, memory usage, cpu usage etc.


## Installation

```bash
npm i client-vector-search
```


## Quickstart

This library provides a plug-and-play solution for embedding and vector search. It's designed to be easy to use, efficient, and versatile. Here's a quick start guide:


```ts
  import { getEmbedding, EmbeddingIndex } from 'client-vector-search';

  // getEmbedding is an async function, so you need to use 'await' or '.then()' to get the result
  const embedding = await getEmbedding("Apple"); // Returns embedding as number[]

  // Each object should have an 'embedding' property of type number[]
  const initialObjects = [
  { id: 1, name: "Apple", embedding: embedding },
  { id: 2, name: "Banana", embedding: await getEmbedding("Banana") },
  { id: 3, name: "Cheddar", embedding: await getEmbedding("Cheddar")},
  { id: 4, name: "Space", embedding: await getEmbedding("Space")},
  { id: 5, name: "database", embedding: await getEmbedding("database")},
  ];
  const index = new EmbeddingIndex(initialObjects); // Creates an index

  // The query should be an embedding of type number[]
  const queryEmbedding = await getEmbedding('Fruit'); // Query embedding
  const results = await index.search(queryEmbedding, { topK: 5 }); // Returns top similar objects

  // specify the storage type
  await index.saveIndex('indexedDB');
  const results = await index.search([1, 2, 3], {
    topK: 5,
    useStorage: 'indexedDB',
    // storageOptions: { // use only if you overrode the defaults
    //   indexedDBName: 'clientVectorDB',
    //   indexedDBObjectStoreName: 'ClientEmbeddingStore',
    // },
  });

  console.log(results);

  await index.deleteIndexedDB(); // if you overrode default, specify db name
```

## Trouble-shooting

### NextJS
To use it inside NextJS projects you'll need to update the `next.config.js` file to include the following:

```js
module.exports = {
  // Override the default webpack configuration
  webpack: (config) => {
    // See https://webpack.js.org/configuration/resolve/#resolvealias
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };
    return config;
  },
};
```

#### Model load after page is loaded

You can initialize the model before using it to generate embeddings. This will ensure that the model is loaded before you use it and provide a better UX.

```js
import { initializeModel } from "client-vector-search"
...
  useEffect(() => {
    try {
      initializeModel();
    } catch (e) {
      console.log(e);
    }
  }, []);
```

## Usage Guide

This guide provides a step-by-step walkthrough of the library's main features. It covers everything from generating embeddings for a string to performing operations on the index such as adding, updating, and removing objects. It also includes instructions on how to save the index to a database and perform search operations within it.

Until we have a reference documentation, you can find all the methods and their usage in this guide. Each step is accompanied by a code snippet to illustrate the usage of the method in question. Make sure to follow along and try out the examples in your own environment to get a better understanding of how everything works.

Let's get started!

### Step 1: Generate Embeddings for String
Generate embeddings for a given string using the `getEmbedding` method.

```ts
const embedding = await getEmbedding("Apple"); // Returns embedding as number[]
```
> **Note**: `getEmbedding` is asynchronous; make sure to use `await`.

---

### Step 2: Calculate Cosine Similarity
Calculate the cosine similarity between two embeddings.

```ts
const similarity = cosineSimilarity(embedding1, embedding2, 6);
```
> **Note**: Both embeddings should be of the same length.

---

### Step 3: Create an Index
Create an index with an initial array of objects. Each object must have an 'embedding' property.

```ts
const initialObjects = [...];
const index = new EmbeddingIndex(initialObjects);
```

---

### Step 4: Add to Index
Add an object to the index.

```ts
const objectToAdd = { id: 6, name: 'Cat', embedding: await getEmbedding('Cat') };
index.add(objectToAdd);
```

---

### Step 5: Update Index
Update an existing object in the index.

```ts
const vectorToUpdate = { id: 6, name: 'Dog', embedding: await getEmbedding('Dog') };
index.update({ id: 6 }, vectorToUpdate);
```

---

### Step 6: Remove from Index
Remove an object from the index.

```ts
index.remove({ id: 6 });
```

---

### Step 7: Retrieve from Index
Retrieve an object from the index.

```ts
const vector = index.get({ id: 1 });
```

---

### Step 8: Search the Index
Search the index with a query embedding.

```ts
const queryEmbedding = await getEmbedding('Fruit');
const results = await index.search(queryEmbedding, { topK: 5 });
```

---

### Step 9: Print the Index
Print the entire index to the console.

```ts
index.printIndex();
```

---

### Step 10: Save Index to IndexedDB (for browser)
Save the index to a persistent IndexedDB database. Note

```ts
await index.saveIndex("indexedDB", { DBName: "clientVectorDB", objectStoreName:"ClientEmbeddingStore"})
```

---

### Important: Search in indexedDB
Perform a search operation in the IndexedDB.

```ts
const results = await index.search(queryEmbedding, {
  topK: 5,
  useStorage: "indexedDB",
  storageOptions: { // only if you want to override the default options, defaults are below
    indexedDBName: 'clientVectorDB',
    indexedDBObjectStoreName: 'ClientEmbeddingStore'
  }
});

---

### Delete Database
To delete an entire database.

```ts
await IndexedDbManager.deleteIndexedDB("clientVectorDB");
```

---

### Delete Object Store
To delete an object store from a database.

```ts
await IndexedDbManager.deleteIndexedDBObjectStore("clientVectorDB", "ClientEmbeddingStore");
```

---

### Retrieve All Objects
To retrieve all objects from a specific object store.

```ts
const allObjects = await IndexedDbManager.getAllObjectsFromIndexedDB("clientVectorDB", "ClientEmbeddingStore");
```


================================================
FILE: package.json
================================================
{
  "name": "client-vector-search",
  "version": "0.2.0",
  "description": "A client side vector search library",
  "main": "dist/index.js",
  "module": "dist/index.mjs",
  "types": "dist/index.d.ts",
  "scripts": {
    "build": "tsup src/index.ts --format cjs,esm --dts",
    "dev": "tsup src/index.ts --format cjs,esm --dts --watch",
    "changeset": "changeset",
    "version": "changeset version",
    "release": "npm run build && changeset publish",
    "lint": "tsc"
  },
  "repository": {
    "type": "git",
    "url": "git+https://github.com/yusufhilmi/client-vector-search.git"
  },
  "keywords": [
    "vector",
    "search",
    "embeddings",
    "nlp",
    "models"
  ],
  "author": "yusufhilmi",
  "license": "MIT",
  "bugs": {
    "url": "https://github.com/yusufhilmi/client-vector-search/issues"
  },
  "homepage": "https://github.com/yusufhilmi/client-vector-search#readme",
  "devDependencies": {
    "@changesets/cli": "^2.26.2",
    "fake-indexeddb": "^4.0.2",
    "tsup": "^6.5.0",
    "typescript": "^4.9.5"
  },
  "dependencies": {
    "@msgpack/msgpack": "^3.0.0-beta2",
    "@xenova/transformers": "^2.5.2",
    "lru-cache": "^10.0.1"
  }
}


================================================
FILE: src/cache.ts
================================================
import { LRUCache } from 'lru-cache';

class Cache {
  private static instance: LRUCache<string, any[]>;

  private constructor() {}

  public static getInstance(
    max: number = 10000,
    maxAge: number = 1000 * 60 * 10,
  ): LRUCache<string, any[]> {
    if (!Cache.instance) {
      const options = {
        max: max,
        length: () => 1,
        maxAge: maxAge,
      };
      Cache.instance = new LRUCache<string, any[]>(options);
    }
    return Cache.instance;
  }
}

export default Cache;


================================================
FILE: src/hnsw.ts
================================================
// an experimental implementation of hnsw that doesn't rely on the hnsw binding libs which only works in browser or node
// TODOS:
// - bare bones
// - find # layers and optimal params
// - test the speed, accuracy, and memory usage
import { encode, decode } from '@msgpack/msgpack';

type Vector = number[];
type Distance = number;
type NodeIndex = number;
type Layer = LayerNode[];

interface LayerNode {
  vector: Vector;
  connections: NodeIndex[];
  layerBelow: NodeIndex | null;
}

interface HNSWData {
  L: number;
  mL: number;
  efc: number;
  index: Layer[];
}

// Simple Priority Queue Implementation
class PriorityQueue<T> {
  private elements: T[];
  private compareFn: (a: T, b: T) => number;

  constructor(elements: T[], compareFn: (a: T, b: T) => number) {
    this.elements = elements;
    this.compareFn = compareFn;
    this.elements.sort(this.compareFn);
  }

  push(element: T) {
    this.elements.push(element);
    this.elements.sort(this.compareFn);
  }

  pop(): T | null {
    return this.elements.shift() || null;
  }

  isEmpty(): boolean {
    return this.elements.length === 0;
  }
}

const EuclideanDistance = (a: Vector, b: Vector): Distance => {
  if (a.length !== b.length) {
    throw new Error('Vectors must have the same length');
  }

  return Math.sqrt(
    a.reduce((acc, val, i) => {
      const bVal = b[i]; // Check b[i] in a variable
      if (bVal === undefined) throw new Error('b[i] is undefined');
      return acc + Math.pow(val - bVal, 2);
    }, 0),
  );
};

const getInsertLayer = (L: number, mL: number): number => {
  return Math.min(-Math.floor(Math.log(Math.random()) * mL), L - 1);
};
const _searchLayer = (
  graph: Layer,
  entry: NodeIndex,
  query: Vector,
  ef: number,
): [Distance, NodeIndex][] => {
  if (entry < 0 || entry >= graph.length) {
    throw new Error(`Invalid entry index: ${entry}`);
  }

  // Check if the graph at the entry index is defined
  const graphEntry = graph[entry];
  if (!graphEntry) {
    throw new Error(`Graph entry at index ${entry} is undefined`);
  }

  const best: [Distance, NodeIndex] = [
    EuclideanDistance(graphEntry.vector, query),
    entry,
  ];
  const nns: [Distance, NodeIndex][] = [best];
  const visited = new Set([best[1]]);
  const candidates = new PriorityQueue<[Distance, NodeIndex]>(
    [best],
    (a, b) => a[0] - b[0],
  );

  while (!candidates.isEmpty()) {
    const current = candidates.pop();
    // Define a variable to hold the last element of nns array
    const lastNnsElement = nns.length > 0 ? nns[nns.length - 1] : null;
    // Check if current is not null and lastNnsElement is not undefined before comparing their values
    if (!current || (lastNnsElement && lastNnsElement[0] < current[0])) break;

    const graphCurrent = graph[current[1]];
    if (!graphCurrent) continue;

    for (const e of graphCurrent.connections) {
      const graphE = graph[e];
      if (!graphE) continue;

      const dist = EuclideanDistance(graphE.vector, query);
      if (!visited.has(e)) {
        visited.add(e);
        const lastNn = nns[nns.length - 1];
        if (!lastNn || dist < lastNn[0] || nns.length < ef) {
          candidates.push([dist, e]);
          nns.push([dist, e]);
          nns.sort((a, b) => a[0] - b[0]);
          if (nns.length > ef) {
            nns.pop();
          }
        }
      }
    }
  }

  return nns;
};
export class ExperimentalHNSWIndex {
  private L: number;
  private mL: number;
  private efc: number;
  private index: Layer[];

  constructor(L = 5, mL = 0.62, efc = 10) {
    this.L = L;
    this.mL = mL;
    this.efc = efc;
    this.index = Array.from({ length: L }, () => []);
  }
  setIndex(index: Layer[]): void {
    this.index = index;
  }

  insert(vec: Vector) {
    const l = getInsertLayer(this.L, this.mL);
    let startV = 0;

    for (let n = 0; n < this.L; n++) {
      const graph = this.index[n];

      if (graph?.length === 0) {
        // If the graph layer is empty, add a new node to it
        // Assign next layer to a variable and check if it's undefined
        const nextLayer = this.index[n + 1];
        const nextLayerLength = nextLayer ? nextLayer.length : null;
        graph?.push({
          vector: vec,
          connections: [],
          layerBelow: n < this.L - 1 ? nextLayerLength : null,
        });
        continue;
      }

      if (n < l && graph) {
        // Check if the search layer result is not undefined before accessing its properties
        const searchLayerResult = _searchLayer(graph, startV, vec, 1);
        startV =
          searchLayerResult && searchLayerResult[0]
            ? searchLayerResult[0][1]
            : startV;
      } else if (graph) {
        // Assign next layer to a variable and check if it's undefined
        const nextLayer = this.index[n + 1];
        const nextLayerLength = nextLayer ? nextLayer.length : null;
        const node: LayerNode = {
          vector: vec,
          connections: [],
          layerBelow: n < this.L - 1 ? nextLayerLength : null,
        };
        const nns = _searchLayer(graph, startV, vec, this.efc);
        for (const nn of nns) {
          node.connections.push(nn[1]);
          graph[nn[1]]?.connections.push(graph.length);
        }
        graph?.push(node);
        // Assign graph[startV] to a variable and check if it's undefined before accessing its properties
        const graphStartV = graph[startV];
        if (graphStartV) startV = graphStartV.layerBelow!;
      }
    }
  }

  search(query: Vector, ef = 1): [Distance, NodeIndex][] {
    if (this.index && this.index[0] && this.index[0].length === 0) {
      return [];
    }

    let bestV = 0;
    for (const graph of this.index) {
      const searchLayer = _searchLayer(graph, bestV, query, ef);
      if (searchLayer && searchLayer[0]) {
        bestV = searchLayer[0][1];
        if (graph[bestV]?.layerBelow === null) {
          return _searchLayer(graph, bestV, query, ef);
        }
        bestV = graph[bestV]?.layerBelow!;
      }
    }
    return [];
  }

  toJSON() {
    return {
      L: this.L,
      mL: this.mL,
      efc: this.efc,
      index: this.index,
    };
  }

  static fromJSON(json: any): ExperimentalHNSWIndex {
    const hnsw = new ExperimentalHNSWIndex(json.L, json.mL, json.efc);
    return hnsw;
  }

  toBinary() {
    return encode({
      L: this.L,
      mL: this.mL,
      efc: this.efc,
      index: this.index,
    });
  }

  static fromBinary(binary: Uint8Array): ExperimentalHNSWIndex {
    const data = decode(binary) as HNSWData;
    const hnsw = new ExperimentalHNSWIndex(data.L, data.mL, data.efc);
    hnsw.setIndex(data.index);
    return hnsw;
  }
}


================================================
FILE: src/index.ts
================================================
const DEFAULT_TOP_K = 3;

interface Filter {
  [key: string]: any;
}

import Cache from './cache';
import { IndexedDbManager } from './indexedDB';
import { cosineSimilarity } from './utils';
export { ExperimentalHNSWIndex } from './hnsw';

// uncomment if you want to test indexedDB implementation in node env for faster dev cycle
// import { IDBFactory } from 'fake-indexeddb';
// const indexedDB = new IDBFactory();

export interface SearchResult {
  similarity: number;
  object: any;
}

type StorageOptions = 'indexedDB' | 'localStorage' | 'none';

/**
 * Interface for search options in the EmbeddingIndex class.
 * topK: The number of top similar items to return.
 * filter: An optional filter to apply to the objects before searching.
 * useStorage: A flag to indicate whether to use storage options like indexedDB or localStorage.
 */
interface SearchOptions {
  topK?: number;
  filter?: Filter;
  useStorage?: StorageOptions;
  storageOptions?: { indexedDBName: string; indexedDBObjectStoreName: string }; // TODO: generalize it to localStorage as well
}

const cacheInstance = Cache.getInstance();

let pipe: any;
let currentModel: string;

export const initializeModel = async (
  model: string = 'Xenova/gte-small',
): Promise<void> => {
  if (model !== currentModel) {
    const transformersModule = await import('@xenova/transformers');
    const pipeline = transformersModule.pipeline;
    pipe = await pipeline('feature-extraction', model);
    currentModel = model;
  }
};

export const getEmbedding = async (
  text: string,
  precision: number = 7,
  options = { pooling: 'mean', normalize: false },
  model = 'Xenova/gte-small',
): Promise<number[]> => {
  const cachedEmbedding = cacheInstance.get(text);
  if (cachedEmbedding) {
    return Promise.resolve(cachedEmbedding);
  }

  if (model !== currentModel) {
    await initializeModel(model);
  }

  const output = await pipe(text, options);
  const roundedOutput = Array.from(output.data as number[]).map(
    (value: number) => parseFloat(value.toFixed(precision)),
  );
  cacheInstance.set(text, roundedOutput);
  return Array.from(roundedOutput);
};

export class EmbeddingIndex {
  private objects: Filter[];
  private keys: string[];

  constructor(initialObjects?: Filter[]) {
    // TODO: add support for options while creating index such as  {... indexedDB: true, ...}
    this.objects = [];
    this.keys = [];
    if (initialObjects && initialObjects.length > 0) {
      initialObjects.forEach((obj) => this.validateAndAdd(obj));
      if (initialObjects[0]) {
        this.keys = Object.keys(initialObjects[0]);
      }
    }
  }

  private findVectorIndex(filter: Filter): number {
    return this.objects.findIndex((object) =>
      Object.keys(filter).every((key) => object[key] === filter[key]),
    );
  }

  private validateAndAdd(obj: Filter) {
    if (!Array.isArray(obj.embedding) || obj.embedding.some(isNaN)) {
      throw new Error(
        'Object must have an embedding property of type number[]',
      );
    }
    if (this.keys.length === 0) {
      this.keys = Object.keys(obj);
    } else if (!this.keys.every((key) => key in obj)) {
      throw new Error(
        'Object must have the same properties as the initial objects',
      );
    }
    this.objects.push(obj);
  }

  add(obj: Filter) {
    this.validateAndAdd(obj);
  }

  // Method to update an existing vector in the index
  update(filter: Filter, vector: Filter) {
    const index = this.findVectorIndex(filter);
    if (index === -1) {
      throw new Error('Vector not found');
    }
    if (vector.hasOwnProperty('embedding')) {
      // Validate and add the new vector
      this.validateAndAdd(vector);
    }
    // Replace the old vector with the new one
    this.objects[index] = Object.assign(this.objects[index] as Filter, vector);
  }

  // Method to remove a vector from the index
  remove(filter: Filter) {
    const index = this.findVectorIndex(filter);
    if (index === -1) {
      throw new Error('Vector not found');
    }
    // Remove the vector from the index
    this.objects.splice(index, 1);
  }

  // Method to remove multiple vectors from the index
  removeBatch(filters: Filter[]) {
    filters.forEach((filter) => {
      const index = this.findVectorIndex(filter);
      if (index !== -1) {
        // Remove the vector from the index
        this.objects.splice(index, 1);
      }
    });
  }

  // Method to retrieve a vector from the index
  get(filter: Filter) {
    const vector = this.objects[this.findVectorIndex(filter)];
    return vector || null;
  }

  size(): number {
    // Returns the size of the index
    return this.objects.length;
  }

  clear() {
    this.objects = [];
  }

  async search(
    queryEmbedding: number[],
    options: SearchOptions = {
      topK: 3,
      useStorage: 'none',
      storageOptions: {
        indexedDBName: 'clientVectorDB',
        indexedDBObjectStoreName: 'ClientEmbeddingStore',
      },
    },
  ): Promise<SearchResult[]> {
    const topK = options.topK || DEFAULT_TOP_K;
    const filter = options.filter || {};
    const useStorage = options.useStorage || 'none';

    if (useStorage === 'indexedDB') {
      const DBname = options.storageOptions?.indexedDBName || 'clientVectorDB';
      const objectStoreName =
        options.storageOptions?.indexedDBObjectStoreName ||
        'ClientEmbeddingStore';

      if (typeof indexedDB === 'undefined') {
        console.error('IndexedDB is not supported');
        throw new Error('IndexedDB is not supported');
      }
      const results = await this.loadAndSearchFromIndexedDB(
        DBname,
        objectStoreName,
        queryEmbedding,
        topK,
        filter,
      );
      return results;
    } else {
      // Compute similarities
      const similarities = this.objects
        .filter((object) =>
          Object.keys(filter).every((key) => object[key] === filter[key]),
        )
        .map((obj) => ({
          similarity: cosineSimilarity(queryEmbedding, obj.embedding),
          object: obj,
        }));

      // Sort by similarity and return topK results
      return similarities
        .sort((a, b) => b.similarity - a.similarity)
        .slice(0, topK);
    }
  }

  printIndex() {
    console.log('Index Content:');
    this.objects.forEach((obj, idx) => {
      console.log(`Item ${idx + 1}:`, obj);
    });
  }

  async saveIndex(
    storageType: string,
    options: { DBName: string; objectStoreName: string } = {
      DBName: 'clientVectorDB',
      objectStoreName: 'ClientEmbeddingStore',
    },
  ) {
    if (storageType === 'indexedDB') {
      await this.saveToIndexedDB(options.DBName, options.objectStoreName);
    } else {
      throw new Error(
        `Unsupported storage type: ${storageType} \n Supported storage types: "indexedDB"`,
      );
    }
  }

  async saveToIndexedDB(
    DBname: string = 'clientVectorDB',
    objectStoreName: string = 'ClientEmbeddingStore',
  ): Promise<void> {
    if (typeof indexedDB === 'undefined') {
      console.error('IndexedDB is not defined');
      throw new Error('IndexedDB is not supported');
    }

    if (!this.objects || this.objects.length === 0) {
      throw new Error('Index is empty. Nothing to save');
    }

    try {
      const db = await IndexedDbManager.create(DBname, objectStoreName);
      await db.addToIndexedDB(this.objects);
      console.log(
        `Index saved to database '${DBname}' object store '${objectStoreName}'`,
      );
    } catch (error) {
      console.error('Error saving index to database:', error);
      throw new Error('Error saving index to database');
    }
  }

  async loadAndSearchFromIndexedDB(
    DBname: string = 'clientVectorDB',
    objectStoreName: string = 'ClientEmbeddingStore',
    queryEmbedding: number[],
    topK: number,
    filter: { [key: string]: any },
  ): Promise<SearchResult[]> {
    const db = await IndexedDbManager.create(DBname, objectStoreName);
    const generator = db.dbGenerator();
    const results: { similarity: number; object: any }[] = [];

    for await (const record of generator) {
      if (Object.keys(filter).every((key) => record[key] === filter[key])) {
        const similarity = cosineSimilarity(queryEmbedding, record.embedding);
        results.push({ similarity, object: record });
      }
    }
    results.sort((a, b) => b.similarity - a.similarity);
    return results.slice(0, topK);
  }

  async deleteIndexedDB(DBname: string = 'clientVectorDB'): Promise<void> {
    if (typeof indexedDB === 'undefined') {
      console.error('IndexedDB is not defined');
      throw new Error('IndexedDB is not supported');
    }
    return new Promise((resolve, reject) => {
      const request = indexedDB.deleteDatabase(DBname);

      request.onsuccess = () => {
        console.log(`Database '${DBname}' deleted`);
        resolve();
      };
      request.onerror = (event) => {
        console.error('Failed to delete database', event);
        reject(new Error('Failed to delete database'));
      };
    });
  }

  async deleteIndexedDBObjectStore(
    DBname: string = 'clientVectorDB',
    objectStoreName: string = 'ClientEmbeddingStore',
  ): Promise<void> {
    const db = await IndexedDbManager.create(DBname, objectStoreName);

    try {
      await db.deleteIndexedDBObjectStoreFromDB(DBname, objectStoreName);
      console.log(
        `Object store '${objectStoreName}' deleted from database '${DBname}'`,
      );
    } catch (error) {
      console.error('Error deleting object store:', error);
      throw new Error('Error deleting object store');
    }
  }

  async getAllObjectsFromIndexedDB(
    DBname: string = 'clientVectorDB',
    objectStoreName: string = 'ClientEmbeddingStore',
  ): Promise<any[]> {
    const db = await IndexedDbManager.create(DBname, objectStoreName);
    const objects: any[] = [];
    for await (const record of db.dbGenerator()) {
      objects.push(record);
    }
    return objects;
  }
}


================================================
FILE: src/indexedDB.ts
================================================
// uncomment for testing only
// import { IDBFactory } from 'fake-indexeddb';
// const indexedDB = new IDBFactory();

export class IndexedDbManager {
  private DBname!: string;
  private objectStoreName!: string;

  constructor(DBname: string, objectStoreName: string) {
    this.DBname = DBname;
    this.objectStoreName = objectStoreName;
  }

  static async create(
    DBname: string = 'clientVectorDB',
    objectStoreName: string = 'ClientEmbeddingStore',
    index: string | null = null,
  ): Promise<IndexedDbManager> {
    const instance = new IndexedDbManager(DBname, objectStoreName);
    return new Promise((resolve, reject) => {
      const request = indexedDB.open(DBname);
      let db: IDBDatabase;

      request.onerror = (event) => {
        console.error('IndexedDB error:', event);
        reject(new Error('Database initialization failed'));
      };

      request.onsuccess = async () => {
        db = request.result;
        if (!db.objectStoreNames.contains(objectStoreName)) {
          db.close();
          await instance.createObjectStore(index);
        }
        db.close();
        resolve(instance);
      };
    });
  }

  async createObjectStore(index: string | null = null): Promise<void> {
    return new Promise((resolve, reject) => {
      const request = indexedDB.open(this.DBname);
      request.onsuccess = () => {
        let db1 = request.result;
        var version = db1.version;
        db1.close();
        const request_2 = indexedDB.open(this.DBname, version + 1);
        request_2.onupgradeneeded = async () => {
          let db2 = request_2.result;
          if (!db2.objectStoreNames.contains(this.objectStoreName)) {
            const objectStore = db2.createObjectStore(this.objectStoreName, {
              autoIncrement: true,
            });
            if (index) {
              objectStore.createIndex(`by_${index}`, index, { unique: false });
            }
          }
        };
        request_2.onsuccess = async () => {
          let db2 = request_2.result;
          console.log('Object store creation successful');
          db2.close();
          resolve();
        };
        request_2.onerror = (event) => {
          console.error('Error creating object store:', event);
          reject(new Error('Error creating object store'));
        };
      };
      request.onerror = (event) => {
        console.error('Error opening database:', event);
        reject(new Error('Error opening database'));
      };
    });
  }

  async addToIndexedDB(
    objs: { [key: string]: any }[] | { [key: string]: any },
  ): Promise<void> {
    return new Promise(async (resolve, reject) => {
      const request = indexedDB.open(this.DBname);

      request.onsuccess = async () => {
        let db = request.result;
        const transaction = db.transaction([this.objectStoreName], 'readwrite');
        const objectStore = transaction.objectStore(this.objectStoreName);

        if (!Array.isArray(objs)) {
          objs = [objs];
        }

        objs.forEach((obj: { [key: string]: any }) => {
          const request = objectStore.add(obj);

          request.onerror = (event) => {
            console.error('Failed to add object', event);
            throw new Error('Failed to add object');
          };
        });

        transaction.oncomplete = () => {
          resolve();
        };

        transaction.onerror = (event) => {
          console.error('Failed to add object', event);
          reject(new Error('Failed to add object'));
        };
        db.close();
      };
    });
  }

  async *dbGenerator(): AsyncGenerator<any, void, undefined> {
    const objectStoreName = this.objectStoreName;
    const dbOpenPromise = new Promise<IDBDatabase>((resolve, reject) => {
      const request = indexedDB.open(this.DBname);
      request.onsuccess = () => {
        resolve(request.result);
      };
      request.onerror = () => {
        reject(new Error('Could not open DB'));
      };
    });

    try {
      const db = await dbOpenPromise;
      const transaction = db.transaction([objectStoreName], 'readonly');
      const objectStore = transaction.objectStore(objectStoreName);
      const request = objectStore.openCursor();

      let promiseResolver: (value: any) => void;

      request.onsuccess = function (event: Event) {
        const cursor = (event.target as IDBRequest<IDBCursorWithValue>).result;
        if (cursor) {
          promiseResolver(cursor.value);
          cursor.continue();
        } else {
          promiseResolver(null);
        }
      };

      while (true) {
        const promise = new Promise<any>((resolve) => {
          promiseResolver = resolve;
        });
        const value = await promise;
        if (value === null) break;
        yield value;
      }

      db.close();
    } catch (error) {
      console.error('An error occurred:', error);
    }
  }
  async deleteIndexedDBObjectStoreFromDB(
    DBname: string,
    objectStoreName: string,
  ): Promise<void> {
    return new Promise(async (resolve, reject) => {
      const request = indexedDB.open(this.DBname);

      request.onsuccess = async () => {
        let db = request.result;
        var version = db.version;
        db.close();
        const request_2 = indexedDB.open(db.name, version + 1);
        request_2.onupgradeneeded = async () => {
          let db2 = request_2.result;
          if (db2.objectStoreNames.contains(objectStoreName)) {
            db2.deleteObjectStore(objectStoreName);
          } else {
            console.error(
              `Object store '${objectStoreName}' not found in database '${DBname}'`,
            );
            reject(
              new Error(
                `Object store '${objectStoreName}' not found in database '${DBname}'`,
              ),
            );
          }
        };
        request_2.onsuccess = () => {
          let db2 = request_2.result;
          console.log('Object store deletion successful');
          db2.close();
          resolve();
        };
        request_2.onerror = (event) => {
          console.error('Failed to delete object store', event);
          let db2 = request_2.result;
          db2.close();
          reject(new Error('Failed to delete object store'));
        };
      };
      request.onerror = (event) => {
        console.error('Failed to open database', event);
        reject(new Error('Failed to open database'));
      };
    });
  }
}


================================================
FILE: src/utils.ts
================================================
export const cosineSimilarity = (
  vecA: number[],
  vecB: number[],
  precision: number = 6,
): number => {
  // Check if both vectors have the same length
  if (vecA.length !== vecB.length) {
    throw new Error('Vectors must have the same length');
  }

  // Compute dot product and magnitudes
  const dotProduct = vecA.reduce((sum, a, i) => {
    const b = vecB[i]; // Extract value safely
    return sum + a * (b !== undefined ? b : 0); // Check for undefined
  }, 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));

  // Check if either magnitude is zero
  if (magnitudeA === 0 || magnitudeB === 0) {
    return 0;
  }

  // Calculate cosine similarity and round to specified precision
  return parseFloat(
    (dotProduct / (magnitudeA * magnitudeB)).toFixed(precision),
  );
};


================================================
FILE: tsconfig.json
================================================
{
	"compilerOptions": {
		"target": "ESNext",
		"lib": ["ESNext", "DOM"],
		"module": "esnext",
		"rootDir": "./src",
		"moduleResolution": "node",
		"baseUrl": "./",
		"resolveJsonModule": true,
		"allowJs": true,
		"declaration": true,
		"declarationMap": true,
		"sourceMap": true,
		"outDir": "./dist",
		"noUnusedParameters": true,
		"noUnusedLocals": true,
		// "target": "es2016", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
		// "module": "commonjs", /* Specify what module code is generated. */
		"esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
		"forceConsistentCasingInFileNames": true, /* Ensure that casing is correct in imports. */
		"strict": true, /* Enable all strict type-checking options. */
		"skipLibCheck": true, /* Skip type checking all .d.ts files. */
		"noUncheckedIndexedAccess": true,
		"noEmit": true
	}
}

Download .txt

gitextract_toyhhhmu/

├── .changeset/
│   ├── README.md
│   ├── config.json
│   ├── forty-papayas-push.md
│   ├── nine-readers-trade.md
│   ├── six-coins-care.md
│   └── tall-lies-hope.md
├── .gitignore
├── .npmignore
├── .prettierrc
├── CHANGELOG.md
├── LICENSE
├── README.md
├── package.json
├── src/
│   ├── cache.ts
│   ├── hnsw.ts
│   ├── index.ts
│   ├── indexedDB.ts
│   └── utils.ts
└── tsconfig.json

Download .txt

SYMBOL INDEX (54 symbols across 4 files)

FILE: src/cache.ts
  class Cache (line 3) | class Cache {
    method constructor (line 6) | private constructor() {}
    method getInstance (line 8) | public static getInstance(

FILE: src/hnsw.ts
  type Vector (line 8) | type Vector = number[];
  type Distance (line 9) | type Distance = number;
  type NodeIndex (line 10) | type NodeIndex = number;
  type Layer (line 11) | type Layer = LayerNode[];
  type LayerNode (line 13) | interface LayerNode {
  type HNSWData (line 19) | interface HNSWData {
  class PriorityQueue (line 27) | class PriorityQueue<T> {
    method constructor (line 31) | constructor(elements: T[], compareFn: (a: T, b: T) => number) {
    method push (line 37) | push(element: T) {
    method pop (line 42) | pop(): T | null {
    method isEmpty (line 46) | isEmpty(): boolean {
  class ExperimentalHNSWIndex (line 127) | class ExperimentalHNSWIndex {
    method constructor (line 133) | constructor(L = 5, mL = 0.62, efc = 10) {
    method setIndex (line 139) | setIndex(index: Layer[]): void {
    method insert (line 143) | insert(vec: Vector) {
    method search (line 192) | search(query: Vector, ef = 1): [Distance, NodeIndex][] {
    method toJSON (line 211) | toJSON() {
    method fromJSON (line 220) | static fromJSON(json: any): ExperimentalHNSWIndex {
    method toBinary (line 225) | toBinary() {
    method fromBinary (line 234) | static fromBinary(binary: Uint8Array): ExperimentalHNSWIndex {

FILE: src/index.ts
  constant DEFAULT_TOP_K (line 1) | const DEFAULT_TOP_K = 3;
  type Filter (line 3) | interface Filter {
  type SearchResult (line 16) | interface SearchResult {
  type StorageOptions (line 21) | type StorageOptions = 'indexedDB' | 'localStorage' | 'none';
  type SearchOptions (line 29) | interface SearchOptions {
  class EmbeddingIndex (line 75) | class EmbeddingIndex {
    method constructor (line 79) | constructor(initialObjects?: Filter[]) {
    method findVectorIndex (line 91) | private findVectorIndex(filter: Filter): number {
    method validateAndAdd (line 97) | private validateAndAdd(obj: Filter) {
    method add (line 113) | add(obj: Filter) {
    method update (line 118) | update(filter: Filter, vector: Filter) {
    method remove (line 132) | remove(filter: Filter) {
    method removeBatch (line 142) | removeBatch(filters: Filter[]) {
    method get (line 153) | get(filter: Filter) {
    method size (line 158) | size(): number {
    method clear (line 163) | clear() {
    method search (line 167) | async search(
    method printIndex (line 218) | printIndex() {
    method saveIndex (line 225) | async saveIndex(
    method saveToIndexedDB (line 241) | async saveToIndexedDB(
    method loadAndSearchFromIndexedDB (line 266) | async loadAndSearchFromIndexedDB(
    method deleteIndexedDB (line 287) | async deleteIndexedDB(DBname: string = 'clientVectorDB'): Promise<void> {
    method deleteIndexedDBObjectStore (line 306) | async deleteIndexedDBObjectStore(
    method getAllObjectsFromIndexedDB (line 323) | async getAllObjectsFromIndexedDB(

FILE: src/indexedDB.ts
  class IndexedDbManager (line 5) | class IndexedDbManager {
    method constructor (line 9) | constructor(DBname: string, objectStoreName: string) {
    method create (line 14) | static async create(
    method createObjectStore (line 41) | async createObjectStore(index: string | null = null): Promise<void> {
    method addToIndexedDB (line 78) | async addToIndexedDB(
    method dbGenerator (line 115) | async *dbGenerator(): AsyncGenerator<any, void, undefined> {
    method deleteIndexedDBObjectStoreFromDB (line 159) | async deleteIndexedDBObjectStoreFromDB(

Download .json

Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (39K chars).

[
  {
    "path": ".changeset/README.md",
    "chars": 510,
    "preview": "# Changesets\n\nHello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that wo"
  },
  {
    "path": ".changeset/config.json",
    "chars": 271,
    "preview": "{\n  \"$schema\": \"https://unpkg.com/@changesets/config@2.3.0/schema.json\",\n  \"changelog\": \"@changesets/cli/changelog\",\n  \""
  },
  {
    "path": ".changeset/forty-papayas-push.md",
    "chars": 100,
    "preview": "---\n\"client-vector-search\": patch\n---\n\nupdates the docs and dynamic import for @xenova/transformers\n"
  },
  {
    "path": ".changeset/nine-readers-trade.md",
    "chars": 142,
    "preview": "---\n'client-vector-search': minor\n---\n\nsupport for experimental hnsw that runs on node and browser with json and binary "
  },
  {
    "path": ".changeset/six-coins-care.md",
    "chars": 72,
    "preview": "---\n\"client-vector-search\": patch\n---\n\ncreates a proper embedding index\n"
  },
  {
    "path": ".changeset/tall-lies-hope.md",
    "chars": 96,
    "preview": "---\n\"client-vector-search\": patch\n---\n\nadds in-memory index creation and brute force knn search\n"
  },
  {
    "path": ".gitignore",
    "chars": 64,
    "preview": "node_modules\ndist\ntest.js\nmock/\n.DS_Store\n.pytest_cache/\ntest*/\n"
  },
  {
    "path": ".npmignore",
    "chars": 86,
    "preview": "node_modules\ndist\ntest.js\nmock/\n.DS_Store\n.pytest_cache/\ntest*/\n\n.github/\n.changeset/\n"
  },
  {
    "path": ".prettierrc",
    "chars": 105,
    "preview": "{\n  \"semi\": true,\n  \"trailingComma\": \"all\",\n  \"singleQuote\": true,\n  \"printWidth\": 80,\n  \"tabWidth\": 2\n}\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 362,
    "preview": "# client-vector-search\n\n## 0.2.0\n\n### Minor Changes\n\n- support for experimental hnsw that runs on node and browser with "
  },
  {
    "path": "LICENSE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2023 Yusuf Hilmi\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "README.md",
    "chars": 7217,
    "preview": "# client-vector-search\n\nA client side vector search library that can embed, search, and cache. Works on the browser and "
  },
  {
    "path": "package.json",
    "chars": 1166,
    "preview": "{\n  \"name\": \"client-vector-search\",\n  \"version\": \"0.2.0\",\n  \"description\": \"A client side vector search library\",\n  \"mai"
  },
  {
    "path": "src/cache.ts",
    "chars": 506,
    "preview": "import { LRUCache } from 'lru-cache';\n\nclass Cache {\n  private static instance: LRUCache<string, any[]>;\n\n  private cons"
  },
  {
    "path": "src/hnsw.ts",
    "chars": 6661,
    "preview": "// an experimental implementation of hnsw that doesn't rely on the hnsw binding libs which only works in browser or node"
  },
  {
    "path": "src/index.ts",
    "chars": 9948,
    "preview": "const DEFAULT_TOP_K = 3;\n\ninterface Filter {\n  [key: string]: any;\n}\n\nimport Cache from './cache';\nimport { IndexedDbMan"
  },
  {
    "path": "src/indexedDB.ts",
    "chars": 6447,
    "preview": "// uncomment for testing only\n// import { IDBFactory } from 'fake-indexeddb';\n// const indexedDB = new IDBFactory();\n\nex"
  },
  {
    "path": "src/utils.ts",
    "chars": 886,
    "preview": "export const cosineSimilarity = (\n  vecA: number[],\n  vecB: number[],\n  precision: number = 6,\n): number => {\n  // Check"
  },
  {
    "path": "tsconfig.json",
    "chars": 1027,
    "preview": "{\n\t\"compilerOptions\": {\n\t\t\"target\": \"ESNext\",\n\t\t\"lib\": [\"ESNext\", \"DOM\"],\n\t\t\"module\": \"esnext\",\n\t\t\"rootDir\": \"./src\",\n\t\t"
  }
]

About this extraction

This page contains the full source code of the yusufhilmi/client-vector-search GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (35.9 KB), approximately 9.5k tokens, and a symbol index with 54 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo